Automatic text extraction and character segmentation using maximally stable extremal regions

08/11/2016
by   Nitigya Sambyal, et al.
0

Text detection and segmentation is an important prerequisite for many content based image analysis tasks. The paper proposes a novel text extraction and character segmentation algorithm using Maximally Stable Extremal Regions as basic letter candidates. These regions are then subjected to thresholding and thereafter various connected components are determined to identify separate characters. The algorithm is tested along a set of various JPEG, PNG and BMP images over four different character sets; English, Russian, Hindi and Urdu. The algorithm gives good results for English and Russian character set; however character segmentation in Urdu and Hindi language is not much accurate. The algorithm is simple, efficient, involves no overhead as required in training and gives good results for even low quality images. The paper also proposes various challenges in text extraction and segmentation for multilingual inputs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/11/2013

Robust Text Detection in Natural Scene Images

Text detection in natural scene images is an important prerequisite for ...
research
05/17/2013

Font Acknowledgment and Character Extraction of Digital and Scanned Images

The font recognition and character extraction is of immense importance a...
research
10/31/2018

Real-time Automatic Word Segmentation for User-generated Text

For readability and possibly for disambiguation, appropriate word segmen...
research
06/11/2018

An optimized system to solve text-based CAPTCHA

CAPTCHA(Completely Automated Public Turing test to Tell Computers and Hu...
research
08/23/2019

A BLSTM Network for Printed Bengali OCR System with High Accuracy

This paper presents a printed Bengali and English text OCR system develo...
research
07/25/2021

Character Spotting Using Machine Learning Techniques

This work presents a comparison of machine learning algorithms that are ...
research
10/01/2012

Enhanced Techniques for PDF Image Segmentation and Text Extraction

Extracting text objects from the PDF images is a challenging problem. Th...

Please sign up or login with your details

Forgot password? Click here to reset