Devnagari document segmentation using histogram approach

09/06/2011
by   Vikas J. Dongre, et al.
0

Document segmentation is one of the critical phases in machine recognition of any language. Correct segmentation of individual symbols decides the accuracy of character recognition technique. It is used to decompose image of a sequence of characters into sub images of individual symbols by segmenting lines and words. Devnagari is the most popular script in India. It is used for writing Hindi, Marathi, Sanskrit and Nepali languages. Moreover, Hindi is the third most popular language in the world. Devnagari documents consist of vowels, consonants and various modifiers. Hence proper segmentation of Devnagari word is challenging. A simple histogram based approach to segment Devnagari documents is proposed in this paper. Various challenges in segmentation of Devnagari script are also discussed.

READ FULL TEXT
research
12/05/2017

Zone-based Keyword Spotting in Bangla and Devanagari Documents

In this paper we present a word spotting system in text lines for offlin...
research
02/14/2012

Segmentation of Offline Handwritten Bengali Script

Character segmentation has long been one of the most critical areas of o...
research
01/27/2017

Document Decomposition of Bangla Printed Text

Today all kind of information is getting digitized and along with all th...
research
08/28/2013

Categorizing ancient documents

The analysis of historical documents is still a topical issue given the ...
research
03/16/2021

Combining Morphological and Histogram based Text Line Segmentation in the OCR Context

Text line segmentation is one of the pre-stages of modern optical charac...
research
05/13/2020

Sanskrit Segmentation Revisited

Computationally analyzing Sanskrit texts requires proper segmentation in...
research
05/13/2022

An empirical study of CTC based models for OCR of Indian languages

Recognition of text on word or line images, without the need for sub-wor...

Please sign up or login with your details

Forgot password? Click here to reset