Denoising and Segmentation of Epigraphical Scripts

07/25/2021
by   P Preethi, et al.
0

This paper is a presentation of a new method for denoising images using Haralick features and further segmenting the characters using artificial neural networks. The image is divided into kernels, each of which is converted to a GLCM (Gray Level Co-Occurrence Matrix) on which a Haralick Feature generation function is called, the result of which is an array with fourteen elements corresponding to fourteen features The Haralick values and the corresponding noise/text classification form a dictionary, which is then used to de-noise the image through kernel comparison. Segmentation is the process of extracting characters from a document and can be used when letters are separated by white space, which is an explicit boundary marker. Segmentation is the first step in many Natural Language Processing problems. This paper explores the process of segmentation using Neural Networks. While there have been numerous methods to segment characters of a document, this paper is only concerned with the accuracy of doing so using neural networks. It is imperative that the characters be segmented correctly, for failing to do so will lead to incorrect recognition by Natural language processing tools. Artificial Neural Networks was used to attain accuracy of upto 89 where the characters are delimited by white space. However, this method will fail to provide acceptable results when the language heavily uses connected letters. An example would be the Devanagari script, which is predominantly used in northern India.

READ FULL TEXT
research
05/07/2005

Visual Character Recognition using Artificial Neural Networks

The recognition of optical characters is known to be one of the earliest...
research
11/10/2015

Information retrieval in folktales using natural language processing

Our aim is to extract information about literary characters in unstructu...
research
07/06/2016

Artificial neural networks and fuzzy logic for recognizing alphabet characters and mathematical symbols

Optical Character Recognition software (OCR) are important tools for obt...
research
01/27/2021

"This item is a glaxefw, and this is a glaxuzb": Compositionality Through Language Transmission, using Artificial Neural Networks

We propose an architecture and process for using the Iterated Learning M...
research
06/30/2010

Recognition of Non-Compound Handwritten Devnagari Characters using a Combination of MLP and Minimum Edit Distance

This paper deals with a new method for recognition of offline Handwritte...
research
01/22/2019

Deep learning and sub-word-unit approach in written art generation

Automatic poetry generation is novel and interesting application of natu...
research
02/25/2019

A Review on Automatic License Plate Recognition System

Automatic License Plate Recognition (ALPR) is a challenging problem to t...

Please sign up or login with your details

Forgot password? Click here to reset