A Novel Approach to OCR using Image Recognition based Classification for Ancient Tamil Inscriptions in Temples

07/04/2019
by   Lalitha Giridhar, et al.
14

Recognition of ancient Tamil characters has always been a challenge for epigraphers. This is primarily because the language has evolved over the several centuries and the character set over this time has both expanded and diversified. This proposed work focuses on improving optical character recognition techniques for ancient Tamil script which was in use between the 7th and 12th centuries. While comprehensively curating a functional data set for ancient Tamil characters is an arduous task, in this work, a data set has been curated using cropped images of characters found on certain temple inscriptions, specific to this time as a case study. After using Otsu thresholding method for binarization of the image a two dimensional convolution neural network is defined and used to train, classify and, recognize the ancient Tamil characters. To implement the optical character recognition techniques, the neural network is linked to the Tesseract using the pytesseract library of Python. As an added feature, the work also incorporates Google's text to speech voice engine to produce an audio output of the digitized text. Various samples for both modern and ancient Tamil were collected and passed through the system. It is found that for Tamil inscriptions studied over the considered time period, a combined efficiency of 77.7 percent can be achieved.

READ FULL TEXT

page 6

page 7

research
11/19/2012

Artificial Neural Network Based Optical Character Recognition

Optical Character Recognition deals in recognition and classification of...
research
03/30/2010

Recognition of Handwritten Roman Script Using Tesseract Open source OCR Engine

In the present work, we have used Tesseract 2.01 open source Optical Cha...
research
11/10/2020

On-Device Language Identification of Text in Images using Diacritic Characters

Diacritic characters can be considered as a unique set of characters pro...
research
08/12/2022

Character decomposition to resolve class imbalance problem in Hangul OCR

We present a novel approach to OCR(Optical Character Recognition) of Kor...
research
03/27/2022

Benchmarking Algorithms for Automatic License Plate Recognition

We evaluated a lightweight Convolutional Neural Network (CNN) called LPR...
research
05/17/2021

Unknown-box Approximation to Improve Optical Character Recognition Performance

Optical character recognition (OCR) is a widely used pattern recognition...
research
08/21/2012

An Online Character Recognition System to Convert Grantha Script to Malayalam

This paper presents a novel approach to recognize Grantha, an ancient sc...

Please sign up or login with your details

Forgot password? Click here to reset