Development of a multi-user handwriting recognition system using Tesseract open source OCR engine

03/30/2010
by   Sandip Rakshit, et al.
0

The objective of the paper is to recognize handwritten samples of lower case Roman script using Tesseract open source Optical Character Recognition (OCR) engine under Apache License 2.0. Handwritten data samples containing isolated and free-flow text were collected from different users. Tesseract is trained with user-specific data samples of both the categories of document pages to generate separate user-models representing a unique language-set. Each such language-set recognizes isolated and free-flow handwritten test samples collected from the designated user. On a three user model, the system is trained with 1844, 1535 and 1113 isolated handwritten character samples collected from three different users and the performance is tested on 1133, 1186 and 1204 character samples, collected form the test sets of the three users respectively. The user specific character level accuracies were obtained as 87.92 of the system is observed as 78.39 characters and erroneously classifies 10.65

READ FULL TEXT
research
03/30/2010

Development of a Multi-User Recognition Engine for Handwritten Bangla Basic Characters and Digits

The objective of the paper is to recognize handwritten samples of basic ...
research
03/30/2010

Recognition of Handwritten Roman Script Using Tesseract Open source OCR Engine

In the present work, we have used Tesseract 2.01 open source Optical Cha...
research
03/30/2010

Recognition of Handwritten Textual Annotations using Tesseract Open Source OCR Engine for information Just In Time (iJIT)

Objective of the current work is to develop an Optical Character Recogni...
research
03/30/2010

Recognition of handwritten Roman Numerals using Tesseract open source OCR engine

The objective of the paper is to recognize handwritten samples of Roman ...
research
08/17/2013

Development of Comprehensive Devnagari Numeral and Character Database for Offline Handwritten Character Recognition

In handwritten character recognition, benchmark database plays an import...
research
03/13/2021

uTHCD: A New Benchmarking for Tamil Handwritten OCR

Handwritten character recognition is a challenging research in the field...
research
04/28/2019

TMIXT: A process flow for Transcribing MIXed handwritten and machine-printed Text

Handling large corpuses of documents is of significant importance in man...

Please sign up or login with your details

Forgot password? Click here to reset