LexiconNet: An End-to-End Handwritten Paragraph Text Recognition System

05/23/2022
by   Lalita Kumari, et al.
0

Historical documents present in the form of libraries needs to be digitised. The recognition of these unconstrained cursive handwritten documents is a challenging task. In the present work, neural network based classifier is used. The recognition of scanned document images which are easy to train on neural network based systems is usually done by a two step approach: segmentation followed by recognition. This approach has several shortcomings, which includes identification of text regions, layout diversity analysis present within pages and ground truth segmentation. These processes are prone to errors that create bottleneck in the recognition accuracies. Thus in this study, an end-to-end paragraph recognition system is presented with internal line segmentation and lexicon decoder as post processing step, which is free from those errors. We named our model as LexiconNet. In LexiconNet, given a paragraph image a combination of convolution and depth-wise separable convolutional modules generates the two dimension feature map of the image. The attention module is responsible for internal line segmentation that consequently processing a page in a line by line manner. At decoding step, we have added connectionist temporal classification based word beam search decoder as a post processing step. Our approach reports state-of-the-art results on standard datasets. The reported character error rate is 3.24 1.13 improvement from existing literature and the word error rate is 8.29 dataset with 43.02 and 7.35 results. The character error rate and word error rate reported in this work surpasses the results reported in literature.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/07/2020

End-to-end Handwritten Paragraph Text Recognition Using a Vertical Attention Network

Unconstrained handwritten text recognition remains challenging for compu...
research
03/16/2018

Joint Recognition of Handwritten Text and Named Entities with a Neural End-to-end Model

When extracting information from handwritten documents, text transcripti...
research
09/11/2022

Lexicon and Attention based Handwritten Text Recognition System

The handwritten text recognition problem is widely studied by the resear...
research
05/13/2022

An empirical study of CTC based models for OCR of Indian languages

Recognition of text on word or line images, without the need for sub-wor...
research
01/14/2023

End-to-End Page-Level Assessment of Handwritten Text Recognition

The evaluation of Handwritten Text Recognition (HTR) systems has traditi...
research
08/26/2019

End-To-End Measure for Text Recognition

Measuring the performance of text recognition and text line detection en...
research
09/22/2020

Whole page recognition of historical handwriting

Historical handwritten documents guard an important part of human knowle...

Please sign up or login with your details

Forgot password? Click here to reset