Full Page Handwriting Recognition via Image to Sequence Extraction

03/11/2021
by   Sumeet S. Singh, et al.
0

We present a Neural Network based Handwritten Text Recognition (HTR) model architecture that can be trained to recognize full pages of handwritten or printed text without image segmentation. Being based on an Image to Sequence architecture, it can be trained to extract text present in an image and sequence it correctly without imposing any constraints on language, shape of characters or orientation and layout of text and non-text. The model can also be trained to generate auxiliary markup related to formatting, layout and content. We use character level token vocabulary, thereby supporting proper nouns and terminology of any subject. The model achieves a new state-of-art in full page recognition on the IAM dataset and when evaluated on scans of real world handwritten free form test answers - a dataset beset with curved and slanted lines, drawings, tables, math, chemistry and other symbols - it performs better than all commercially available HTR APIs. It is deployed in production as part of a commercial web application.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/23/2022

DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition

Unconstrained handwritten document recognition is a challenging computer...
research
03/30/2010

Recognition of Handwritten Roman Script Using Tesseract Open source OCR Engine

In the present work, we have used Tesseract 2.01 open source Optical Cha...
research
08/17/2022

Boosting Modern and Historical Handwritten Text Recognition with Deformable Convolutions

Handwritten Text Recognition (HTR) in free-layout pages is a challenging...
research
09/30/2022

Towards End-to-end Handwritten Document Recognition

Handwritten text recognition has been widely studied in the last decades...
research
01/19/2021

VML-MOC: Segmenting a multiply oriented and curved handwritten text lines dataset

This paper publishes a natural and very complicated dataset of handwritt...
research
08/29/2020

AKHCRNet: Bengali Handwritten Character Recognition Using Deep Learning

I propose a state of the art deep neural architectural solution for hand...
research
08/15/2023

Handwritten Stenography Recognition and the LION Dataset

Purpose: In this paper, we establish a baseline for handwritten stenogra...

Please sign up or login with your details

Forgot password? Click here to reset