A Scalable Handwritten Text Recognition System

04/19/2019
by   R. Reeve Ingle, et al.
0

Many studies on (Offline) Handwritten Text Recognition (HTR) systems have focused on building state-of-the-art models for line recognition on small corpora. However, adding HTR capability to a large scale multilingual OCR system poses new challenges. This paper addresses three problems in building such systems: data, efficiency, and integration. Firstly, one of the biggest challenges is obtaining sufficient amounts of high quality training data. We address the problem by using online handwriting data collected for a large scale production online handwriting recognition system. We describe our image data generation pipeline and study how online data can be used to build HTR models. We show that the data improve the models significantly under the condition where only a small number of real images is available, which is usually the case for HTR models. It enables us to support a new script at substantially lower cost. Secondly, we propose a line recognition model based on neural networks without recurrent connections. The model achieves a comparable accuracy with LSTM-based models while allowing for better parallelism in training and inference. Finally, we present a simple way to integrate HTR models into an OCR system. These constitute a solution to bring HTR capability into a large scale OCR system.

READ FULL TEXT

page 1

page 3

research
10/01/2019

A Computationally Efficient Pipeline Approach to Full Page Offline Handwritten Text Recognition

Offline handwriting recognition with deep neural networks is usually lim...
research
04/17/2018

Synthetic data generation for Indic handwritten text recognition

This paper presents a novel approach to generate synthetic dataset for h...
research
03/01/2019

Adversarial Generation of Handwritten Text Images Conditioned on Sequences

State-of-the-art offline handwriting text recognition systems tend to us...
research
03/11/2022

Preliminary experiments on automatic gender recognition based on online capital letters

In this paper we present some experiments to automatically classify onli...
research
08/18/2020

EASTER: Efficient and Scalable Text Recognizer

Recent progress in deep learning has led to the development of Optical C...
research
04/13/2020

Embedded Large-Scale Handwritten Chinese Character Recognition

As handwriting input becomes more prevalent, the large symbol inventory ...
research
05/04/2023

How to Choose Pretrained Handwriting Recognition Models for Single Writer Fine-Tuning

Recent advancements in Deep Learning-based Handwritten Text Recognition ...

Please sign up or login with your details

Forgot password? Click here to reset