Boosting offline handwritten text recognition in historical documents with few labeled lines

12/04/2020
by   José Carlos Aradillas, et al.
0

In this paper, we face the problem of offline handwritten text recognition (HTR) in historical documents when few labeled samples are available and some of them contain errors in the train set. Three main contributions are developed. First we analyze how to perform transfer learning (TL) from a massive database to a smaller historical database, analyzing which layers of the model need a fine-tuning process. Second, we analyze methods to efficiently combine TL and data augmentation (DA). Finally, an algorithm to mitigate the effects of incorrect labelings in the training set is proposed. The methods are analyzed over the ICFHR 2018 competition database, Washington and Parzival. Combining all these techniques, we demonstrate a remarkable reduction of CER (up to 6

READ FULL TEXT
research
10/02/2022

DARE: A large-scale handwritten date recognition system

Handwritten text recognition for historical documents is an important ta...
research
11/10/2018

Handwriting Recognition of Historical Documents with few labeled data

Historical documents present many challenges for offline handwriting rec...
research
05/04/2023

How to Choose Pretrained Handwriting Recognition Models for Single Writer Fine-Tuning

Recent advancements in Deep Learning-based Handwritten Text Recognition ...
research
04/04/2018

Boosting Handwriting Text Recognition in Small Databases with Transfer Learning

In this paper we deal with the offline handwriting text recognition (HTR...
research
10/24/2016

Record Counting in Historical Handwritten Documents with Convolutional Neural Networks

In this paper, we investigate the use of Convolutional Neural Networks f...
research
12/15/2022

The Effects of Character-Level Data Augmentation on Style-Based Dating of Historical Manuscripts

Identifying the production dates of historical manuscripts is one of the...
research
12/15/2014

CITlab ARGUS for historical data tables

We describe CITlab's recognition system for the ANWRESH-2014 competition...

Please sign up or login with your details

Forgot password? Click here to reset