Handwriting Recognition of Historical Documents with few labeled data

11/10/2018
by   Edgard Chammas, et al.
0

Historical documents present many challenges for offline handwriting recognition systems, among them, the segmentation and labeling steps. Carefully annotated textlines are needed to train an HTR system. In some scenarios, transcripts are only available at the paragraph level with no text-line information. In this work, we demonstrate how to train an HTR system with few labeled data. Specifically, we train a deep convolutional recurrent neural network (CRNN) system on only 10 dataset and propose an incremental training procedure that covers the rest of the data. Performance is further increased by augmenting the training set with specially crafted multiscale data. We also propose a model-based normalization scheme which considers the variability in the writing scale at the recognition phase. We apply this approach to the publicly available READ dataset. Our system achieved the second best result during the ICDAR2017 competition.

READ FULL TEXT

page 1

page 6

research
12/04/2020

Boosting offline handwritten text recognition in historical documents with few labeled lines

In this paper, we face the problem of offline handwritten text recogniti...
research
10/02/2022

DARE: A large-scale handwritten date recognition system

Handwritten text recognition for historical documents is an important ta...
research
03/16/2021

Digital Peter: Dataset, Competition and Handwriting Recognition Methods

This paper presents a new dataset of Peter the Great's manuscripts and d...
research
12/15/2014

CITlab ARGUS for historical data tables

We describe CITlab's recognition system for the ANWRESH-2014 competition...
research
01/27/2016

Font Identification in Historical Documents Using Active Learning

Identifying the type of font (e.g., Roman, Blackletter) used in historic...
research
02/09/2018

A Two-Stage Method for Text Line Detection in Historical Documents

This work presents a two-stage text line detection method for historical...
research
03/21/2022

Transformer-based HTR for Historical Documents

We apply the TrOCR framework to real-world, historical manuscripts and s...

Please sign up or login with your details

Forgot password? Click here to reset