Unsupervised Writer Adaptation for Synthetic-to-Real Handwritten Word Recognition

09/18/2019
by   Lei Kang, et al.
0

Handwritten Text Recognition (HTR) is still a challenging problem because it must deal with two important difficulties: the variability among writing styles, and the scarcity of labelled data. To alleviate such problems, synthetic data generation and data augmentation are typically used to train HTR systems. However, training with such data produces encouraging but still inaccurate transcriptions in real words. In this paper, we propose an unsupervised writer adaptation approach that is able to automatically adjust a generic handwritten word recognizer, fully trained with synthetic fonts, towards a new incoming writer. We have experimentally validated our proposal using five different datasets, covering several challenges (i) the document source: modern and historic samples, which may involve paper degradation problems; (ii) different handwriting styles: single and multiple writer collections; and (iii) language, which involves different character combinations. Across these challenging collections, we show that our system is able to maintain its performance, thus, it provides a practical and generic approach to deal with new document collections without requiring any expensive and tedious manual annotation step.

READ FULL TEXT

page 3

page 4

research
03/10/2023

Marginalia and machine learning: Handwritten text recognition for Marginalia Collections

The pressing need for digitization of historical document collections ha...
research
04/17/2018

Synthetic data generation for Indic handwritten text recognition

This paper presents a novel approach to generate synthetic dataset for h...
research
04/12/2022

Content and Style Aware Generation of Text-line Images for Handwriting Recognition

Handwritten Text Recognition has achieved an impressive performance in p...
research
10/10/2017

DocEmul: a Toolkit to Generate Structured Historical Documents

We propose a toolkit to generate structured synthetic documents emulatin...
research
04/05/2021

MetaHTR: Towards Writer-Adaptive Handwritten Text Recognition

Handwritten Text Recognition (HTR) remains a challenging problem to date...
research
08/15/2016

Generating Synthetic Data for Text Recognition

Generating synthetic images is an art which emulates the natural process...
research
03/28/2023

Scalable handwritten text recognition system for lexicographic sources of under-resourced languages and alphabets

The paper discusses an approach to decipher large collections of handwri...

Please sign up or login with your details

Forgot password? Click here to reset