One-shot Compositional Data Generation for Low Resource Handwritten Text Recognition

05/11/2021
by   Mohamed Ali Souibgui, et al.
0

Low resource Handwritten Text Recognition (HTR) is a hard problem due to the scarce annotated data and the very limited linguistic information (dictionaries and language models). This appears, for example, in the case of historical ciphered manuscripts, which are usually written with invented alphabets to hide the content. Thus, in this paper we address this problem through a data generation technique based on Bayesian Program Learning (BPL). Contrary to traditional generation approaches, which require a huge amount of annotated images, our method is able to generate human-like handwriting using only one sample of each symbol from the desired alphabet. After generating symbols, we create synthetic lines to train state-of-the-art HTR architectures in a segmentation free fashion. Quantitative and qualitative analyses were carried out and confirm the effectiveness of the proposed method, achieving competitive results compared to the usage of real annotated data.

READ FULL TEXT
research
07/21/2021

Few Shots Is All You Need: A Progressive Few Shot Learning Approach for Low Resource Handwriting Recognition

Handwritten text recognition in low resource scenarios, such as manuscri...
research
06/25/2023

Weakly Supervised Scene Text Generation for Low-resource Languages

A large number of annotated training images is crucial for training succ...
research
09/26/2020

A Few-shot Learning Approach for Historical Ciphered Manuscript Recognition

Encoded (or ciphered) manuscripts are a special type of historical docum...
research
04/12/2022

Content and Style Aware Generation of Text-line Images for Handwriting Recognition

Handwritten Text Recognition has achieved an impressive performance in p...
research
08/15/2016

Generating Synthetic Data for Text Recognition

Generating synthetic images is an art which emulates the natural process...
research
04/18/2023

An end-to-end, interactive Deep Learning based Annotation system for cursive and print English handwritten text

With the surging inclination towards carrying out tasks on computational...
research
03/05/2020

GANwriting: Content-Conditioned Generation of Styled Handwritten Word Images

Although current image generation methods have reached impressive qualit...

Please sign up or login with your details

Forgot password? Click here to reset