The Learnable Typewriter: A Generative Approach to Text Line Analysis

02/03/2023
by   Ioannis Siglidis, et al.
0

We present a generative document-specific approach to character analysis and recognition in text lines. Our main idea is to build on unsupervised multi-object segmentation methods and in particular those that reconstruct images based on a limited amount of visual elements, called sprites. Our approach can learn a large number of different characters and leverage line-level annotations when available. Our contribution is twofold. First, we provide the first adaptation and evaluation of a deep unsupervised multi-object segmentation approach for text line analysis. Since these methods have mainly been evaluated on synthetic data in a completely unsupervised setting, demonstrating that they can be adapted and quantitatively evaluated on real text images and that they can be trained using weak supervision are significant progresses. Second, we demonstrate the potential of our method for new applications, more specifically in the field of paleography, which studies the history and variations of handwriting, and for cipher analysis. We evaluate our approach on three very different datasets: a printed volume of the Google1000 dataset, the Copiale cipher and historical handwritten charters from the 12th and early 13th century.

READ FULL TEXT

page 1

page 5

page 6

page 8

research
03/19/2020

Unsupervised text line segmentation

We present an unsupervised text line segmentation method that is inspire...
research
05/19/2021

Unsupervised learning of text line segmentation by differentiating coarse patterns

Despite recent advances in the field of supervised deep learning for tex...
research
09/23/2018

Learning to Read by Spelling: Towards Unsupervised Text Recognition

This work presents a method for visual text recognition without using an...
research
09/06/2023

Character Queries: A Transformer-based Approach to On-Line Handwritten Character Segmentation

On-line handwritten character segmentation is often associated with hand...
research
03/16/2021

Combining Morphological and Histogram based Text Line Segmentation in the OCR Context

Text line segmentation is one of the pre-stages of modern optical charac...
research
04/12/2022

Content and Style Aware Generation of Text-line Images for Handwriting Recognition

Handwritten Text Recognition has achieved an impressive performance in p...
research
07/19/2022

You Actually Look Twice At it (YALTAi): using an object detection approach instead of region segmentation within the Kraken engine

Layout Analysis (the identification of zones and their classification) i...

Please sign up or login with your details

Forgot password? Click here to reset