A Two-Stage Method for Text Line Detection in Historical Documents

02/09/2018
by   Tobias Grüning, et al.
0

This work presents a two-stage text line detection method for historical documents. In a first stage, a deep neural network called ARU-Net labels pixels to belong to one of the three classes: baseline, separator or other. The separator class marks beginning and end of each text line. The ARU-Net is trainable from scratch with manageably few manually annotated example images (less than 50). This is achieved by utilizing data augmentation strategies. The network predictions are used as input for the second stage which performs a bottom-up clustering to build baselines. The developed method is capable of handling complex layouts as well as curved and arbitrarily oriented text lines. It substantially outperforms current state-of-the-art approaches. For example, for the complex track of the cBAD: ICDAR2017 Competiton on Baseline Detection the F-value is increased from 0.859 to 0.922. The framework to train and run the ARU-Net is open source.

READ FULL TEXT

page 13

page 16

page 29

page 34

page 35

page 36

page 37

page 38

research
01/18/2021

Text line extraction using fully convolutional network and energy minimization

Text lines are important parts of handwritten document images and easier...
research
03/23/2022

Robust Text Line Detection in Historical Documents: Learning and Evaluation Methods

Text line segmentation is one of the key steps in historical document un...
research
02/23/2021

Page Layout Analysis System for Unconstrained Historic Documents

Extraction of text regions and individual text lines from historic docum...
research
05/09/2017

READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents

Text line detection is crucial for any application associated with Autom...
research
07/09/2019

BADAM: A Public Dataset for Baseline Detection in Arabic-script Manuscripts

The application of handwritten text recognition to historical works is h...
research
11/10/2018

Handwriting Recognition of Historical Documents with few labeled data

Historical documents present many challenges for offline handwriting rec...
research
10/22/2018

Baseline Detection in Historical Documents using Convolutional U-Nets

Baseline detection is still a challenging task for heterogeneous collect...

Please sign up or login with your details

Forgot password? Click here to reset