READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents

05/09/2017
by   Tobias Grüning, et al.
0

Text line detection is crucial for any application associated with Automatic Text Recognition or Keyword Spotting. Modern algorithms perform good on well-established datasets since they either comprise clean data or simple/homogeneous page layouts. We have collected and annotated 2036 archival document images from different locations and time periods. The dataset contains varying page layouts and degradations that challenge text line segmentation methods. Well established text line segmentation evaluation schemes such as the Detection Rate or Recognition Accuracy demand for binarized data that is annotated on a pixel level. Producing ground truth by these means is laborious and not needed to determine a method's quality. In this paper we propose a new evaluation scheme that is based on baselines. The proposed scheme has no need for binarization and it can handle skewed as well as rotated text lines. The ICDAR 2017 Competition on Baseline Detection and the ICDAR 2017 Competition on Layout Analysis for Challenging Medieval Manuscripts used this evaluation scheme. Finally, we present results achieved by a recently published text line detection algorithm.

READ FULL TEXT

page 2

page 3

page 4

research
02/23/2021

Page Layout Analysis System for Unconstrained Historic Documents

Extraction of text regions and individual text lines from historic docum...
research
07/09/2019

BADAM: A Public Dataset for Baseline Detection in Arabic-script Manuscripts

The application of handwritten text recognition to historical works is h...
research
03/15/2020

Multistage Curvilinear Coordinate Transform Based Document Image Dewarping using a Novel Quality Estimator

The present work demonstrates a fast and improved technique for dewarpin...
research
08/26/2019

End-To-End Measure for Text Recognition

Measuring the performance of text recognition and text line detection en...
research
11/23/2017

Open Evaluation Tool for Layout Analysis of Document Images

This paper presents an open tool for standardizing the evaluation proces...
research
07/19/2022

You Actually Look Twice At it (YALTAi): using an object detection approach instead of region segmentation within the Kraken engine

Layout Analysis (the identification of zones and their classification) i...
research
02/09/2018

A Two-Stage Method for Text Line Detection in Historical Documents

This work presents a two-stage text line detection method for historical...

Please sign up or login with your details

Forgot password? Click here to reset