End-To-End Measure for Text Recognition

08/26/2019
by   Gundram Leifert, et al.
0

Measuring the performance of text recognition and text line detection engines is an important step to objectively compare systems and their configuration. There exist well-established measures for both tasks separately. However, there is no sophisticated evaluation scheme to measure the quality of a combined text line detection and text recognition system. The F-measure on word level is a well-known methodology, which is sometimes used in this context. Nevertheless, it does not take into account the alignment of hypothesis and ground truth text and can lead to deceptive results. Since users of automatic information retrieval pipelines in the context of text recognition are mainly interested in the end-to-end performance of a given system, there is a strong need for such a measure. Hence, we present a measure to evaluate the quality of an end-to-end text recognition system. The basis for this measure is the well established and widely used character error rate, which is limited -- in its original form -- to aligned hypothesis and ground truth texts. The proposed measure is flexible in a way that it can be configured to penalize different reading orders between the hypothesis and ground truth and can take into account the geometric position of the text lines. Additionally, it can ignore over- and under- segmentation of text lines. With these parameters it is possible to get a measure fitting best to its own needs.

READ FULL TEXT
research
01/14/2023

End-to-End Page-Level Assessment of Handwritten Text Recognition

The evaluation of Handwritten Text Recognition (HTR) systems has traditi...
research
05/09/2017

READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents

Text line detection is crucial for any application associated with Autom...
research
05/23/2022

LexiconNet: An End-to-End Handwritten Paragraph Text Recognition System

Historical documents present in the form of libraries needs to be digiti...
research
10/13/2021

Optical Character Recognition of 19th Century Classical Commentaries: the Current State of Affairs

Together with critical editions and translations, commentaries are one o...
research
03/15/2020

Multistage Curvilinear Coordinate Transform Based Document Image Dewarping using a Novel Quality Estimator

The present work demonstrates a fast and improved technique for dewarpin...
research
09/14/2018

Ground Truth for training OCR engines on historical documents in German Fraktur and Early Modern Latin

In this paper we describe a dataset of German and Latin ground truth (GT...
research
07/06/2020

Text Recognition – Real World Data and Where to Find Them

We present a method for exploiting weakly annotated images to improve te...

Please sign up or login with your details

Forgot password? Click here to reset