Rerunning OCR: A Machine Learning Approach to Quality Assessment and Enhancement Prediction

10/04/2021
by   Pit Schneider, et al.
0

Iterating with new and improved OCR solutions enforces decisions to be taken when it comes to targeting the right reprocessing candidates. This especially applies when the underlying data collection is of considerable size and rather diverse in terms of fonts, languages, periods of publication and consequently OCR quality. This article captures the efforts of the National Library of Luxembourg to support those exact decisions. They are crucial in order to guarantee low computational overhead and reduced quality degradation risks, combined with a more quantifiable OCR improvement. In particular, this work explains the methodology of the library with respect to text block level quality assessment. As an extension of this technique, another contribution comes in the form of a regression model that takes the enhancement potential of a new OCR engine into account. They both mark promising approaches, especially for cultural institutions dealing with historic data of lower quality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2022

The Loop Game: Quality Assessment and Optimization for Low-Light Image Enhancement

There is an increasing consensus that the design and optimization of low...
research
01/29/2020

A Scalable Framework for Quality Assessment of RDF Datasets

Over the last years, Linked Data has grown continuously. Today, we count...
research
04/20/2022

A Survey of Video-based Action Quality Assessment

Human action recognition and analysis have great demand and important ap...
research
08/19/2021

More for Less: Non-Intrusive Speech Quality Assessment with Limited Annotations

Non-intrusive speech quality assessment is a crucial operation in multim...
research
04/18/2019

Learning a No-Reference Quality Assessment Model of Enhanced Images With Big Data

In this paper we investigate into the problem of image quality assessmen...
research
08/19/2022

Applying Back Propagation Algorithm and Analytic Hierarchy Process to Environment Assessment

This paper designs a new and scientific environmental quality assessment...
research
03/16/2021

Combining Morphological and Histogram based Text Line Segmentation in the OCR Context

Text line segmentation is one of the pre-stages of modern optical charac...

Please sign up or login with your details

Forgot password? Click here to reset