Statistical Learning for OCR Text Correction

11/21/2016
by   Jie Mei, et al.
0

The accuracy of Optical Character Recognition (OCR) is crucial to the success of subsequent applications used in text analyzing pipeline. Recent models of OCR post-processing significantly improve the quality of OCR-generated text, but are still prone to suggest correction candidates from limited observations while insufficiently accounting for the characteristics of OCR errors. In this paper, we show how to enlarge candidate suggestion space by using external corpus and integrating OCR-specific features in a regression approach to correct OCR-generated errors. The evaluation results show that our model can correct 61.5 the OCR-errors (considering the top 3 suggestions), for cases where the theoretical correction upper-bound is 78

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/30/2023

Optimizing the Neural Network Training for OCR Error Correction of Historical Hebrew Texts

Over the past few decades, large archives of paper-based documents such ...
research
01/02/2018

A Novel Approach to Skew-Detection and Correction of English Alphabets for OCR

Optical Character Recognition has been a challenging field in the advent...
research
08/01/2022

iOCR: Informed Optical Character Recognition for Election Ballot Tallies

The purpose of this study is to explore the performance of Informed OCR ...
research
06/22/2021

A Simple and Practical Approach to Improve Misspellings in OCR Text

The focus of our paper is the identification and correction of non-word ...
research
08/29/2023

Enhancing OCR Performance through Post-OCR Models: Adopting Glyph Embedding for Improved Correction

The study investigates the potential of post-OCR models to overcome limi...
research
02/09/2023

Correcting Real-Word Spelling Errors: A New Hybrid Approach

Spelling correction is one of the main tasks in the field of Natural Lan...
research
06/26/2019

Leveraging Text Repetitions and Denoising Autoencoders in OCR Post-correction

A common approach for improving OCR quality is a post-processing step ba...

Please sign up or login with your details

Forgot password? Click here to reset