OCR Error Correction Using Character Correction and Feature-Based Word Classification

04/21/2016
by   Ido Kissos, et al.
0

This paper explores the use of a learned classifier for post-OCR text correction. Experiments with the Arabic language show that this approach, which integrates a weighted confusion matrix and a shallow language model, improves the vast majority of segmentation and recognition errors, the most frequent types of error on our dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/27/2018

Building a Lemmatizer and a Spell-checker for Sorani Kurdish

The present paper aims at presenting a lemmatization and a word-level er...
research
07/02/2018

A Simple but Effective Classification Model for Grammatical Error Correction

We treat grammatical error correction (GEC) as a classification problem ...
research
05/11/2022

Some Grammatical Errors are Frequent, Others are Important

In Grammatical Error Correction, systems are evaluated by the number of ...
research
08/29/2023

Enhancing OCR Performance through Post-OCR Models: Adopting Glyph Embedding for Improved Correction

The study investigates the potential of post-OCR models to overcome limi...
research
08/14/2016

Numerically Grounded Language Models for Semantic Error Correction

Semantic error detection and correction is an important task for applica...
research
06/22/2021

A Simple and Practical Approach to Improve Misspellings in OCR Text

The focus of our paper is the identification and correction of non-word ...
research
03/18/2022

Towards Lithuanian grammatical error correction

Everyone wants to write beautiful and correct text, yet the lack of lang...

Please sign up or login with your details

Forgot password? Click here to reset