A Novel Pipeline for Improving Optical Character Recognition through Post-processing Using Natural Language Processing

07/09/2023
by   Aishik Rakshit, et al.
0

Optical Character Recognition (OCR) technology finds applications in digitizing books and unstructured documents, along with applications in other domains such as mobility statistics, law enforcement, traffic, security systems, etc. The state-of-the-art methods work well with the OCR with printed text on license plates, shop names, etc. However, applications such as printed textbooks and handwritten texts have limited accuracy with existing techniques. The reason may be attributed to similar-looking characters and variations in handwritten characters. Since these issues are challenging to address with OCR technologies exclusively, we propose a post-processing approach using Natural Language Processing (NLP) tools. This work presents an end-to-end pipeline that first performs OCR on the handwritten or printed text and then improves its accuracy using NLP.

READ FULL TEXT

page 4

page 5

page 6

research
06/09/2022

Transformer based Urdu Handwritten Text Optical Character Reader

Extracting Handwritten text is one of the most important components of d...
research
08/19/2022

To show or not to show: Redacting sensitive text from videos of electronic displays

With the increasing prevalence of video recordings there is a growing ne...
research
12/20/2021

Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP

What are the units of text that we want to model? From bytes to multi-wo...
research
10/25/2014

A Framework for On-Line Devanagari Handwritten Character Recognition

The main challenge in on-line handwritten character recognition in India...
research
02/22/2022

Wastewater Pipe Rating Model Using Natural Language Processing

Closed-circuit video (CCTV) inspection has been the most popular techniq...
research
06/29/2023

The mapKurator System: A Complete Pipeline for Extracting and Linking Text from Historical Maps

Scanned historical maps in libraries and archives are valuable repositor...
research
07/03/2023

Estimating Post-OCR Denoising Complexity on Numerical Texts

Post-OCR processing has significantly improved over the past few years. ...

Please sign up or login with your details

Forgot password? Click here to reset