Estimating Post-OCR Denoising Complexity on Numerical Texts

07/03/2023
by   Arthur Hemmer, et al.
0

Post-OCR processing has significantly improved over the past few years. However, these have been primarily beneficial for texts consisting of natural, alphabetical words, as opposed to documents of numerical nature such as invoices, payslips, medical certificates, etc. To evaluate the OCR post-processing difficulty of these datasets, we propose a method to estimate the denoising complexity of a text and evaluate it on several datasets of varying nature, and show that texts of numerical nature have a significant disadvantage. We evaluate the estimated complexity ranking with respect to the error rates of modern-day denoising approaches to show the validity of our estimator.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/12/2021

A Simple Post-Processing Technique for Improving Readability Assessment of Texts using Word Mover's Distance

Assessing the proper difficulty levels of reading materials or texts in ...
research
03/01/2018

Poisson Image Denoising Using Best Linear Prediction: A Post-processing Framework

In this paper, we address the problem of denoising images degraded by Po...
research
11/04/2018

Char2char Generation with Reranking for the E2E NLG Challenge

This paper describes our submission to the E2E NLG Challenge. Recently, ...
research
04/25/2023

Post-processing and improved error estimates of numerical methods for evolutionary systems

We consider evolutionary systems, i.e. systems of linear partial differe...
research
07/09/2023

A Novel Pipeline for Improving Optical Character Recognition through Post-processing Using Natural Language Processing

Optical Character Recognition (OCR) technology finds applications in dig...
research
06/24/2020

Post-DAE: Anatomically Plausible Segmentation via Post-Processing with Denoising Autoencoders

We introduce Post-DAE, a post-processing method based on denoising autoe...
research
09/09/2021

Mining Points of Interest via Address Embeddings: An Unsupervised Approach

Digital maps are commonly used across the globe for exploring places tha...

Please sign up or login with your details

Forgot password? Click here to reset