Multistage Curvilinear Coordinate Transform Based Document Image Dewarping using a Novel Quality Estimator

03/15/2020
by   Tanmoy Dasgupta, et al.
0

The present work demonstrates a fast and improved technique for dewarping nonlinearly warped document images. The images are first dewarped at the page-level by estimating optimum inverse projections using curvilinear homography. The quality of the process is then estimated by evaluating a set of metrics related to the characteristics of the text lines and rectilinear objects for measuring parallelism, orthogonality, etc. These are designed specifically to estimate the quality of the dewarping process without the need of any ground truth. If the quality is estimated to be unsatisfactory, the page-level dewarping process is repeated with finer approximations. This is followed by a line-level dewarping process that makes granular corrections to the warps in individual text-lines. The methodology has been tested on the CBDAR 2007 / IUPR 2011 document image dewarping dataset and is seen to yield the best OCR accuracy in the shortest amount of time, till date. The usefulness of the methodology has also been evaluated on the DocUNet 2018 dataset with some minor tweaks, and is seen to produce comparable results.

READ FULL TEXT

page 4

page 8

page 9

page 10

page 11

page 13

page 15

research
05/09/2017

READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents

Text line detection is crucial for any application associated with Autom...
research
12/25/2019

DDI-100: Dataset for Text Detection and Recognition

Nowadays document analysis and recognition remain challenging tasks. How...
research
12/11/2017

Learning Surrogate Models of Document Image Quality Metrics for Automated Document Image Processing

Computation of document image quality metrics often depends upon the ava...
research
08/26/2019

End-To-End Measure for Text Recognition

Measuring the performance of text recognition and text line detection en...
research
01/14/2023

End-to-End Page-Level Assessment of Handwritten Text Recognition

The evaluation of Handwritten Text Recognition (HTR) systems has traditi...
research
12/05/2019

A Document Skew Detection Method Using Fast Hough Transform

The majority of document image analysis systems use a document skew dete...
research
08/24/2023

Beyond Document Page Classification: Design, Datasets, and Challenges

This paper highlights the need to bring document classification benchmar...

Please sign up or login with your details

Forgot password? Click here to reset