Towards Document Image Quality Assessment: A Text Line Based Framework and A Synthetic Text Line Image Dataset

06/05/2019
by   Hongyu Li, et al.
0

Since the low quality of document images will greatly undermine the chances of success in automatic text recognition and analysis, it is necessary to assess the quality of document images uploaded in online business process, so as to reject those images of low quality. In this paper, we attempt to achieve document image quality assessment and our contributions are twofold. Firstly, since document image quality assessment is more interested in text, we propose a text line based framework to estimate document image quality, which is composed of three stages: text line detection, text line quality prediction, and overall quality assessment. Text line detection aims to find potential text lines with a detector. In the text line quality prediction stage, the quality score is computed for each text line with a CNN-based prediction model. The overall quality of document images is finally assessed with the ensemble of all text line quality. Secondly, to train the prediction model, a large-scale dataset, comprising 52,094 text line images, is synthesized with diverse attributes. For each text line image, a quality label is computed with a piece-wise function. To demonstrate the effectiveness of the proposed framework, comprehensive experiments are evaluated on two popular document image quality assessment benchmarks. Our framework significantly outperforms the state-of-the-art methods by large margins on the large and complicated dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 8

research
07/11/2018

CG-DIQA: No-reference Document Image Quality Assessment Based on Character Gradient

Document image quality assessment (DIQA) is an important and challenging...
research
01/04/2019

A Joint Model for Multimodal Document Quality Assessment

The quality of a document is affected by various factors, including gram...
research
09/05/2022

Forensicability Assessment of Questioned Images in Recapturing Detection

Recapture detection of face and document images is an important forensic...
research
03/20/2018

SlideNet: Fast and Accurate Slide Quality Assessment Based on Deep Neural Networks

This work tackles the automatic fine-grained slide quality assessment pr...
research
08/13/2020

Cognitive Representation Learning of Self-Media Online Article Quality

The automatic quality assessment of self-media online articles is an urg...
research
10/05/2018

ResumeNet: A Learning-based Framework for Automatic Resume Quality Assessment

Recruitment of appropriate people for certain positions is critical for ...
research
01/24/2022

Cross-Domain Document Layout Analysis via Unsupervised Document Style Guide

The document layout analysis (DLA) aims to decompose document images int...

Please sign up or login with your details

Forgot password? Click here to reset