Direct Processing of Document Images in Compressed Domain

10/11/2014
by   Mohammed Javed, et al.
0

With the rapid increase in the volume of Big data of this digital era, fax documents, invoices, receipts, etc are traditionally subjected to compression for the efficiency of data storage and transfer. However, in order to process these documents, they need to undergo the stage of decompression which indents additional computing resources. This limitation induces the motivation to research on the possibility of directly processing of compressed images. In this research paper, we summarize the research work carried out to perform different operations straight from run-length compressed documents without going through the stage of decompression. The different operations demonstrated are feature extraction; text-line, word and character segmentation; document block segmentation; and font size detection, all carried out in the compressed version of the document. Feature extraction methods demonstrate how to extract the conventionally defined features such as projection profile, run-histogram and entropy, directly from the compressed document data. Document segmentation involves the extraction of compressed segments of text-lines, words and characters using the vertical and horizontal projection profile features. Further an attempt is made to segment randomly a block of interest from the compressed document and subsequently facilitate absolute and relative characterization of the segmented block which finds real time applications in automatic processing of Bank Cheques, Challans, etc, in compressed domain. Finally an application to detect font size at text line level is also investigated. All the proposed algorithms are validated experimentally with sufficient data set of compressed documents.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/02/2020

Automatic Page Segmentation Without Decompressing the Run-Length Compressed Text Documents

Page segmentation is considered to be the crucial stage for the automati...
research
02/09/2014

Direct Processing of Run Length Compressed Document Image for Segmentation and Characterization of a Specified Block

Extracting a block of interest referred to as segmenting a specified blo...
research
07/29/2019

Automatic Text Line Segmentation Directly in JPEG Compressed Document Images

JPEG is one of the popular image compression algorithms that provide eff...
research
09/13/2022

OCR for TIFF Compressed Document Images Directly in Compressed Domain Using Text segmentation and Hidden Markov Model

In today's technological era, document images play an important and inte...
research
05/10/2012

Discrimination of English to other Indian languages (Kannada and Hindi) for OCR system

India is a multilingual multi-script country. In every state of India th...
research
06/02/2023

DWT-CompCNN: Deep Image Classification Network for High Throughput JPEG 2000 Compressed Documents

For any digital application with document images such as retrieval, the ...
research
01/25/2021

Spanner Evaluation over SLP-Compressed Documents

We consider the problem of evaluating regular spanners over compressed d...

Please sign up or login with your details

Forgot password? Click here to reset