OCR for TIFF Compressed Document Images Directly in Compressed Domain Using Text segmentation and Hidden Markov Model

09/13/2022
by   Dikshit Sharma, et al.
0

In today's technological era, document images play an important and integral part in our day to day life, and specifically with the surge of Covid-19, digitally scanned documents have become key source of communication, thus avoiding any sort of infection through physical contact. Storage and transmission of scanned document images is a very memory intensive task, hence compression techniques are being used to reduce the image size before archival and transmission. To extract information or to operate on the compressed images, we have two ways of doing it. The first way is to decompress the image and operate on it and subsequently compress it again for the efficiency of storage and transmission. The other way is to use the characteristics of the underlying compression algorithm to directly process the images in their compressed form without involving decompression and re-compression. In this paper, we propose a novel idea of developing an OCR for CCITT (The International Telegraph and Telephone Consultative Committee) compressed machine printed TIFF document images directly in the compressed domain. After segmenting text regions into lines and words, HMM is applied for recognition using three coding modes of CCITT- horizontal, vertical and the pass mode. Experimental results show that OCR on pass modes give a promising results.

READ FULL TEXT

page 3

page 5

research
10/11/2014

Direct Processing of Document Images in Compressed Domain

With the rapid increase in the volume of Big data of this digital era, f...
research
07/29/2019

Automatic Text Line Segmentation Directly in JPEG Compressed Document Images

JPEG is one of the popular image compression algorithms that provide eff...
research
07/02/2020

Automatic Page Segmentation Without Decompressing the Run-Length Compressed Text Documents

Page segmentation is considered to be the crucial stage for the automati...
research
02/09/2014

Direct Processing of Run Length Compressed Document Image for Segmentation and Characterization of a Specified Block

Extracting a block of interest referred to as segmenting a specified blo...
research
09/13/2022

Document Image Binarization in JPEG Compressed Domain using Dual Discriminator Generative Adversarial Networks

Image binarization techniques are being popularly used in enhancement of...
research
06/25/2020

Fine granularity access in interactive compression of 360-degree images based on rate-adaptive channel codes

In this paper, we propose a new interactive compression scheme for omnid...

Please sign up or login with your details

Forgot password? Click here to reset