Document Image Coding and Clustering for Script Discrimination

09/21/2016
by   Darko Brodic, et al.
0

The paper introduces a new method for discrimination of documents given in different scripts. The document is mapped into a uniformly coded text of numerical values. It is derived from the position of the letters in the text line, based on their typographical characteristics. Each code is considered as a gray level. Accordingly, the coded text determines a 1-D image, on which texture analysis by run-length statistics and local binary pattern is performed. It defines feature vectors representing the script content of the document. A modified clustering approach employed on document feature vector groups documents written in the same script. Experimentation performed on two custom oriented databases of historical documents in old Cyrillic, angular and round Glagolitic as well as Antiqua and Fraktur scripts demonstrates the superiority of the proposed method with respect to well-known methods in the state-of-the-art.

READ FULL TEXT
research
09/07/2015

An Approach to the Analysis of the South Slavic Medieval Labels Using Image Texture

The paper presents a new script classification method for the discrimina...
research
07/17/2015

Analysis of the South Slavic Scripts by Run-Length Features of the Image Texture

The paper proposes an algorithm for the script recognition based on the ...
research
03/06/2022

A Comparative Study on Data Representation to Categorize Text Documents

In the modern world text documents play an important role in most of the...
research
03/29/2018

High Capacity Image Data Hiding of Scanned Text Documents Using Improved Quadtree

In this paper, an effective method was introduced to steganography of te...
research
03/06/2022

Evaluation of Partition-Based Text Clustering Techniques to Categorize Indic Language Documents

Wide availability of electronic data has led to the vast interest in tex...
research
10/03/2022

EraseNet: A Recurrent Residual Network for Supervised Document Cleaning

Document denoising is considered one of the most challenging tasks in co...
research
11/25/2016

Texture analysis using deterministic partially self-avoiding walk with thresholds

In this paper, we propose a new texture analysis method using the determ...

Please sign up or login with your details

Forgot password? Click here to reset