Evaluation of Partition-Based Text Clustering Techniques to Categorize Indic Language Documents

03/06/2022
by   Dulani Meedeniya, et al.
0

Wide availability of electronic data has led to the vast interest in text analysis, information retrieval and text categorization methods. To provide a better service, there is a need for non-English based document analysis and categorizing systems, as is currently available for English text documents. This study is mainly focused on categorizing Indic language documents. The main techniques examined in this study include data pre-processing and document clustering. The approach makes use of a transformation based on the text frequency and the inverse document frequency, which enhances the clustering performance. This approach is based on latent semantic analysis, k-means clustering and Gaussian mixture model clustering. A text corpus categorized by human readers is utilized to test the validity of the suggested approach. The technique introduced in this work enables the processing of text documents written in Sinhala, and empowers citizens and organizations to do their daily work efficiently.

READ FULL TEXT
research
03/06/2022

An Adaptive Technique to Categorize Indic Language Documents

The significant growth of the electronic media to store and exchange tex...
research
03/06/2022

A Comparative Study on Data Representation to Categorize Text Documents

In the modern world text documents play an important role in most of the...
research
03/10/2022

TIDF-DLPM: Term and Inverse Document Frequency based Data Leakage Prevention Model

Confidentiality of the data is being endangered as it has been categoriz...
research
05/30/2022

Contextualization for the Organization of Text Documents Streams

There has been a significant effort by the research community to address...
research
01/06/2015

Arabic Text Categorization Algorithm using Vector Evaluation Method

Text categorization is the process of grouping documents into categories...
research
04/16/2020

An approach based on Combination of Features for automatic news retrieval

Nowadays, according to the increasingly increasing information, the impo...
research
09/21/2016

Document Image Coding and Clustering for Script Discrimination

The paper introduces a new method for discrimination of documents given ...

Please sign up or login with your details

Forgot password? Click here to reset