A Novel Graph Based Clustering Approach to Document Topic Modeling

07/02/2020
by   Prateek Chanda, et al.
0

Clustering is the task of assigning a set of objects into groups so that the objects within the same cluster are more similar to each other than to those in other clusters based on some similarity measures. Clustering of documents is an important task in text mining based on their research topics. In this field, cluster analysis is the task of grouping a set of documents in such a way that the documents in the same cluster have similar topic and documents of different clusters have different topics. The proposed method introduces a novel graph based clustering method which uses the importance factor of a document based on a better mathematical approach than well known classical methods. Document with the maximum importance factor in a cluster is considered as the centroid of the cluster. Publicly available synthetic dataset is used to evaluate the performance of the proposed algorithm and the method is compared with some traditional graph based methods to demonstrate its accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

research
09/26/2013

Integrating Document Clustering and Topic Modeling

Document clustering and topic modeling are two closely related tasks whi...
research
04/15/2021

Vec2GC – A Graph Based Clustering Method for Text Representations

NLP pipelines with limited or no labeled data, rely on unsupervised meth...
research
06/15/2021

Author Clustering and Topic Estimation for Short Texts

Analysis of short text, such as social media posts, is extremely difficu...
research
05/22/2020

Classification and Clustering of arXiv Documents, Sections, and Abstracts, Comparing Encodings of Natural and Mathematical Language

In this paper, we show how selecting and combining encodings of natural ...
research
03/08/2016

A Bayesian non-parametric method for clustering high-dimensional binary data

In many real life problems, objects are described by large number of bin...
research
12/15/2021

Text Mining Through Label Induction Grouping Algorithm Based Method

The main focus of information retrieval methods is to provide accurate a...
research
06/13/2016

Graph-Community Detection for Cross-Document Topic Segment Relationship Identification

In this paper we propose a graph-community detection approach to identif...

Please sign up or login with your details

Forgot password? Click here to reset