Integrating Document Clustering and Topic Modeling

09/26/2013
by   Pengtao Xie, et al.
0

Document clustering and topic modeling are two closely related tasks which can mutually benefit each other. Topic modeling can project documents into a topic space which facilitates effective document clustering. Cluster labels discovered by document clustering can be incorporated into topic models to extract local topics specific to each cluster and global topics shared by all clusters. In this paper, we propose a multi-grain clustering topic model (MGCTM) which integrates document clustering and topic modeling into a unified framework and jointly performs the two tasks to achieve the overall best performance. Our model tightly couples two components: a mixture component used for discovering latent groups in document collection and a topic model component used for mining multi-grain topics including local topics specific to each cluster and global topics shared across clusters.We employ variational inference to approximate the posterior of hidden variables and learn model parameters. Experiments on two datasets demonstrate the effectiveness of our model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/03/2023

ClusTop: An unsupervised and integrated text clustering and topic extraction framework

Text clustering and topic extraction are two important tasks in text min...
research
07/02/2020

A Novel Graph Based Clustering Approach to Document Topic Modeling

Clustering is the task of assigning a set of objects into groups so that...
research
09/02/2023

MPTopic: Improving topic modeling via Masked Permuted pre-training

Topic modeling is pivotal in discerning hidden semantic structures withi...
research
06/01/2017

Discovering Discrete Latent Topics with Neural Variational Inference

Topic models have been widely explored as probabilistic generative model...
research
06/08/2023

A modified model for topic detection from a corpus and a new metric evaluating the understandability of topics

This paper presents a modified neural model for topic detection from a c...
research
03/30/2015

Nonparametric Relational Topic Models through Dependent Gamma Processes

Traditional Relational Topic Models provide a way to discover the hidden...
research
07/29/2016

TopicResponse: A Marriage of Topic Modelling and Rasch Modelling for Automatic Measurement in MOOCs

This paper explores the suitability of using automatically discovered to...

Please sign up or login with your details

Forgot password? Click here to reset