Parsimonious Topic Models with Salient Word Discovery

01/22/2014
by   Hossein Soleimani, et al.
0

We propose a parsimonious topic model for text corpora. In related models such as Latent Dirichlet Allocation (LDA), all words are modeled topic-specifically, even though many words occur with similar frequencies across different topics. Our modeling determines salient words for each topic, which have topic-specific probabilities, with the rest explained by a universal shared model. Further, in LDA all topics are in principle present in every document. By contrast our model gives sparse topic representation, determining the (small) subset of relevant topics for each document. We derive a Bayesian Information Criterion (BIC), balancing model complexity and goodness of fit. Here, interestingly, we identify an effective sample size and corresponding penalty specific to each parameter type in our model. We minimize BIC to jointly determine our entire model -- the topic-specific words, document-specific topics, all model parameter values, and the total number of topics -- in a wholly unsupervised fashion. Results on three text corpora and an image dataset show that our model achieves higher test set likelihood and better agreement with ground-truth class labels, compared to LDA and to a model designed to incorporate sparsity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2021

Concentrated Document Topic Model

We propose a Concentrated Document Topic Model(CDTM) for unsupervised te...
research
05/04/2012

Variable Selection for Latent Dirichlet Allocation

In latent Dirichlet allocation (LDA), topics are multinomial distributio...
research
10/16/2014

Graph-Sparse LDA: A Topic Model with Structured Sparsity

Originally designed to model text, topic modeling has become a powerful ...
research
12/07/2015

Jointly Modeling Topics and Intents with Global Order Structure

Modeling document structure is of great importance for discourse analysi...
research
02/06/2021

Exclusive Topic Modeling

We propose an Exclusive Topic Modeling (ETM) for unsupervised text class...
research
09/18/2023

A Novel Method of Fuzzy Topic Modeling based on Transformer Processing

Topic modeling is admittedly a convenient way to monitor markets trend. ...
research
03/15/2017

A Hybrid Supervised-unsupervised Method on Image Topic Visualization with Convolutional Neural Network and LDA

Given the progress in image recognition with recent data driven paradigm...

Please sign up or login with your details

Forgot password? Click here to reset