Improving the Inference of Topic Models via Infinite Latent State Replications

01/25/2023
by   Daniel Rugeles, et al.
0

In text mining, topic models are a type of probabilistic generative models for inferring latent semantic topics from text corpus. One of the most popular inference approaches to topic models is perhaps collapsed Gibbs sampling (CGS), which typically samples one single topic label for each observed document-word pair. In this paper, we aim at improving the inference of CGS for topic models. We propose to leverage state augmentation technique by maximizing the number of topic samples to infinity, and then develop a new inference approach, called infinite latent state replication (ILR), to generate robust soft topic assignment for each given document-word pair. Experimental results on the publicly available datasets show that ILR outperforms CGS for inference of existing established topic models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/11/2018

Learning Topics using Semantic Locality

The topic modeling discovers the latent topic probability of the given t...
research
11/11/2015

Hierarchical Latent Semantic Mapping for Automated Topic Generation

Much of information sits in an unprecedented amount of text data. Managi...
research
11/19/2017

Prior-aware Dual Decomposition: Document-specific Topic Inference for Spectral Topic Models

Spectral topic modeling algorithms operate on matrices/tensors of word c...
research
01/15/2013

Factorized Topic Models

In this paper we present a modification to a latent topic model, which m...
research
07/04/2012

Mining Associated Text and Images with Dual-Wing Harmoniums

We propose a multi-wing harmonium model for mining multimedia data that ...
research
12/15/2015

Towards Evaluation of Cultural-scale Claims in Light of Topic Model Sampling Effects

Cultural-scale models of full text documents are prone to over-interpret...
research
03/30/2015

Infinite Author Topic Model based on Mixed Gamma-Negative Binomial Process

Incorporating the side information of text corpus, i.e., authors, time s...

Please sign up or login with your details

Forgot password? Click here to reset