On Smoothing and Inference for Topic Models

05/09/2012
by   Arthur Asuncion, et al.
0

Latent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for modeling high-dimensional sparse count data. Various learning algorithms have been developed in recent years, including collapsed Gibbs sampling, variational inference, and maximum a posteriori estimation, and this variety motivates the need for careful empirical comparisons. In this paper, we highlight the close connections between these approaches. We find that the main differences are attributable to the amount of smoothing applied to the counts. When the hyperparameters are optimized, the differences in performance among the algorithms diminish significantly. The ability of these algorithms to achieve solutions of comparable accuracy gives us the freedom to select computationally efficient approaches. Using the insights gained from this comparative study, we show how accurate topic models can be learned in several seconds on text corpora with thousands of documents.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/27/2012

Sparse Stochastic Inference for Latent Dirichlet allocation

We present a hybrid algorithm for Bayesian topic models that combines th...
research
03/04/2017

Autoencoding Variational Inference For Topic Models

Topic models are one of the most popular methods for learning representa...
research
05/27/2016

Provable Algorithms for Inference in Topic Models

Recently, there has been considerable progress on designing algorithms w...
research
03/23/2015

On some provably correct cases of variational inference for topic models

Variational inference is a very efficient and popular heuristic used in ...
research
05/02/2017

Fuzzy Approach Topic Discovery in Health and Medical Corpora

The majority of medical documents and electronic health records (EHRs) a...
research
06/17/2019

Analyses of Multi-collection Corpora via Compound Topic Modeling

As electronically stored data grow in daily life, obtaining novel and re...
research
11/02/2016

Learning Methods for Dynamic Topic Modeling in Automated Behaviour Analysis

Semi-supervised and unsupervised systems provide operators with invaluab...

Please sign up or login with your details

Forgot password? Click here to reset