Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models

06/11/2015
by   Måns Magnusson, et al.
0

Topic models, and more specifically the class of Latent Dirichlet Allocation (LDA), are widely used for probabilistic modeling of text. MCMC sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler. We propose a parallel sparse partially collapsed Gibbs sampler and compare its speed and efficiency to state-of-the-art samplers for topic models on five well-known text corpora of differing sizes and properties. In particular, we propose and compare two different strategies for sampling the parameter block with latent topic indicators. The experiments show that the increase in statistical inefficiency from only partial collapsing is smaller than commonly assumed, and can be more than compensated by the speedup from parallelization and sparsity on larger corpora. We also prove that the partially collapsed samplers scale well with the size of the corpus. The proposed algorithm is fast, efficient, exact, and can be used in more modeling situations than the ordinary collapsed sampler.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/12/2017

Polya Urn Latent Dirichlet Allocation: a doubly sparse massively parallel sampler

Latent Dirichlet Allocation (LDA) is a topic model widely used in natura...
research
11/12/2018

Modeling Text Complexity using a Multi-Scale Probit

We present a novel model for text complexity analysis which can be fitte...
research
06/06/2019

Sparse Parallel Training of Hierarchical Dirichlet Process Topic Models

Nonparametric extensions of topic models such as Latent Dirichlet Alloca...
research
09/02/2013

Scalable Probabilistic Entity-Topic Modeling

We present an LDA approach to entity disambiguation. Each topic is assoc...
research
08/02/2016

Blocking Collapsed Gibbs Sampler for Latent Dirichlet Allocation Models

The latent Dirichlet allocation (LDA) model is a widely-used latent vari...
research
08/19/2022

SimLDA: A tool for topic model evaluation

Variational Bayes (VB) applied to latent Dirichlet allocation (LDA) has ...
research
12/10/2015

Inference in topic models: sparsity and trade-off

Topic models are popular for modeling discrete data (e.g., texts, images...

Please sign up or login with your details

Forgot password? Click here to reset