Scaling up Dynamic Topic Models

02/19/2016
by   Arnab Bhadury, et al.
0

Dynamic topic models (DTMs) are very effective in discovering topics and capturing their evolution trends in time series data. To do posterior inference of DTMs, existing methods are all batch algorithms that scan the full dataset before each update of the model and make inexact variational approximations with mean-field assumptions. Due to a lack of a more scalable inference algorithm, despite the usefulness, DTMs have not captured large topic dynamics. This paper fills this research void, and presents a fast and parallelizable inference algorithm using Gibbs Sampling with Stochastic Gradient Langevin Dynamics that does not make any unwarranted assumptions. We also present a Metropolis-Hastings based O(1) sampler for topic assignments for each word token. In a distributed environment, our algorithm requires very little communication between workers during sampling (almost embarrassingly parallel) and scales up to large-scale applications. We are able to learn the largest Dynamic Topic Model to our knowledge, and learned the dynamics of 1,000 topics from 2.6 million documents in less than half an hour, and our empirical results show that our algorithm is not only orders of magnitude faster than the baselines but also achieves lower perplexity.

READ FULL TEXT

page 6

page 7

research
09/24/2018

Streaming dynamic and distributed inference of latent geometric structures

We develop new models and algorithms for learning the temporal dynamics ...
research
02/23/2017

Scalable Inference for Nested Chinese Restaurant Process Topic Models

Nested Chinese Restaurant Process (nCRP) topic models are powerful nonpa...
research
10/07/2017

Topic Modeling based on Keywords and Context

Current topic models often suffer from discovering topics not matching h...
research
05/27/2016

Provable Algorithms for Inference in Topic Models

Recently, there has been considerable progress on designing algorithms w...
research
10/09/2013

Improved Bayesian Logistic Supervised Topic Models with Data Augmentation

Supervised topic models with a logistic likelihood have two issues that ...
research
10/09/2017

Conic Scan-and-Cover algorithms for nonparametric topic modeling

We propose new algorithms for topic modeling when the number of topics i...
research
01/14/2019

Large-Scale Joint Topic, Sentiment & User Preference Analysis for Online Reviews

This paper presents a non-trivial reconstruction of a previous joint top...

Please sign up or login with your details

Forgot password? Click here to reset