Topic Modeling via Full Dependence Mixtures

06/13/2019
by   Dan Fisher, et al.
0

We consider the topic modeling problem for large datasets. For this problem, Latent Dirichlet Allocation (LDA) with a collapsed Gibbs sampler optimization is the state-of-the-art approach in terms of topic quality. However, LDA is a slow approach, and running it on large datasets is impractical even with modern hardware. In this paper we propose to fit topics directly to the co-occurances data of the corpus. In particular, we introduce an extension of a mixture model, the Full Dependence Mixture (FDM), which arises naturally as a model of a second moment under general generative assumptions on the data. While there is some previous work on topic modeling using second moments, we develop a direct stochastic optimization procedure for fitting an FDM with a single Kullback Leibler objective. While moment methods in general have the benefit that an iteration no longer needs to scale with the size of the corpus, our approach also allows us to leverage standard optimizers and GPUs for the problem of topic modeling. We evaluate the approach on synthetic and semi-synthetic data, as well as on the SOTU and Neurips Papers corpora, and show that the approach outperforms LDA, where LDA is run on both full and sub-sampled data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/19/2011

Using Variational Inference and MapReduce to Scale Topic Modeling

Latent Dirichlet Allocation (LDA) is a popular topic modeling technique ...
research
04/12/2017

Polya Urn Latent Dirichlet Allocation: a doubly sparse massively parallel sampler

Latent Dirichlet Allocation (LDA) is a topic model widely used in natura...
research
04/07/2016

Combinatorial Topic Models using Small-Variance Asymptotics

Topic models have emerged as fundamental tools in unsupervised machine l...
research
09/02/2013

Scalable Probabilistic Entity-Topic Modeling

We present an LDA approach to entity disambiguation. Each topic is assoc...
research
04/15/2021

AI supported Topic Modeling using KNIME-Workflows

Topic modeling algorithms traditionally model topics as list of weighted...
research
07/07/2015

Rethinking LDA: moment matching for discrete ICA

We consider moment matching techniques for estimation in Latent Dirichle...
research
11/30/2016

Anchored Correlation Explanation: Topic Modeling with Minimal Domain Knowledge

While generative models such as Latent Dirichlet Allocation (LDA) have p...

Please sign up or login with your details

Forgot password? Click here to reset