Guaranteed Model Order Estimation and Sample Complexity Bounds for LDA

12/10/2013
by   E. D. Gutiérrez, et al.
0

The question of how to determine the number of independent latent factors (topics) in mixture models such as Latent Dirichlet Allocation (LDA) is of great practical importance. In most applications, the exact number of topics is unknown, and depends on the application and the size of the data set. Bayesian nonparametric methods can avoid the problem of topic number selection, but they can be impracticably slow for large sample sizes and are subject to local optima. We develop a guaranteed procedure for topic number recovery that does not necessitate learning the model's latent parameters beforehand. Our procedure relies on adapting results from random matrix theory. Performance of our topic number recovery procedure is superior to hLDA, a nonparametric method. We also discuss some implications of our results on the sample complexity and accuracy of popular spectral learning algorithms for LDA. Our results and procedure can be extended to spectral learning algorithms for other exchangeable mixture models as well as Hidden Markov Models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/30/2012

A Spectral Algorithm for Latent Dirichlet Allocation

The problem of topic modeling can be seen as a generalization of the clu...
research
02/19/2016

Spectral Learning for Supervised Topic Models

Supervised topic models simultaneously model the latent topic structure ...
research
05/30/2016

Spectral Methods for Correlated Topic Models

In this paper, we propose guaranteed spectral methods for learning a bro...
research
09/02/2020

Local-HDP: Interactive Open-Ended 3D Object Categorization

We introduce a non-parametric hierarchical Bayesian approach for open-en...
research
03/28/2012

Spectral dimensionality reduction for HMMs

Hidden Markov Models (HMMs) can be accurately approximated using co-occu...
research
10/23/2014

Model Selection for Topic Models via Spectral Decomposition

Topic models have achieved significant successes in analyzing large-scal...
research
03/31/2017

Spectral Methods for Nonparametric Models

Nonparametric models are versatile, albeit computationally expensive, to...

Please sign up or login with your details

Forgot password? Click here to reset