Learning Topic Models: Identifiability and Finite-Sample Analysis

10/08/2021
by   Yinyin Chen, et al.
0

Topic models provide a useful text-mining tool for learning, extracting and discovering latent structures in large text corpora. Although a plethora of methods have been proposed for topic modeling, a formal theoretical investigation on the statistical identifiability and accuracy of latent topic estimation is lacking in the literature. In this paper, we propose a maximum likelihood estimator (MLE) of latent topics based on a specific integrated likelihood, which is naturally connected to the concept of volume minimization in computational geometry. Theoretically, we introduce a new set of geometric conditions for topic model identifiability, which are weaker than conventional separability conditions relying on the existence of anchor words or pure topic documents. We conduct finite-sample error analysis for the proposed estimator and discuss the connection of our results with existing ones. We conclude with empirical studies on both simulated and real datasets.

READ FULL TEXT

page 36

page 37

page 38

research
05/17/2018

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

We propose a new method of estimation in topic models, that is not a var...
research
03/15/2012

Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream

Topic models have proven to be a useful tool for discovering latent stru...
research
10/09/2017

Conic Scan-and-Cover algorithms for nonparametric topic modeling

We propose new algorithms for topic modeling when the number of topics i...
research
10/30/2017

Convergence Rates of Latent Topic Models Under Relaxed Identifiability Conditions

In this paper we study the frequentist convergence rate for the Latent D...
research
07/13/2021

Semiparametric Latent Topic Modeling on Consumer-Generated Corpora

Legacy procedures for topic modelling have generally suffered problems o...
research
01/05/2017

Crime Topic Modeling

The classification of crime into discrete categories entails a massive l...
research
08/23/2015

Necessary and Sufficient Conditions and a Provably Efficient Algorithm for Separable Topic Discovery

We develop necessary and sufficient conditions and a novel provably cons...

Please sign up or login with your details

Forgot password? Click here to reset