A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

05/17/2018
by   Xin Bing, et al.
0

We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates the number of topics K from the observed data. We derive new finite sample minimax lower bounds for the estimation of A, as well as new upper bounds for our proposed estimator. We describe the scenarios where our estimator is minimax adaptive. Our finite sample analysis is valid for any number of documents (n), individual document length (N_i), dictionary size (p) and number of topics (K), and both p and K are allowed to increase with n, a situation not handled well by previous analyses. We complement our theoretical results with a detailed simulation study. We illustrate that the new algorithm is faster and more accurate than the current ones, although we start out with a computational and theoretical disadvantage of not knowing the correct number of topics K, while we provide the competing methods with the correct value in our simulations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/22/2020

Optimal estimation of sparse topic models

Topic models have become popular tools for dimension reduction and explo...
research
07/08/2021

Assigning Topics to Documents by Successive Projections

Topic models provide a useful tool to organize and understand the struct...
research
10/08/2021

Learning Topic Models: Identifiability and Finite-Sample Analysis

Topic models provide a useful text-mining tool for learning, extracting ...
research
10/17/2019

Finite sample deviation and variance bounds for first order autoregressive processes

In this paper, we study finite-sample properties of the least squares es...
research
07/12/2021

Likelihood estimation of sparse topic distributions in topic models and its applications to Wasserstein document distance calculations

This paper studies the estimation of high-dimensional, discrete, possibl...
research
10/09/2017

Conic Scan-and-Cover algorithms for nonparametric topic modeling

We propose new algorithms for topic modeling when the number of topics i...
research
03/25/2015

Quantized Nonparametric Estimation over Sobolev Ellipsoids

We formulate the notion of minimax estimation under storage or communica...

Please sign up or login with your details

Forgot password? Click here to reset