Optimal estimation of sparse topic models

01/22/2020
by   Xin Bing, et al.
0

Topic models have become popular tools for dimension reduction and exploratory analysis of text data which consists in observed frequencies of a vocabulary of p words in n documents, stored in a p× n matrix. The main premise is that the mean of this data matrix can be factorized into a product of two non-negative matrices: a p× K word-topic matrix A and a K× n topic-document matrix W. This paper studies the estimation of A that is possibly element-wise sparse, and the number of topics K is unknown. In this under-explored context, we derive a new minimax lower bound for the estimation of such A and propose a new computationally efficient algorithm for its recovery. We derive a finite sample upper bound for our estimator, and show that it matches the minimax lower bound in many scenarios. Our estimate adapts to the unknown sparsity of A and our analysis is valid for any finite n, p, K and document lengths. Empirical results on both synthetic data and semi-synthetic data show that our proposed estimator is a strong competitor of the existing state-of-the-art algorithms for both non-sparse A and sparse A, and has superior performance is many scenarios of interest.

READ FULL TEXT
research
05/17/2018

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

We propose a new method of estimation in topic models, that is not a var...
research
07/12/2021

Likelihood estimation of sparse topic distributions in topic models and its applications to Wasserstein document distance calculations

This paper studies the estimation of high-dimensional, discrete, possibl...
research
07/08/2021

Assigning Topics to Documents by Successive Projections

Topic models provide a useful tool to organize and understand the struct...
research
04/05/2022

Nearly minimax robust estimator of the mean vector by iterative spectral dimension reduction

We study the problem of robust estimation of the mean vector of a sub-Ga...
research
11/11/2017

Minimax estimation in linear models with unknown finite alphabet design

We provide minimax theory for joint estimation of F and ω in linear mode...
research
03/23/2023

PAC-Bayes Bounds for High-Dimensional Multi-Index Models with Unknown Active Dimension

The multi-index model with sparse dimension reduction matrix is a popula...
research
02/20/2014

Multi-Step Stochastic ADMM in High Dimensions: Applications to Sparse Optimization and Noisy Matrix Decomposition

We propose an efficient ADMM method with guarantees for high-dimensional...

Please sign up or login with your details

Forgot password? Click here to reset