A Geometrical Approach to Topic Model Estimation

08/16/2016
by   Zheng Tracy Ke, et al.
0

In the probabilistic topic models, the quantity of interest---a low-rank matrix consisting of topic vectors---is hidden in the text corpus matrix, masked by noise, and the Singular Value Decomposition (SVD) is a potentially useful tool for learning such a low-rank matrix. However, the connection between this low-rank matrix and the singular vectors of the text corpus matrix are usually complicated and hard to spell out, so how to use SVD for learning topic models faces challenges. In this paper, we overcome the challenge by revealing a surprising insight: there is a low-dimensional simplex structure which can be viewed as a bridge between the low-rank matrix of interest and the SVD of the text corpus matrix, and allows us to conveniently reconstruct the former using the latter. Such an insight motivates a new SVD approach to learning topic models, which we analyze with delicate random matrix theory and derive the rate of convergence. We support our methods and theory numerically, using both simulated data and real data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2018

Subspace-Orbit Randomized Decomposition for Low-rank Matrix Approximation

An efficient, accurate and reliable approximation of a matrix by one of ...
research
02/22/2017

On the Power of Truncated SVD for General High-rank Matrix Estimation Problems

We show that given an estimate A that is close to a general high-rank po...
research
04/21/2021

Accurate and fast matrix factorization for low-rank learning

In this paper we tackle two important challenges related to the accurate...
research
10/07/2013

Singular Value Decomposition of Images from Scanned Photographic Plates

We want to approximate the mxn image A from scanned astronomical photogr...
research
05/24/2018

Confidence region of singular vectors for high-dimensional and low-rank matrix regression

Let M∈R^m_1× m_2 be an unknown matrix with r= rank( M)≪(m_1,m_2) whose ...
research
05/24/2018

Confidence interval of singular vectors for high-dimensional and low-rank matrix regression

Let M∈R^m_1× m_2 be an unknown matrix with r= rank( M)≪(m_1,m_2) whose ...
research
12/06/2019

Hybrid Kronecker Product Decomposition and Approximation

Discovering the underlying low dimensional structure of high dimensional...

Please sign up or login with your details

Forgot password? Click here to reset