Necessary and Sufficient Conditions and a Provably Efficient Algorithm for Separable Topic Discovery

08/23/2015
by   Weicong Ding, et al.
0

We develop necessary and sufficient conditions and a novel provably consistent and efficient algorithm for discovering topics (latent factors) from observations (documents) that are realized from a probabilistic mixture of shared latent factors that have certain properties. Our focus is on the class of topic models in which each shared latent factor contains a novel word that is unique to that factor, a property that has come to be known as separability. Our algorithm is based on the key insight that the novel words correspond to the extreme points of the convex hull formed by the row-vectors of a suitably normalized word co-occurrence matrix. We leverage this geometric insight to establish polynomial computation and sample complexity bounds based on a few isotropic random projections of the rows of the normalized word co-occurrence matrix. Our proposed random-projections-based algorithm is naturally amenable to an efficient distributed implementation and is attractive for modern web-scale distributed data mining applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/03/2015

Learning Mixed Membership Mallows Models from Pairwise Comparisons

We propose a novel parameterized family of Mixed Membership Mallows Mode...
research
03/15/2013

Topic Discovery through Data Dependent and Random Projections

We present algorithms for topic modeling based on the geometry of cross-...
research
01/05/2013

A New Geometric Approach to Latent Topic Modeling and Discovery

A new geometrically-motivated algorithm for nonnegative matrix factoriza...
research
08/13/2013

When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity

Overcomplete latent representations have been very popular for unsupervi...
research
08/22/2020

On the Identifiability of Latent Class Models for Multiple-Systems Estimation

Latent class models have recently become popular for multiple-systems es...
research
10/08/2021

Learning Topic Models: Identifiability and Finite-Sample Analysis

Topic models provide a useful text-mining tool for learning, extracting ...

Please sign up or login with your details

Forgot password? Click here to reset