Topic Discovery through Data Dependent and Random Projections

03/15/2013
by   Weicong Ding, et al.
0

We present algorithms for topic modeling based on the geometry of cross-document word-frequency patterns. This perspective gains significance under the so called separability condition. This is a condition on existence of novel-words that are unique to each topic. We present a suite of highly efficient algorithms based on data-dependent and random projections of word-frequency patterns to identify novel words and associated topics. We will also discuss the statistical guarantees of the data-dependent projections method based on two mild assumptions on the prior density of topic document matrix. Our key insight here is that the maximum and minimum values of cross-document frequency patterns projected along any direction are associated with novel words. While our sample complexity bounds for topic recovery are similar to the state-of-art, the computational complexity of our random projection scheme scales linearly with the number of documents and the number of words per document. We present several experiments on synthetic and real-world datasets to demonstrate qualitative and quantitative merits of our scheme.

READ FULL TEXT

page 8

page 9

research
01/05/2013

A New Geometric Approach to Latent Topic Modeling and Discovery

A new geometrically-motivated algorithm for nonnegative matrix factoriza...
research
01/05/2018

Knowledge-based Word Sense Disambiguation using Topic Models

Word Sense Disambiguation is an open problem in Natural Language Process...
research
08/23/2015

Necessary and Sufficient Conditions and a Provably Efficient Algorithm for Separable Topic Discovery

We develop necessary and sufficient conditions and a novel provably cons...
research
08/05/2020

BATS: A Spectral Biclustering Approach to Single Document Topic Modeling and Segmentation

Existing topic modeling and text segmentation methodologies generally re...
research
11/04/2016

Generalized Topic Modeling

Recently there has been significant activity in developing algorithms wi...
research
11/19/2017

Prior-aware Dual Decomposition: Document-specific Topic Inference for Spectral Topic Models

Spectral topic modeling algorithms operate on matrices/tensors of word c...

Please sign up or login with your details

Forgot password? Click here to reset