A New Geometric Approach to Latent Topic Modeling and Discovery

01/05/2013
by   Weicong Ding, et al.
0

A new geometrically-motivated algorithm for nonnegative matrix factorization is developed and applied to the discovery of latent "topics" for text and image "document" corpora. The algorithm is based on robustly finding and clustering extreme points of empirical cross-document word-frequencies that correspond to novel "words" unique to each topic. In contrast to related approaches that are based on solving non-convex optimization problems using suboptimal approximations, locally-optimal methods, or heuristics, the new algorithm is convex, has polynomial complexity, and has competitive qualitative and quantitative performance compared to the current state-of-the-art approaches on synthetic and real-world datasets.

READ FULL TEXT

page 3

page 4

research
03/15/2013

Topic Discovery through Data Dependent and Random Projections

We present algorithms for topic modeling based on the geometry of cross-...
research
02/25/2010

Syntactic Topic Models

The syntactic topic model (STM) is a Bayesian nonparametric model of lan...
research
12/11/2014

A Topic Modeling Approach to Ranking

We propose a topic modeling approach to the prediction of preferences in...
research
08/23/2015

Necessary and Sufficient Conditions and a Provably Efficient Algorithm for Separable Topic Discovery

We develop necessary and sufficient conditions and a novel provably cons...
research
05/10/2019

A New Anchor Word Selection Method for the Separable Topic Discovery

Separable Non-negative Matrix Factorization (SNMF) is an important metho...
research
04/03/2015

Learning Mixed Membership Mallows Models from Pairwise Comparisons

We propose a novel parameterized family of Mixed Membership Mallows Mode...
research
02/14/2018

Robust Continuous Co-Clustering

Clustering consists of grouping together samples giving their similar pr...

Please sign up or login with your details

Forgot password? Click here to reset