On-the-Fly Rectification for Robust Large-Vocabulary Topic Inference

11/12/2021
by   Moontae Lee, et al.
0

Across many data domains, co-occurrence statistics about the joint appearance of objects are powerfully informative. By transforming unsupervised learning problems into decompositions of co-occurrence statistics, spectral algorithms provide transparent and efficient algorithms for posterior inference such as latent topic analysis and community detection. As object vocabularies grow, however, it becomes rapidly more expensive to store and run inference algorithms on co-occurrence statistics. Rectifying co-occurrence, the key process to uphold model assumptions, becomes increasingly more vital in the presence of rare terms, but current techniques cannot scale to large vocabularies. We propose novel methods that simultaneously compress and rectify co-occurrence statistics, scaling gracefully with the size of vocabulary and the dimension of latent space. We also present new algorithms learning latent variables from the compressed statistics, and verify that our methods perform comparably to previous approaches on both textual and non-textual data.

READ FULL TEXT

page 7

page 8

research
05/21/2016

Latent Tree Models for Hierarchical Topic Detection

We present a novel method for hierarchical topic detection where topics ...
research
07/30/2020

Label or Message: A Large-Scale Experimental Survey of Texts and Objects Co-Occurrence

Our daily life is surrounded by textual information. Nowadays, the autom...
research
03/21/2018

Scalable Generalized Dynamic Topic Models

Dynamic topic models (DTMs) model the evolution of prevalent themes in l...
research
12/19/2013

Using Web Co-occurrence Statistics for Improving Image Categorization

Object recognition and localization are important tasks in computer visi...
research
09/25/2019

LAVAE: Disentangling Location and Appearance

We propose a probabilistic generative model for unsupervised learning of...
research
09/20/2022

Learning Sparse Latent Representations for Generator Model

Sparsity is a desirable attribute. It can lead to more efficient and mor...
research
04/30/2022

To Know by the Company Words Keep and What Else Lies in the Vicinity

The development of state-of-the-art (SOTA) Natural Language Processing (...

Please sign up or login with your details

Forgot password? Click here to reset