Learning Topic Models by Neighborhood Aggregation

02/22/2018
by   Ryohei Hisano, et al.
0

Topic models are one of the most frequently used models in machine learning due to its high interpretability and modular structure. However extending the model to include supervisory signal, incorporate pre-trained word embedding vectors and add nonlinear output function to the model is not an easy task because one has to resort to highly intricate approximate inference procedure. In this paper, we show that topic models could be viewed as performing a neighborhood aggregation algorithm where the messages are passed through a network defined over words. Under the network view of topic models, nodes corresponds to words in a document and edges correspond to either a relationship describing co-occurring words in a document or a relationship describing same word in the corpus. The network view allows us to extend the model to include supervisory signals, incorporate pre-trained word embedding vectors and add nonlinear output function to the model in a simple manner. Moreover, we describe a simple way to train the model that is well suited in a semi-supervised setting where we only have supervisory signals for some portion of the corpus and the goal is to improve prediction performance in the held-out data. Through careful experiments we show that our approach outperforms state-of-the-art supervised Latent Dirichlet Allocation implementation in both held-out document classification tasks and topic coherence.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2020

Gaussian Hierarchical Latent Dirichlet Allocation: Bringing Polysemy Back

Topic models are widely used to discover the latent representation of a ...
research
10/15/2018

Improving Topic Models with Latent Feature Word Representations

Probabilistic topic models are widely used to discover latent topics in ...
research
02/24/2017

Dirichlet-vMF Mixture Model

This document is about the multi-document Von-Mises-Fisher mixture model...
research
05/06/2016

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec

Distributed dense word vectors have been shown to be effective at captur...
research
04/08/2019

Enriching Rare Word Representations in Neural Language Models by Embedding Matrix Augmentation

The neural language models (NLM) achieve strong generalization capabilit...
research
12/20/2016

SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations

We present a feature vector formation technique for documents - Sparse C...
research
11/22/2019

Topical Phrase Extraction from Clinical Reports by Incorporating both Local and Global Context

Making sense of words often requires to simultaneously examine the surro...

Please sign up or login with your details

Forgot password? Click here to reset