DeepAI AI Chat
Log In Sign Up

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec

by   Christopher E Moody, et al.

Distributed dense word vectors have been shown to be effective at capturing token-level semantic and syntactic regularities in language, while topic models can form interpretable representations over documents. In this work, we describe lda2vec, a model that learns dense word vectors jointly with Dirichlet-distributed latent document-level mixtures of topic vectors. In contrast to continuous dense document representations, this formulation produces sparse, interpretable document mixtures through a non-negative simplex constraint. Our method is simple to incorporate into existing automatic differentiation frameworks and allows for unsupervised document representations geared for use by scientists while simultaneously learning word vectors and the linear relationships between them.


page 1

page 2

page 3

page 4


Inductive Document Network Embedding with Topic-Word Attention

Document network embedding aims at learning representations for a struct...

Bilingual Distributed Word Representations from Document-Aligned Comparable Data

We propose a new model for learning bilingual word representations from ...

Semantic Regularities in Document Representations

Recent work exhibited that distributed word representations are good at ...

Lucene for Approximate Nearest-Neighbors Search on Arbitrary Dense Vectors

We demonstrate three approaches for adapting the open-source Lucene sear...

Learning Topic Models by Neighborhood Aggregation

Topic models are one of the most frequently used models in machine learn...

Learning Topic-Sensitive Word Representations

Distributed word representations are widely used for modeling words in N...

Sparse Lifting of Dense Vectors: Unifying Word and Sentence Representations

As the first step in automated natural language processing, representing...

Code Repositories


Experiments in identifying someone's interests/knowledge using word embedding & topic modeling

view repo