DeepAI AI Chat
Log In Sign Up

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec

05/06/2016
by   Christopher E Moody, et al.
0

Distributed dense word vectors have been shown to be effective at capturing token-level semantic and syntactic regularities in language, while topic models can form interpretable representations over documents. In this work, we describe lda2vec, a model that learns dense word vectors jointly with Dirichlet-distributed latent document-level mixtures of topic vectors. In contrast to continuous dense document representations, this formulation produces sparse, interpretable document mixtures through a non-negative simplex constraint. Our method is simple to incorporate into existing automatic differentiation frameworks and allows for unsupervised document representations geared for use by scientists while simultaneously learning word vectors and the linear relationships between them.

READ FULL TEXT

page 1

page 2

page 3

page 4

01/10/2020

Inductive Document Network Embedding with Topic-Word Attention

Document network embedding aims at learning representations for a struct...
09/24/2015

Bilingual Distributed Word Representations from Document-Aligned Comparable Data

We propose a new model for learning bilingual word representations from ...
03/24/2016

Semantic Regularities in Document Representations

Recent work exhibited that distributed word representations are good at ...
10/22/2019

Lucene for Approximate Nearest-Neighbors Search on Arbitrary Dense Vectors

We demonstrate three approaches for adapting the open-source Lucene sear...
02/22/2018

Learning Topic Models by Neighborhood Aggregation

Topic models are one of the most frequently used models in machine learn...
05/01/2017

Learning Topic-Sensitive Word Representations

Distributed word representations are widely used for modeling words in N...
11/05/2019

Sparse Lifting of Dense Vectors: Unifying Word and Sentence Representations

As the first step in automated natural language processing, representing...

Code Repositories

vec2me

Experiments in identifying someone's interests/knowledge using word embedding & topic modeling


view repo