Jointly Learning Word Embeddings and Latent Topics

06/21/2017
by   Bei Shi, et al.
0

Word embedding models such as Skip-gram learn a vector-space representation for each word, based on the local word collocation patterns that are observed in a text corpus. Latent topic models, on the other hand, take a more global view, looking at the word distributions across the corpus to assign a topic to each word occurrence. These two paradigms are complementary in how they represent the meaning of word occurrences. While some previous works have already looked at using word embeddings for improving the quality of latent topics, and conversely, at using latent topics for improving word embeddings, such "two-step" methods cannot capture the mutual interaction between the two paradigms. In this paper, we propose STE, a framework which can learn word embeddings and latent topics in a unified manner. STE naturally obtains topic-specific word embeddings, and thus addresses the issue of polysemy. At the same time, it also learns the term distributions of the topics, and the topic distributions of the documents. Our experimental results demonstrate that the STE model can indeed generate useful topic-specific word embeddings and coherent latent topics in an effective and efficient way.

READ FULL TEXT
research
08/11/2020

A Neural Generative Model for Joint Learning Topics and Topic-Specific Word Embeddings

We propose a novel generative model to explore both local and global con...
research
09/12/2018

Distilled Wasserstein Learning for Word Embedding and Topic Modeling

We propose a novel Wasserstein method with a distillation mechanism, yie...
research
03/15/2016

Topic Modeling Using Distributed Word Embeddings

We propose a new algorithm for topic modeling, Vec2Topic, that identifie...
research
05/04/2021

Unsupervised Graph-based Topic Modeling from Video Transcriptions

To unfold the tremendous amount of audiovisual data uploaded daily to so...
research
12/28/2017

Corpus specificity in LSA and Word2vec: the role of out-of-domain documents

Latent Semantic Analysis (LSA) and Word2vec are some of the most widely ...
research
02/21/2018

CoVeR: Learning Covariate-Specific Vector Representations with Tensor Decompositions

Word embedding is a useful approach to capture co-occurrence structures ...
research
07/22/2020

Better Early than Late: Fusing Topics with Word Embeddings for Neural Question Paraphrase Identification

Question paraphrase identification is a key task in Community Question A...

Please sign up or login with your details

Forgot password? Click here to reset