Distilled Wasserstein Learning for Word Embedding and Topic Modeling

09/12/2018
by   Hongteng Xu, et al.
0

We propose a novel Wasserstein method with a distillation mechanism, yielding joint learning of word embeddings and topics. The proposed method is based on the fact that the Euclidean distance between word embeddings may be employed as the underlying distance in the Wasserstein topic model. The word distributions of topics, their optimal transports to the word distributions of documents, and the embeddings of words are learned in a unified framework. When learning the topic model, we leverage a distilled underlying distance matrix to update the topic distributions and smoothly calculate the corresponding optimal transports. Such a strategy provides the updating of word embeddings with robust guidance, improving the algorithmic convergence. As an application, we focus on patient admission records, in which the proposed method embeds the codes of diseases and procedures and learns the topics of admissions, obtaining superior performance on clinically-meaningful disease network construction, mortality prediction as a function of admission codes, and procedure recommendation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2017

Jointly Learning Word Embeddings and Latent Topics

Word embedding models such as Skip-gram learn a vector-space representat...
research
08/11/2020

A Neural Generative Model for Joint Learning Topics and Topic-Specific Word Embeddings

We propose a novel generative model to explore both local and global con...
research
06/13/2019

Interpretable ICD Code Embeddings with Self- and Mutual-Attention Mechanisms

We propose a novel and interpretable embedding method to represent the i...
research
04/30/2020

Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too!

Topic models are a useful analysis tool to uncover the underlying themes...
research
07/22/2020

Better Early than Late: Fusing Topics with Word Embeddings for Neural Question Paraphrase Identification

Question paraphrase identification is a key task in Community Question A...
research
11/11/2022

Improving word mover's distance by leveraging self-attention matrix

Measuring the semantic similarity between two sentences is still an impo...
research
04/01/2016

Nonparametric Spherical Topic Modeling with Word Embeddings

Traditional topic models do not account for semantic regularities in lan...

Please sign up or login with your details

Forgot password? Click here to reset