Continuous Semantic Topic Embedding Model Using Variational Autoencoder

11/24/2017
by   Namkyu Jung, et al.
0

This paper proposes the continuous semantic topic embedding model (CSTEM) which finds latent topic variables in documents using continuous semantic distance function between the topics and the words by means of the variational autoencoder(VAE). The semantic distance could be represented by any symmetric bell-shaped geometric distance function on the Euclidean space, for which the Mahalanobis distance is used in this paper. In order for the semantic distance to perform more properly, we newly introduce an additional model parameter for each word to take out the global factor from this distance indicating how likely it occurs regardless of its topic. It certainly improves the problem that the Gaussian distribution which is used in previous topic model with continuous word embedding could not explain the semantic relation correctly and helps to obtain the higher topic coherence. Through the experiments with the dataset of 20 Newsgroup, NIPS papers and CNN/Dailymail corpus, the performance of the recent state-of-the-art models is accomplished by our model as well as generating topic embedding vectors which makes possible to observe where the topic vectors are embedded with the word vectors in the real Euclidean space and how the topics are related each other semantically.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/19/2020

Top2Vec: Distributed Representations of Topics

Topic modeling is used for discovering latent semantic structure, usuall...
research
06/09/2016

Generative Topic Embedding: a Continuous Representation of Documents (Extended Version with Proofs)

Word embedding maps words into a low-dimensional continuous embedding sp...
research
09/10/2019

Neural Embedding Allocation: Distributed Representations of Topic Models

Word embedding models such as the skip-gram learn vector representations...
research
05/25/2023

Diversity-Aware Coherence Loss for Improving Neural Topic Models

The standard approach for neural topic modeling uses a variational autoe...
research
10/31/2022

Latent Semantic Structure in Malicious Programs

Latent Semantic Analysis is a method of matrix decomposition used for di...
research
06/28/2021

Integrating topic modeling and word embedding to characterize violent deaths

There is an escalating need for methods to identify latent patterns in t...
research
05/01/2019

Nested Variational Autoencoder for Topic Modeling on Microtexts with Word Vectors

Most of the information on the Internet is represented in the form of mi...

Please sign up or login with your details

Forgot password? Click here to reset