Ordering-sensitive and Semantic-aware Topic Modeling

02/12/2015
by   Min Yang, et al.
0

Topic modeling of textual corpora is an important and challenging problem. In most previous work, the "bag-of-words" assumption is usually made which ignores the ordering of words. This assumption simplifies the computation, but it unrealistically loses the ordering information and the semantic of words in the context. In this paper, we present a Gaussian Mixture Neural Topic Model (GMNTM) which incorporates both the ordering of words and the semantic meaning of sentences into topic modeling. Specifically, we represent each topic as a cluster of multi-dimensional vectors and embed the corpus into a collection of vectors generated by the Gaussian mixture model. Each word is affected not only by its topic, but also by the embedding vector of its surrounding words and the context. The Gaussian mixture components and the topic of documents, sentences and words can be learnt jointly. Extensive experiments show that our model can learn better topics and more accurate word distributions for each topic. Quantitatively, comparing to state-of-the-art topic modeling approaches, GMNTM obtains significantly better performance in terms of perplexity, retrieval accuracy and classification accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/19/2020

Top2Vec: Distributed Representations of Topics

Topic modeling is used for discovering latent semantic structure, usuall...
research
08/11/2020

Context Reinforced Neural Topic Modeling over Short Texts

As one of the prevalent topic mining tools, neural topic modeling has at...
research
02/14/2012

Multidimensional counting grids: Inferring word order from disordered bags of words

Models of bags of words typically assume topic mixing so that the words ...
research
11/19/2015

Gaussian Mixture Embeddings for Multiple Word Prototypes

Recently, word representation has been increasingly focused on for its e...
research
04/29/2023

The Effectiveness of Applying Different Strategies on Recognition and Recall Textual Password

Using English words as passwords have been a popular topic in the last f...
research
09/29/2019

Lifelong Neural Topic Learning in Contextualized Autoregressive Topic Models of Language via Informative Transfers

Topic models such as LDA, DocNADE, iDocNADEe have been popular in docume...
research
06/15/2021

Unsupervised Abstractive Opinion Summarization by Generating Sentences with Tree-Structured Topic Guidance

This paper presents a novel unsupervised abstractive summarization metho...

Please sign up or login with your details

Forgot password? Click here to reset