Discovering topics with neural topic models built from PLSA assumptions

11/25/2019
by   Sileye 0. Ba, et al.
0

In this paper we present a model for unsupervised topic discovery in texts corpora. The proposed model uses documents, words, and topics lookup table embedding as neural network model parameters to build probabilities of words given topics, and probabilities of topics given documents. These probabilities are used to recover by marginalization probabilities of words given documents. For very large corpora where the number of documents can be in the order of billions, using a neural auto-encoder based document embedding is more scalable then using a lookup table embedding as classically done. We thus extended the lookup based document embedding model to continuous auto-encoder based model. Our models are trained using probabilistic latent semantic analysis (PLSA) assumptions. We evaluated our models on six datasets with a rich variety of contents. Conducted experiments demonstrate that the proposed neural topic models are very effective in capturing relevant topics. Furthermore, considering perplexity metric, conducted evaluation benchmarks show that our topic models outperform latent Dirichlet allocation (LDA) model which is classically used to address topic discovery tasks.

READ FULL TEXT
research
08/19/2020

Top2Vec: Distributed Representations of Topics

Topic modeling is used for discovering latent semantic structure, usuall...
research
10/12/2021

Topic Model Supervised by Understanding Map

Inspired by the notion of Center of Mass in physics, an extension called...
research
07/18/2017

Discovering topics in text datasets by visualizing relevant words

When dealing with large collections of documents, it is imperative to qu...
research
08/01/2017

SenGen: Sentence Generating Neural Variational Topic Model

We present a new topic model that generates documents by sampling a topi...
research
08/20/2019

Learning document embeddings along with their uncertainties

Majority of the text modelling techniques yield only point estimates of ...
research
10/26/2014

A provable SVD-based algorithm for learning topics in dominant admixture corpus

Topic models, such as Latent Dirichlet Allocation (LDA), posit that docu...
research
07/30/2020

Is there something I'm missing? Topic Modeling in eDiscovery

In legal eDiscovery, the parties are required to search through their el...

Please sign up or login with your details

Forgot password? Click here to reset