Document Informed Neural Autoregressive Topic Models with Distributional Prior

09/15/2018
by   Pankaj Gupta, et al.
0

We address two challenges in topic models: (1) Context information around words helps in determining their actual meaning, e.g., "networks" used in the contexts artificial neural networks vs. biological neuron networks. Generative topic models infer topic-word distributions, taking no or only little context into account. Here, we extend a neural autoregressive topic model to exploit the full context information around words in a document in a language modeling fashion. The proposed model is named as iDocNADE. (2) Due to the small number of word occurrences (i.e., lack of context) in short text and data sparsity in a corpus of few documents, the application of topic models is challenging on such texts. Therefore, we propose a simple and efficient way of incorporating external knowledge into neural autoregressive topic models: we use embeddings as a distributional prior. The proposed variants are named as DocNADE2 and iDocNADE2. We present novel neural autoregressive topic model variants that consistently outperform state-of-the-art generative topic models in terms of generalization, interpretability (topic coherence) and applicability (retrieval and classification) over 6 long-text and 8 short-text datasets from diverse domains.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/11/2018

Document Informed Neural Autoregressive Topic Models

Context information around words helps in determining their actual meani...
research
10/09/2018

textTOvec: Deep Contextualized Neural Autoregressive Topic Models of Language with Distributed Compositional Prior

We address two challenges of probabilistic topic modelling in order to b...
research
10/09/2018

textTOvec: Deep Contextualized Neural Autoregressive Models of Language with Distributed Compositional Prior

We address two challenges of probabilistic topic modelling in order to b...
research
10/31/2021

Conical Classification For Computationally Efficient One-Class Topic Determination

As the Internet grows in size, so does the amount of text based informat...
research
09/13/2014

A Deep and Autoregressive Approach for Topic Modeling of Multimodal Data

Topic modeling based on latent Dirichlet allocation (LDA) has been a fra...
research
09/19/2017

MetaLDA: a Topic Model that Efficiently Incorporates Meta information

Besides the text content, documents and their associated words usually c...
research
01/05/2018

Knowledge-based Word Sense Disambiguation using Topic Models

Word Sense Disambiguation is an open problem in Natural Language Process...

Please sign up or login with your details

Forgot password? Click here to reset