MetaLDA: a Topic Model that Efficiently Incorporates Meta information

09/19/2017
by   He Zhao, et al.
0

Besides the text content, documents and their associated words usually come with rich sets of meta informa- tion, such as categories of documents and semantic/syntactic features of words, like those encoded in word embeddings. Incorporating such meta information directly into the generative process of topic models can improve modelling accuracy and topic quality, especially in the case where the word-occurrence information in the training data is insufficient. In this paper, we present a topic model, called MetaLDA, which is able to leverage either document or word meta information, or both of them jointly. With two data argumentation techniques, we can derive an efficient Gibbs sampling algorithm, which benefits from the fully local conjugacy of the model. Moreover, the algorithm is favoured by the sparsity of the meta information. Extensive experiments on several real world datasets demonstrate that our model achieves comparable or improved performance in terms of both perplexity and topic quality, particularly in handling sparse texts. In addition, compared with other models using meta information, our model runs significantly faster.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/16/2023

BERTTM: Leveraging Contextualized Word Embeddings from Pre-trained Language Models for Neural Topic Modeling

With the development of neural topic models in recent years, topic model...
research
08/12/2020

Neural Sinkhorn Topic Model

In this paper, we present a new topic modelling approach via the theory ...
research
11/19/2017

Prior-aware Dual Decomposition: Document-specific Topic Inference for Spectral Topic Models

Spectral topic modeling algorithms operate on matrices/tensors of word c...
research
09/15/2018

Document Informed Neural Autoregressive Topic Models with Distributional Prior

We address two challenges in topic models: (1) Context information aroun...
research
10/20/2020

Topic-Aware Abstractive Text Summarization

Automatic text summarization aims at condensing a document to a shorter ...
research
10/09/2018

textTOvec: Deep Contextualized Neural Autoregressive Topic Models of Language with Distributed Compositional Prior

We address two challenges of probabilistic topic modelling in order to b...
research
11/29/2018

Sequential Embedding Induced Text Clustering, a Non-parametric Bayesian Approach

Current state-of-the-art nonparametric Bayesian text clustering methods ...

Please sign up or login with your details

Forgot password? Click here to reset