Source-LDA: Enhancing probabilistic topic models using prior knowledge sources

06/02/2016
by   Justin Wood, et al.
0

A popular approach to topic modeling involves extracting co-occurring n-grams of a corpus into semantic themes. The set of n-grams in a theme represents an underlying topic, but most topic modeling approaches are not able to label these sets of words with a single n-gram. Such labels are useful for topic identification in summarization systems. This paper introduces a novel approach to labeling a group of n-grams comprising an individual topic. The approach taken is to complement the existing topic distributions over words with a known distribution based on a predefined set of topics. This is done by integrating existing labeled knowledge sources representing known potential topics into the probabilistic topic model. These knowledge sources are translated into a distribution and used to set the hyperparameters of the Dirichlet generated distribution over words. In the inference these modified distributions guide the convergence of the latent topics to conform with the complementary distributions. This approach ensures that the topic inference process is consistent with existing knowledge. The label assignment from the complementary knowledge sources are then transferred to the latent topics of the corpus. The results show both accurate label assignment to topics as well as improved topic generation than those obtained using various labeling approaches based off Latent Dirichlet allocation (LDA).

READ FULL TEXT
research
07/24/2015

The Polylingual Labeled Topic Model

In this paper, we present the Polylingual Labeled Topic Model, a model w...
research
05/17/2021

TopicsRanksDC: Distance-based Topic Ranking applied on Two-Class Data

In this paper, we introduce a novel approach named TopicsRanksDC for top...
research
04/26/2016

Entities as topic labels: Improving topic interpretability and evaluability combining Entity Linking and Labeled LDA

In order to create a corpus exploration method providing topics that are...
research
10/26/2014

A provable SVD-based algorithm for learning topics in dominant admixture corpus

Topic models, such as Latent Dirichlet Allocation (LDA), posit that docu...
research
09/27/2018

Semantic Topic Analysis of Traffic Camera Images

Traffic cameras are commonly deployed monitoring components in road infr...
research
02/27/2018

Classifying Idiomatic and Literal Expressions Using Topic Models and Intensity of Emotions

We describe an algorithm for automatic classification of idiomatic and l...
research
01/12/2017

Prior matters: simple and general methods for evaluating and improving topic quality in topic modeling

Latent Dirichlet Allocation (LDA) models trained without stopword remova...

Please sign up or login with your details

Forgot password? Click here to reset