Topic representation: finding more representative words in topic models

10/23/2018
by   Jinjin Chi, et al.
0

The top word list, i.e., the top-M words with highest marginal probability in a given topic, is the standard topic representation in topic models. Most of recent automatical topic labeling algorithms and popular topic quality metrics are based on it. However, we find, empirically, words in this type of top word list are not always representative. The objective of this paper is to find more representative top word lists for topics. To achieve this, we rerank the words in a given topic by further considering marginal probability on words over every other topic. The reranking list of top-M words is used to be a novel topic representation for topic models. We investigate three reranking methodologies, using (1) standard deviation weight, (2) standard deviation weight with topic size and (3) Chi Square ḩi̧2statistic selection. Experimental results on real world collections indicate that our representations can extract more representative words for topics, agreeing with human judgements.

READ FULL TEXT

page 13

page 14

research
08/19/2020

Top2Vec: Distributed Representations of Topics

Topic modeling is used for discovering latent semantic structure, usuall...
research
04/07/2017

Conceptualization Topic Modeling

Recently, topic modeling has been widely used to discover the abstract t...
research
12/16/2019

Optimized Tracking of Topic Evolution

Topic evolution modeling has been researched for a long time and has gai...
research
06/25/2016

Finding the Topic of a Set of Images

In this paper we introduce the problem of determining the topic that a s...
research
09/25/2019

PaRe: A Paper-Reviewer Matching Approach Using a Common Topic Space

Finding the right reviewers to assess the quality of conference submissi...
research
06/07/2011

Exploring Network Economics

In this paper, we explore what network economics is all about, focusing ...
research
02/27/2018

Classifying Idiomatic and Literal Expressions Using Topic Models and Intensity of Emotions

We describe an algorithm for automatic classification of idiomatic and l...

Please sign up or login with your details

Forgot password? Click here to reset