Less is More: Learning Prominent and Diverse Topics for Data Summarization

11/29/2016
by   Jian Tang, et al.
0

Statistical topic models efficiently facilitate the exploration of large-scale data sets. Many models have been developed and broadly used to summarize the semantic structure in news, science, social media, and digital humanities. However, a common and practical objective in data exploration tasks is not to enumerate all existing topics, but to quickly extract representative ones that broadly cover the content of the corpus, i.e., a few topics that serve as a good summary of the data. Most existing topic models fit exactly the same number of topics as a user specifies, which have imposed an unnecessary burden to the users who have limited prior knowledge. We instead propose new models that are able to learn fewer but more representative topics for the purpose of data summarization. We propose a reinforced random walk that allows prominent topics to absorb tokens from similar and smaller topics, thus enhances the diversity among the top topics extracted. With this reinforced random walk as a general process embedded in classical topic models, we obtain diverse topic models that are able to extract the most prominent and diverse topics from data. The inference procedures of these diverse topic models remain as simple and efficient as the classical models. Experimental results demonstrate that the diverse topic models not only discover topics that better summarize the data, but also require minimal prior knowledge of the users.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2021

A Query-Driven Topic Model

Topic modeling is an unsupervised method for revealing the hidden semant...
research
11/01/2018

ATM:Adversarial-neural Topic Model

Topic models are widely used for thematic structure discovery in text. B...
research
04/08/2023

Effects of Algorithmic Trend Promotion: Evidence from Coordinated Campaigns in Twitter's Trending Topics

In addition to more personalized content feeds, some leading social medi...
research
02/25/2023

Topic-Selective Graph Network for Topic-Focused Summarization

Due to the success of the pre-trained language model (PLM), existing PLM...
research
09/20/2022

Knowledge-Aware Bayesian Deep Topic Model

We propose a Bayesian generative model for incorporating prior domain kn...
research
09/12/2018

Discovering Topical Interactions in Text-based Cascades using Hidden Markov Hawkes Processes

Social media conversations unfold based on complex interactions between ...
research
09/11/2017

Research Portfolio Analysis and Topic Prominence

Stakeholders in the science system need to decide where to place their b...

Please sign up or login with your details

Forgot password? Click here to reset