Fast, Flexible Models for Discovering Topic Correlation across Weakly-Related Collections

08/19/2015
by   Jingwei Zhang, et al.
0

Weak topic correlation across document collections with different numbers of topics in individual collections presents challenges for existing cross-collection topic models. This paper introduces two probabilistic topic models, Correlated LDA (C-LDA) and Correlated HDP (C-HDP). These address problems that can arise when analyzing large, asymmetric, and potentially weakly-related collections. Topic correlations in weakly-related collections typically lie in the tail of the topic distribution, where they would be overlooked by models unable to fit large numbers of topics. To efficiently model this long tail for large-scale analysis, our models implement a parallel sampling algorithm based on the Metropolis-Hastings and alias methods (Yuan et al., 2015). The models are first evaluated on synthetic data, generated to simulate various collection-level asymmetries. We then present a case study of modeling over 300k documents in collections of sciences and humanities research from JSTOR.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2020

Mapping the "long tail" of research funding: A topic analysis of NSF grant proposals in the Division of Astronomical Sciences

"Long tail" data are considered to be smaller, heterogeneous, researcher...
research
10/22/2015

Multi-GPU Distributed Parallel Bayesian Differential Topic Modelling

There is an explosion of data, documents, and other content, and people ...
research
06/01/2016

On a Topic Model for Sentences

Probabilistic topic models are generative models that describe the conte...
research
05/18/2022

Topic Segmentation of Research Article Collections

Collections of research article data harvested from the web have become ...
research
11/20/2020

Topic modelling discourse dynamics in historical newspapers

This paper addresses methodological issues in diachronic data analysis f...
research
09/26/2014

Topic Similarity Networks: Visual Analytics for Large Document Sets

We investigate ways in which to improve the interpretability of LDA topi...
research
12/05/2022

Federated Neural Topic Models

Over the last years, topic modeling has emerged as a powerful technique ...

Please sign up or login with your details

Forgot password? Click here to reset