Scalable Dynamic Topic Modeling with Clustered Latent Dirichlet Allocation (CLDA)

10/25/2016
by   Chris Gropp, et al.
0

Topic modeling, a method for extracting the underlying themes from a collection of documents, is an increasingly important component of the design of intelligent systems enabling the sense-making of highly dynamic and diverse streams of text data. Traditional methods such as Dynamic Topic Modeling (DTM) do not lend themselves well to direct parallelization because of dependencies from one time step to another. In this paper, we introduce and empirically analyze Clustered Latent Dirichlet Allocation (CLDA), a method for extracting dynamic latent topics from a collection of documents. Our approach is based on data decomposition in which the data is partitioned into segments, followed by topic modeling on the individual segments. The resulting local models are then combined into a global solution using clustering. The decomposition and resulting parallelization leads to very fast runtime even on very large datasets. Our approach furthermore provides insight into how the composition of topics changes over time and can also be applied using other data partitioning strategies over any discrete features of the data, such as geographic features or classes of users. In this paper CLDA is applied successfully to seventeen years of NIPS conference papers (2,484 documents and 3,280,697 words), seventeen years of computer science journal abstracts (533,560 documents and 32,551,540 words), and to forty years of the PubMed corpus (4,025,978 documents and 273,853,980 words).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2019

The Dynamic Embedded Topic Model

Topic modeling analyzes documents to learn meaningful patterns of words....
research
12/18/2020

Technical Progress Analysis Using a Dynamic Topic Model for Technical Terms to Revise Patent Classification Codes

Japanese patents are assigned a patent classification code, FI (File Ind...
research
04/13/2019

Topic Grouper: An Agglomerative Clustering Approach to Topic Modeling

We introduce Topic Grouper as a complementary approach in the field of p...
research
08/05/2020

BATS: A Spectral Biclustering Approach to Single Document Topic Modeling and Segmentation

Existing topic modeling and text segmentation methodologies generally re...
research
07/17/2017

Exploring text datasets by visualizing relevant words

When working with a new dataset, it is important to first explore and fa...
research
04/21/2021

Clustering Introductory Computer Science Exercises Using Topic Modeling Methods

Manually determining concepts present in a group of questions is a chall...
research
12/31/2019

Domain-topic models with chained dimensions: modeling the evolution of a major oncology conference (1995-2017)

In this paper we introduce a novel approach for the computational analys...

Please sign up or login with your details

Forgot password? Click here to reset