Anchored Correlation Explanation: Topic Modeling with Minimal Domain Knowledge

11/30/2016
by   Ryan J. Gallagher, et al.
0

While generative models such as Latent Dirichlet Allocation (LDA) have proven fruitful in topic modeling, they often require detailed assumptions and careful specification of hyperparameters. Such model complexity issues only compound when trying to generalize generative models to incorporate human input. We introduce Correlation Explanation (CorEx), an alternative approach to topic modeling that does not assume an underlying generative model, and instead learns maximally informative topics through an information-theoretic framework. This framework naturally generalizes to hierarchical and semi-supervised extensions with no additional modeling assumptions. In particular, word-level domain knowledge can be flexibly incorporated within CorEx through anchor words, allowing topic separability and representation to be promoted with minimal human intervention. Across a variety of datasets, metrics, and experiments, we demonstrate that CorEx produces topics that are comparable in quality to those produced by unsupervised and semi-supervised variants of LDA.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/11/2012

The Author-Topic Model for Authors and Documents

We introduce the author-topic model, a generative model for documents th...
research
07/12/2019

The Dynamic Embedded Topic Model

Topic modeling analyzes documents to learn meaningful patterns of words....
research
07/06/2023

S2vNTM: Semi-supervised vMF Neural Topic Modeling

Language model based methods are powerful techniques for text classifica...
research
06/22/2016

Toward Interpretable Topic Discovery via Anchored Correlation Explanation

Many predictive tasks, such as diagnosing a patient based on their medic...
research
06/13/2019

Topic Modeling via Full Dependence Mixtures

We consider the topic modeling problem for large datasets. For this prob...
research
01/12/2017

Prior matters: simple and general methods for evaluating and improving topic quality in topic modeling

Latent Dirichlet Allocation (LDA) models trained without stopword remova...
research
08/13/2016

Analysis of Morphology in Topic Modeling

Topic models make strong assumptions about their data. In particular, di...

Please sign up or login with your details

Forgot password? Click here to reset