A new evaluation framework for topic modeling algorithms based on synthetic corpora

01/28/2019
by   Hanyu Shi, et al.
0

Topic models are in widespread use in natural language processing and beyond. Here, we propose a new framework for the evaluation of probabilistic topic modeling algorithms based on synthetic corpora containing an unambiguously defined ground truth topic structure. The major innovation of our approach is the ability to quantify the agreement between the planted and inferred topic structures by comparing the assigned topic labels at the level of the tokens. In experiments, our approach yields novel insights about the relative strengths of topic models as corpus characteristics vary, and the first evidence of an "undetectable phase" for topic models when the planted structure is weak. We also establish the practical relevance of the insights gained for synthetic corpora by predicting the performance of topic modeling algorithms in classification tasks in real-world corpora.

READ FULL TEXT

page 15

page 16

page 18

research
07/13/2021

Semiparametric Latent Topic Modeling on Consumer-Generated Corpora

Legacy procedures for topic modelling have generally suffered problems o...
research
11/21/2021

Jointly Dynamic Topic Model for Recognition of Lead-lag Relationship in Two Text Corpora

Topic evolution modeling has received significant attentions in recent d...
research
10/16/2022

Coordinated Topic Modeling

We propose a new problem called coordinated topic modeling that imitates...
research
02/13/2022

Learning to Rank from Relevance Judgments Distributions

Learning to Rank (LETOR) algorithms are usually trained on annotated cor...
research
07/24/2023

Towards Generalising Neural Topical Representations

Topic models have evolved from conventional Bayesian probabilistic model...
research
10/28/2022

Are Neural Topic Models Broken?

Recently, the relationship between automated and human evaluation of top...
research
11/20/2019

A Coefficient of Determination for Probabilistic Topic Models

This research proposes a new (old) metric for evaluating goodness of fit...

Please sign up or login with your details

Forgot password? Click here to reset