ProSiT! Latent Variable Discovery with PROgressive SImilarity Thresholds

10/26/2022
by   Tommaso Fornaciari, et al.
0

The most common ways to explore latent document dimensions are topic models and clustering methods. However, topic models have several drawbacks: e.g., they require us to choose the number of latent dimensions a priori, and the results are stochastic. Most clustering methods have the same issues and lack flexibility in various ways, such as not accounting for the influence of different topics on single documents, forcing word-descriptors to belong to a single topic (hard-clustering) or necessarily relying on word representations. We propose PROgressive SImilarity Thresholds - ProSiT, a deterministic and interpretable method, agnostic to the input format, that finds the optimal number of latent dimensions and only has two hyper-parameters, which can be set efficiently via grid search. We compare this method with a wide range of topic models and clustering methods on four benchmark data sets. In most setting, ProSiT matches or outperforms the other methods in terms six metrics of topic coherence and distinctiveness, producing replicable, deterministic results.

READ FULL TEXT

page 5

page 6

page 11

page 12

page 13

page 14

page 15

page 16

research
10/15/2018

Improving Topic Models with Latent Feature Word Representations

Probabilistic topic models are widely used to discover latent topics in ...
research
02/09/2022

Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations

Topic models have been the prominent tools for automatic topic discovery...
research
05/21/2016

Latent Tree Models for Hierarchical Topic Detection

We present a novel method for hierarchical topic detection where topics ...
research
02/28/2013

KSU KDD: Word Sense Induction by Clustering in Topic Space

We describe our language-independent unsupervised word sense induction s...
research
10/25/2021

Contrastive Learning for Neural Topic Model

Recent empirical studies show that adversarial topic models (ATM) can su...
research
12/12/2017

Document Generation with Hierarchical Latent Tree Models

In most probabilistic topic models, a document is viewed as a collection...
research
12/18/2017

Multilingual Topic Models

Scientific publications have evolved several features for mitigating voc...

Please sign up or login with your details

Forgot password? Click here to reset