Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too!

04/30/2020
by   Suzanna Sia, et al.
0

Topic models are a useful analysis tool to uncover the underlying themes within document collections. Probabilistic models which assume a generative story have been the dominant approach for topic modeling. We propose an alternative approach based on clustering readily available pre-trained word embeddings while incorporating document information for weighted clustering and reranking top words. We provide benchmarks for the combination of different word embeddings and clustering algorithms, and analyse their performance under dimensionality reduction with PCA. The best performing combination for our approach is comparable to classical models, and complexity analysis indicate that this is a practical alternative to traditional topic modeling.

READ FULL TEXT
research
03/11/2022

BERTopic: Neural topic modeling with a class-based TF-IDF procedure

Topic models can be useful tools to discover latent topics in collection...
research
08/09/2017

Identifying Reference Spans: Topic Modeling and Word Embeddings help IR

The CL-SciSumm 2016 shared task introduced an interesting problem: given...
research
02/06/2023

Efficient and Flexible Topic Modeling using Pretrained Embeddings and Bag of Sentences

Pre-trained language models have led to a new state-of-the-art in many N...
research
07/10/2020

Topic Modeling on User Stories using Word Mover's Distance

Requirements elicitation has recently been complemented with crowd-based...
research
06/05/2019

Topic Sensitive Attention on Generic Corpora Corrects Sense Bias in Pretrained Embeddings

Given a small corpus D_T pertaining to a limited set of focused topics,...
research
09/12/2018

Distilled Wasserstein Learning for Word Embedding and Topic Modeling

We propose a novel Wasserstein method with a distillation mechanism, yie...
research
04/21/2022

Is Neural Topic Modelling Better than Clustering? An Empirical Study on Clustering with Contextual Embeddings for Topics

Recent work incorporates pre-trained word embeddings such as BERT embedd...

Please sign up or login with your details

Forgot password? Click here to reset