BERTopic: Neural topic modeling with a class-based TF-IDF procedure

03/11/2022
by   Maarten Grootendorst, et al.
0

Topic models can be useful tools to discover latent topics in collections of documents. Recent studies have shown the feasibility of approach topic modeling as a clustering task. We present BERTopic, a topic model that extends this process by extracting coherent topic representation through the development of a class-based variation of TF-IDF. More specifically, BERTopic generates document embedding with pre-trained transformer-based language models, clusters these embeddings, and finally, generates topic representations with the class-based TF-IDF procedure. BERTopic generates coherent topics and remains competitive across a variety of benchmarks involving classical models and those that follow the more recent clustering approach of topic modeling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/30/2020

Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too!

Topic models are a useful analysis tool to uncover the underlying themes...
research
02/09/2022

Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations

Topic models have been the prominent tools for automatic topic discovery...
research
12/19/2022

Human in the loop: How to effectively create coherent topics by manually labeling only a few documents per class

Few-shot methods for accurate modeling under sparse label-settings have ...
research
01/06/2023

Topics as Entity Clusters: Entity-based Topics from Language Models and Graph Neural Networks

Topic models aim to reveal the latent structure behind a corpus, typical...
research
03/24/2021

Topic Modeling Genre: An Exploration of French Classical and Enlightenment Drama

The concept of literary genre is a highly complex one: not only are diff...
research
12/05/2022

Federated Neural Topic Models

Over the last years, topic modeling has emerged as a powerful technique ...
research
01/11/2023

Topics in Contextualised Attention Embeddings

Contextualised word vectors obtained via pre-trained language models enc...

Please sign up or login with your details

Forgot password? Click here to reset