Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence

04/08/2020
by   Federico Bianchi, et al.
1

Topic models extract meaningful groups of words from documents, allowing for a better understanding of data. However, the solutions are often not coherent enough, and thus harder to interpret. Coherence can be improved by adding more contextual knowledge to the model. Recently, neural topic models have become available, while BERT-based representations have further pushed the state of the art of neural models in general. We combine pre-trained representations and neural topic models. Pre-trained BERT sentence embeddings indeed support the generation of more meaningful and coherent topics than either standard LDA or existing neural topic models. Results on four datasets show that our approach effectively increases topic coherence.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2022

Is Neural Topic Modelling Better than Clustering? An Empirical Study on Clustering with Contextual Embeddings for Topics

Recent work incorporates pre-trained word embeddings such as BERT embedd...
research
05/16/2023

BERTTM: Leveraging Contextualized Word Embeddings from Pre-trained Language Models for Neural Topic Modeling

With the development of neural topic models in recent years, topic model...
research
06/24/2021

Unsupervised Topic Segmentation of Meetings with BERT Embeddings

Topic segmentation of meetings is the task of dividing multi-person meet...
research
11/24/2022

Tapping the Potential of Coherence and Syntactic Features in Neural Models for Automatic Essay Scoring

In the prompt-specific holistic score prediction task for Automatic Essa...
research
10/14/2021

Neural Attention-Aware Hierarchical Topic Model

Neural topic models (NTMs) apply deep neural networks to topic modelling...
research
05/15/2022

Topic Modelling on Consumer Financial Protection Bureau Data: An Approach Using BERT Based Embeddings

Customers' reviews and comments are important for businesses to understa...
research
03/06/2018

Categorical Mixture Models on VGGNet activations

In this project, I use unsupervised learning techniques in order to clus...

Please sign up or login with your details

Forgot password? Click here to reset