Topics in Contextualised Attention Embeddings

01/11/2023
by   Mozhgan Talebpour, et al.
0

Contextualised word vectors obtained via pre-trained language models encode a variety of knowledge that has already been exploited in applications. Complementary to these language models are probabilistic topic models that learn thematic patterns from the text. Recent work has demonstrated that conducting clustering on the word-level contextual representations from a language model emulates word clusters that are discovered in latent topics of words from Latent Dirichlet Allocation. The important question is how such topical word clusters are automatically formed, through clustering, in the language model when it has not been explicitly designed to model latent topics. To address this question, we design different probe experiments. Using BERT and DistilBERT, we find that the attention framework plays a key role in modelling such word topic clusters. We strongly believe that our work paves way for further research into the relationships between probabilistic topic models and pre-trained language models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/16/2023

BERTTM: Leveraging Contextualized Word Embeddings from Pre-trained Language Models for Neural Topic Modeling

With the development of neural topic models in recent years, topic model...
research
03/11/2022

BERTopic: Neural topic modeling with a class-based TF-IDF procedure

Topic models can be useful tools to discover latent topics in collection...
research
01/06/2023

Topics as Entity Clusters: Entity-based Topics from Language Models and Graph Neural Networks

Topic models aim to reveal the latent structure behind a corpus, typical...
research
12/15/2021

Simple Text Detoxification by Identifying a Linear Toxic Subspace in Language Model Embeddings

Large pre-trained language models are often trained on large volumes of ...
research
12/10/2020

Multi-Sense Language Modelling

The effectiveness of a language model is influenced by its token represe...
research
05/23/2022

Artificial intelligence for topic modelling in Hindu philosophy: mapping themes between the Upanishads and the Bhagavad Gita

A distinct feature of Hindu religious and philosophical text is that the...
research
07/29/2016

TopicResponse: A Marriage of Topic Modelling and Rasch Modelling for Automatic Measurement in MOOCs

This paper explores the suitability of using automatically discovered to...

Please sign up or login with your details

Forgot password? Click here to reset