Topics as Entity Clusters: Entity-based Topics from Language Models and Graph Neural Networks

01/06/2023
by   Manuel V. Loureiro, et al.
0

Topic models aim to reveal the latent structure behind a corpus, typically conducted over a bag-of-words representation of documents. In the context of topic modeling, most vocabulary is either irrelevant for uncovering underlying topics or contains strong relationships with relevant concepts, impacting the interpretability of these topics. Furthermore, their limited expressiveness and dependency on language demand considerable computation resources. Hence, we propose a novel approach for cluster-based topic modeling that employs conceptual entities. Entities are language-agnostic representations of real-world concepts rich in relational information. To this end, we extract vector representations of entities from (i) an encyclopedic corpus using a language model; and (ii) a knowledge base using a graph neural network. We demonstrate that our approach consistently outperforms other state-of-the-art topic models across coherency metrics and find that the explicit knowledge encoded in the graph-based embeddings provides more coherent topics than the implicit knowledge encoded with the contextualized embeddings of language models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2022

BERTopic: Neural topic modeling with a class-based TF-IDF procedure

Topic models can be useful tools to discover latent topics in collection...
research
01/11/2023

Topics in Contextualised Attention Embeddings

Contextualised word vectors obtained via pre-trained language models enc...
research
10/13/2017

Fast Top-k Area Topics Extraction with Knowledge Base

What are the most popular research topics in Artificial Intelligence (AI...
research
04/26/2017

Topically Driven Neural Language Model

Language models are typically applied at the sentence level, without acc...
research
12/08/2020

Incorporating Domain Knowledge To Improve Topic Segmentation Of Long MOOC Lecture Videos

Topical Segmentation poses a great role in reducing search space of the ...
research
04/26/2016

Entities as topic labels: Improving topic interpretability and evaluability combining Entity Linking and Labeled LDA

In order to create a corpus exploration method providing topics that are...
research
08/17/2023

Linearity of Relation Decoding in Transformer Language Models

Much of the knowledge encoded in transformer language models (LMs) may b...

Please sign up or login with your details

Forgot password? Click here to reset