Re-Ranking Words to Improve Interpretability of Automatically Generated Topics

03/29/2019
by   Areej Alokaili, et al.
0

Topics models, such as LDA, are widely used in Natural Language Processing. Making their output interpretable is an important area of research with applications to areas such as the enhancement of exploratory search interfaces and the development of interpretable machine learning models. Conventionally, topics are represented by their n most probable words, however, these representations are often difficult for humans to interpret. This paper explores the re-ranking of topic words to generate more interpretable topic representations. A range of approaches are compared and evaluated in two experiments. The first uses crowdworkers to associate topics represented by different word rankings with related documents. The second experiment is an automatic approach based on a document retrieval task applied on multiple domains. Results in both experiments demonstrate that re-ranking words improves topic interpretability and that the most effective re-ranking schemes were those which combine information about the importance of words both within topics and their relative frequency in the entire corpus. In addition, close correlation between the results of the two evaluation approaches suggests that the automatic method proposed here could be used to evaluate re-ranking methods without the need for human judgements.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/30/2023

Topics in the Haystack: Extracting and Evaluating Topics beyond Coherence

Extracting and identifying latent topics in large text corpora has gaine...
research
12/16/2019

Optimized Tracking of Topic Evolution

Topic evolution modeling has been researched for a long time and has gai...
research
04/13/2023

G2T: A Simple but Effective Framework for Topic Modeling based on Pretrained Language Model and Community Detection

It has been reported that clustering-based topic models, which cluster h...
research
10/16/2014

Graph-Sparse LDA: A Topic Model with Structured Sparsity

Originally designed to model text, topic modeling has become a powerful ...
research
07/27/2022

CompText: Visualizing, Comparing Understanding Text Corpus

A common practice in Natural Language Processing (NLP) is to visualize t...
research
05/29/2020

Automatic Generation of Topic Labels

Topic modelling is a popular unsupervised method for identifying the und...
research
06/03/2018

Transfer Topic Labeling with Domain-Specific Knowledge Base: An Analysis of UK House of Commons Speeches 1935-2014

Topic models are among the most widely used methods in natural language ...

Please sign up or login with your details

Forgot password? Click here to reset