Seed-Guided Topic Discovery with Out-of-Vocabulary Seeds

05/04/2022
by   Yu Zhang, et al.
0

Discovering latent topics from text corpora has been studied for decades. Many existing topic models adopt a fully unsupervised setting, and their discovered topics may not cater to users' particular interests due to their inability of leveraging user guidance. Although there exist seed-guided topic discovery approaches that leverage user-provided seeds to discover topic-representative terms, they are less concerned with two factors: (1) the existence of out-of-vocabulary seeds and (2) the power of pre-trained language models (PLMs). In this paper, we generalize the task of seed-guided topic discovery to allow out-of-vocabulary seeds. We propose a novel framework, named SeeTopic, wherein the general knowledge of PLMs and the local semantics learned from the input corpus can mutually benefit each other. Experiments on three real datasets from different domains demonstrate the effectiveness of SeeTopic in terms of topic coherence, accuracy, and diversity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/12/2022

Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts

Instead of mining coherent topics from a given text corpus in a complete...
research
07/18/2020

Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding

Mining a set of meaningful topics organized into a hierarchy is intuitiv...
research
07/03/2018

Topic Discovery in Massive Text Corpora Based on Min-Hashing

The task of discovering topics in text corpora has been dominated by Lat...
research
01/30/2018

Creative Exploration Using Topic Based Bisociative Networks

Bisociative knowledge discovery is an approach that combines elements fr...
research
02/09/2022

Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations

Topic models have been the prominent tools for automatic topic discovery...
research
07/24/2019

Topic Modeling with Wasserstein Autoencoders

We propose a novel neural topic model in the Wasserstein autoencoders (W...
research
07/05/2020

Unsupervised Paraphrasing via Deep Reinforcement Learning

Paraphrasing is expressing the meaning of an input sentence in different...

Please sign up or login with your details

Forgot password? Click here to reset