Effective Neural Topic Modeling with Embedding Clustering Regularization

06/07/2023
by   Xiaobao Wu, et al.
0

Topic models have been prevalent for decades with various applications. However, existing topic models commonly suffer from the notorious topic collapsing: discovered topics semantically collapse towards each other, leading to highly repetitive topics, insufficient topic discovery, and damaged model interpretability. In this paper, we propose a new neural topic model, Embedding Clustering Regularization Topic Model (ECRTM). Besides the existing reconstruction error, we propose a novel Embedding Clustering Regularization (ECR), which forces each topic embedding to be the center of a separately aggregated word embedding cluster in the semantic space. This enables each produced topic to contain distinct word semantics, which alleviates topic collapsing. Regularized by ECR, our ECRTM generates diverse and coherent topics together with high-quality topic distributions of documents. Extensive experiments on benchmark datasets demonstrate that ECRTM effectively addresses the topic collapsing issue and consistently surpasses state-of-the-art baselines in terms of topic quality, topic distributions of documents, and downstream classification tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/11/2020

Context Reinforced Neural Topic Modeling over Short Texts

As one of the prevalent topic mining tools, neural topic modeling has at...
research
06/16/2022

Towards Better Understanding with Uniformity and Explicit Regularization of Embeddings in Embedding-based Neural Topic Models

Embedding-based neural topic models could explicitly represent words and...
research
06/10/2020

A novel sentence embedding based topic detection method for micro-blog

Topic detection is a challenging task, especially without knowing the ex...
research
02/09/2022

Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations

Topic models have been the prominent tools for automatic topic discovery...
research
08/14/2016

Viewpoint and Topic Modeling of Current Events

There are multiple sides to every story, and while statistical topic mod...
research
08/29/2023

Classification-Aware Neural Topic Model Combined With Interpretable Analysis – For Conflict Classification

A large number of conflict events are affecting the world all the time. ...
research
08/31/2017

Video Captioning with Guidance of Multimodal Latent Topics

The topic diversity of open-domain videos leads to various vocabularies ...

Please sign up or login with your details

Forgot password? Click here to reset