Contrastive Learning for Neural Topic Model

10/25/2021
by   Thong Nguyen, et al.
0

Recent empirical studies show that adversarial topic models (ATM) can successfully capture semantic patterns of the document by differentiating a document with another dissimilar sample. However, utilizing that discriminative-generative architecture has two important drawbacks: (1) the architecture does not relate similar documents, which has the same document-word distribution of salient words; (2) it restricts the ability to integrate external information, such as sentiments of the document, which has been shown to benefit the training of neural topic model. To address those issues, we revisit the adversarial topic architecture in the viewpoint of mathematical analysis, propose a novel approach to re-formulate discriminative goal as an optimization problem, and design a novel sampling method which facilitates the integration of external variables. The reformulation encourages the model to incorporate the relations among similar samples and enforces the constraint on the similarity among dissimilar ones; while the sampling method, which is based on the internal input and reconstructed output, helps inform the model of salient words contributing to the main topic. Experimental results show that our framework outperforms other state-of-the-art neural topic models in three common benchmark datasets that belong to various domains, vocabulary sizes, and document lengths in terms of topic coherence.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/27/2023

Improving Contextualized Topic Models with Negative Sampling

Topic modeling has emerged as a dominant method for exploring large docu...
research
07/05/2023

Graph Contrastive Topic Model

Existing NTMs with contrastive learning suffer from the sample bias prob...
research
08/12/2020

Neural Sinkhorn Topic Model

In this paper, we present a new topic modelling approach via the theory ...
research
04/26/2020

Neural Topic Modeling with Bidirectional Adversarial Training

Recent years have witnessed a surge of interests of using neural topic m...
research
12/02/2020

TAN-NTM: Topic Attention Networks for Neural Topic Modeling

Topic models have been widely used to learn representations from text an...
research
10/26/2022

ProSiT! Latent Variable Discovery with PROgressive SImilarity Thresholds

The most common ways to explore latent document dimensions are topic mod...
research
12/21/2016

Inverted Bilingual Topic Models for Lexicon Extraction from Non-parallel Data

Topic models have been successfully applied in lexicon extraction. Howev...

Please sign up or login with your details

Forgot password? Click here to reset