Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning

11/23/2022
by   Xiaobao Wu, et al.
0

To overcome the data sparsity issue in short text topic modeling, existing methods commonly rely on data augmentation or the data characteristic of short texts to introduce more word co-occurrence information. However, most of them do not make full use of the augmented data or the data characteristic: they insufficiently learn the relations among samples in data, leading to dissimilar topic distributions of semantically similar text pairs. To better address data sparsity, in this paper we propose a novel short text topic modeling framework, Topic-Semantic Contrastive Topic Model (TSCTM). To sufficiently model the relations among samples, we employ a new contrastive learning method with efficient positive and negative sampling strategies based on topic semantics. This contrastive learning method refines the representations, enriches the learning signals, and thus mitigates the sparsity issue. Extensive experimental results show that our TSCTM outperforms state-of-the-art baselines regardless of the data augmentation availability, producing high-quality topics and topic distributions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/05/2023

Graph Contrastive Topic Model

Existing NTMs with contrastive learning suffer from the sample bias prob...
research
09/11/2018

Topic Memory Networks for Short Text Classification

Many classification models work poorly on short texts due to data sparsi...
research
09/30/2021

CrossAug: A Contrastive Data Augmentation Method for Debiasing Fact Verification Models

Fact verification datasets are typically constructed using crowdsourcing...
research
04/11/2021

Constructing Contrastive samples via Summarization for Text Classification with limited annotations

Contrastive Learning has emerged as a powerful representation learning m...
research
12/17/2014

Word Network Topic Model: A Simple but General Solution for Short and Imbalanced Texts

The short text has been the prevalent format for information of Internet...
research
03/01/2022

MERIt: Meta-Path Guided Contrastive Learning for Logical Reasoning

Logical reasoning is of vital importance to natural language understandi...
research
06/01/2023

Topic-Guided Sampling For Data-Efficient Multi-Domain Stance Detection

Stance Detection is concerned with identifying the attitudes expressed b...

Please sign up or login with your details

Forgot password? Click here to reset