Semi-Supervised Clustering with Contrastive Learning for Discovering New Intents

01/07/2022
by   Feng Wei, et al.
0

Most dialogue systems in real world rely on predefined intents and answers for QA service, so discovering potential intents from large corpus previously is really important for building such dialogue services. Considering that most scenarios have few intents known already and most intents waiting to be discovered, we focus on semi-supervised text clustering and try to make the proposed method benefit from labeled samples for better overall clustering performance. In this paper, we propose Deep Contrastive Semi-supervised Clustering (DCSC), which aims to cluster text samples in a semi-supervised way and provide grouped intents to operation staff. To make DCSC fully utilize the limited known intents, we propose a two-stage training procedure for DCSC, in which DCSC will be trained on both labeled samples and unlabeled samples, and achieve better text representation and clustering performance. We conduct experiments on two public datasets to compare our model with several popular methods, and the results show DCSC achieve best performance across all datasets and circumstances, indicating the effect of the improvements in our work.

READ FULL TEXT
research
07/01/2019

A Semi-Supervised Self-Organizing Map for Clustering and Classification

There has been an increasing interest in semi-supervised learning in the...
research
08/27/2021

Contrastive Mixup: Self- and Semi-Supervised learning for Tabular Domain

Recent literature in self-supervised has demonstrated significant progre...
research
02/22/2016

Semi-supervised Clustering for Short Text via Deep Representation Learning

In this work, we propose a semi-supervised method for short text cluster...
research
07/31/2013

Who and Where: People and Location Co-Clustering

In this paper, we consider the clustering problem on images where each i...
research
02/12/2022

Semi-supervised New Event Type Induction and Description via Contrastive Loss-Enforced Batch Attention

Most event extraction methods have traditionally relied on an annotated ...
research
10/01/2021

SMATE: Semi-Supervised Spatio-Temporal Representation Learning on Multivariate Time Series

Learning from Multivariate Time Series (MTS) has attracted widespread at...

Please sign up or login with your details

Forgot password? Click here to reset