CEIL: A General Classification-Enhanced Iterative Learning Framework for Text Clustering

04/20/2023
by   Mingjun Zhao, et al.
0

Text clustering, as one of the most fundamental challenges in unsupervised learning, aims at grouping semantically similar text segments without relying on human annotations. With the rapid development of deep learning, deep clustering has achieved significant advantages over traditional clustering methods. Despite the effectiveness, most existing deep text clustering methods rely heavily on representations pre-trained in general domains, which may not be the most suitable solution for clustering in specific target domains. To address this issue, we propose CEIL, a novel Classification-Enhanced Iterative Learning framework for short text clustering, which aims at generally promoting the clustering performance by introducing a classification objective to iteratively improve feature representations. In each iteration, we first adopt a language model to retrieve the initial text representations, from which the clustering results are collected using our proposed Category Disentangled Contrastive Clustering (CDCC) algorithm. After strict data filtering and aggregation processes, samples with clean category labels are retrieved, which serve as supervision information to update the language model with the classification objective via a prompt learning approach. Finally, the updated language model with improved representation ability is used to enhance clustering in the next iteration. Extensive experiments demonstrate that the CEIL framework significantly improves the clustering performance over iterations, and is generally effective on various clustering algorithms. Moreover, by incorporating CEIL on CDCC, we achieve the state-of-the-art clustering performance on a wide range of short text clustering benchmarks outperforming other strong baseline methods.

READ FULL TEXT
research
01/31/2021

Short Text Clustering with Transformers

Recent techniques for the task of short text clustering often rely on wo...
research
01/01/2017

Self-Taught Convolutional Neural Networks for Short Text Clustering

Short text clustering is a challenging problem due to its sparseness of ...
research
01/03/2023

ClusTop: An unsupervised and integrated text clustering and topic extraction framework

Text clustering and topic extraction are two important tasks in text min...
research
09/21/2021

Representation Learning for Short Text Clustering

Effective representation learning is critical for short text clustering ...
research
03/22/2019

An end-to-end Neural Network Framework for Text Clustering

The unsupervised text clustering is one of the major tasks in natural la...
research
06/18/2020

Online Deep Clustering for Unsupervised Representation Learning

Joint clustering and feature learning methods have shown remarkable perf...
research
03/30/2023

Iterative Prompt Learning for Unsupervised Backlit Image Enhancement

We propose a novel unsupervised backlit image enhancement method, abbrev...

Please sign up or login with your details

Forgot password? Click here to reset