SelfMix: Robust Learning Against Textual Label Noise with Self-Mixup Training

10/10/2022
by   Dan Qiao, et al.
0

The conventional success of textual classification relies on annotated data, and the new paradigm of pre-trained language models (PLMs) still requires a few labeled data for downstream tasks. However, in real-world applications, label noise inevitably exists in training data, damaging the effectiveness, robustness, and generalization of the models constructed on such data. Recently, remarkable achievements have been made to mitigate this dilemma in visual data, while only a few explore textual data. To fill this gap, we present SelfMix, a simple yet effective method, to handle label noise in text classification tasks. SelfMix uses the Gaussian Mixture Model to separate samples and leverages semi-supervised learning. Unlike previous works requiring multiple models, our method utilizes the dropout mechanism on a single model to reduce the confirmation bias in self-training and introduces a textual-level mixup training strategy. Experimental results on three text classification benchmarks with different types of text show that the performance of our proposed method outperforms these strong baselines designed for both textual and visual data under different noise ratios and noise types. Our code is available at https://github.com/noise-learning/SelfMix.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2023

Zero-Shot Text Classification via Self-Supervised Tuning

Existing solutions to zero-shot text classification either conduct promp...
research
04/19/2021

Contrastive Learning Improves Model Robustness Under Label Noise

Deep neural network-based classifiers trained with the categorical cross...
research
05/26/2023

Hierarchical Verbalizer for Few-Shot Hierarchical Text Classification

Due to the complex label hierarchy and intensive labeling cost in practi...
research
10/31/2022

Improving Temporal Generalization of Pre-trained Language Models with Lexical Semantic Change

Recent research has revealed that neural language models at scale suffer...
research
10/11/2022

Entity Disambiguation with Entity Definitions

Local models have recently attained astounding performances in Entity Di...
research
12/20/2020

Transductive Visual Verb Sense Disambiguation

Verb Sense Disambiguation is a well-known task in NLP, the aim is to fin...
research
04/28/2021

Boosting Co-teaching with Compression Regularization for Label Noise

In this paper, we study the problem of learning image classification mod...

Please sign up or login with your details

Forgot password? Click here to reset