Towards Robustness to Label Noise in Text Classification via Noise Modeling

by   Siddhant Garg, et al.

Large datasets in NLP suffer from noisy labels, due to erroneous automatic and human annotation procedures. We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over the classifier. We first assign a probability score to each training sample of having a noisy label, through a beta mixture model fitted on the losses at an early epoch of training. Then, we use this score to selectively guide the learning of the noise model and classifier. Our empirical evaluation on two text classification tasks shows that our approach can improve over the baseline accuracy, and prevent over-fitting to the noise.


page 1

page 2

page 3

page 4


Is BERT Robust to Label Noise? A Study on Learning with Noisy Labels in Text Classification

Incorrect labels in training data occur when human annotators make mista...

An Effective Label Noise Model for DNN Text Classification

Because large, human-annotated datasets suffer from labeling errors, it ...

Building a Robust Text Classifier on a Test-Time Budget

We propose a generic and interpretable learning framework for building r...

Task-Adaptive Pre-Training for Boosting Learning With Noisy Labels: A Study on Text Classification for African Languages

For high-resource languages like English, text classification is a well-...

Label Sleuth: From Unlabeled Text to a Classifier in a Few Hours

Text classification can be useful in many real-world scenarios, saving a...

Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data

Industry datasets used for text classification are rarely created for th...

Unsupervised label noise modeling and loss correction

Despite being robust to small amounts of label noise, convolutional neur...