Towards Robustness to Label Noise in Text Classification via Noise Modeling

01/27/2021
by   Siddhant Garg, et al.
0

Large datasets in NLP suffer from noisy labels, due to erroneous automatic and human annotation procedures. We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over the classifier. We first assign a probability score to each training sample of having a noisy label, through a beta mixture model fitted on the losses at an early epoch of training. Then, we use this score to selectively guide the learning of the noise model and classifier. Our empirical evaluation on two text classification tasks shows that our approach can improve over the baseline accuracy, and prevent over-fitting to the noise.

READ FULL TEXT

page 1

page 2

page 3

page 4

04/20/2022

Is BERT Robust to Label Noise? A Study on Learning with Noisy Labels in Text Classification

Incorrect labels in training data occur when human annotators make mista...
03/18/2019

An Effective Label Noise Model for DNN Text Classification

Because large, human-annotated datasets suffer from labeling errors, it ...
08/24/2018

Building a Robust Text Classifier on a Test-Time Budget

We propose a generic and interpretable learning framework for building r...
06/03/2022

Task-Adaptive Pre-Training for Boosting Learning With Noisy Labels: A Study on Text Classification for African Languages

For high-resource languages like English, text classification is a well-...
08/02/2022

Label Sleuth: From Unlabeled Text to a Classifier in a Few Hours

Text classification can be useful in many real-world scenarios, saving a...
09/11/2018

Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data

Industry datasets used for text classification are rarely created for th...
04/25/2019

Unsupervised label noise modeling and loss correction

Despite being robust to small amounts of label noise, convolutional neur...