An Effective Label Noise Model for DNN Text Classification

03/18/2019
by   Ishan Jindal, et al.
0

Because large, human-annotated datasets suffer from labeling errors, it is crucial to be able to train deep neural networks in the presence of label noise. While training image classification models with label noise have received much attention, training text classification models have not. In this paper, we propose an approach to training deep networks that is robust to label noise. This approach introduces a non-linear processing layer (noise model) that models the statistics of the label noise into a convolutional neural network (CNN) architecture. The noise model and the CNN weights are learned jointly from noisy training data, which prevents the model from overfitting to erroneous labels. Through extensive experiments on several text classification datasets, we show that this approach enables the CNN to learn better sentence representations and is robust even to extreme label noise. We find that proper initialization and regularization of this noise model is critical. Further, by contrast to results focusing on large batch sizes for mitigating label noise for image classification, we find that altering the batch size does not have much effect on classification performance.

READ FULL TEXT
research
01/27/2021

Towards Robustness to Label Noise in Text Classification via Noise Modeling

Large datasets in NLP suffer from noisy labels, due to erroneous automat...
research
05/27/2019

Combating Label Noise in Deep Learning Using Abstention

We introduce a novel method to combat label noise when training deep neu...
research
09/08/2021

A robust approach for deep neural networks in presence of label noise: relabelling and filtering instances during training

Deep learning has outperformed other machine learning algorithms in a va...
research
09/13/2017

Co-training for Demographic Classification Using Deep Learning from Label Proportions

Deep learning algorithms have recently produced state-of-the-art accurac...
research
05/09/2017

Learning Deep Networks from Noisy Labels with Dropout Regularization

Large datasets often have unreliable labels-such as those obtained from ...
research
11/14/2017

On Extending Neural Networks with Loss Ensembles for Text Classification

Ensemble techniques are powerful approaches that combine several weak le...
research
09/23/2020

Using Under-trained Deep Ensembles to Learn Under Extreme Label Noise

Improper or erroneous labelling can pose a hindrance to reliable general...

Please sign up or login with your details

Forgot password? Click here to reset