INN: A Method Identifying Clean-annotated Samples via Consistency Effect in Deep Neural Networks

by   Dongha Kim, et al.

In many classification problems, collecting massive clean-annotated data is not easy, and thus a lot of researches have been done to handle data with noisy labels. Most recent state-of-art solutions for noisy label problems are built on the small-loss strategy which exploits the memorization effect. While it is a powerful tool, the memorization effect has several drawbacks. The performances are sensitive to the choice of a training epoch required for utilizing the memorization effect. In addition, when the labels are heavily contaminated or imbalanced, the memorization effect may not occur in which case the methods based on the small-loss strategy fail to identify clean labeled data. We introduce a new method called INN(Integration with the Nearest Neighborhoods) to refine clean labeled data from training data with noisy labels. The proposed method is based on a new discovery that a prediction pattern at neighbor regions of clean labeled data is consistently different from that of noisy labeled data regardless of training epochs. The INN method requires more computation but is much stable and powerful than the small-loss strategy. By carrying out various experiments, we demonstrate that the INN method resolves the shortcomings in the memorization effect successfully and thus is helpful to construct more accurate deep prediction models with training data with noisy labels.


page 1

page 2

page 3

page 4


MILD: Modeling the Instance Learning Dynamics for Learning with Noisy Labels

Despite deep learning has achieved great success, it often relies on a l...

Self-semi-supervised Learning to Learn from NoisyLabeled Data

The remarkable success of today's deep neural networks highly depends on...

Handling Noisy Labels for Robustly Learning from Self-Training Data for Low-Resource Sequence Labeling

In this paper, we address the problem of effectively self-training neura...

Meta-Learning for Neural Relation Classification with Distant Supervision

Distant supervision provides a means to create a large number of weakly ...

The Dynamic of Consensus in Deep Networks and the Identification of Noisy Labels

Deep neural networks have incredible capacity and expressibility, and ca...

Towards Understanding Deep Learning from Noisy Labels with Small-Loss Criterion

Deep neural networks need large amounts of labeled data to achieve good ...

An Empirical Investigation of Learning from Biased Toxicity Labels

Collecting annotations from human raters often results in a trade-off be...

Please sign up or login with your details

Forgot password? Click here to reset