Leveraging Unlabeled Data to Track Memorization

12/08/2022
by   Mahsa Forouzesh, et al.
0

Deep neural networks may easily memorize noisy labels present in real-world data, which degrades their ability to generalize. It is therefore important to track and evaluate the robustness of models against noisy label memorization. We propose a metric, called susceptibility, to gauge such memorization for neural networks. Susceptibility is simple and easy to compute during training. Moreover, it does not require access to ground-truth labels and it only uses unlabeled data. We empirically show the effectiveness of our metric in tracking memorization on various architectures and datasets and provide theoretical insights into the design of the susceptibility metric. Finally, we show through extensive experiments on datasets with synthetic and real-world label noise that one can utilize susceptibility and the overall training accuracy to distinguish models that maintain a low memorization on the training set and generalize well to unseen clean data.

READ FULL TEXT

page 2

page 32

page 35

research
08/23/2022

A Study on the Impact of Data Augmentation for Training Convolutional Neural Networks in the Presence of Noisy Labels

Label noise is common in large real-world datasets, and its presence har...
research
08/14/2020

Which Strategies Matter for Noisy Label Classification? Insight into Loss and Uncertainty

Label noise is a critical factor that degrades the generalization perfor...
research
05/01/2021

RATT: Leveraging Unlabeled Data to Guarantee Generalization

To assess generalization, machine learning scientists typically either (...
research
07/06/2020

Are Labels Necessary for Classifier Accuracy Evaluation?

To calculate the model accuracy on a computer vision task, e.g., object ...
research
03/15/2022

Scalable Penalized Regression for Noise Detection in Learning with Noisy Labels

Noisy training set usually leads to the degradation of generalization an...
research
08/03/2021

Noise-Resistant Deep Metric Learning with Probabilistic Instance Filtering

Noisy labels are commonly found in real-world data, which cause performa...
research
03/30/2021

Noise-resistant Deep Metric Learning with Ranking-based Instance Selection

The existence of noisy labels in real-world data negatively impacts the ...

Please sign up or login with your details

Forgot password? Click here to reset