Identifying Mislabeled Data using the Area Under the Margin Ranking

01/28/2020
by   Geoff Pleiss, et al.
26

Not all data in a typical training set help with generalization; some samples can be overly ambiguous or outrightly mislabeled. This paper introduces a new method to identify such samples and mitigate their impact when training neural networks. At the heart of our algorithm is the Area Under the Margin (AUM) statistic, which exploits differences in the training dynamics of clean and mislabeled samples. A simple procedure - adding an extra class populated with purposefully mislabeled indicator samples - learns a threshold that isolates mislabeled data based on this metric. This approach consistently improves upon prior work on synthetic and real-world datasets. On the WebVision50 classification task our method removes 17 (absolute) improvement in test error. On CIFAR100 removing 13 leads to a 1.2

READ FULL TEXT

page 2

page 12

page 13

page 14

research
09/30/2022

Improve learning combining crowdsourced labels by weighting Areas Under the Margin

In supervised learning – for instance in image classification – modern m...
research
08/22/2022

Dynamic Adaptive Threshold based Learning for Noisy Annotations Robust Facial Expression Recognition

The real-world facial expression recognition (FER) datasets suffer from ...
research
10/15/2019

Notes on Lipschitz Margin, Lipschitz Margin Training, and Lipschitz Margin p-Values for Deep Neural Network Classifiers

We provide a local class purity theorem for Lipschitz continuous, half-r...
research
06/25/2021

A hybrid model-based and learning-based approach for classification using limited number of training samples

The fundamental task of classification given a limited number of trainin...
research
11/19/2022

Robust AUC Optimization under the Supervision of Clean Data

AUC (area under the ROC curve) optimization algorithms have drawn much a...
research
08/23/2022

The Value of Out-of-Distribution Data

More data helps us generalize to a task. But real datasets can contain o...
research
12/30/2020

Explainability Matters: Backdoor Attacks on Medical Imaging

Deep neural networks have been shown to be vulnerable to backdoor attack...

Please sign up or login with your details

Forgot password? Click here to reset