Distribution Density, Tails, and Outliers in Machine Learning: Metrics and Applications

10/29/2019
by   Nicholas Carlini, et al.
24

We develop techniques to quantify the degree to which a given (training or testing) example is an outlier in the underlying distribution. We evaluate five methods to score examples in a dataset by how well-represented the examples are, for different plausible definitions of "well-represented", and apply these to four common datasets: MNIST, Fashion-MNIST, CIFAR-10, and ImageNet. Despite being independent approaches, we find all five are highly correlated, suggesting that the notion of being well-represented can be quantified. Among other uses, we find these methods can be combined to identify (a) prototypical examples (that match human expectations); (b) memorized training examples; and, (c) uncommon submodes of the dataset. Further, we show how we can utilize our metrics to determine an improved ordering for curriculum learning, and impact adversarial robustness. We release all metric values on training and test sets we studied.

READ FULL TEXT

page 18

page 19

page 20

page 24

page 25

page 33

page 40

page 42

research
06/19/2019

Training on test data: Removing near duplicates in Fashion-MNIST

MNIST and Fashion MNIST are extremely popular for testing in the machine...
research
08/11/2020

Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks

In a data poisoning attack, an attacker modifies, deletes, and/or insert...
research
05/24/2016

Measuring Neural Net Robustness with Constraints

Despite having high accuracy, neural nets have been shown to be suscepti...
research
07/07/2022

A Study on the Predictability of Sample Learning Consistency

Curriculum Learning is a powerful training method that allows for faster...
research
05/13/2018

Curriculum Adversarial Training

Recently, deep learning has been applied to many security-sensitive appl...
research
03/04/2020

The Impact of Hole Geometry on Relative Robustness of In-Painting Networks: An Empirical Study

In-painting networks use existing pixels to generate appropriate pixels ...
research
09/29/2021

BulletTrain: Accelerating Robust Neural Network Training via Boundary Example Mining

Neural network robustness has become a central topic in machine learning...

Please sign up or login with your details

Forgot password? Click here to reset