Characterizing Datapoints via Second-Split Forgetting

10/26/2022
by   Pratyush Maini, et al.
0

Researchers investigating example hardness have increasingly focused on the dynamics by which neural networks learn and forget examples throughout training. Popular metrics derived from these dynamics include (i) the epoch at which examples are first correctly classified; (ii) the number of times their predictions flip during training; and (iii) whether their prediction flips if they are held out. However, these metrics do not distinguish among examples that are hard for distinct reasons, such as membership in a rare subpopulation, being mislabeled, or belonging to a complex subpopulation. In this paper, we propose second-split forgetting time (SSFT), a complementary metric that tracks the epoch (if any) after which an original training example is forgotten as the network is fine-tuned on a randomly held out partition of the data. Across multiple benchmark datasets and modalities, we demonstrate that mislabeled examples are forgotten quickly, and seemingly rare examples are forgotten comparatively slowly. By contrast, metrics only considering the first split learning dynamics struggle to differentiate the two. At large learning rates, SSFT tends to be robust across architectures, optimizers, and random seeds. From a practical standpoint, the SSFT can (i) help to identify mislabeled samples, the removal of which improves generalization; and (ii) provide insights about failure modes. Through theoretical analysis addressing overparameterized linear models, we provide insights into how the observed phenomena may arise. Code for reproducing our experiments can be found here: https://github.com/pratyushmaini/ssft

READ FULL TEXT

page 8

page 25

page 28

research
12/12/2018

An Empirical Study of Example Forgetting during Deep Neural Network Learning

Inspired by the phenomenon of catastrophic forgetting, we investigate th...
research
02/12/2023

Theory on Forgetting and Generalization of Continual Learning

Continual learning (CL), which aims to learn a sequence of tasks, has at...
research
02/10/2022

Understanding Rare Spurious Correlations in Neural Networks

Neural networks are known to use spurious correlations for classificatio...
research
09/08/2021

EMA: Auditing Data Removal from Trained Models

Data auditing is a process to verify whether certain data have been remo...
research
05/26/2019

All Neural Networks are Created Equal

One of the unresolved questions in the context of deep learning is the t...
research
09/04/2022

Beyond Random Split for Assessing Statistical Model Performance

Even though a train/test split of the dataset randomly performed is a co...
research
12/15/2021

Lifelong Generative Modelling Using Dynamic Expansion Graph Model

Variational Autoencoders (VAEs) suffer from degenerated performance, whe...

Please sign up or login with your details

Forgot password? Click here to reset