Does `Deep Learning on a Data Diet' reproduce? Overall yes, but GraNd at Initialization does not

03/26/2023
by   Andreas Kirsch, et al.
0

The paper 'Deep Learning on a Data Diet' by Paul et al. (2021) introduces two innovative metrics for pruning datasets during the training of neural networks. While we are able to replicate the results for the EL2N score at epoch 20, the same cannot be said for the GraNd score at initialization. The GraNd scores later in training provide useful pruning signals, however. The GraNd score at initialization calculates the average gradient norm of an input sample across multiple randomly initialized models before any training has taken place. Our analysis reveals a strong correlation between the GraNd score at initialization and the input norm of a sample, suggesting that the latter could have been a cheap new baseline for data pruning. Unfortunately, neither the GraNd score at initialization nor the input norm surpasses random pruning in performance. This contradicts one of the findings in Paul et al. (2021). We were unable to reproduce their CIFAR-10 results using both an updated version of the original JAX repository and in a newly implemented PyTorch codebase. An investigation of the underlying JAX/FLAX code from 2021 surfaced a bug in the checkpoint restoring code that was fixed in April 2021 (https://github.com/google/flax/commit/28fbd95500f4bf2f9924d2560062fa50e919b1a5).

READ FULL TEXT
research
09/18/2020

Pruning Neural Networks at Initialization: Why are We Missing the Mark?

Recent work has explored the possibility of pruning neural networks at i...
research
02/19/2020

Pruning untrained neural networks: Principles and Analysis

Overparameterized neural networks display state-of-the art performance. ...
research
07/01/2022

Studying the impact of magnitude pruning on contrastive learning methods

We study the impact of different pruning techniques on the representatio...
research
05/24/2022

Compression-aware Training of Neural Networks using Frank-Wolfe

Many existing Neural Network pruning approaches either rely on retrainin...
research
10/21/2021

Towards strong pruning for lottery tickets with non-zero biases

The strong lottery ticket hypothesis holds the promise that pruning rand...
research
02/14/2023

Data pruning and neural scaling laws: fundamental limitations of score-based algorithms

Data pruning algorithms are commonly used to reduce the memory and compu...
research
06/21/2023

Fantastic Weights and How to Find Them: Where to Prune in Dynamic Sparse Training

Dynamic Sparse Training (DST) is a rapidly evolving area of research tha...

Please sign up or login with your details

Forgot password? Click here to reset