Beyond neural scaling laws: beating power law scaling via data pruning

06/29/2022
by   Ben Sorscher, et al.
12

Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning. However, these improvements through scaling alone require considerable costs in compute and energy. Here we focus on the scaling of error with dataset size and show how both in theory and practice we can break beyond power law scaling and reduce it to exponential scaling instead if we have access to a high-quality data pruning metric that ranks the order in which training examples should be discarded to achieve any pruned dataset size. We then test this new exponential scaling prediction with pruned dataset size empirically, and indeed observe better than power law scaling performance on ResNets trained on CIFAR-10, SVHN, and ImageNet. Given the importance of finding high-quality pruning metrics, we perform the first large-scale benchmarking study of ten different data pruning metrics on ImageNet. We find most existing high performing metrics scale poorly to ImageNet, while the best are computationally intensive and require labels for every image. We therefore developed a new simple, cheap and scalable self-supervised pruning metric that demonstrates comparable performance to the best supervised metrics. Overall, our work suggests that the discovery of good data-pruning metrics may provide a viable path forward to substantially improved neural scaling laws, thereby reducing the resource costs of modern deep learning.

READ FULL TEXT

page 30

page 31

page 32

page 33

page 34

page 36

page 37

page 38

research
08/17/2022

Understanding Scaling Laws for Recommendation Models

Scale has been a major driving force in improving machine learning perfo...
research
02/14/2023

Data pruning and neural scaling laws: fundamental limitations of score-based algorithms

Data pruning algorithms are commonly used to reduce the memory and compu...
research
08/17/2021

Scaling Laws for Deep Learning

Running faster will only get you so far – it is generally advisable to f...
research
09/27/2022

Scaling Laws For Deep Learning Based Image Reconstruction

Deep neural networks trained end-to-end to map a measurement of a (noisy...
research
11/15/2022

Power-law Scaling to Assist with Key Challenges in Artificial Intelligence

Power-law scaling, a central concept in critical phenomena, is found to ...
research
06/18/2020

On the Predictability of Pruning Across Scales

We show that the error of magnitude-pruned networks follows a scaling la...
research
02/14/2023

Cliff-Learning

We study the data-scaling of transfer learning from foundation models in...

Please sign up or login with your details

Forgot password? Click here to reset