Broken Neural Scaling Laws

by   Ethan Caballero, et al.

We present a smoothly broken power law functional form that accurately models and extrapolates the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as the amount of compute used for training, number of model parameters, training dataset size, or upstream performance varies) for each task within a large and diverse set of upstream and downstream tasks, in zero-shot, prompted, and fine-tuned settings. This set includes large-scale vision and unsupervised language tasks, diffusion generative modeling of images, arithmetic, and reinforcement learning. When compared to other functional forms for neural scaling behavior, this functional form yields extrapolations of scaling behavior that are considerably more accurate (root mean squared log error of its extrapolations are 0.86 times that of previous state-of-the-art on average) on this set. Moreover, this functional form accurately models and extrapolates scaling behavior that other functional forms are incapable of expressing such as the non-monotonic transitions present in the scaling behavior of phenomena such as double descent and the delayed, sharp inflection points present in the scaling behavior of tasks such as arithmetic. Code is available at


page 1

page 2

page 3

page 4


Reproducible scaling laws for contrastive language-image learning

Scaling up neural networks has led to remarkable performance across a wi...

Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts

Previous work has shown that there exists a scaling law between the size...

A Constructive Prediction of the Generalization Error Across Scales

The dependency of the generalization error of neural networks on model a...

Scaling Data-Constrained Language Models

The current trend of scaling language models involves increasing both pa...

CLIPA-v2: Scaling CLIP Training with 81.1 within a $10,000 Budget; An Extra $4,000 Unlocks 81.8

The recent work CLIPA presents an inverse scaling law for CLIP training ...

An Inverse Scaling Law for CLIP Training

CLIP, the first foundation model that connects images and text, has enab...

Please sign up or login with your details

Forgot password? Click here to reset