Explaining Neural Scaling Laws

by   Yasaman Bahri, et al.

The test loss of well-trained neural networks often follows precise power-law scaling relations with either the size of the training dataset or the number of parameters in the network. We propose a theory that explains and connects these scaling laws. We identify variance-limited and resolution-limited scaling behavior for both dataset and model size, for a total of four scaling regimes. The variance-limited scaling follows simply from the existence of a well-behaved infinite data or infinite width limit, while the resolution-limited regime can be explained by positing that models are effectively resolving a smooth data manifold. In the large width limit, this can be equivalently obtained from the spectrum of certain kernels, and we present evidence that large width and large dataset resolution-limited scaling exponents are related by a duality. We exhibit all four scaling regimes in the controlled setting of large random feature and pretrained models and test the predictions empirically on a range of standard architectures and datasets. We also observe several empirical relationships between datasets and scaling exponents: super-classing image tasks does not change exponents, while changing input distribution (via changing datasets or adding noise) has a strong effect. We further explore the effect of architecture aspect ratio on scaling exponents.



page 1

page 2

page 3

page 4


Asymptotics of Wide Convolutional Neural Networks

Wide neural networks have proven to be a rich class of architectures for...

Dynamically Stable Infinite-Width Limits of Neural Classifiers

Recent research has been focused on two different approaches to studying...

Limitations of the NTK for Understanding Generalization in Deep Learning

The “Neural Tangent Kernel” (NTK) (Jacot et al 2018), and its empirical ...

A Constructive Prediction of the Generalization Error Across Scales

The dependency of the generalization error of neural networks on model a...

A Neural Scaling Law from the Dimension of the Data Manifold

When data is plentiful, the loss achieved by well-trained neural network...

The Scalability, Efficiency and Complexity of Universities and Colleges: A New Lens for Assessing the Higher Educational System

The growing need for affordable and accessible higher education is a maj...

Scaling in Words on Twitter

Scaling properties of language are a useful tool for understanding gener...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.