Explaining Neural Scaling Laws

02/12/2021
by   Yasaman Bahri, et al.
0

The test loss of well-trained neural networks often follows precise power-law scaling relations with either the size of the training dataset or the number of parameters in the network. We propose a theory that explains and connects these scaling laws. We identify variance-limited and resolution-limited scaling behavior for both dataset and model size, for a total of four scaling regimes. The variance-limited scaling follows simply from the existence of a well-behaved infinite data or infinite width limit, while the resolution-limited regime can be explained by positing that models are effectively resolving a smooth data manifold. In the large width limit, this can be equivalently obtained from the spectrum of certain kernels, and we present evidence that large width and large dataset resolution-limited scaling exponents are related by a duality. We exhibit all four scaling regimes in the controlled setting of large random feature and pretrained models and test the predictions empirically on a range of standard architectures and datasets. We also observe several empirical relationships between datasets and scaling exponents: super-classing image tasks does not change exponents, while changing input distribution (via changing datasets or adding noise) has a strong effect. We further explore the effect of architecture aspect ratio on scaling exponents.

READ FULL TEXT

Authors

page 1

page 2

page 3

page 4

08/19/2020

Asymptotics of Wide Convolutional Neural Networks

Wide neural networks have proven to be a rich class of architectures for...
06/11/2020

Dynamically Stable Infinite-Width Limits of Neural Classifiers

Recent research has been focused on two different approaches to studying...
06/20/2022

Limitations of the NTK for Understanding Generalization in Deep Learning

The “Neural Tangent Kernel” (NTK) (Jacot et al 2018), and its empirical ...
09/27/2019

A Constructive Prediction of the Generalization Error Across Scales

The dependency of the generalization error of neural networks on model a...
04/22/2020

A Neural Scaling Law from the Dimension of the Data Manifold

When data is plentiful, the loss achieved by well-trained neural network...
10/12/2019

The Scalability, Efficiency and Complexity of Universities and Colleges: A New Lens for Assessing the Higher Educational System

The growing need for affordable and accessible higher education is a maj...
03/11/2019

Scaling in Words on Twitter

Scaling properties of language are a useful tool for understanding gener...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.