A Solvable Model of Neural Scaling Laws

10/30/2022
by   Alexander Maloney, et al.
0

Large language models with a huge number of parameters, when trained on near internet-sized number of tokens, have been empirically shown to obey neural scaling laws: specifically, their performance behaves predictably as a power law in either parameters or dataset size until bottlenecked by the other resource. To understand this better, we first identify the necessary properties allowing such scaling laws to arise and then propose a statistical model – a joint generative data model and random feature model – that captures this neural scaling phenomenology. By solving this model in the dual limit of large training set size and large number of parameters, we gain insight into (i) the statistical structure of datasets and tasks that lead to scaling laws, (ii) the way nonlinear feature maps, such as those provided by neural networks, enable scaling laws when trained on these datasets, (iii) the optimality of the equiparameterization scaling of training sets and parameters, and (iv) whether such scaling laws can break down and how they behave when they do. Key findings are the manner in which the power laws that occur in the statistics of natural datasets are extended by nonlinear random feature maps and then translated into power-law scalings of the test loss and how the finite extent of the data's spectral power law causes the model's performance to plateau.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/12/2021

Explaining Neural Scaling Laws

The test loss of well-trained neural networks often follows precise powe...
research
03/23/2023

The Quantization Model of Neural Scaling

We propose the Quantization Model of neural scaling laws, explaining bot...
research
01/10/2023

Scaling Laws for Generative Mixed-Modal Language Models

Generative language models define distributions over sequences of tokens...
research
06/26/2023

The Underlying Scaling Laws and Universal Statistical Structure of Complex Datasets

We study universal traits which emerge both in real-world complex datase...
research
03/22/2016

Completely random measures for modeling power laws in sparse graphs

Network data appear in a number of applications, such as online social n...
research
07/16/2020

On Power Laws in Deep Ensembles

Ensembles of deep neural networks are known to achieve state-of-the-art ...
research
09/15/2023

Scaling Laws for Sparsely-Connected Foundation Models

We explore the impact of parameter sparsity on the scaling behavior of T...

Please sign up or login with your details

Forgot password? Click here to reset