The Quantization Model of Neural Scaling

03/23/2023
by   Eric J. Michaud, et al.
0

We propose the Quantization Model of neural scaling laws, explaining both the observed power law dropoff of loss with model and data size, and also the sudden emergence of new capabilities with scale. We derive this model from what we call the Quantization Hypothesis, where learned network capabilities are quantized into discrete chunks (quanta). We show that when quanta are learned in order of decreasing use frequency, then a power law in use frequencies explains observed power law scaling of loss. We validate this prediction on toy datasets, then study how scaling curves decompose for large language models. Using language model internals, we auto-discover diverse model capabilities (quanta) and find tentative evidence that the distribution over corresponding subproblems in the prediction of natural text is compatible with the power law predicted from the neural scaling exponent as predicted from our theory.

READ FULL TEXT

page 7

page 13

page 18

page 22

research
10/30/2022

A Solvable Model of Neural Scaling Laws

Large language models with a huge number of parameters, when trained on ...
research
03/28/2015

Some Further Evidence about Magnification and Shape in Neural Gas

Neural gas (NG) is a robust vector quantization algorithm with a well-kn...
research
04/22/2020

A Neural Scaling Law from the Dimension of the Data Manifold

When data is plentiful, the loss achieved by well-trained neural network...
research
02/17/2023

A Simplistic Model of Neural Scaling Laws: Multiperiodic Santa Fe Processes

It was observed that large language models exhibit a power-law decay of ...
research
05/16/2017

Agent-based model for the origins of scaling in human language

Background/Introduction: The Zipf's law establishes that if the words of...
research
10/25/2017

An information scaling law: ζ= 3/4

Consider the entropy of a unit Gaussian convolved over a discrete set of...
research
03/10/2016

Zipf's law emerges asymptotically during phase transitions in communicative systems

Zipf's law predicts a power-law relationship between word rank and frequ...

Please sign up or login with your details

Forgot password? Click here to reset