Learning Curve Theory

02/08/2021
by   Marcus Hutter, et al.
7

Recently a number of empirical "universal" scaling law papers have been published, most notably by OpenAI. `Scaling laws' refers to power-law decreases of training or test error w.r.t. more data, larger neural networks, and/or more compute. In this work we focus on scaling w.r.t. data size n. Theoretical understanding of this phenomenon is largely lacking, except in finite-dimensional models for which error typically decreases with n^-1/2 or n^-1, where n is the sample size. We develop and theoretically analyse the simplest possible (toy) model that can exhibit n^-β learning curves for arbitrary power β>0, and determine whether power laws are universal or depend on the data distribution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/16/2023

Two Phases of Scaling Laws for Nearest Neighbor Classifiers

A scaling law refers to the observation that the test performance of a m...
research
11/09/2021

Turing-Universal Learners with Optimal Scaling Laws

For a given distribution, learning algorithm, and performance metric, th...
research
02/14/2023

Cliff-Learning

We study the data-scaling of transfer learning from foundation models in...
research
05/26/2021

A Universal Law of Robustness via Isoperimetry

Classically, data interpolation with a parametrized model class is possi...
research
06/26/2023

The Underlying Scaling Laws and Universal Statistical Structure of Complex Datasets

We study universal traits which emerge both in real-world complex datase...
research
08/17/2021

Scaling Laws for Deep Learning

Running faster will only get you so far – it is generally advisable to f...
research
05/27/2019

Power laws in code repositories: A skeptical approach

Software development as done using modern methodologies and source contr...

Please sign up or login with your details

Forgot password? Click here to reset