Scaling Laws for Deep Learning

08/17/2021
by   Jonathan S. Rosenfeld, et al.
0

Running faster will only get you so far – it is generally advisable to first understand where the roads lead, then get a car ... The renaissance of machine learning (ML) and deep learning (DL) over the last decade is accompanied by an unscalable computational cost, limiting its advancement and weighing on the field in practice. In this thesis we take a systematic approach to address the algorithmic and methodological limitations at the root of these costs. We first demonstrate that DL training and pruning are predictable and governed by scaling laws – for state of the art models and tasks, spanning image classification and language modeling, as well as for state of the art model compression via iterative pruning. Predictability, via the establishment of these scaling laws, provides the path for principled design and trade-off reasoning, currently largely lacking in the field. We then continue to analyze the sources of the scaling laws, offering an approximation-theoretic view and showing through the exploration of a noiseless realizable case that DL is in fact dominated by error sources very far from the lower error limit. We conclude by building on the gained theoretical understanding of the scaling laws' origins. We present a conjectural path to eliminate one of the current dominant error sources – through a data bandwidth limiting hypothesis and the introduction of Nyquist learners – which can, in principle, reach the generalization error lower limit (e.g. 0 in the noiseless case), at finite dataset size.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/29/2022

Beyond neural scaling laws: beating power law scaling via data pruning

Widely observed neural scaling laws, in which error falls off as a power...
research
02/08/2021

Learning Curve Theory

Recently a number of empirical "universal" scaling law papers have been ...
research
12/01/2017

Deep Learning Scaling is Predictable, Empirically

Deep learning (DL) creates impactful advances following a virtuous recip...
research
02/14/2023

Data pruning and neural scaling laws: fundamental limitations of score-based algorithms

Data pruning algorithms are commonly used to reduce the memory and compu...
research
07/05/2023

Scaling Laws Do Not Scale

Recent work has proposed a power law relationship, referred to as “scali...
research
05/19/2022

Deep Learning in Business Analytics: A Clash of Expectations and Reality

Our fast-paced digital economy shaped by global competition requires inc...
research
11/19/2017

BPGrad: Towards Global Optimality in Deep Learning via Branch and Pruning

Understanding the global optimality in deep learning (DL) has been attra...

Please sign up or login with your details

Forgot password? Click here to reset