Deep Learning Scaling is Predictable, Empirically

12/01/2017
by   Joel Hestness, et al.
0

Deep learning (DL) creates impactful advances following a virtuous recipe: model architecture search, creating large training data sets, and scaling computation. It is widely believed that growing training sets and models should improve accuracy and result in better products. As DL application domains grow, we would like a deeper understanding of the relationships between training set size, computational scale, and model accuracy improvements to advance the state-of-the-art. This paper presents a large scale empirical characterization of generalization error and model size growth as training sets grow. We introduce a methodology for this measurement and test four machine learning domains: machine translation, language modeling, image processing, and speech recognition. Our empirical results show power-law generalization error scaling across a breadth of factors, resulting in power-law exponents---the "steepness" of the learning curve---yet to be explained by theoretical work. Further, model improvements only shift the error but do not appear to affect the power-law exponent. We also show that model size scales sublinearly with data size. These scaling relationships have significant implications on deep learning research, practice, and systems. They can assist model debugging, setting accuracy targets, and decisions about data set growth. They can also guide computing system design and underscore the importance of continued computational scaling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/03/2019

Beyond Human-Level Accuracy: Computational Challenges in Deep Learning

Deep learning (DL) research yields accuracy and product improvements fro...
research
11/15/2022

Power-law Scaling to Assist with Key Challenges in Artificial Intelligence

Power-law scaling, a central concept in critical phenomena, is found to ...
research
10/23/2018

Language Modeling at Scale

We show how Zipf's Law can be used to scale up language modeling (LM) to...
research
10/12/2019

The Scalability, Efficiency and Complexity of Universities and Colleges: A New Lens for Assessing the Higher Educational System

The growing need for affordable and accessible higher education is a maj...
research
08/17/2021

Scaling Laws for Deep Learning

Running faster will only get you so far – it is generally advisable to f...
research
11/15/2021

Scaling Law for Recommendation Models: Towards General-purpose User Representations

A recent trend shows that a general class of models, e.g., BERT, GPT-3, ...
research
09/27/2019

A Constructive Prediction of the Generalization Error Across Scales

The dependency of the generalization error of neural networks on model a...

Please sign up or login with your details

Forgot password? Click here to reset