A Constructive Prediction of the Generalization Error Across Scales

09/27/2019
by   Jonathan S. Rosenfeld, et al.
11

The dependency of the generalization error of neural networks on model and dataset size is of critical importance both in practice and for understanding the theory of neural networks. Nevertheless, the functional form of this dependency remains elusive. In this work, we present a functional form which approximates well the generalization error in practice. Capitalizing on the successful concept of model scaling (e.g., width, depth), we are able to simultaneously construct such a form and specify the exact models which can attain it across model/data scales. Our construction follows insights obtained from observations conducted over a range of model/data scales, in various model types and datasets, in vision and language tasks. We show that the form both fits the observations well across scales, and provides accurate predictions from small- to large-scale models and data.

READ FULL TEXT
06/18/2020

On the Predictability of Pruning Across Scales

We show that the error of magnitude-pruned networks follows a scaling la...
02/12/2021

Explaining Neural Scaling Laws

The test loss of well-trained neural networks often follows precise powe...
02/02/2013

A New Constructive Method to Optimize Neural Network Architecture and Generalization

In this paper, after analyzing the reasons of poor generalization and ov...
11/11/2019

Stronger Convergence Results for Deep Residual Networks: Network Width Scales Linearly with Training Data Size

Deep neural networks are highly expressive machine learning models with ...
12/01/2017

Deep Learning Scaling is Predictable, Empirically

Deep learning (DL) creates impactful advances following a virtuous recip...
12/21/2020

Predicting the Critical Number of Layers for Hierarchical Support Vector Regression

Hierarchical support vector regression (HSVR) models a function from dat...
12/27/2021

Shock trace prediction by reduced models for a viscous stochastic Burgers equation

Viscous shocks are a particular type of extreme events in nonlinear mult...