A Constructive Prediction of the Generalization Error Across Scales

by   Jonathan S. Rosenfeld, et al.

The dependency of the generalization error of neural networks on model and dataset size is of critical importance both in practice and for understanding the theory of neural networks. Nevertheless, the functional form of this dependency remains elusive. In this work, we present a functional form which approximates well the generalization error in practice. Capitalizing on the successful concept of model scaling (e.g., width, depth), we are able to simultaneously construct such a form and specify the exact models which can attain it across model/data scales. Our construction follows insights obtained from observations conducted over a range of model/data scales, in various model types and datasets, in vision and language tasks. We show that the form both fits the observations well across scales, and provides accurate predictions from small- to large-scale models and data.


On the Predictability of Pruning Across Scales

We show that the error of magnitude-pruned networks follows a scaling la...

Explaining Neural Scaling Laws

The test loss of well-trained neural networks often follows precise powe...

A New Constructive Method to Optimize Neural Network Architecture and Generalization

In this paper, after analyzing the reasons of poor generalization and ov...

Stronger Convergence Results for Deep Residual Networks: Network Width Scales Linearly with Training Data Size

Deep neural networks are highly expressive machine learning models with ...

Deep Learning Scaling is Predictable, Empirically

Deep learning (DL) creates impactful advances following a virtuous recip...

Predicting the Critical Number of Layers for Hierarchical Support Vector Regression

Hierarchical support vector regression (HSVR) models a function from dat...

Shock trace prediction by reduced models for a viscous stochastic Burgers equation

Viscous shocks are a particular type of extreme events in nonlinear mult...