A Universal Trade-off Between the Model Size, Test Loss, and Training Loss of Linear Predictors

07/23/2022
by   Nikhil Ghosh, et al.
0

In this work we establish an algorithm and distribution independent non-asymptotic trade-off between the model size, excess test loss, and training loss of linear predictors. Specifically, we show that models that perform well on the test data (have low excess loss) are either "classical" – have training loss close to the noise level, or are "modern" – have a much larger number of parameters compared to the minimum needed to fit the training data exactly. We also provide a more precise asymptotic analysis when the limiting spectral distribution of the whitened features is Marchenko-Pastur. Remarkably, while the Marchenko-Pastur analysis is far more precise near the interpolation peak, where the number of parameters is just enough to fit the training data, in settings of most practical interest it differs from the distribution independent bound by only a modest multiplicative constant.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/15/2020

The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization

Modern deep learning models employ considerably more parameters than req...
research
11/09/2021

Harmless interpolation in regression and classification with structured features

Overparametrized neural networks tend to perfectly fit noisy training da...
research
03/21/2019

Harmless interpolation of noisy data in regression

A continuing mystery in understanding the empirical success of deep neur...
research
04/28/2020

A Doubly Regularized Linear Discriminant Analysis Classifier with Automatic Parameter Selection

Linear discriminant analysis (LDA) based classifiers tend to falter in m...
research
05/26/2021

A Universal Law of Robustness via Isoperimetry

Classically, data interpolation with a parametrized model class is possi...
research
02/09/2022

Agree to Disagree: Diversity through Disagreement for Better Transferability

Gradient-based learning algorithms have an implicit simplicity bias whic...
research
05/17/2018

Extrapolation in NLP

We argue that extrapolation to examples outside the training space will ...

Please sign up or login with your details

Forgot password? Click here to reset