A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning

by   Yehuda Dar, et al.

The rapid recent progress in machine learning (ML) has raised a number of scientific questions that challenge the longstanding dogma of the field. One of the most important riddles is the good empirical generalization of overparameterized models. Overparameterized models are excessively complex with respect to the size of the training dataset, which results in them perfectly fitting (i.e., interpolating) the training data, which is usually noisy. Such interpolation of noisy data is traditionally associated with detrimental overfitting, and yet a wide range of interpolating models – from simple linear models to deep neural networks – have recently been observed to generalize extremely well on fresh test data. Indeed, the recently discovered double descent phenomenon has revealed that highly overparameterized models often improve over the best underparameterized model in test performance. Understanding learning in this overparameterized regime requires new theory and foundational empirical studies, even for the simplest case of the linear model. The underpinnings of this understanding have been laid in very recent analyses of overparameterized linear regression and related statistical learning tasks, which resulted in precise analytic characterizations of double descent. This paper provides a succinct overview of this emerging theory of overparameterized ML (henceforth abbreviated as TOPML) that explains these recent findings through a statistical signal processing perspective. We emphasize the unique aspects that define the TOPML research area as a subfield of modern ML theory and outline interesting open questions that remain.


page 10

page 11

page 19

page 32


More Data Can Hurt for Linear Regression: Sample-wise Double Descent

In this expository note we describe a surprising phenomenon in overparam...

The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization

Modern deep learning models employ considerably more parameters than req...

Asymptotics of Ridge Regression in Convolutional Models

Understanding generalization and estimation error of estimators for simp...

Fitting Elephants

Textbook wisdom advocates for smooth function fits and implies that inte...

Deep Double Descent via Smooth Interpolation

Overparameterized deep networks are known to be able to perfectly fit th...

Fluctuations, Bias, Variance Ensemble of Learners: Exact Asymptotics for Convex Losses in High-Dimension

From the sampling of data to the initialisation of parameters, randomnes...

Functional Data Regression Reconciles with Excess Bases

As the development of measuring instruments and computers has accelerate...

Please sign up or login with your details

Forgot password? Click here to reset