A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning

09/06/2021
by   Yehuda Dar, et al.
309

The rapid recent progress in machine learning (ML) has raised a number of scientific questions that challenge the longstanding dogma of the field. One of the most important riddles is the good empirical generalization of overparameterized models. Overparameterized models are excessively complex with respect to the size of the training dataset, which results in them perfectly fitting (i.e., interpolating) the training data, which is usually noisy. Such interpolation of noisy data is traditionally associated with detrimental overfitting, and yet a wide range of interpolating models – from simple linear models to deep neural networks – have recently been observed to generalize extremely well on fresh test data. Indeed, the recently discovered double descent phenomenon has revealed that highly overparameterized models often improve over the best underparameterized model in test performance. Understanding learning in this overparameterized regime requires new theory and foundational empirical studies, even for the simplest case of the linear model. The underpinnings of this understanding have been laid in very recent analyses of overparameterized linear regression and related statistical learning tasks, which resulted in precise analytic characterizations of double descent. This paper provides a succinct overview of this emerging theory of overparameterized ML (henceforth abbreviated as TOPML) that explains these recent findings through a statistical signal processing perspective. We emphasize the unique aspects that define the TOPML research area as a subfield of modern ML theory and outline interesting open questions that remain.

READ FULL TEXT

page 10

page 11

page 19

page 32

research
12/16/2019

More Data Can Hurt for Linear Regression: Sample-wise Double Descent

In this expository note we describe a surprising phenomenon in overparam...
research
08/15/2020

The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization

Modern deep learning models employ considerably more parameters than req...
research
03/08/2021

Asymptotics of Ridge Regression in Convolutional Models

Understanding generalization and estimation error of estimators for simp...
research
03/31/2021

Fitting Elephants

Textbook wisdom advocates for smooth function fits and implies that inte...
research
09/21/2022

Deep Double Descent via Smooth Interpolation

Overparameterized deep networks are known to be able to perfectly fit th...
research
01/31/2022

Fluctuations, Bias, Variance Ensemble of Learners: Exact Asymptotics for Convex Losses in High-Dimension

From the sampling of data to the initialisation of parameters, randomnes...
research
08/03/2023

Functional Data Regression Reconciles with Excess Bases

As the development of measuring instruments and computers has accelerate...

Please sign up or login with your details

Forgot password? Click here to reset