Reconciling modern machine learning and the bias-variance trade-off

12/28/2018
by   Mikhail Belkin, et al.
6

The question of generalization in machine learning---how algorithms are able to learn predictors from a training sample to make accurate predictions out-of-sample---is revisited in light of the recent breakthroughs in modern machine learning technology. The classical approach to understanding generalization is based on bias-variance trade-offs, where model complexity is carefully calibrated so that the fit on the training sample reflects performance out-of-sample. However, it is now common practice to fit highly complex models like deep neural networks to data with (nearly) zero training error, and yet these interpolating predictors are observed to have good out-of-sample accuracy even for noisy data. How can the classical understanding of generalization be reconciled with these observations from modern machine learning practice? In this paper, we bridge the two regimes by exhibiting a new "double descent" risk curve that extends the traditional U-shaped bias-variance curve beyond the point of interpolation. Specifically, the curve shows that as soon as the model complexity is high enough to achieve interpolation on the training sample---a point that we call the "interpolation threshold"---the risk of suitably chosen interpolating predictors from these models can, in fact, be decreasing as the model complexity increases, often below the risk achieved using non-interpolating models. The double descent risk curve is demonstrated for a broad range of models, including neural networks and random forests, and a mechanism for producing this behavior is posited.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/26/2020

Rethinking Bias-Variance Trade-off for Generalization of Neural Networks

The classical bias-variance trade-off predicts that bias decreases and v...
research
11/18/2022

Understanding the double descent curve in Machine Learning

The theory of bias-variance used to serve as a guide for model selection...
research
08/03/2020

Multiple Descent: Design Your Own Generalization Curve

This paper explores the generalization loss of linear regression in vari...
research
12/11/2020

Beyond Occam's Razor in System Identification: Double-Descent when Modeling Dynamics

System identification aims to build models of dynamical systems from dat...
research
11/18/2020

Bias-Variance Trade-off and Overlearning in Dynamic Decision Problems

Modern Monte Carlo-type approaches to dynamic decision problems face the...
research
05/29/2021

Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation

In the past decade the mathematical theory of machine learning has lagge...

Please sign up or login with your details

Forgot password? Click here to reset