Understanding overfitting peaks in generalization error: Analytical risk curves for l_2 and l_1 penalized interpolation

06/09/2019
by   Partha P. Mitra, et al.
0

Traditionally in regression one minimizes the number of fitting parameters or uses smoothing/regularization to trade training (TE) and generalization error (GE). Driving TE to zero by increasing fitting degrees of freedom (dof) is expected to increase GE. However modern big-data approaches, including deep nets, seem to over-parametrize and send TE to zero (data interpolation) without impacting GE. Overparametrization has the benefit that global minima of the empirical loss function proliferate and become easier to find. These phenomena have drawn theoretical attention. Regression and classification algorithms have been shown that interpolate data but also generalize optimally. An interesting related phenomenon has been noted: the existence of non-monotonic risk curves, with a peak in GE with increasing dof. It was suggested that this peak separates a classical regime from a modern regime where over-parametrization improves performance. Similar over-fitting peaks were reported previously (statistical physics approach to learning) and attributed to increased fitting model flexibility. We introduce a generative and fitting model pair ("Misparametrized Sparse Regression" or MiSpaR) and show that the overfitting peak can be dissociated from the point at which the fitting function gains enough dof's to match the data generative model and thus provides good generalization. This complicates the interpretation of overfitting peaks as separating a "classical" from a "modern" regime. Data interpolation itself cannot guarantee good generalization: we need to study the interpolation with different penalty terms. We present analytical formulae for GE curves for MiSpaR with l_2 and l_1 penalties, in the interpolating limit λ→ 0.These risk curves exhibit important differences and help elucidate the underlying phenomena.

READ FULL TEXT
research
03/31/2021

Fitting Elephants

Textbook wisdom advocates for smooth function fits and implies that inte...
research
03/21/2019

Harmless interpolation of noisy data in regression

A continuing mystery in understanding the empirical success of deep neur...
research
08/06/2020

Benign Overfitting and Noisy Features

Modern machine learning often operates in the regime where the number of...
research
02/05/2018

To understand deep learning we need to understand kernel learning

Generalization performance of classifiers in deep learning has recently ...
research
06/05/2020

Triple descent and the two kinds of overfitting: Where why do they appear?

A recent line of research has highlighted the existence of a double desc...
research
01/18/2023

Strong inductive biases provably prevent harmless interpolation

Classical wisdom suggests that estimators should avoid fitting noise to ...
research
07/01/2016

A new analytical approach to consistency and overfitting in regularized empirical risk minimization

This work considers the problem of binary classification: given training...

Please sign up or login with your details

Forgot password? Click here to reset