Harmless interpolation of noisy data in regression

03/21/2019
by   Vidya Muthukumar, et al.
0

A continuing mystery in understanding the empirical success of deep neural networks has been in their ability to achieve zero training error and yet generalize well, even when the training data is noisy and there are more parameters than data points. We investigate this "overparametrization" phenomena in the classical underdetermined linear regression problem, where all solutions that minimize training error interpolate the data, including noise. We give a bound on how well such interpolative solutions can generalize to fresh test data, and show that this bound generically decays to zero with the number of extra features, thus characterizing an explicit benefit of overparameterization. For appropriately sparse linear models, we provide a hybrid interpolating scheme (combining classical sparse recovery schemes with harmless noise-fitting) to achieve generalization error close to the bound on interpolative solutions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2022

Memorize to Generalize: on the Necessity of Interpolation in High Dimensional Linear Regression

We examine the necessity of interpolation in overparameterized models, t...
research
06/09/2019

Understanding overfitting peaks in generalization error: Analytical risk curves for l_2 and l_1 penalized interpolation

Traditionally in regression one minimizes the number of fitting paramete...
research
10/03/2022

Understanding Influence Functions and Datamodels via Harmonic Analysis

Influence functions estimate effect of individual data points on predict...
research
05/25/2019

Global Minima of DNNs: The Plenty Pantry

A common strategy to train deep neural networks (DNNs) is to use very la...
research
07/23/2022

A Universal Trade-off Between the Model Size, Test Loss, and Training Loss of Linear Predictors

In this work we establish an algorithm and distribution independent non-...
research
11/07/2022

Highly over-parameterized classifiers generalize since bad solutions are rare

We study the generalization of over-parameterized classifiers where Empi...
research
02/17/2018

An analysis of training and generalization errors in shallow and deep networks

An open problem around deep networks is the apparent absence of over-fit...

Please sign up or login with your details

Forgot password? Click here to reset