On Uniform Convergence and Low-Norm Interpolation Learning

06/10/2020
by   Lijia Zhou, et al.
0

We consider an underdetermined noisy linear regression model where the minimum-norm interpolating predictor is known to be consistent, and ask: can uniform convergence in a norm ball, or at least (following Nagarajan and Kolter) the subset of a norm ball that the algorithm selects on a typical input set, explain this success? We show that uniformly bounding the difference between empirical and population errors cannot show any learning in the norm ball, and cannot show consistency for any set, even one depending on the exact algorithm and distribution. But we argue we can explain the consistency of the minimal-norm interpolator with a slightly weaker, yet standard, notion, uniform convergence of zero-error predictors. We use this to bound the generalization error of low- (but not minimal-) norm interpolating predictors.

READ FULL TEXT

page 1

page 2

page 3

page 4

03/08/2021

Exact Gap between Generalization Error and Uniform Convergence in Random Feature Models

Recent work showed that there could be a large gap between the classical...
06/17/2021

Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting

We consider interpolation learning in high-dimensional linear regression...
11/10/2021

Tight bounds for minimum l1-norm interpolation of noisy data

We provide matching upper and lower bounds of order σ^2/log(d/n) for the...
02/27/2022

Thinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex Optimization

We consider linear prediction with a convex Lipschitz loss, or more gene...
02/13/2019

Uniform convergence may be unable to explain generalization in deep learning

We cast doubt on the power of uniform convergence-based generalization b...
12/01/2020

On the robustness of minimum-norm interpolators

This article develops a general theory for minimum-norm interpolated est...
06/23/2022

Hausdorff Distance between Norm Balls and their Linear Maps

We consider the problem of computing the (two-sided) Hausdorff distance ...