Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression

06/14/2023
by   Shahar Stein Ioushua, et al.
0

Learning algorithms that divide the data into batches are prevalent in many machine-learning applications, typically offering useful trade-offs between computational efficiency and performance. In this paper, we examine the benefits of batch-partitioning through the lens of a minimum-norm overparameterized linear regression model with isotropic Gaussian features. We suggest a natural small-batch version of the minimum-norm estimator, and derive an upper bound on its quadratic risk, showing it is inversely proportional to the noise level as well as to the overparameterization ratio, for the optimal choice of batch size. In contrast to minimum-norm, our estimator admits a stable risk behavior that is monotonically increasing in the overparameterization ratio, eliminating both the blowup at the interpolation point and the double-descent phenomenon. Interestingly, we observe that this implicit regularization offered by the batch partition is partially explained by feature overlap between the batches. Our bound is derived via a novel combination of techniques, in particular normal approximation in the Wasserstein metric of noisy projections over random subspaces.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2021

Minimum ℓ_1-norm interpolators: Precise asymptotics and multiple descent

An evolving line of machine learning works observe empirical evidence th...
research
03/02/2023

High-dimensional analysis of double descent for linear regression with random projections

We consider linear regression problems with a varying number of random p...
research
10/21/2021

Conditioning of Random Feature Matrices: Double Descent and Generalization Error

We provide (high probability) bounds on the condition number of random f...
research
02/06/2020

Interpolation under latent factor regression models

This work studies finite-sample properties of the risk of the minimum-no...
research
12/10/2019

Exact expressions for double descent and implicit regularization via surrogate random design

Double descent refers to the phase transition that is exhibited by the g...
research
02/02/2020

Overfitting Can Be Harmless for Basis Pursuit: Only to a Degree

Recently, there have been significant interests in studying the generali...
research
10/06/2021

Foolish Crowds Support Benign Overfitting

We prove a lower bound on the excess risk of sparse interpolating proced...

Please sign up or login with your details

Forgot password? Click here to reset