A Continuous-Time View of Early Stopping for Least Squares Regression

10/23/2018
by   Alnur Ali, et al.
0

We study the statistical properties of the iterates generated by gradient descent, applied to the fundamental problem of least squares regression. We take a continuous-time view, i.e., consider infinitesimal step sizes in gradient descent, in which case the iterates form a trajectory called gradient flow. In a random matrix theory setup, which allows the number of samples n and features p to diverge in such a way that p/n →γ∈ (0,∞), we derive and analyze an asymptotic risk expression for gradient flow. In particular, we compare the asymptotic risk profile of gradient flow to that of ridge regression. When the feature covariance is spherical, we show that the optimal asymptotic gradient flow risk is between 1 and 1.25 times the optimal asymptotic ridge risk. Further, we derive a calibration between the two risk curves under which the asymptotic gradient flow risk no more than 2.25 times the asymptotic ridge risk, at all points along the path. We present a number of other results illustrating the connections between gradient flow and ℓ_2 regularization, and numerical experiments that support our theory.

READ FULL TEXT
research
01/14/2022

The Implicit Regularization of Momentum Gradient Descent with Early Stopping

The study on the implicit regularization induced by gradient-based optim...
research
03/17/2020

The Implicit Regularization of Stochastic Gradient Flow for Least Squares

We study the implicit regularization of mini-batch stochastic gradient d...
research
11/16/2021

Online Estimation and Optimization of Utility-Based Shortfall Risk

Utility-Based Shortfall Risk (UBSR) is a risk metric that is increasingl...
research
06/29/2023

Solving Kernel Ridge Regression with Gradient-Based Optimization Methods

Kernel ridge regression, KRR, is a non-linear generalization of linear r...
research
01/20/2022

Accelerated Gradient Flow: Risk, Stability, and Implicit Regularization

Acceleration and momentum are the de facto standard in modern applicatio...
research
02/28/2021

Asymptotic Risk of Overparameterized Likelihood Models: Double Descent Theory for Deep Neural Networks

We investigate the asymptotic risk of a general class of overparameteriz...
research
12/13/2022

Gradient flow in the gaussian covariate model: exact solution of learning curves and multiple descent structures

A recent line of work has shown remarkable behaviors of the generalizati...

Please sign up or login with your details

Forgot password? Click here to reset