The Implicit Regularization of Momentum Gradient Descent with Early Stopping

01/14/2022
by   Li Wang, et al.
0

The study on the implicit regularization induced by gradient-based optimization is a longstanding pursuit. In the present paper, we characterize the implicit regularization of momentum gradient descent (MGD) with early stopping by comparing with the explicit ℓ_2-regularization (ridge). In details, we study MGD in the continuous-time view, so-called momentum gradient flow (MGF), and show that its tendency is closer to ridge than the gradient descent (GD) [Ali et al., 2019] for least squares regression. Moreover, we prove that, under the calibration t=√(2/λ), where t is the time parameter in MGF and λ is the tuning parameter in ridge regression, the risk of MGF is no more than 1.54 times that of ridge. In particular, the relative Bayes risk of MGF to ridge is between 1 and 1.035 under the optimal tuning. The numerical experiments support our theoretical results strongly.

READ FULL TEXT
research
10/23/2018

A Continuous-Time View of Early Stopping for Least Squares Regression

We study the statistical properties of the iterates generated by gradien...
research
03/17/2020

The Implicit Regularization of Stochastic Gradient Flow for Least Squares

We study the implicit regularization of mini-batch stochastic gradient d...
research
06/29/2023

Solving Kernel Ridge Regression with Gradient-Based Optimization Methods

Kernel ridge regression, KRR, is a non-linear generalization of linear r...
research
08/12/2021

Implicit Sparse Regularization: The Impact of Depth and Early Stopping

In this paper, we study the implicit bias of gradient descent for sparse...
research
01/20/2022

Accelerated Gradient Flow: Risk, Stability, and Implicit Regularization

Acceleration and momentum are the de facto standard in modern applicatio...
research
08/26/2021

Comparing Classes of Estimators: When does Gradient Descent Beat Ridge Regression in Linear Models?

Modern methods for learning from data depend on many tuning parameters, ...
research
11/02/2021

An Asymptotic Analysis of Minibatch-Based Momentum Methods for Linear Regression Models

Momentum methods have been shown to accelerate the convergence of the st...

Please sign up or login with your details

Forgot password? Click here to reset