Implicit Sparse Regularization: The Impact of Depth and Early Stopping

08/12/2021
by   Jiangyuan Li, et al.
0

In this paper, we study the implicit bias of gradient descent for sparse regression. We extend results on regression with quadratic parametrization, which amounts to depth-2 diagonal linear networks, to more general depth-N networks, under more realistic settings of noise and correlated designs. We show that early stopping is crucial for gradient descent to converge to a sparse model, a phenomenon that we call implicit sparse regularization. This result is in sharp contrast to known results for noiseless and uncorrelated-design cases. We characterize the impact of depth and early stopping and show that for a general depth parameter N, gradient descent with early stopping achieves minimax optimal sparse recovery with sufficiently small initialization and step size. In particular, we show that increasing depth enlarges the scale of working initialization and the early-stopping window, which leads to more stable gradient paths for sparse recovery.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/14/2022

The Implicit Regularization of Momentum Gradient Descent with Early Stopping

The study on the implicit regularization induced by gradient-based optim...
research
03/22/2019

Implicit Regularization via Hadamard Product Over-Parametrization in High-Dimensional Linear Regression

We consider Hadamard product parametrization as a change-of-variable (ov...
research
09/11/2019

Implicit Regularization for Optimal Sparse Recovery

We investigate implicit regularization schemes for gradient descent meth...
research
07/13/2023

Implicit regularization in AI meets generalized hardness of approximation in optimization – Sharp results for diagonal linear networks

Understanding the implicit regularization imposed by neural network arch...
research
02/17/2023

(S)GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability

In this paper, we investigate the impact of stochasticity and large step...
research
02/01/2020

The Statistical Complexity of Early Stopped Mirror Descent

Recently there has been a surge of interest in understanding implicit re...
research
01/31/2023

Robust Linear Regression: Gradient-descent, Early-stopping, and Beyond

In this work we study the robustness to adversarial attacks, of early-st...

Please sign up or login with your details

Forgot password? Click here to reset