(S)GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability

02/17/2023
by   Mathieu Even, et al.
0

In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over diagonal linear networks. We prove the convergence of GD and SGD with macroscopic stepsizes in an overparametrised regression setting and characterise their solutions through an implicit regularisation problem. Our crisp characterisation leads to qualitative insights about the impact of stochasticity and stepsizes on the recovered solution. Specifically, we show that large stepsizes consistently benefit SGD for sparse regression problems, while they can hinder the recovery of sparse solutions for GD. These effects are magnified for stepsizes in a tight window just below the divergence threshold, in the “edge of stability” regime. Our findings are supported by experimental results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2021

Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity

Understanding the implicit bias of training algorithms is of crucial imp...
research
08/12/2021

Implicit Sparse Regularization: The Impact of Depth and Early Stopping

In this paper, we study the implicit bias of gradient descent for sparse...
research
05/27/2023

The Implicit Regularization of Dynamical Stability in Stochastic Gradient Descent

In this paper, we study the implicit regularization of stochastic gradie...
research
06/17/2021

Sub-linear convergence of a tamed stochastic gradient descent method in Hilbert space

In this paper, we introduce the tamed stochastic gradient descent method...
research
09/22/2015

Stochastic gradient descent methods for estimation with large data sets

We develop methods for parameter estimation in settings with large-scale...
research
10/12/2017

Graph Drawing by Stochastic Gradient Descent

A popular method of force-directed graph drawing is multidimensional sca...
research
05/10/2015

Towards stability and optimality in stochastic gradient descent

Iterative procedures for parameter estimation based on stochastic gradie...

Please sign up or login with your details

Forgot password? Click here to reset