Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity

06/17/2021
by   Scott Pesme, et al.
0

Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks. In this paper, we study the dynamics of stochastic gradient descent over diagonal linear networks through its continuous time version, namely stochastic gradient flow. We explicitly characterise the solution chosen by the stochastic flow and prove that it always enjoys better generalisation properties than that of gradient flow. Quite surprisingly, we show that the convergence speed of the training loss controls the magnitude of the biasing effect: the slower the convergence, the better the bias. To fully complete our analysis, we provide convergence guarantees for the dynamics. We also give experimental results which support our theoretical claims. Our findings highlight the fact that structured noise can induce better generalisation and they help explain the greater performances observed in practice of stochastic gradient descent over gradient descent.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2022

Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation

Understanding the implicit bias of training algorithms is of crucial imp...
research
02/17/2023

(S)GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability

In this paper, we investigate the impact of stochasticity and large step...
research
10/06/2020

A Unifying View on Implicit Bias in Training Linear Neural Networks

We study the implicit bias of gradient flow (i.e., gradient descent with...
research
05/09/2023

Robust Implicit Regularization via Weight Normalization

Overparameterized models may have many interpolating solutions; implicit...
research
04/02/2023

Saddle-to-Saddle Dynamics in Diagonal Linear Networks

In this paper we fully describe the trajectory of gradient flow over dia...
research
03/03/2023

Implicit Stochastic Gradient Descent for Training Physics-informed Neural Networks

Physics-informed neural networks (PINNs) have effectively been demonstra...
research
03/09/2023

Scalable Stochastic Gradient Riemannian Langevin Dynamics in Non-Diagonal Metrics

Stochastic-gradient sampling methods are often used to perform Bayesian ...

Please sign up or login with your details

Forgot password? Click here to reset