Saddle-to-Saddle Dynamics in Diagonal Linear Networks

04/02/2023
by   Scott Pesme, et al.
0

In this paper we fully describe the trajectory of gradient flow over diagonal linear networks in the limit of vanishing initialisation. We show that the limiting flow successively jumps from a saddle of the training loss to another until reaching the minimum ℓ_1-norm solution. This saddle-to-saddle dynamics translates to an incremental learning process as each saddle corresponds to the minimiser of the loss constrained to an active set outside of which the coordinates must be zero. We explicitly characterise the visited saddles as well as the jumping times through a recursive algorithm reminiscent of the Homotopy algorithm used for computing the Lasso path. Our proof leverages a convenient arc-length time-reparametrisation which enables to keep track of the heteroclinic transitions between the jumps. Our analysis requires negligible assumptions on the data, applies to both under and overparametrised settings and covers complex cases where there is no monotonicity of the number of active coordinates. We provide numerical experiments to support our findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/31/2022

Incremental Learning in Diagonal Linear Networks

Diagonal linear networks (DLNs) are a toy simplification of artificial n...
research
06/17/2021

Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity

Understanding the implicit bias of training algorithms is of crucial imp...
research
10/06/2020

A Unifying View on Implicit Bias in Training Linear Neural Networks

We study the implicit bias of gradient flow (i.e., gradient descent with...
research
06/20/2022

Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation

Understanding the implicit bias of training algorithms is of crucial imp...
research
07/13/2020

Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy

We provide a detailed asymptotic study of gradient flow trajectories and...
research
06/12/2023

Transformers learn through gradual rank increase

We identify incremental learning dynamics in transformers, where the dif...
research
02/22/2022

Connecting Optimization and Generalization via Gradient Flow Path Length

Optimization and generalization are two essential aspects of machine lea...

Please sign up or login with your details

Forgot password? Click here to reset