Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy

07/13/2020
by   Edward Moroshko, et al.
0

We provide a detailed asymptotic study of gradient flow trajectories and their implicit optimization bias when minimizing the exponential loss over "diagonal linear networks". This is the simplest model displaying a transition between "kernel" and non-kernel ("rich" or "active") regimes. We show how the transition is controlled by the relationship between the initialization scale and how accurately we minimize the training loss. Our results indicate that some limit behaviors of gradient descent only kick in at ridiculous training accuracies (well beyond 10^-100). Moreover, the implicit bias at reasonable initialization scales and training accuracies is more complex and not captured by these limits.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2019

Kernel and Deep Regimes in Overparametrized Models

A recent line of work studies overparametrized neural networks in the "k...
research
02/19/2021

On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent

Recent work has highlighted the role of initialization scale in determin...
research
05/13/2021

On the Explicit Role of Initialization on the Convergence and Implicit Bias of Overparametrized Linear Networks

Neural networks trained via gradient descent with random initialization ...
research
02/20/2020

Kernel and Rich Regimes in Overparametrized Models

A recent line of work studies overparametrized neural networks in the "k...
research
06/12/2020

Implicit bias of gradient descent for mean squared error regression with wide neural networks

We investigate gradient descent training of wide neural networks and the...
research
04/02/2023

Saddle-to-Saddle Dynamics in Diagonal Linear Networks

In this paper we fully describe the trajectory of gradient flow over dia...
research
08/09/2023

How to induce regularization in generalized linear models: A guide to reparametrizing gradient flow

In this work, we analyze the relation between reparametrizations of grad...

Please sign up or login with your details

Forgot password? Click here to reset