Implicit bias of SGD in L_2-regularized linear DNNs: One-way jumps from high to low rank

05/25/2023
by   Zihan Wang, et al.
0

The L_2-regularized loss of Deep Linear Networks (DLNs) with more than one hidden layers has multiple local minima, corresponding to matrices with different ranks. In tasks such as matrix completion, the goal is to converge to the local minimum with the smallest rank that still fits the training data. While rank-underestimating minima can easily be avoided since they do not fit the data, gradient descent might get stuck at rank-overestimating minima. We show that with SGD, there is always a probability to jump from a higher rank minimum to a lower rank one, but the probability of jumping back is zero. More precisely, we define a sequence of sets B_1⊂ B_2⊂⋯⊂ B_R so that B_r contains all minima of rank r or less (and not more) that are absorbing for small enough ridge parameters λ and learning rates η: SGD has prob. 0 of leaving B_r, and from any starting point there is a non-zero prob. for SGD to go in B_r.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/05/2014

Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems

Stochastic gradient descent (SGD) on a low-rank factorization is commonl...
research
06/06/2019

Bad Global Minima Exist and SGD Can Reach Them

Several recent works have aimed to explain why severely overparameterize...
research
05/25/2018

How Much Restricted Isometry is Needed In Nonconvex Matrix Recovery?

When the linear measurements of an instance of low-rank matrix recovery ...
research
06/30/2021

Deep Linear Networks Dynamics: Low-Rank Biases Induced by Initialization Scale and L2 Regularization

For deep linear networks (DLN), various hyperparameters alter the dynami...
research
02/11/2020

Unique Properties of Wide Minima in Deep Networks

It is well known that (stochastic) gradient descent has an implicit bias...
research
09/29/2022

Implicit Bias of Large Depth Networks: a Notion of Rank for Nonlinear Functions

We show that the representation cost of fully connected neural networks ...
research
03/23/2023

The Probabilistic Stability of Stochastic Gradient Descent

A fundamental open problem in deep learning theory is how to define and ...

Please sign up or login with your details

Forgot password? Click here to reset