Gradient descent aligns the layers of deep linear networks

10/04/2018
by   Ziwei Ji, et al.
0

This paper establishes risk convergence and asymptotic weight matrix alignment --- a form of implicit regularization --- of gradient flow and gradient descent when applied to deep linear networks on linearly separable data. In more detail, for gradient flow applied to strictly decreasing loss functions (with similar results for gradient descent with particular decreasing step sizes): (i) the risk converges to 0; (ii) the normalized i-th weight matrix asymptotically equals its rank-1 approximation u_iv_i^; (iii) these rank-1 matrices are aligned across layers, meaning |v_i+1^u_i|→1. In the case of the logistic loss (binary cross entropy), more can be said: the linear function induced by the network --- the product of its weight matrices --- converges to the same direction as the maximum margin solution. This last property was identified in prior work, but only under assumptions on gradient descent which here are implied by the alignment phenomenon.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/04/2021

Convergence of gradient descent for learning linear neural networks

We study the convergence properties of gradient descent for training dee...
research
10/27/2017

The Implicit Bias of Gradient Descent on Separable Data

We show that gradient descent on an unregularized logistic regression pr...
research
06/19/2020

Gradient descent follows the regularization path for general losses

Recent work across many machine learning disciplines has highlighted tha...
research
03/13/2020

Balancedness and Alignment are Unlikely in Linear Neural Networks

We study the invariance properties of alignment in linear neural network...
research
01/28/2022

Training invariances and the low-rank phenomenon: beyond linear networks

The implicit bias induced by the training of neural networks has become ...
research
06/29/2018

Theory IIIb: Generalization in Deep Networks

A main puzzle of deep neural networks (DNNs) revolves around the apparen...
research
05/28/2021

Implicit Regularization in Matrix Sensing via Mirror Descent

We study discrete-time mirror descent applied to the unregularized empir...

Please sign up or login with your details

Forgot password? Click here to reset