Convergence Analysis of Over-parameterized Deep Linear Networks, and the Principal Components Bias

05/12/2021
by   Guy Hacohen, et al.
0

Convolutional Neural networks of different architectures seem to learn to classify images in the same order. To understand this phenomenon, we revisit the over-parametrized deep linear network model. Our analysis of this model's learning dynamics reveals that the convergence rate of its parameters is exponentially faster along directions corresponding to the larger principal components of the data, at a rate governed by the singular values. We term this convergence pattern the Principal Components bias (PC-bias). We show how the PC-bias streamlines the order of learning of both linear and non-linear networks, more prominently in earlier stages of learning. We then compare our results to the spectral bias, showing that both biases can be seen independently, and affect the order of learning in different ways. Finally, we discuss how the PC-bias can explain several phenomena, including the benefits of prevalent initialization schemes, how early stopping may be related to PCA, and why deep networks converge more slowly when given random labels.

READ FULL TEXT

page 2

page 7

page 20

research
10/20/2021

Convergence Analysis and Implicit Regularization of Feedback Alignment for Deep Linear Networks

We theoretically analyze the Feedback Alignment (FA) algorithm, an effic...
research
06/02/2019

The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies

We study the relationship between the speed at which a neural network le...
research
12/03/2019

Towards Understanding the Spectral Bias of Deep Learning

An intriguing phenomenon observed during training neural networks is the...
research
05/13/2021

On the Explicit Role of Initialization on the Convergence and Implicit Bias of Overparametrized Linear Networks

Neural networks trained via gradient descent with random initialization ...
research
09/26/2019

The Implicit Bias of Depth: How Incremental Learning Drives Generalization

A leading hypothesis for the surprising generalization of neural network...
research
09/19/2022

Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty

Among attempts at giving a theoretical account of the success of deep ne...

Please sign up or login with your details

Forgot password? Click here to reset