On the Explicit Role of Initialization on the Convergence and Implicit Bias of Overparametrized Linear Networks

05/13/2021
by   Hancheng Min, et al.
0

Neural networks trained via gradient descent with random initialization and without any regularization enjoy good generalization performance in practice despite being highly overparametrized. A promising direction to explain this phenomenon is to study how initialization and overparametrization affect convergence and implicit bias of training algorithms. In this paper, we present a novel analysis of single-hidden-layer linear networks trained under gradient flow, which connects initialization, optimization, and overparametrization. Firstly, we show that the squared loss converges exponentially to its optimum at a rate that depends on the level of imbalance of the initialization. Secondly, we show that proper initialization constrains the dynamics of the network parameters to lie within an invariant set. In turn, minimizing the loss over this set leads to the min-norm solution. Finally, we show that large hidden layer width, together with (properly scaled) random initialization, ensures proximity to such an invariant set during training, allowing us to derive a novel non-asymptotic upper-bound on the distance between the trained network and the min-norm solution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2020

Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy

We provide a detailed asymptotic study of gradient flow trajectories and...
research
09/30/2022

On the optimization and generalization of overparameterized implicit neural networks

Implicit neural networks have become increasingly attractive in the mach...
research
06/12/2020

Implicit bias of gradient descent for mean squared error regression with wide neural networks

We investigate gradient descent training of wide neural networks and the...
research
05/18/2022

On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias

We study the dynamics and implicit bias of gradient flow (GF) on univari...
research
05/12/2021

Convergence Analysis of Over-parameterized Deep Linear Networks, and the Principal Components Bias

Convolutional Neural networks of different architectures seem to learn t...
research
08/25/2021

The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks

The recent success of neural network models has shone light on a rather ...
research
07/13/2023

Implicit regularization in AI meets generalized hardness of approximation in optimization – Sharp results for diagonal linear networks

Understanding the implicit regularization imposed by neural network arch...

Please sign up or login with your details

Forgot password? Click here to reset