Implicit Regularization of Discrete Gradient Dynamics in Deep Linear Neural Networks

04/30/2019
by   Gauthier Gidel, et al.
2

When optimizing over-parameterized models, such as deep neural networks, a large set of parameters can achieve zero training error. In such cases, the choice of the optimization algorithm and its respective hyper-parameters introduces biases that will lead to convergence to specific minimizers of the objective. Consequently, this choice can be considered as an implicit regularization for the training of over-parametrized models. In this work, we push this idea further by studying the discrete gradient dynamics of the training of a two-layer linear network with the least-square loss. Using a time rescaling, we show that, with a vanishing initialization and a small enough step size, this dynamics sequentially learns components that are the solutions of a reduced-rank regression with a gradually increasing rank.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/29/2022

A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix Factorization

Implicit regularization is an important way to interpret neural networks...
research
10/20/2021

Convergence Analysis and Implicit Regularization of Feedback Alignment for Deep Linear Networks

We theoretically analyze the Feedback Alignment (FA) algorithm, an effic...
research
09/30/2022

On the optimization and generalization of overparameterized implicit neural networks

Implicit neural networks have become increasingly attractive in the mach...
research
02/19/2019

Global Convergence of Adaptive Gradient Methods for An Over-parameterized Neural Network

Adaptive gradient methods like AdaGrad are widely used in optimizing neu...
research
06/30/2021

Deep Linear Networks Dynamics: Low-Rank Biases Induced by Initialization Scale and L2 Regularization

For deep linear networks (DLN), various hyperparameters alter the dynami...
research
04/19/2019

Implicit regularization for deep neural networks driven by an Ornstein-Uhlenbeck like process

We consider deep networks, trained via stochastic gradient descent to mi...
research
09/27/2022

Why neural networks find simple solutions: the many regularizers of geometric complexity

In many contexts, simpler models are preferable to more complex models a...

Please sign up or login with your details

Forgot password? Click here to reset