Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks

04/14/2022
by   Rama Cont, et al.
0

We prove linear convergence of gradient descent to a global minimum for the training of deep residual networks with constant layer width and smooth activation function. We further show that the trained weights, as a function of the layer index, admits a scaling limit which is Hölder continuous as the depth of the network tends to infinity. The proofs are based on non-asymptotic estimates of the loss function and of norms of the network weights along the gradient descent path. We illustrate the relevance of our theoretical results to practical settings using detailed numerical experiments on supervised learning problems.

READ FULL TEXT
research
05/25/2021

Scaling Properties of Deep Residual Networks

Residual networks (ResNets) have displayed impressive results in pattern...
research
09/03/2023

Implicit regularization of deep residual networks towards neural ODEs

Residual neural networks are state-of-the-art deep learning models. Thei...
research
11/27/2020

Deep orthogonal linear networks are shallow

We consider the problem of training a deep orthogonal linear network, wh...
research
03/31/2023

On the Effect of Initialization: The Scaling Path of 2-Layer Neural Networks

In supervised learning, the regularization path is sometimes used as a c...
research
12/03/2017

Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima

We consider the problem of learning a one-hidden-layer neural network wi...
research
09/17/2021

AdaLoss: A computationally-efficient and provably convergent adaptive gradient method

We propose a computationally-friendly adaptive learning rate schedule, "...
research
11/07/2019

How implicit regularization of Neural Networks affects the learned function – Part I

Today, various forms of neural networks are trained to perform approxima...

Please sign up or login with your details

Forgot password? Click here to reset