DeepAI AI Chat
Log In Sign Up

Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks

by   Rama Cont, et al.

We prove linear convergence of gradient descent to a global minimum for the training of deep residual networks with constant layer width and smooth activation function. We further show that the trained weights, as a function of the layer index, admits a scaling limit which is Hölder continuous as the depth of the network tends to infinity. The proofs are based on non-asymptotic estimates of the loss function and of norms of the network weights along the gradient descent path. We illustrate the relevance of our theoretical results to practical settings using detailed numerical experiments on supervised learning problems.


Scaling Properties of Deep Residual Networks

Residual networks (ResNets) have displayed impressive results in pattern...

Implicit regularization of deep residual networks towards neural ODEs

Residual neural networks are state-of-the-art deep learning models. Thei...

Deep orthogonal linear networks are shallow

We consider the problem of training a deep orthogonal linear network, wh...

On the Effect of Initialization: The Scaling Path of 2-Layer Neural Networks

In supervised learning, the regularization path is sometimes used as a c...

Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima

We consider the problem of learning a one-hidden-layer neural network wi...

AdaLoss: A computationally-efficient and provably convergent adaptive gradient method

We propose a computationally-friendly adaptive learning rate schedule, "...

How implicit regularization of Neural Networks affects the learned function – Part I

Today, various forms of neural networks are trained to perform approxima...