Implicit regularization of deep residual networks towards neural ODEs

09/03/2023
by   Pierre Marion, et al.
0

Residual neural networks are state-of-the-art deep learning models. Their continuous-depth analog, neural ordinary differential equations (ODEs), are also widely used. Despite their success, the link between the discrete and continuous models still lacks a solid mathematical foundation. In this article, we take a step in this direction by establishing an implicit regularization of deep residual networks towards neural ODEs, for nonlinear networks trained with gradient flow. We prove that if the network is initialized as a discretization of a neural ODE, then such a discretization holds throughout training. Our results are valid for a finite training time, and also as the training time tends to infinity provided that the network satisfies a Polyak-Lojasiewicz condition. Importantly, this condition holds for a family of residual networks where the residuals are two-layer perceptrons with an overparameterization in width that is only linear, and implies the convergence of gradient flow to a global minimum. Numerical experiments illustrate our results.

READ FULL TEXT
research
04/14/2022

Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks

We prove linear convergence of gradient descent to a global minimum for ...
research
05/11/2023

Generalization bounds for neural ordinary differential equations and deep residual networks

Neural ordinary differential equations (neural ODEs) are a popular famil...
research
05/16/2022

Gradient Descent Optimizes Infinite-Depth ReLU Implicit Networks with Linear Widths

Implicit deep learning has recently become popular in the machine learni...
research
07/30/2020

When are Neural ODE Solutions Proper ODEs?

A key appeal of the recently proposed Neural Ordinary Differential Equat...
research
09/22/2022

Vanilla feedforward neural networks as a discretization of dynamic systems

Deep learning has made significant applications in the field of data sci...
research
05/25/2021

Scaling Properties of Deep Residual Networks

Residual networks (ResNets) have displayed impressive results in pattern...
research
12/30/2019

Machine Learning from a Continuous Viewpoint

We present a continuous formulation of machine learning, as a problem in...

Please sign up or login with your details

Forgot password? Click here to reset