A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth

03/11/2020
by   Yiping Lu, et al.
42

Training deep neural networks with stochastic gradient descent (SGD) can often achieve zero training loss on real-world tasks although the optimization landscape is known to be highly non-convex. To understand the success of SGD for training deep neural networks, this work presents a mean-field analysis of deep residual networks, based on a line of works that interpret the continuum limit of the deep residual network as an ordinary differential equation when the network capacity tends to infinity. Specifically, we propose a new continuum limit of deep residual networks, which enjoys a good landscape in the sense that every local minimizer is global. This characterization enables us to derive the first global convergence result for multilayer neural networks in the mean-field regime. Furthermore, without assuming the convexity of the loss landscape, our proof relies on a zero-loss assumption at the global minimizer that can be achieved when the model shares a universal approximation property. Key to our result is the observation that a deep residual network resembles a shallow network ensemble, i.e. a two-layer network. We bound the difference between the shallow network and our ResNet model via the adjoint sensitivity method, which enables us to apply existing mean-field analyses of two-layer networks to deep networks. Furthermore, we propose several novel training schemes based on the new continuous model, including one training procedure that switches the order of the residual blocks and results in strong empirical performance on the benchmark datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/11/2021

Global Convergence of Three-layer Neural Networks in the Mean Field Regime

In the mean field regime, neural networks are appropriately scaled so th...
research
05/30/2021

Overparameterization of deep ResNet: zero loss and mean-field analysis

Finding parameters in a deep neural network (NN) that fit training data ...
research
12/10/2021

Global convergence of ResNets: From finite to infinite width using linear parameterization

Overparametrization is a key factor in the absence of convexity to expla...
research
10/13/2022

Mean-field analysis for heavy ball methods: Dropout-stability, connectivity, and global convergence

The stochastic heavy ball method (SHB), also known as stochastic gradien...
research
07/05/2023

From NeurODEs to AutoencODEs: a mean-field control framework for width-varying Neural Networks

In our work, we build upon the established connection between Residual N...
research
10/06/2021

On the Global Convergence of Gradient Descent for multi-layer ResNets in the mean-field regime

Finding the optimal configuration of parameters in ResNet is a nonconvex...
research
05/19/2019

Mean-Field Langevin Dynamics and Energy Landscape of Neural Networks

We present a probabilistic analysis of the long-time behaviour of the no...

Please sign up or login with your details

Forgot password? Click here to reset