Overparameterization of deep ResNet: zero loss and mean-field analysis

05/30/2021
by   Zhiyan Ding, et al.
0

Finding parameters in a deep neural network (NN) that fit training data is a nonconvex optimization problem, but a basic first-order optimization method (gradient descent) finds a global solution with perfect fit in many practical situations. We examine this phenomenon for the case of Residual Neural Networks (ResNet) with smooth activation functions in a limiting regime in which both the number of layers (depth) and the number of neurons in each layer (width) go to infinity. First, we use a mean-field-limit argument to prove that the gradient descent for parameter training becomes a partial differential equation (PDE) that characterizes gradient flow for a probability distribution in the large-NN limit. Next, we show that the solution to the PDE converges in the training time to a zero-loss solution. Together, these results imply that training of the ResNet also gives a near-zero loss if the Resnet is large enough. We give estimates of the depth and width needed to reduce the loss below a given threshold, with high probability.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

10/06/2021

On the Global Convergence of Gradient Descent for multi-layer ResNets in the mean-field regime

Finding the optimal configuration of parameters in ResNet is a nonconvex...
03/11/2020

A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth

Training deep neural networks with stochastic gradient descent (SGD) can...
05/10/2018

Scaling limit of the Stein variational gradient descent part I: the mean field regime

We study an interacting particle system in R^d motivated by Stein variat...
09/27/2021

The edge of chaos: quantum field theory and deep neural networks

We explicitly construct the quantum field theory corresponding to a gene...
11/11/2019

Stronger Convergence Results for Deep Residual Networks: Network Width Scales Linearly with Training Data Size

Deep neural networks are highly expressive machine learning models with ...
10/29/2021

Limiting fluctuation and trajectorial stability of multilayer neural networks with mean field training

The mean field (MF) theory of multilayer neural networks centers around ...
08/28/2020

Predicting Training Time Without Training

We tackle the problem of predicting the number of optimization steps tha...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.