Training Over-parameterized Deep ResNet Is almost as Easy as Training a Two-layer Network

03/17/2019
by   Huishuai Zhang, et al.
0

It has been proved that gradient descent converges linearly to the global minima for training deep neural network in the over-parameterized regime. However, according to allen2018convergence, the width of each layer should grow at least with the polynomial of the depth (the number of layers) for residual network (ResNet) in order to guarantee the linear convergence of gradient descent, which shows no obvious advantage over feedforward network. In this paper, we successfully remove the dependence of the width on the depth of the network for ResNet and reach a conclusion that training deep residual network can be as easy as training a two-layer network. This theoretically justifies the benefit of skip connection in terms of facilitating the convergence of gradient descent. Our experiments also justify that the width of ResNet to guarantee successful training is much smaller than that of deep feedforward neural network.

READ FULL TEXT
research
11/09/2018

Gradient Descent Finds Global Minima of Deep Neural Networks

Gradient descent finds a global minimum in training deep neural networks...
research
11/11/2019

Stronger Convergence Results for Deep Residual Networks: Network Width Scales Linearly with Training Data Size

Deep neural networks are highly expressive machine learning models with ...
research
12/19/2014

Random Walk Initialization for Training Very Deep Feedforward Networks

Training very deep networks is an important open problem in machine lear...
research
04/18/2018

Are ResNets Provably Better than Linear Predictors?

A residual network (or ResNet) is a standard deep neural net architectur...
research
06/11/2021

Neural Optimization Kernel: Towards Robust Deep Learning

Recent studies show a close connection between neural networks (NN) and ...
research
10/10/2022

A global analysis of global optimisation

Theoretical understanding of the training of deep neural networks has ma...
research
12/22/2016

Highway and Residual Networks learn Unrolled Iterative Estimation

The past year saw the introduction of new architectures such as Highway ...

Please sign up or login with your details

Forgot password? Click here to reset