On the Global Convergence of Training Deep Linear ResNets

03/02/2020
by   Difan Zou, et al.
15

We study the convergence of gradient descent (GD) and stochastic gradient descent (SGD) for training L-hidden-layer linear residual networks (ResNets). We prove that for training deep residual networks with certain linear transformations at input and output layers, which are fixed throughout training, both GD and SGD with zero initialization on all hidden weights can converge to the global minimum of the training loss. Moreover, when specializing to appropriate Gaussian random linear transformations, GD and SGD provably optimize wide enough deep linear ResNets. Compared with the global convergence result of GD for training standard deep linear networks (Du Hu 2019), our condition on the neural network width is sharper by a factor of O(κ L), where κ denotes the condition number of the covariance matrix of the training data. We further propose a modified identity input and output transformations, and show that a (d+k)-wide neural network is sufficient to guarantee the global convergence of GD/SGD, where d,k are the input and output dimensions respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2018

A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks

We analyze speed of convergence to global optimum for gradient descent t...
research
01/24/2019

Width Provably Matters in Optimization for Deep Linear Neural Networks

We prove that for an L-layer fully-connected linear neural network, if t...
research
06/07/2021

Batch Normalization Orthogonalizes Representations in Deep Random Networks

This paper underlines a subtle property of batch-normalization (BN): Suc...
research
03/14/2017

Convergence of Deep Neural Networks to a Hierarchical Covariance Matrix Decomposition

We show that in a deep neural network trained with ReLU, the low-lying l...
research
10/04/2020

Feature Whitening via Gradient Transformation for Improved Convergence

Feature whitening is a known technique for speeding up training of DNN. ...
research
06/11/2019

An Improved Analysis of Training Over-parameterized Deep Neural Networks

A recent line of research has shown that gradient-based algorithms with ...
research
11/28/2018

Shared Representational Geometry Across Neural Networks

Different neural networks trained on the same dataset often learn simila...

Please sign up or login with your details

Forgot password? Click here to reset