Gradient Descent Finds Global Minima of Deep Neural Networks

11/09/2018
by   Simon S. Du, et al.
20

Gradient descent finds a global minimum in training deep neural networks despite the objective function being non-convex. The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet). Our analysis relies on the particular structure of the Gram matrix induced by the neural network architecture. This structure allows us to show the Gram matrix is stable throughout the training process and this stability implies the global optimality of the gradient descent algorithm. Our bounds also shed light on the advantage of using ResNet over the fully connected feedforward architecture; our bound requires the number of neurons per layer scaling exponentially with depth for feedforward networks whereas for ResNet the bound only requires the number of neurons per layer scaling polynomially with depth. We further extend our analysis to deep residual convolutional neural networks and obtain a similar convergence result.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/17/2019

Training Over-parameterized Deep ResNet Is almost as Easy as Training a Two-layer Network

It has been proved that gradient descent converges linearly to the globa...
research
08/13/2023

Separable Gaussian Neural Networks: Structure, Analysis, and Function Approximations

The Gaussian-radial-basis function neural network (GRBFNN) has been a po...
research
08/05/2019

Gradient Descent Finds Global Minima for Generalizable Deep Neural Networks of Practical Sizes

In this paper, we theoretically prove that gradient descent can find a g...
research
07/07/2020

Towards an Understanding of Residual Networks Using Neural Tangent Hierarchy (NTH)

Gradient descent yields zero training loss in polynomial time for deep n...
research
09/10/2019

Towards Understanding the Importance of Shortcut Connections in Residual Networks

Residual Network (ResNet) is undoubtedly a milestone in deep learning. R...
research
11/16/2022

Engineering Monosemanticity in Toy Models

In some neural networks, individual neurons correspond to natural “featu...
research
08/17/2018

Optimizing Deep Neural Network Architecture: A Tabu Search Based Approach

The performance of Feedforward neural network (FNN) fully de-pends upon ...

Please sign up or login with your details

Forgot password? Click here to reset