A Recipe for Global Convergence Guarantee in Deep Neural Networks

04/12/2021
by   Kenji Kawaguchi, et al.
0

Existing global convergence guarantees of (stochastic) gradient descent do not apply to practical deep networks in the practical regime of deep learning beyond the neural tangent kernel (NTK) regime. This paper proposes an algorithm, which is ensured to have global convergence guarantees in the practical regime beyond the NTK regime, under a verifiable condition called the expressivity condition. The expressivity condition is defined to be both data-dependent and architecture-dependent, which is the key property that makes our results applicable for practical settings beyond the NTK regime. On the one hand, the expressivity condition is theoretically proven to hold data-independently for fully-connected deep neural networks with narrow hidden layers and a single wide layer. On the other hand, the expressivity condition is numerically shown to hold data-dependently for deep (convolutional) ResNet with batch normalization with various standard image datasets. We also show that the proposed algorithm has generalization performances comparable with those of the heuristic algorithm, with the same hyper-parameters and total number of iterations. Therefore, the proposed algorithm can be viewed as a step towards providing theoretical guarantees for deep learning in the practical regime.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2020

Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology

A recent line of research has provided convergence guarantees for gradie...
research
04/18/2022

A Convergence Analysis of Nesterov's Accelerated Gradient Method in Training Deep Linear Neural Networks

Momentum methods, including heavy-ball (HB) and Nesterov's accelerated g...
research
06/28/2021

Understanding Dynamics of Nonlinear Representation Learning and Its Application

Representations of the world environment play a crucial role in machine ...
research
06/05/2023

Aiming towards the minimizers: fast convergence of SGD for overparametrized problems

Modern machine learning paradigms, such as deep learning, occur in or cl...
research
07/11/2019

Freeze and Chaos for DNNs: an NTK view of Batch Normalization, Checkerboard and Boundary Effects

In this paper, we analyze a number of architectural features of Deep Neu...
research
02/12/2019

Towards moderate overparameterization: global convergence guarantees for training shallow neural networks

Many modern neural network architectures are trained in an overparameter...
research
12/30/2019

Disentangling trainability and generalization in deep learning

A fundamental goal in deep learning is the characterization of trainabil...

Please sign up or login with your details

Forgot password? Click here to reset