Phase diagram of training dynamics in deep neural networks: effect of learning rate, depth, and width

02/23/2023
by   Dayal Singh Kalra, et al.
0

We systematically analyze optimization dynamics in deep neural networks (DNNs) trained with stochastic gradient descent (SGD) over long time scales and study the effect of learning rate, depth, and width of the neural network. By analyzing the maximum eigenvalue λ^H_t of the Hessian of the loss, which is a measure of sharpness of the loss landscape, we find that the dynamics can show four distinct regimes: (i) an early time transient regime, (ii) an intermediate saturation regime, (iii) a progressive sharpening regime, and finally (iv) a late time “edge of stability" regime. The early and intermediate regimes (i) and (ii) exhibit a rich phase diagram depending on learning rate η≡ c/λ^H_0, depth d, and width w. We identify several critical values of c which separate qualitatively distinct phenomena in the early time dynamics of training loss and sharpness, and extract their dependence on d/w. Our results have implications for how to scale the learning rate with DNN depth and width in order to remain in the same phase of learning.

READ FULL TEXT

page 5

page 7

page 8

page 21

page 22

research
03/04/2020

The large learning rate phase of deep learning: the catapult mechanism

The choice of initial learning rate can have a profound effect on the pe...
research
07/15/2020

Phase diagram for two-layer ReLU neural networks at infinite-width limit

How neural network behaves during the training over different choices of...
research
02/12/2023

From high-dimensional mean-field dynamics to dimensionless ODEs: A unifying approach to SGD in two-layers networks

This manuscript investigates the one-pass stochastic gradient descent (S...
research
07/11/2019

Freeze and Chaos for DNNs: an NTK view of Batch Normalization, Checkerboard and Boundary Effects

In this paper, we analyze a number of architectural features of Deep Neu...
research
09/08/2022

Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes

A fundamental property of deep learning normalization techniques, such a...
research
12/14/2022

Maximal Initial Learning Rates in Deep ReLU Networks

Training a neural network requires choosing a suitable learning rate, in...
research
09/08/2023

Connecting NTK and NNGP: A Unified Theoretical Framework for Neural Network Learning Dynamics in the Kernel Regime

Artificial neural networks have revolutionized machine learning in recen...

Please sign up or login with your details

Forgot password? Click here to reset