Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability

07/26/2022
by   Zhouzi Li, et al.
0

Recent findings (e.g., arXiv:2103.00065) demonstrate that modern neural networks trained by full-batch gradient descent typically enter a regime called Edge of Stability (EOS). In this regime, the sharpness, i.e., the maximum Hessian eigenvalue, first increases to the value 2/(step size) (the progressive sharpening phase) and then oscillates around this value (the EOS phase). This paper aims to analyze the GD dynamics and the sharpness along the optimization trajectory. Our analysis naturally divides the GD trajectory into four phases depending on the change of the sharpness. We empirically identify the norm of output layer weight as an interesting indicator of sharpness dynamics. Based on this empirical observation, we attempt to theoretically and empirically explain the dynamics of various key quantities that lead to the change of sharpness in each phase of EOS. Moreover, based on certain assumptions, we provide a theoretical proof of the sharpness behavior in EOS regime in two-layer fully-connected linear neural networks. We also discuss some other empirical findings and the limitation of our theoretical results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2023

Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory

Cohen et al. (2021) empirically study the evolution of the largest eigen...
research
02/26/2021

Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability

We empirically demonstrate that full-batch gradient descent on neural ne...
research
07/29/2022

Adaptive Gradient Methods at the Edge of Stability

Very little is known about the training dynamics of adaptive gradient me...
research
10/10/2022

Second-order regression models exhibit progressive sharpening to the edge of stability

Recent studies of gradient descent with large step sizes have shown that...
research
10/07/2022

Understanding Edge-of-Stability Training Dynamics with a Minimalist Example

Recently, researchers observed that gradient descent for deep neural net...
research
01/18/2023

Catapult Dynamics and Phase Transitions in Quadratic Nets

Neural networks trained with gradient descent can undergo non-trivial ph...
research
05/19/2022

Understanding Gradient Descent on Edge of Stability in Deep Learning

Deep learning experiments in Cohen et al. (2021) using deterministic Gra...

Please sign up or login with your details

Forgot password? Click here to reset