Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond

05/22/2023
by   Itai Kreisler, et al.
0

Recent research shows that when Gradient Descent (GD) is applied to neural networks, the loss almost never decreases monotonically. Instead, the loss oscillates as gradient descent converges to its ”Edge of Stability” (EoS). Here, we find a quantity that does decrease monotonically throughout GD training: the sharpness attained by the gradient flow solution (GFS)-the solution that would be obtained if, from now until convergence, we train with an infinitesimal step size. Theoretically, we analyze scalar neural networks with the squared loss, perhaps the simplest setting where the EoS phenomena still occur. In this model, we prove that the GFS sharpness decreases monotonically. Using this result, we characterize settings where GD provably converges to the EoS in scalar networks. Empirically, we show that GD monotonically decreases the GFS sharpness in a squared regression model as well as practical neural network architectures.

READ FULL TEXT

page 5

page 13

research
09/30/2022

Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability

Traditional analyses of gradient descent show that when the largest eige...
research
05/27/2019

Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks

Natural gradient descent has proven effective at mitigating the effects ...
research
02/26/2021

Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability

We empirically demonstrate that full-batch gradient descent on neural ne...
research
05/31/2021

Why does CTC result in peaky behavior?

The peaky behavior of CTC models is well known experimentally. However, ...
research
05/22/2018

Step Size Matters in Deep Learning

Training a neural network with the gradient descent algorithm gives rise...
research
01/12/2022

There is a Singularity in the Loss Landscape

Despite the widespread adoption of neural networks, their training dynam...
research
06/17/2020

Image-on-Scalar Regression via Deep Neural Networks

A research topic of central interest in neuroimaging analysis is to stud...

Please sign up or login with your details

Forgot password? Click here to reset