SAM operates far from home: eigenvalue regularization as a dynamical phenomenon

02/17/2023
by   Atish Agarwala, et al.
0

The Sharpness Aware Minimization (SAM) optimization algorithm has been shown to control large eigenvalues of the loss Hessian and provide generalization benefits in a variety of settings. The original motivation for SAM was a modified loss function which penalized sharp minima; subsequent analyses have also focused on the behavior near minima. However, our work reveals that SAM provides a strong regularization of the eigenvalues throughout the learning trajectory. We show that in a simplified setting, SAM dynamically induces a stabilization related to the edge of stability (EOS) phenomenon observed in large learning rate gradient descent. Our theory predicts the largest eigenvalue as a function of the learning rate and SAM radius parameters. Finally, we show that practical models can also exhibit this EOS stabilization, and that understanding SAM must account for these dynamics far away from any minima.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2018

DNN's Sharpest Directions Along the SGD Trajectory

Recent work has identified that using a high learning rate or a small ba...
research
05/27/2023

The Implicit Regularization of Dynamical Stability in Stochastic Gradient Descent

In this paper, we study the implicit regularization of stochastic gradie...
research
10/07/2021

Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect

Recent empirical advances show that training deep models with large lear...
research
03/03/2023

Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization

Recently, flat minima are proven to be effective for improving generaliz...
research
09/14/2017

The Impact of Local Geometry and Batch Size on the Convergence and Divergence of Stochastic Gradient Descent

Stochastic small-batch (SB) methods, such as mini-batch Stochastic Gradi...
research
11/22/2016

Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond

We look at the eigenvalues of the Hessian of a loss function before and ...
research
03/09/2020

Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule

While the generalization properties of neural networks are not yet well ...

Please sign up or login with your details

Forgot password? Click here to reset