An SDE for Modeling SAM: Theory and Insights

01/19/2023
by   Enea Monzio Compagnoni, et al.
0

We study the SAM (Sharpness-Aware Minimization) optimizer which has recently attracted a lot of interest due to its increased performance over more classical variants of stochastic gradient descent. Our main contribution is the derivation of continuous-time models (in the form of SDEs) for SAM and its unnormalized variant USAM, both for the full-batch and mini-batch settings. We demonstrate that these SDEs are rigorous approximations of the real discrete-time algorithms (in a weak sense, scaling linearly with the step size). Using these models, we then offer an explanation of why SAM prefers flat minima over sharp ones - by showing that it minimizes an implicitly regularized loss with a Hessian-dependent noise structure. Finally, we prove that perhaps unexpectedly SAM is attracted to saddle points under some realistic conditions. Our theoretical results are supported by detailed experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2017

The Impact of Local Geometry and Batch Size on the Convergence and Divergence of Stochastic Gradient Descent

Stochastic small-batch (SB) methods, such as mini-batch Stochastic Gradi...
research
12/07/2020

Stochastic Gradient Descent with Large Learning Rate

As a simple and efficient optimization method in deep learning, stochast...
research
12/02/2021

On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective

We study the statistical properties of the dynamic trajectory of stochas...
research
06/13/2023

Exact Mean Square Linear Stability Analysis for SGD

The dynamical stability of optimization methods at the vicinity of minim...
research
08/06/2023

The Effect of SGD Batch Size on Autoencoder Learning: Sparsity, Sharpness, and Feature Learning

In this work, we investigate the dynamics of stochastic gradient descent...
research
05/20/2022

On the SDEs and Scaling Rules for Adaptive Gradient Algorithms

Approximating Stochastic Gradient Descent (SGD) as a Stochastic Differen...
research
11/10/2022

How Does Sharpness-Aware Minimization Minimize Sharpness?

Sharpness-Aware Minimization (SAM) is a highly effective regularization ...

Please sign up or login with your details

Forgot password? Click here to reset