How Does Sharpness-Aware Minimization Minimize Sharpness?

11/10/2022
by   Kaiyue Wen, et al.
0

Sharpness-Aware Minimization (SAM) is a highly effective regularization technique for improving the generalization of deep neural networks for various settings. However, the underlying working of SAM remains elusive because of various intriguing approximations in the theoretical characterizations. SAM intends to penalize a notion of sharpness of the model but implements a computationally efficient variant; moreover, a third notion of sharpness was used for proving generalization guarantees. The subtle differences in these notions of sharpness can indeed lead to significantly different empirical results. This paper rigorously nails down the exact sharpness notion that SAM regularizes and clarifies the underlying mechanism. We also show that the two steps of approximations in the original motivation of SAM individually lead to inaccurate local conclusions, but their combination accidentally reveals the correct effect, when full-batch gradients are applied. Furthermore, we also prove that the stochastic version of SAM in fact regularizes the third notion of sharpness mentioned above, which is most likely to be the preferred notion for practical performance. The key mechanism behind this intriguing phenomenon is the alignment between the gradient and the top eigenvector of Hessian when SAM is applied.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2020

Input Hessian Regularization of Neural Networks

Regularizing the input gradient has shown to be effective in promoting t...
research
06/13/2022

Towards Understanding Sharpness-Aware Minimization

Sharpness-Aware Minimization (SAM) is a recent training method that reli...
research
08/14/2022

Model Generalization: A Sharpness Aware Optimization Perspective

Sharpness-Aware Minimization (SAM) and adaptive sharpness-aware minimiza...
research
07/20/2023

Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization

Despite extensive studies, the underlying reason as to why overparameter...
research
08/22/2023

Understanding Hessian Alignment for Domain Generalization

Out-of-distribution (OOD) generalization is a critical ability for deep ...
research
11/01/2022

SADT: Combining Sharpness-Aware Minimization with Self-Distillation for Improved Model Generalization

Methods for improving deep neural network training times and model gener...
research
01/19/2023

An SDE for Modeling SAM: Theory and Insights

We study the SAM (Sharpness-Aware Minimization) optimizer which has rece...

Please sign up or login with your details

Forgot password? Click here to reset