Towards Understanding Sharpness-Aware Minimization

06/13/2022
by   Maksym Andriushchenko, et al.
0

Sharpness-Aware Minimization (SAM) is a recent training method that relies on worst-case weight perturbations which significantly improves generalization in various settings. We argue that the existing justifications for the success of SAM which are based on a PAC-Bayes generalization bound and the idea of convergence to flat minima are incomplete. Moreover, there are no explanations for the success of using m-sharpness in SAM which has been shown as essential for generalization. To better understand this aspect of SAM, we theoretically analyze its implicit bias for diagonal linear networks. We prove that SAM always chooses a solution that enjoys better generalization properties than standard gradient descent for a certain class of problems, and this effect is amplified by using m-sharpness. We further study the properties of the implicit bias on non-linear networks empirically, where we show that fine-tuning a standard model with SAM can lead to significant generalization improvements. Finally, we provide convergence results of SAM for non-convex objectives when used with stochastic gradients. We illustrate these results empirically for deep networks and discuss their relation to the generalization behavior of SAM. The code of our experiments is available at https://github.com/tml-epfl/understanding-sam.

READ FULL TEXT
research
10/06/2022

Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias

Gradient regularization (GR) is a method that penalizes the gradient nor...
research
02/14/2023

A modern look at the relationship between sharpness and generalization

Sharpness of minima is a promising quantity that can correlate with gene...
research
08/16/2018

On the Decision Boundary of Deep Neural Networks

While deep learning models and techniques have achieved great empirical ...
research
11/10/2022

How Does Sharpness-Aware Minimization Minimize Sharpness?

Sharpness-Aware Minimization (SAM) is a highly effective regularization ...
research
08/04/2023

Frustratingly Easy Model Generalization by Dummy Risk Minimization

Empirical risk minimization (ERM) is a fundamental machine learning para...
research
06/17/2022

How You Start Matters for Generalization

Characterizing the remarkable generalization properties of over-paramete...
research
01/16/2023

Stability Analysis of Sharpness-Aware Minimization

Sharpness-aware minimization (SAM) is a recently proposed training metho...

Please sign up or login with your details

Forgot password? Click here to reset