Sharpness-Aware Minimization: An Implicit Regularization Perspective

02/23/2023
by   Kayhan Behdin, et al.
0

Sharpness-Aware Minimization (SAM) is a recent optimization framework aiming to improve the deep neural network generalization, through obtaining flatter (i.e. less sharp) solutions. As SAM has been numerically successful, recent papers have studied the theoretical aspects of the framework. In this work, we study SAM through an implicit regularization lens, and present a new theoretical explanation of why SAM generalizes well. To this end, we study the least-squares linear regression problem and show a bias-variance trade-off for SAM's error over the course of the algorithm. We show SAM has lower bias compared to Gradient Descent (GD), while having higher variance. This shows SAM can outperform GD, specially if the algorithm is stopped early, which is often the case when training large neural networks due to the prohibitive computational cost. We extend our results to kernel regression, as well as stochastic optimization and discuss how implicit regularization of SAM can improve upon vanilla training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2023

Implicit Regularization for Group Sparsity

We study the implicit regularization of gradient descent towards structu...
research
03/13/2020

Can Implicit Bias Explain Generalization? Stochastic Convex Optimization as a Case Study

The notion of implicit bias, or implicit regularization, has been sugges...
research
04/29/2022

Implicit Regularization Properties of Variance Reduced Stochastic Mirror Descent

In machine learning and statistical data analysis, we often run into obj...
research
02/06/2023

Rethinking Gauss-Newton for learning over-parameterized models

Compared to gradient descent, Gauss-Newton's method (GN) and variants ar...
research
10/06/2022

Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias

Gradient regularization (GR) is a method that penalizes the gradient nor...
research
06/03/2018

Minnorm training: an algorithm for training overcomplete deep neural networks

In this work, we propose a new training method for finding minimum weigh...
research
06/03/2018

Minnorm training: an algorithm for training over-parameterized deep neural networks

In this work, we propose a new training method for finding minimum weigh...

Please sign up or login with your details

Forgot password? Click here to reset