An Adaptive Policy to Employ Sharpness-Aware Minimization

04/28/2023
by   Weisen Jiang, et al.
0

Sharpness-aware minimization (SAM), which searches for flat minima by min-max optimization, has been shown to be useful in improving model generalization. However, since each SAM update requires computing two gradients, its computational cost and training time are both doubled compared to standard empirical risk minimization (ERM). Recent state-of-the-arts reduce the fraction of SAM updates and thus accelerate SAM by switching between SAM and ERM updates randomly or periodically. In this paper, we design an adaptive policy to employ SAM based on the loss landscape geometry. Two efficient algorithms, AE-SAM and AE-LookSAM, are proposed. We theoretically show that AE-SAM has the same convergence rate as SAM. Experimental results on various datasets and architectures demonstrate the efficiency and effectiveness of the adaptive policy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2022

Sharpness-Aware Training for Free

Modern deep neural networks (DNNs) have achieved state-of-the-art perfor...
research
12/16/2021

δ-SAM: Sharpness-Aware Minimization with Dynamic Reweighting

Deep neural networks are often overparameterized and may not easily achi...
research
10/13/2022

GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization for Improved Generalization

Recently, Sharpness-Aware Minimization (SAM) algorithm has shown state-o...
research
06/19/2023

Enhancing Generalization and Plasticity for Sample Efficient Reinforcement Learning

In Reinforcement Learning (RL), enhancing sample efficiency is crucial, ...
research
02/23/2021

ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks

Recently, learning algorithms motivated from sharpness of loss surface a...
research
08/04/2023

Frustratingly Easy Model Generalization by Dummy Risk Minimization

Empirical risk minimization (ERM) is a fundamental machine learning para...
research
01/09/2023

Balance is Essence: Accelerating Sparse Training via Adaptive Gradient Correction

Despite impressive performance on a wide variety of tasks, deep neural n...

Please sign up or login with your details

Forgot password? Click here to reset