GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization for Improved Generalization

10/13/2022
by   Zhiyuan Zhang, et al.
19

Recently, Sharpness-Aware Minimization (SAM) algorithm has shown state-of-the-art generalization abilities in vision tasks. It demonstrates that flat minima tend to imply better generalization abilities. However, it has some difficulty implying SAM to some natural language tasks, especially to models with drastic gradient changes, such as RNNs. In this work, we analyze the relation between the flatness of the local minimum and its generalization ability from a novel and straightforward theoretical perspective. We propose that the shift of the training and test distributions can be equivalently seen as a virtual parameter corruption or perturbation, which can explain why flat minima that are robust against parameter corruptions or perturbations have better generalization performances. On its basis, we propose a Gradient-Strength based Adaptive Sharpness-Aware Minimization (GA-SAM) algorithm to help to learn algorithms find flat minima that generalize better. Results in various language benchmarks validate the effectiveness of the proposed GA-SAM algorithm on natural language tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/03/2023

Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization

Recently, flat minima are proven to be effective for improving generaliz...
research
12/16/2021

δ-SAM: Sharpness-Aware Minimization with Dynamic Reweighting

Deep neural networks are often overparameterized and may not easily achi...
research
08/04/2023

Frustratingly Easy Model Generalization by Dummy Risk Minimization

Empirical risk minimization (ERM) is a fundamental machine learning para...
research
07/18/2023

Promoting Exploration in Memory-Augmented Adam using Critical Momenta

Adaptive gradient-based optimizers, particularly Adam, have left their m...
research
10/24/2022

Sharpness-aware Minimization for Worst Case Optimization

Improvement of worst group performance and generalization performance ar...
research
04/28/2023

An Adaptive Policy to Employ Sharpness-Aware Minimization

Sharpness-aware minimization (SAM), which searches for flat minima by mi...
research
10/16/2021

Sharpness-Aware Minimization Improves Language Model Generalization

The allure of superhuman-level capabilities has led to considerable inte...

Please sign up or login with your details

Forgot password? Click here to reset