Normalization Layers Are All That Sharpness-Aware Minimization Needs

06/07/2023
by   Maximilian Mueller, et al.
0

Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima and has been shown to enhance generalization performance in various settings. In this work we show that perturbing only the affine normalization parameters (comprising less than 0.1 SAM outperforms perturbing all of the parameters. This finding generalizes to different SAM variants and both ResNet (Batch Normalization) and Vision Transformer (Layer Normalization) architectures. We consider alternative sparse perturbation approaches and find that these do not achieve similar performance enhancement at such extreme sparsity levels, showing that this behaviour is unique to the normalization layers. Although our findings reaffirm the effectiveness of SAM in improving generalization performance, they cast doubt on whether this is solely caused by reduced sharpness. The code for our experiments is publicly available at https://github.com/mueller-mp/SAM-ON.

READ FULL TEXT
research
03/22/2021

A Batch Normalization Classifier for Domain Adaptation

Adapting a model to perform well on unforeseen data outside its training...
research
09/19/2022

Batch Layer Normalization, A new normalization layer for CNNs and RNN

This study introduces a new normalization layer termed Batch Layer Norma...
research
01/13/2022

Depth Normalization of Small RNA Sequencing: Using Data and Biology to Select a Suitable Method

Deep sequencing has become one of the most popular tools for transcripto...
research
11/24/2021

NAM: Normalization-based Attention Module

Recognizing less salient features is the key for model compression. Howe...
research
12/07/2018

On Batch Orthogonalization Layers

Batch normalization has become ubiquitous in many state-of-the-art nets....
research
06/14/2022

Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction

Normalization layers (e.g., Batch Normalization, Layer Normalization) we...
research
10/11/2022

Understanding the Failure of Batch Normalization for Transformers in NLP

Batch Normalization (BN) is a core and prevalent technique in accelerati...

Please sign up or login with your details

Forgot password? Click here to reset