Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction

06/14/2022
by   Kaifeng Lyu, et al.
0

Normalization layers (e.g., Batch Normalization, Layer Normalization) were introduced to help with optimization difficulties in very deep nets, but they clearly also help generalization, even in not-so-deep nets. Motivated by the long-held belief that flatter minima lead to better generalization, this paper gives mathematical analysis and supporting experiments suggesting that normalization (together with accompanying weight-decay) encourages GD to reduce the sharpness of loss surface. Here "sharpness" is carefully defined given that the loss is scale-invariant, a known consequence of normalization. Specifically, for a fairly broad class of neural nets with normalization, our theory explains how GD with a finite learning rate enters the so-called Edge of Stability (EoS) regime, and characterizes the trajectory of GD in this regime via a continuous sharpness-reduction flow.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/07/2018

On Batch Orthogonalization Layers

Batch normalization has become ubiquitous in many state-of-the-art nets....
research
10/06/2020

Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate

Recent works (e.g., (Li and Arora, 2020)) suggest that the use of popula...
research
11/19/2018

Do Normalization Layers in a Deep ConvNet Really Need to Be Distinct?

Yes, they do. This work investigates a perspective for deep learning: wh...
research
10/19/2020

MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch Normalization

Substantial experiments have validated the success of Batch Normalizatio...
research
06/07/2023

Normalization Layers Are All That Sharpness-Aware Minimization Needs

Sharpness-aware minimization (SAM) was proposed to reduce sharpness of m...
research
09/08/2022

Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes

A fundamental property of deep learning normalization techniques, such a...
research
06/15/2020

Weighted Optimization: better generalization by smoother interpolation

We provide a rigorous analysis of how implicit bias towards smooth inter...

Please sign up or login with your details

Forgot password? Click here to reset