On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay

06/29/2021
by   Ekaterina Lobacheva, et al.
0

Despite the conventional wisdom that using batch normalization with weight decay may improve neural network training, some recent works show their joint usage may cause instabilities at the late stages of training. Other works, in contrast, show convergence to the equilibrium, i.e., the stabilization of training metrics. In this paper, we study this contradiction and show that instead of converging to a stable equilibrium, the training dynamics converge to consistent periodic behavior. That is, the training process regularly exhibits instabilities which, however, do not lead to complete training failure, but cause a new period of training. We rigorously investigate the mechanism underlying this discovered periodic behavior both from an empirical and theoretical point of view and show that this periodic behavior is indeed caused by the interaction between batch normalization and weight decay.

READ FULL TEXT
research
07/05/2022

Understanding and Improving Group Normalization

Various normalization layers have been proposed to help the training of ...
research
01/05/2020

Self-Orthogonality Module: A Network Architecture Plug-in for Learning Orthogonal Filters

In this paper, we investigate the empirical impact of orthogonality regu...
research
05/06/2019

Batch Normalization is a Cause of Adversarial Vulnerability

Batch normalization (batch norm) is often used in an attempt to stabiliz...
research
03/05/2018

Norm matters: efficient and accurate normalization schemes in deep networks

Over the past few years batch-normalization has been commonly used in de...
research
10/15/2018

Revisit Batch Normalization: New Understanding from an Optimization View and a Refinement via Composition Optimization

Batch Normalization (BN) has been used extensively in deep learning to a...
research
10/06/2020

Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate

Recent works (e.g., (Li and Arora, 2020)) suggest that the use of popula...
research
07/12/2019

Faster Neural Network Training with Data Echoing

In the twilight of Moore's law, GPUs and other specialized hardware acce...

Please sign up or login with your details

Forgot password? Click here to reset