Four Things Everyone Should Know to Improve Batch Normalization

06/09/2019
by   Cecilia Summers, et al.
13

A key component of most neural network architectures is the use of normalization layers, such as Batch Normalization. Despite its common use and large utility in optimizing deep architectures that are otherwise intractable, it has been challenging both to generically improve upon Batch Normalization and to understand specific circumstances that lend themselves to other enhancements. In this paper, we identify four improvements to the generic form of Batch Normalization and the circumstances under which they work, yielding performance gains across all batch sizes while requiring no additional computation during training. These contributions include proposing a method for reasoning about the current example in inference normalization statistics which fixes a training vs. inference discrepancy; recognizing and validating the powerful regularization effect of Ghost Batch Normalization for small and medium batch sizes; examining the effect of weight decay regularization on the scaling and shifting parameters; and identifying a new normalization algorithm for very small batch sizes by combining the strengths of Batch and Group Normalization. We validate our results empirically on four datasets: CIFAR-100, SVHN, Caltech-256, and ImageNet.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2023

Ghost Noise for Regularizing Deep Neural Networks

Batch Normalization (BN) is widely used to stabilize the optimization pr...
research
07/16/2020

A New Look at Ghost Normalization

Batch normalization (BatchNorm) is an effective yet poorly understood te...
research
02/13/2018

Uncertainty Estimation via Stochastic Batch Normalization

In this work, we investigate Batch Normalization technique and propose i...
research
12/07/2018

On Batch Orthogonalization Layers

Batch normalization has become ubiquitous in many state-of-the-art nets....
research
03/28/2022

To Fold or Not to Fold: a Necessary and Sufficient Condition on Batch-Normalization Layers Folding

Batch-Normalization (BN) layers have become fundamental components in th...
research
07/06/2022

Difference in Euclidean Norm Can Cause Semantic Divergence in Batch Normalization

In this paper, we show that the difference in Euclidean norm of samples ...
research
12/04/2020

Batch Group Normalization

Deep Convolutional Neural Networks (DCNNs) are hard and time-consuming t...

Please sign up or login with your details

Forgot password? Click here to reset