Norm matters: efficient and accurate normalization schemes in deep networks

03/05/2018
by   Elad Hoffer, et al.
0

Over the past few years batch-normalization has been commonly used in deep networks, allowing faster training and high performance for a wide variety of applications. However, the reasons behind its merits remained unanswered, with several shortcomings that hindered its use for certain tasks. In this work we present a novel view on the purpose and function of normalization methods and weight-decay, as tools to decouple weights' norm from the underlying optimized objective. We also improve the use of weight-normalization and show the connection between practices such as normalization, weight decay and learning-rate adjustments. Finally, we suggest several alternatives to the widely used L^2 batch-norm, using normalization in L^1 and L^∞ spaces that can substantially improve numerical stability in low-precision implementations as well as provide computational and memory benefits. We demonstrate that such methods enable the first batch-norm alternative to work for half-precision implementations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2017

L2 Regularization versus Batch and Weight Normalization

Batch Normalization is a commonly used trick to improve the training of ...
research
11/14/2019

Understanding the Disharmony between Weight Normalization Family and Weight Decay: ε-shifted L_2 Regularizer

The merits of fast convergence and potentially better performance of the...
research
05/06/2019

Batch Normalization is a Cause of Adversarial Vulnerability

Batch normalization (batch norm) is often used in an attempt to stabiliz...
research
06/21/2019

Backpropagation-Friendly Eigendecomposition

Eigendecomposition (ED) is widely used in deep networks. However, the ba...
research
06/29/2021

On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay

Despite the conventional wisdom that using batch normalization with weig...
research
02/27/2019

Equi-normalization of Neural Networks

Modern neural networks are over-parametrized. In particular, each rectif...
research
09/15/2022

Theroretical Insight into Batch Normalization: Data Dependant Auto-Tuning of Regularization Rate

Batch normalization is widely used in deep learning to normalize interme...

Please sign up or login with your details

Forgot password? Click here to reset