L2 Regularization versus Batch and Weight Normalization

06/16/2017
by   Twan van Laarhoven, et al.
0

Batch Normalization is a commonly used trick to improve the training of deep neural networks. These neural networks use L2 regularization, also called weight decay, ostensibly to prevent overfitting. However, we show that L2 regularization has no regularizing effect when combined with normalization. Instead, regularization has an influence on the scale of weights, and thereby on the effective learning rate. We investigate this dependence, both in theory, and experimentally. We show that popular optimization methods such as ADAM only partially eliminate the influence of normalization on the learning rate. This leads to a discussion on other ways to mitigate this issue.

READ FULL TEXT

page 7

page 8

10/02/2020

Weight and Gradient Centralization in Deep Neural Networks

Batch normalization is currently the most widely used variant of interna...
03/05/2018

Norm matters: efficient and accurate normalization schemes in deep networks

Over the past few years batch-normalization has been commonly used in de...
06/15/2020

Spherical Motion Dynamics of Deep Neural Networks with Batch Normalization and Weight Decay

We comprehensively reveal the learning dynamics of deep neural networks ...
02/06/2021

The Implicit Biases of Stochastic Gradient Descent on Deep Neural Networks with Batch Normalization

Deep neural networks with batch normalization (BN-DNNs) are invariant to...
09/24/2017

Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification

Batch normalization (BN) has become a de facto standard for training dee...
05/15/2022

Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks

L2 regularization for weights in neural networks is widely used as a sta...
10/12/2018

Mode Normalization

Normalization methods are a central building block in the deep learning ...