Understanding Batch Normalization

06/01/2018
by   Johan Bjorck, et al.
0

Batch normalization is a ubiquitous deep learning technique that normalizes activations in intermediate layers. It is associated with improved accuracy and faster learning, but despite its enormous success there is little consensus regarding why it works. We aim to rectify this and take an empirical approach to understanding batch normalization. Our primary observation is that the higher learning rates that batch normalization enables have a regularizing effect that dramatically improves generalization of normalized networks, which is both demonstrated empirically and motivated theoretically. We show how activations become large and how the convolutional channels become increasingly ill-behaved for layers deep in unnormalized networks, and how this results in larger input-independent gradients. Beyond just gradient scaling, we demonstrate how the learning rate in unnormalized networks is further limited by the magnitude of activations growing exponentially with network depth for large parameter updates, a problem batch normalization trivially avoids. Motivated by recent results in random matrix theory, we argue that ill-conditioning of the activations is due to fluctuations in random initialization, shedding new light on classical initialization schemes and their consequences.

READ FULL TEXT
research
11/29/2019

Mean Shift Rejection: Training Deep Neural Networks Without Minibatch Statistics or Normalization

Deep convolutional neural networks are known to be unstable during train...
research
09/15/2022

Theroretical Insight into Batch Normalization: Data Dependant Auto-Tuning of Regularization Rate

Batch normalization is widely used in deep learning to normalize interme...
research
10/16/2020

Filtered Batch Normalization

It is a common assumption that the activation of different layers in neu...
research
06/01/2023

Spreads in Effective Learning Rates: The Perils of Batch Normalization During Early Training

Excursions in gradient magnitude pose a persistent challenge when traini...
research
11/01/2018

Stochastic Normalizations as Bayesian Learning

In this work we investigate the reasons why Batch Normalization (BN) imp...
research
02/25/2020

Exploring Learning Dynamics of DNNs via Layerwise Conditioning Analysis

Conditioning analysis uncovers the landscape of optimization objective b...
research
06/10/2021

Beyond BatchNorm: Towards a General Understanding of Normalization in Deep Learning

Inspired by BatchNorm, there has been an explosion of normalization laye...

Please sign up or login with your details

Forgot password? Click here to reset