How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift)

05/29/2018
by   Shibani Santurkar, et al.
0

Batch Normalization (BatchNorm) is a widely adopted technique that enables faster and more stable training of deep neural networks (DNNs). Despite its pervasiveness, the exact reasons for BatchNorm's effectiveness are still poorly understood. The popular belief is that this effectiveness stems from controlling the change of the layers' input distributions during training to reduce the so-called "internal covariate shift". In this work, we demonstrate that such distributional stability of layer inputs has little to do with the success of BatchNorm. Instead, we uncover a more fundamental impact of BatchNorm on the training process: it makes the optimization landscape significantly smoother. This smoothness induces a more predictive and stable behavior of the gradients, allowing for faster training. These findings bring us closer to a true understanding of our DNN training toolkit.

READ FULL TEXT

page 15

page 16

research
03/04/2016

Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks

While the authors of Batch Normalization (BN) identify and address an im...
research
01/09/2020

An Internal Covariate Shift Bounding Algorithm for Deep Neural Networks by Unitizing Layers' Outputs

Batch Normalization (BN) techniques have been proposed to reduce the so-...
research
02/11/2015

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Training Deep Neural Networks is complicated by the fact that the distri...
research
03/14/2023

Context Normalization for Robust Image Classification

Normalization is a pre-processing step that converts the data into a mor...
research
02/09/2018

Batch Kalman Normalization: Towards Training Deep Neural Networks with Micro-Batches

As an indispensable component, Batch Normalization (BN) has successfully...
research
12/07/2017

Solving internal covariate shift in deep learning with linked neurons

This work proposes a novel solution to the problem of internal covariate...
research
10/15/2018

Revisit Batch Normalization: New Understanding from an Optimization View and a Refinement via Composition Optimization

Batch Normalization (BN) has been used extensively in deep learning to a...

Please sign up or login with your details

Forgot password? Click here to reset