Mean-field Analysis of Batch Normalization

03/06/2019
by   Mingwei Wei, et al.
0

Batch Normalization (BatchNorm) is an extremely useful component of modern neural network architectures, enabling optimization using higher learning rates and achieving faster convergence. In this paper, we use mean-field theory to analytically quantify the impact of BatchNorm on the geometry of the loss landscape for multi-layer networks consisting of fully-connected and convolutional layers. We show that it has a flattening effect on the loss landscape, as quantified by the maximum eigenvalue of the Fisher Information Matrix. These findings are then used to justify the use of larger learning rates for networks that use BatchNorm, and we provide quantitative characterization of the maximal allowable learning rate to ensure convergence. Experiments support our theoretically predicted maximum learning rate, and furthermore suggest that networks with smaller values of the BatchNorm parameter achieve lower loss after the same number of epochs of training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2018

Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach

This study analyzes the Fisher information matrix (FIM) by applying mean...
research
10/17/2021

A Riemannian Mean Field Formulation for Two-layer Neural Networks with Batch Normalization

The training dynamics of two-layer neural networks with batch normalizat...
research
06/07/2019

The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks

Normalization methods play an important role in enhancing the performanc...
research
06/16/2020

Curvature is Key: Sub-Sampled Loss Surfaces and the Implications for Large Batch Training

We study the effect of mini-batching on the loss landscape of deep neura...
research
05/13/2023

Depth Dependence of μP Learning Rates in ReLU MLPs

In this short note we consider random fully connected ReLU networks of w...
research
06/07/2016

Systematic evaluation of CNN advances on the ImageNet

The paper systematically studies the impact of a range of recent advance...
research
06/21/2022

On the Maximum Hessian Eigenvalue and Generalization

The mechanisms by which certain training interventions, such as increasi...

Please sign up or login with your details

Forgot password? Click here to reset