Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization

01/19/2020
by   Junjie Yan, et al.
20

Batch Normalization (BN) is one of the most widely used techniques in Deep Learning field. But its performance can awfully degrade with insufficient batch size. This weakness limits the usage of BN on many computer vision tasks like detection or segmentation, where batch size is usually small due to the constraint of memory consumption. Therefore many modified normalization techniques have been proposed, which either fail to restore the performance of BN completely, or have to introduce additional nonlinear operations in inference procedure and increase huge consumption. In this paper, we reveal that there are two extra batch statistics involved in backward propagation of BN, on which has never been well discussed before. The extra batch statistics associated with gradients also can severely affect the training of deep neural network. Based on our analysis, we propose a novel normalization method, named Moving Average Batch Normalization (MABN). MABN can completely restore the performance of vanilla BN in small batch cases, without introducing any additional nonlinear operations in inference procedure. We prove the benefits of MABN by both theoretical analysis and experiments. Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO. The code has been released in https://github.com/megvii-model/MABN.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/22/2018

Group Normalization

Batch Normalization (BN) is a milestone technique in the development of ...
research
04/06/2023

Patch-aware Batch Normalization for Improving Cross-domain Robustness

Despite the significant success of deep learning in computer vision task...
research
12/04/2020

Batch Group Normalization

Deep Convolutional Neural Networks (DCNNs) are hard and time-consuming t...
research
08/02/2022

Unified Normalization for Accelerating and Stabilizing Transformers

Solid results from Transformers have made them prevailing architectures ...
research
04/12/2019

EvalNorm: Estimating Batch Normalization Statistics for Evaluation

Batch normalization (BN) has been very effective for deep learning and i...
research
10/11/2022

Understanding the Failure of Batch Normalization for Transformers in NLP

Batch Normalization (BN) is a core and prevalent technique in accelerati...
research
12/07/2017

In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

In this work we present In-Place Activated Batch Normalization (InPlace-...

Please sign up or login with your details

Forgot password? Click here to reset