Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift

01/16/2018
by   Xiang Li, et al.
0

This paper first answers the question "why do the two most powerful techniques Dropout and Batch Normalization (BN) often lead to a worse performance when they are combined together?" in both theoretical and statistical aspects. Theoretically, we find that Dropout would shift the variance of a specific neural unit when we transfer the state of that network from train to test. However, BN would maintain its statistical variance, which is accumulated from the entire learning procedure, in the test phase. The inconsistency of that variance (we name this scheme as "variance shift") causes the unstable numerical behavior in inference that leads to more erroneous predictions finally, when applying Dropout before BN. Thorough experiments on DenseNet, ResNet, ResNeXt and Wide ResNet confirm our findings. According to the uncovered mechanism, we next explore several strategies that modifies Dropout and try to overcome the limitations of their combination by avoiding the variance shift risks.

READ FULL TEXT
research
07/08/2016

Adjusting for Dropout Variance in Batch Normalization and Weight Initialization

We show how to adjust for the variance introduced by dropout with correc...
research
02/13/2023

How to Use Dropout Correctly on Residual Networks with Batch Normalization

For the stable optimization of deep neural networks, regularization meth...
research
03/22/2021

Delving into Variance Transmission and Normalization: Shift of Average Gradient Makes the Network Collapse

Normalization operations are essential for state-of-the-art neural netwo...
research
03/21/2022

Delving into the Estimation Shift of Batch Normalization in a Network

Batch normalization (BN) is a milestone technique in deep learning. It n...
research
11/01/2021

A variance principle explains why dropout finds flatter minima

Although dropout has achieved great success in deep learning, little is ...
research
05/23/2019

Ensemble Model Patching: A Parameter-Efficient Variational Bayesian Neural Network

Two main obstacles preventing the widespread adoption of variational Bay...
research
05/25/2023

Stochastic Modified Equations and Dynamics of Dropout Algorithm

Dropout is a widely utilized regularization technique in the training of...

Please sign up or login with your details

Forgot password? Click here to reset