How to Use Dropout Correctly on Residual Networks with Batch Normalization

02/13/2023
by   Bum Jun Kim, et al.
0

For the stable optimization of deep neural networks, regularization methods such as dropout and batch normalization have been used in various tasks. Nevertheless, the correct position to apply dropout has rarely been discussed, and different positions have been employed depending on the practitioners. In this study, we investigate the correct position to apply dropout. We demonstrate that for a residual network with batch normalization, applying dropout at certain positions increases the performance, whereas applying dropout at other positions decreases the performance. Based on theoretical analysis, we provide the following guideline for the correct position to apply dropout: apply one dropout after the last batch normalization but before the last weight layer in the residual branch. We provide detailed theoretical explanations to support this claim and demonstrate them through module tests. In addition, we investigate the correct position of dropout in the head that produces the final prediction. Although the current consensus is to apply dropout after global average pooling, we prove that applying dropout before global average pooling leads to a more stable output. The proposed guidelines are validated through experiments using different datasets and models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/08/2016

Adjusting for Dropout Variance in Batch Normalization and Weight Initialization

We show how to adjust for the variance introduced by dropout with correc...
research
05/24/2022

Functional Network: A Novel Framework for Interpretability of Deep Neural Networks

The layered structure of deep neural networks hinders the use of numerou...
research
05/15/2022

Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks

L2 regularization for weights in neural networks is widely used as a sta...
research
01/16/2018

Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift

This paper first answers the question "why do the two most powerful tech...
research
08/24/2019

Don't ignore Dropout in Fully Convolutional Networks

Data for Image segmentation models can be costly to obtain due to the pr...
research
09/24/2017

Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification

Batch normalization (BN) has become a de facto standard for training dee...
research
05/23/2019

Ensemble Model Patching: A Parameter-Efficient Variational Bayesian Neural Network

Two main obstacles preventing the widespread adoption of variational Bay...

Please sign up or login with your details

Forgot password? Click here to reset