Dropout Reduces Underfitting

03/02/2023
by   Zhuang Liu, et al.
0

Introduced by Hinton et al. in 2012, dropout has stood the test of time as a regularizer for preventing overfitting in neural networks. In this study, we demonstrate that dropout can also mitigate underfitting when used at the start of training. During the early phase, we find dropout reduces the directional variance of gradients across mini-batches and helps align the mini-batch gradients with the entire dataset's gradient. This helps counteract the stochasticity of SGD and limit the influence of individual batches on model training. Our findings lead us to a solution for improving performance in underfitting models - early dropout: dropout is applied only during the initial phases of training, and turned off afterwards. Models equipped with early dropout achieve lower final training loss compared to their counterparts without dropout. Additionally, we explore a symmetric technique for regularizing overfitting models - late dropout, where dropout is not used in the early iterations and is only activated later in training. Experiments on ImageNet and various vision tasks demonstrate that our methods consistently improve generalization accuracy. Our results encourage more research on understanding regularization in deep learning and our methods can be useful tools for future neural network training, especially in the era of large data. Code is available at https://github.com/facebookresearch/dropout .

READ FULL TEXT
research
10/05/2022

Revisiting Structured Dropout

Large neural networks are often overparameterised and prone to overfitti...
research
11/18/2019

RotationOut as a Regularization Method for Neural Network

In this paper, we propose a novel regularization method, RotationOut, fo...
research
07/13/2022

Implicit regularization of dropout

It is important to understand how the popular regularization method drop...
research
12/11/2021

Early Stopping for Deep Image Prior

Deep image prior (DIP) and its variants have showed remarkable potential...
research
08/26/2022

Universal Mini-Batch Consistency for Set Encoding Functions

Previous works have established solid foundations for neural set functio...
research
11/01/2021

A variance principle explains why dropout finds flatter minima

Although dropout has achieved great success in deep learning, little is ...
research
11/28/2019

Continuous Dropout

Dropout has been proven to be an effective algorithm for training robust...

Please sign up or login with your details

Forgot password? Click here to reset