Implicit regularization of dropout

07/13/2022
by   Zhongwang Zhang, et al.
0

It is important to understand how the popular regularization method dropout helps the neural network training find a good generalization solution. In this work, we theoretically derive the implicit regularization of dropout and study the relation between the Hessian matrix of the loss function and the covariance matrix of the dropout noise, supported by a series of experiments. We then numerically study two implications of the implicit regularization of dropout, which intuitively rationalize why dropout helps generalization. First, we find that the training with dropout finds the neural network with a flatter minimum compared with standard gradient descent training in experiments, and the implicit regularization is the key for finding flat solutions. Second, trained with dropout, input weights of hidden neurons (the input weight of a hidden neuron consists of the weight from its input layer to the hidden neuron and its bias term) would tend to condense on isolated orientations. Condensation is a feature in non-linear learning process, which makes the neural network low complexity. Although our theory mainly focuses on the dropout used in the last hidden layer, our experiments apply for general dropout in training neural networks. This work points out the distinct characteristics of dropout compared with stochastic gradient descent and serves as an important basis for fully understanding dropout.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2019

On the Regularization Properties of Structured Dropout

Dropout and its extensions (eg. DropBlock and DropConnect) are popular h...
research
02/28/2020

The Implicit and Explicit Regularization Effects of Dropout

Dropout is a widely-used regularization technique, often required to obt...
research
06/26/2018

On the Implicit Bias of Dropout

Algorithmic approaches endow deep learning systems with implicit bias th...
research
11/01/2021

A variance principle explains why dropout finds flatter minima

Although dropout has achieved great success in deep learning, little is ...
research
06/18/2023

Dropout Regularization Versus ℓ_2-Penalization in the Linear Model

We investigate the statistical behavior of gradient descent iterates wit...
research
05/25/2023

Stochastic Modified Equations and Dynamics of Dropout Algorithm

Dropout is a widely utilized regularization technique in the training of...
research
03/02/2023

Dropout Reduces Underfitting

Introduced by Hinton et al. in 2012, dropout has stood the test of time ...

Please sign up or login with your details

Forgot password? Click here to reset