Stochastic Modified Equations and Dynamics of Dropout Algorithm

05/25/2023
by   Zhongwang Zhang, et al.
0

Dropout is a widely utilized regularization technique in the training of neural networks, nevertheless, its underlying mechanism and its impact on achieving good generalization abilities remain poorly understood. In this work, we derive the stochastic modified equations for analyzing the dynamics of dropout, where its discrete iteration process is approximated by a class of stochastic differential equations. In order to investigate the underlying mechanism by which dropout facilitates the identification of flatter minima, we study the noise structure of the derived stochastic modified equation for dropout. By drawing upon the structural resemblance between the Hessian and covariance through several intuitive approximations, we empirically demonstrate the universal presence of the inverse variance-flatness relation and the Hessian-variance relation, throughout the training process of dropout. These theoretical and empirical findings make a substantial contribution to our understanding of the inherent tendency of dropout to locate flatter minima.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2021

A variance principle explains why dropout finds flatter minima

Although dropout has achieved great success in deep learning, little is ...
research
07/13/2022

Implicit regularization of dropout

It is important to understand how the popular regularization method drop...
research
12/01/2018

Stochastic Training of Residual Networks: a Differential Equation Viewpoint

During the last few years, significant attention has been paid to the st...
research
06/08/2015

Variational Dropout and the Local Reparameterization Trick

We investigate a local reparameterizaton technique for greatly reducing ...
research
09/24/2020

How Many Factors Influence Minima in SGD?

Stochastic gradient descent (SGD) is often applied to train Deep Neural ...
research
05/23/2018

Pushing the bounds of dropout

We show that dropout training is best understood as performing MAP estim...
research
01/16/2018

Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift

This paper first answers the question "why do the two most powerful tech...

Please sign up or login with your details

Forgot password? Click here to reset