The Regularization Effects of Anisotropic Noise in Stochastic Gradient Descent

03/01/2018
by   Zhanxing Zhu, et al.
0

Understanding the generalization of deep learning has raised lots of concerns recently, where the learning algorithms play an important role in generalization performance, such as stochastic gradient descent (SGD). Along this line, we particularly study the anisotropic noise introduced by SGD, and investigate its importance for the generalization in deep neural networks. Through a thorough empirical analysis, it is shown that the anisotropic diffusion of SGD tends to follow the curvature information of the loss landscape, and thus is beneficial for escaping from sharp and poor minima effectively, towards more stable and flat minima. We verify our understanding through comparing this anisotropic diffusion with full gradient descent plus isotropic diffusion (i.e. Langevin dynamics) and other types of position-dependent noise.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2020

Dynamic of Stochastic Gradient Descent with State-Dependent Noise

Stochastic gradient descent (SGD) and its variants are mainstream method...
research
06/02/2022

Stochastic gradient descent introduces an effective landscape-dependent regularization favoring flat solutions

Generalization is one of the most important problems in deep learning (D...
research
05/27/2021

The Sobolev Regularization Effect of Stochastic Gradient Descent

The multiplicative structure of parameters and input data in the first l...
research
06/26/2019

Gradient Noise Convolution (GNC): Smoothing Loss Function for Distributed Large-Batch SGD

Large-batch stochastic gradient descent (SGD) is widely used for trainin...
research
10/27/2019

A geometric interpretation of stochastic gradient descent using diffusion metrics

Stochastic gradient descent (SGD) is a key ingredient in the training of...
research
09/22/2020

Anomalous diffusion dynamics of learning in deep neural networks

Learning in deep neural networks (DNNs) is implemented through minimizin...
research
04/04/2022

Deep learning, stochastic gradient descent and diffusion maps

Stochastic gradient descent (SGD) is widely used in deep learning due to...

Please sign up or login with your details

Forgot password? Click here to reset