Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections

02/13/2021
by   Alexander Camuto, et al.
5

Gaussian noise injections (GNIs) are a family of simple and widely-used regularisation methods for training neural networks, where one injects additive or multiplicative Gaussian noise to the network activations at every iteration of the optimisation algorithm, which is typically chosen as stochastic gradient descent (SGD). In this paper we focus on the so-called `implicit effect' of GNIs, which is the effect of the injected noise on the dynamics of SGD. We show that this effect induces an asymmetric heavy-tailed noise on SGD gradient updates. In order to model this modified dynamics, we first develop a Langevin-like stochastic differential equation that is driven by a general family of asymmetric heavy-tailed noise. Using this model we then formally prove that GNIs induce an `implicit bias', which varies depending on the heaviness of the tails and the level of asymmetry. Our empirical results confirm that different types of neural networks trained with GNIs are well-modelled by the proposed dynamics and that the implicit effect of these injections induces a bias that degrades the performance of networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2019

The Multiplicative Noise in Stochastic Gradient Descent: Data-Dependent Regularization, Continuous and Discrete Approximation

The randomness in Stochastic Gradient Descent (SGD) is considered to pla...
research
06/15/2020

Shape Matters: Understanding the Implicit Bias of the Noise Covariance

The noise in stochastic gradient descent (SGD) provides a crucial implic...
research
06/20/2022

Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation

Understanding the implicit bias of training algorithms is of crucial imp...
research
07/14/2020

Explicit Regularisation in Gaussian Noise Injections

We study the regularisation induced in neural networks by Gaussian noise...
research
03/05/2023

Revisiting the Noise Model of Stochastic Gradient Descent

The stochastic gradient noise (SGN) is a significant factor in the succe...
research
06/13/2023

Implicit Compressibility of Overparametrized Neural Networks Trained with Heavy-Tailed SGD

Neural network compression has been an increasingly important subject, d...
research
10/27/2022

Stochastic Mirror Descent in Average Ensemble Models

The stochastic mirror descent (SMD) algorithm is a general class of trai...

Please sign up or login with your details

Forgot password? Click here to reset