Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances

06/08/2023
by   Marcel Kühn, et al.
0

Stochastic gradient descent (SGD) has become a cornerstone of neural network optimization, yet the noise introduced by SGD is often assumed to be uncorrelated over time, despite the ubiquity of epoch-based training. In this work, we challenge this assumption and investigate the effects of epoch-based noise correlations on the stationary distribution of discrete-time SGD with momentum, limited to a quadratic loss. Our main contributions are twofold: first, we calculate the exact autocorrelation of the noise for training in epochs under the assumption that the noise is independent of small fluctuations in the weight vector; second, we explore the influence of correlations introduced by the epoch-based learning scheme on SGD dynamics. We find that for directions with a curvature greater than a hyperparameter-dependent crossover value, the results for uncorrelated noise are recovered. However, for relatively flat directions, the weight variance is significantly reduced. We provide an intuitive explanation for these results based on a crossover between correlation times, contributing to a deeper understanding of the dynamics of SGD in the presence of epoch-based noise correlations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/20/2019

Bayesian interpretation of SGD as Ito process

The current interpretation of stochastic gradient descent (SGD) as a sto...
research
06/02/2022

Stochastic gradient descent introduces an effective landscape-dependent regularization favoring flat solutions

Generalization is one of the most important problems in deep learning (D...
research
02/06/2021

The Implicit Biases of Stochastic Gradient Descent on Deep Neural Networks with Batch Normalization

Deep neural networks with batch normalization (BN-DNNs) are invariant to...
research
02/10/2021

On Minibatch Noise: Discrete-Time SGD, Overparametrization, and Bayes

The noise in stochastic gradient descent (SGD), caused by minibatch samp...
research
06/07/2018

Scalable Natural Gradient Langevin Dynamics in Practice

Stochastic Gradient Langevin Dynamics (SGLD) is a sampling scheme for Ba...
research
03/15/2018

Escaping Saddles with Stochastic Gradients

We analyze the variance of stochastic gradients along negative curvature...
research
06/07/2023

Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks

In this work, we reveal a strong implicit bias of stochastic gradient de...

Please sign up or login with your details

Forgot password? Click here to reset