On Minibatch Noise: Discrete-Time SGD, Overparametrization, and Bayes

02/10/2021
by   Liu Ziyin, et al.
0

The noise in stochastic gradient descent (SGD), caused by minibatch sampling, remains poorly understood despite its enormous practical importance in offering good training efficiency and generalization ability. In this work, we study the minibatch noise in SGD. Motivated by the observation that minibatch sampling does not always cause a fluctuation, we set out to find the conditions that cause minibatch noise to emerge. We first derive the analytically solvable results for linear regression under various settings, which are compared to the commonly used approximations that are used to understand SGD noise. We show that some degree of mismatch between model and data complexity is needed in order for SGD to "cause" a noise, and that such mismatch may be due to the existence of static noise in the labels, in the input, the use of regularization, or underparametrization. Our results motivate a more accurate general formulation to describe minibatch noise.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2019

The Multiplicative Noise in Stochastic Gradient Descent: Data-Dependent Regularization, Continuous and Discrete Approximation

The randomness in Stochastic Gradient Descent (SGD) is considered to pla...
research
12/06/2019

Why ADAM Beats SGD for Attention Models

While stochastic gradient descent (SGD) is still the de facto algorithm ...
research
06/08/2023

Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances

Stochastic gradient descent (SGD) has become a cornerstone of neural net...
research
08/12/2017

Noisy Softmax: Improving the Generalization Ability of DCNN via Postponing the Early Softmax Saturation

Over the past few years, softmax and SGD have become a commonly used com...
research
09/28/2020

Improved generalization by noise enhancement

Recent studies have demonstrated that noise in stochastic gradient desce...
research
08/10/2021

The Benefits of Implicit Regularization from SGD in Least Squares Problems

Stochastic gradient descent (SGD) exhibits strong algorithmic regulariza...
research
05/26/2020

Inherent Noise in Gradient Based Methods

Previous work has examined the ability of larger capacity neural network...

Please sign up or login with your details

Forgot password? Click here to reset