How Many Factors Influence Minima in SGD?

09/24/2020
by   Victor Luo, et al.
0

Stochastic gradient descent (SGD) is often applied to train Deep Neural Networks (DNNs), and research efforts have been devoted to investigate the convergent dynamics of SGD and minima found by SGD. The influencing factors identified in the literature include learning rate, batch size, Hessian, and gradient covariance, and stochastic differential equations are used to model SGD and establish the relationships among these factors for characterizing minima found by SGD. It has been found that the ratio of batch size to learning rate is a main factor in highlighting the underlying SGD dynamics; however, the influence of other important factors such as the Hessian and gradient covariance is not entirely agreed upon. This paper describes the factors and relationships in the recent literature and presents numerical findings on the relationships. In particular, it confirms the four-factor and general relationship results obtained in Wang (2019), while the three-factor and associated relationship results found in Jastrzȩbski et al. (2018) may not hold beyond the considered special case.

READ FULL TEXT

page 6

page 7

research
11/13/2017

Three Factors Influencing Minima in SGD

We study the properties of the endpoint of stochastic gradient descent (...
research
12/03/2018

Towards Theoretical Understanding of Large Batch Training in Stochastic Gradient Descent

Stochastic gradient descent (SGD) is almost ubiquitously used for traini...
research
08/17/2023

Hitting the High-Dimensional Notes: An ODE for SGD learning dynamics on GLMs and multi-index models

We analyze the dynamics of streaming stochastic gradient descent (SGD) i...
research
09/14/2017

The Impact of Local Geometry and Batch Size on the Convergence and Divergence of Stochastic Gradient Descent

Stochastic small-batch (SB) methods, such as mini-batch Stochastic Gradi...
research
02/24/2021

On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)

It is generally recognized that finite learning rate (LR), in contrast t...
research
02/24/2018

A Walk with SGD

Exploring why stochastic gradient descent (SGD) based optimization metho...
research
05/25/2023

Stochastic Modified Equations and Dynamics of Dropout Algorithm

Dropout is a widely utilized regularization technique in the training of...

Please sign up or login with your details

Forgot password? Click here to reset