Escaping Saddles with Stochastic Gradients

03/15/2018
by   Hadi Daneshmand, et al.
0

We analyze the variance of stochastic gradients along negative curvature directions in certain non-convex machine learning models and show that stochastic gradients exhibit a strong component along these directions. Furthermore, we show that - contrary to the case of isotropic noise - this variance is proportional to the magnitude of the corresponding eigenvalues and not decreasing in the dimensionality. Based upon this observation we propose a new assumption under which we show that the injection of explicit, isotropic noise usually applied to make gradient descent escape saddle points can successfully be replaced by a simple SGD step. Additionally - and under the same condition - we derive the first convergence rate for plain SGD to a second-order stationary point in a number of iterations that is independent of the problem dimension.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/07/2021

Escaping Saddle Points with Stochastically Controlled Stochastic Gradient Methods

Stochastically controlled stochastic gradient (SCSG) methods have been p...
research
06/10/2022

Stochastic Zeroth order Descent with Structured Directions

We introduce and analyze Structured Stochastic Zeroth order Descent (S-S...
research
02/11/2022

The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded Gradients and Affine Variance

We study convergence rates of AdaGrad-Norm as an exemplar of adaptive st...
research
08/04/2021

Stochastic Subgradient Descent Escapes Active Strict Saddles

In non-smooth stochastic optimization, we establish the non-convergence ...
research
06/08/2023

Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances

Stochastic gradient descent (SGD) has become a cornerstone of neural net...
research
07/03/2019

Distributed Learning in Non-Convex Environments – Part I: Agreement at a Linear Rate

Driven by the need to solve increasingly complex optimization problems i...
research
02/17/2023

SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance

We study Stochastic Gradient Descent with AdaGrad stepsizes: a popular a...

Please sign up or login with your details

Forgot password? Click here to reset