On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems

06/19/2020
by   Panayotis Mertikopoulos, et al.
0

This paper analyzes the trajectories of stochastic gradient descent (SGD) to help understand the algorithm's convergence properties in non-convex problems. We first show that the sequence of iterates generated by SGD remains bounded and converges with probability 1 under a very broad range of step-size schedules. Subsequently, going beyond existing positive probability guarantees, we show that SGD avoids strict saddle points/manifolds with probability 1 for the entire spectrum of step-size policies considered. Finally, we prove that the algorithm's rate of convergence to Hurwicz minimizers is O(1/n^p) if the method is employed with a Θ(1/n^p) step-size schedule. This provides an important guideline for tuning the algorithm's step-size as it suggests that a cool-down phase with a vanishing step-size could lead to faster convergence; we demonstrate this heuristic using ResNet architectures on CIFAR.

READ FULL TEXT
research
11/05/2014

Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems

Stochastic gradient descent (SGD) on a low-rank factorization is commonl...
research
09/03/2023

Modified Step Size for Enhanced Stochastic Gradient Descent: Convergence and Experiments

This paper introduces a novel approach to enhance the performance of the...
research
07/20/2022

Adaptive Step-Size Methods for Compressed SGD

Compressed Stochastic Gradient Descent (SGD) algorithms have been recent...
research
05/18/2020

Convergence of constant step stochastic gradient descent for non-smooth non-convex functions

This paper studies the asymptotic behavior of the constant step Stochast...
research
06/08/2022

High-dimensional limit theorems for SGD: Effective dynamics and critical scaling

We study the scaling limits of stochastic gradient descent (SGD) with co...
research
02/01/2021

Painless step size adaptation for SGD

Convergence and generalization are two crucial aspects of performance in...
research
03/22/2022

Resonance in Weight Space: Covariate Shift Can Drive Divergence of SGD with Momentum

Most convergence guarantees for stochastic gradient descent with momentu...

Please sign up or login with your details

Forgot password? Click here to reset