Log In Sign Up

On the Convergence of Perturbed Distributed Asynchronous Stochastic Gradient Descent to Second Order Stationary Points in Non-convex Optimization

by   Lifu Wang, et al.

In this paper, the second order convergence of non-convex optimization in asynchronous stochastic gradient descent (ASGD) algorithm is studied systematically. We investigate the behavior of ASGD near and away from saddle points and show that, different from general stochastic gradient descent(SGD), ASGD may return back after escaping the saddle points, yet after staying near a saddle point for a long enough time (O(T)), ASGD will finally go away from strictly saddle points. An inequality is given to describe the process of ASGD to escape from saddle points. We show the exponential instability of the perturbed gradient dynamics near the strictly saddle points and use a novel Razumikhin-Lyapunov method to give a more detailed estimation about how the time delay parameter T influence the speed to escape. In particular, we consider the optimization of smooth nonconvex functions, and propose a perturbed asynchronous stochastic gradient descent algorithm with guarantee of convergence to second order stationary points with high probability in O(1/ϵ^4) iterations. To the best of our knowledge, this is the first work on the second order convergence of asynchronous algorithm.


page 1

page 2

page 3

page 4


Stochastic Non-convex Optimization with Strong High Probability Second-order Convergence

In this paper, we study stochastic non-convex optimization with non-conv...

Escape saddle points faster on manifolds via perturbed Riemannian stochastic recursive gradient

In this paper, we propose a variant of Riemannian stochastic recursive g...

Second-Order Guarantees of Stochastic Gradient Descent in Non-Convex Optimization

Recent years have seen increased interest in performance guarantees of g...

Escaping Saddle Points with Stochastically Controlled Stochastic Gradient Methods

Stochastically controlled stochastic gradient (SCSG) methods have been p...

Stochastic Gradient Descent Escapes Saddle Points Efficiently

This paper considers the perturbed stochastic gradient descent algorithm...

On the Sublinear Convergence of Randomly Perturbed Alternating Gradient Descent to Second Order Stationary Solutions

The alternating gradient descent (AGD) is a simple but popular algorithm...

Distributed Learning in Non-Convex Environments – Part II: Polynomial Escape from Saddle-Points

The diffusion strategy for distributed learning from streaming data empl...