Sharp Analysis for Nonconvex SGD Escaping from Saddle Points

02/01/2019
by   Cong Fang, et al.
0

In this paper, we prove that the simplest Stochastic Gradient Descent (SGD) algorithm is able to efficiently escape from saddle points and find an (ϵ, O(ϵ^0.5))-approximate second-order stationary point in Õ(ϵ^-3.5) stochastic gradient computations for generic nonconvex optimization problems, under both gradient-Lipschitz and Hessian-Lipschitz assumptions. This unexpected result subverts the classical belief that SGD requires at least O(ϵ^-4) stochastic gradient computations for obtaining an (ϵ, O(ϵ ^0.5))-approximate second-order stationary point. Such SGD rate matches, up to a polylogarithmic factor of problem-dependent parameters, the rate of most accelerated nonconvex stochastic optimization algorithms that adopt additional techniques, such as Nesterov's momentum acceleration, negative curvature search, as well as quadratic and cubic regularization tricks. Our novel analysis gives new insights into nonconvex SGD and can be potentially generalized to a broad class of stochastic optimization algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2021

Escaping Saddle Points with Compressed SGD

Stochastic gradient descent (SGD) is a prevalent optimization technique ...
research
01/26/2022

On the Convergence of mSGD and AdaGrad for Stochastic Optimization

As one of the most fundamental stochastic optimization algorithms, stoch...
research
01/27/2022

Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the O(ε^-7/4) Complexity

This paper studies the accelerated gradient descent for general nonconve...
research
01/26/2019

Escaping Saddle Points with Adaptive Gradient Methods

Adaptive methods such as Adam and RMSProp are widely used in deep learni...
research
09/06/2018

Escaping Saddle Points in Constrained Optimization

In this paper, we focus on escaping from saddle points in smooth nonconv...
research
05/17/2018

Interpolatron: Interpolation or Extrapolation Schemes to Accelerate Optimization for Deep Neural Networks

In this paper we explore acceleration techniques for large scale nonconv...
research
03/31/2020

Second-Order Guarantees in Centralized, Federated and Decentralized Nonconvex Optimization

Rapid advances in data collection and processing capabilities have allow...

Please sign up or login with your details

Forgot password? Click here to reset