Convergence of Gradient Algorithms for Nonconvex C^1+α Cost Functions

12/01/2020
by   Zixuan Wang, et al.
0

This paper is concerned with convergence of stochastic gradient algorithms with momentum terms in the nonconvex setting. A class of stochastic momentum methods, including stochastic gradient descent, heavy ball, and Nesterov's accelerated gradient, is analyzed in a general framework under quite mild assumptions. We show that the expected gradient converges and provide an explicit upper bound on the convergence rate. Then a supermartingale can be constructed by proper approximations of the noise and momentum terms. This allows us to prove the almost sure convergence by Doob's supermartingale convergence theorem and a discussion of the number of upcrossings in detail. It is worth noting that the existing Lipschitz condition of the gradient of the objective function is relaxed into the condition of Hölder continuity. Another improvement is that there are no additional restrictions imposed on stepsizes. As a byproduct, we apply a localization procedure to extend our results to stochastic stepsizes.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset