Convergence of Gradient Algorithms for Nonconvex C^1+α Cost Functions
This paper is concerned with convergence of stochastic gradient algorithms with momentum terms in the nonconvex setting. A class of stochastic momentum methods, including stochastic gradient descent, heavy ball, and Nesterov's accelerated gradient, is analyzed in a general framework under quite mild assumptions. We show that the expected gradient converges and provide an explicit upper bound on the convergence rate. Then a supermartingale can be constructed by proper approximations of the noise and momentum terms. This allows us to prove the almost sure convergence by Doob's supermartingale convergence theorem and a discussion of the number of upcrossings in detail. It is worth noting that the existing Lipschitz condition of the gradient of the objective function is relaxed into the condition of Hölder continuity. Another improvement is that there are no additional restrictions imposed on stepsizes. As a byproduct, we apply a localization procedure to extend our results to stochastic stepsizes.
READ FULL TEXT