Stochastic algorithms with geometric step decay converge linearly on sharp functions

07/22/2019
by   Damek Davis, et al.
0

Stochastic (sub)gradient methods require step size schedule tuning to perform well in practice. Classical tuning strategies decay the step size polynomially and lead to optimal sublinear rates on (strongly) convex problems. An alternative schedule, popular in nonconvex optimization, is called geometric step decay and proceeds by halving the step size after every few epochs. In recent work, geometric step decay was shown to improve exponentially upon classical sublinear rates for the class of sharp convex functions. In this work, we ask whether geometric step decay similarly improves stochastic algorithms for the class of sharp nonconvex problems. Such losses feature in modern statistical recovery problems and lead to a new challenge not present in the convex setting: the region of convergence is local, so one must bound the probability of escape. Our main result shows that for a large class of stochastic, sharp, nonsmooth, and nonconvex problems a geometric step decay schedule endows well-known algorithms with a local linear rate of convergence to global minimizers. This guarantee applies to the stochastic projected subgradient, proximal point, and prox-linear algorithms. As an application of our main result, we analyze two statistical recovery tasks---phase retrieval and blind deconvolution---and match the best known guarantees under Gaussian measurement models and establish new guarantees under heavy-tailed distributions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2019

The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure

There is a stark disparity between the step size schedules used in pract...
research
06/05/2019

On the Convergence of SARAH and Beyond

The main theme of this work is a unifying algorithm, abbreviated as L2S,...
research
04/29/2019

Making the Last Iterate of SGD Information Theoretically Optimal

Stochastic gradient descent (SGD) is one of the most widely used algorit...
research
06/05/2021

Bandwidth-based Step-Sizes for Non-Convex Stochastic Optimization

Many popular learning-rate schedules for deep neural networks combine a ...
research
09/18/2018

Nonconvex Demixing From Bilinear Measurements

We consider the problem of demixing a sequence of source signals from th...
research
04/28/2020

Distributed Projected Subgradient Method for Weakly Convex Optimization

The stochastic subgradient method is a widely-used algorithm for solving...
research
08/16/2021

Stochastic optimization under time drift: iterate averaging, step decay, and high probability guarantees

We consider the problem of minimizing a convex function that is evolving...

Please sign up or login with your details

Forgot password? Click here to reset