Stochastic Polyak Stepsize with a Moving Target

06/22/2021
by   Robert M. Gower, et al.
1

We propose a new stochastic gradient method that uses recorded past loss values to reduce the variance. Our method can be interpreted as a new stochastic variant of the Polyak Stepsize that converges globally without assuming interpolation. Our method introduces auxiliary variables, one for each data point, that track the loss value for each data point. We provide a global convergence theory for our method by showing that it can be interpreted as a special variant of online SGD. The new method only stores a single scalar per data point, opening up new applications for variance reduction where memory is the bottleneck.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset