Adaptive Learning Rates for Faster Stochastic Gradient Methods

08/10/2022
by   Samuel Horvath, et al.
7

In this work, we propose new adaptive step size strategies that improve several stochastic gradient methods. Our first method (StoPS) is based on the classical Polyak step size (Polyak, 1987) and is an extension of the recent development of this method for the stochastic optimization-SPS (Loizou et al., 2021), and our second method, denoted GraDS, rescales step size by "diversity of stochastic gradients". We provide a theoretical analysis of these methods for strongly convex smooth functions and show they enjoy deterministic-like rates despite stochastic gradients. Furthermore, we demonstrate the theoretical superiority of our adaptive methods on quadratic objectives. Unfortunately, both StoPS and GraDS depend on unknown quantities, which are only practical for the overparametrized models. To remedy this, we drop this undesired dependence and redefine StoPS and GraDS to StoP and GraD, respectively. We show that these new methods converge linearly to the neighbourhood of the optimal solution under the same assumptions. Finally, we corroborate our theoretical claims by experimental validation, which reveals that GraD is particularly useful for deep learning optimization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2020

Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence

We propose a stochastic variant of the classical Polyak step-size (Polya...
research
11/28/2022

Stochastic Steffensen method

Is it possible for a first-order method, i.e., only first derivatives al...
research
01/17/2020

ADAMT: A Stochastic Optimization with Trend Correction Scheme

Adam-type optimizers, as a class of adaptive moment estimation methods w...
research
06/04/2021

An Even More Optimal Stochastic Optimization Algorithm: Minibatching and Interpolation Learning

We present and analyze an algorithm for optimizing smooth and convex or ...
research
11/09/2020

Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering

Standard first-order stochastic optimization algorithms base their updat...
research
02/13/2020

Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization

Adaptivity is an important yet under-studied property in modern optimiza...
research
02/07/2020

On the Effectiveness of Richardson Extrapolation in Machine Learning

Richardson extrapolation is a classical technique from numerical analysi...

Please sign up or login with your details

Forgot password? Click here to reset