The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded Gradients and Affine Variance

02/11/2022
by   Matthew Faw, et al.
14

We study convergence rates of AdaGrad-Norm as an exemplar of adaptive stochastic gradient methods (SGD), where the step sizes change based on observed stochastic gradients, for minimizing non-convex, smooth objectives. Despite their popularity, the analysis of adaptive SGD lags behind that of non adaptive methods in this setting. Specifically, all prior works rely on some subset of the following assumptions: (i) uniformly-bounded gradient norms, (ii) uniformly-bounded stochastic gradient variance (or even noise support), (iii) conditional independence between the step size and stochastic gradient. In this work, we show that AdaGrad-Norm exhibits an order optimal convergence rate of 𝒪(polylog(T)/√(T)) after T iterations under the same assumptions as optimally-tuned non adaptive SGD (unbounded gradient norms and affine noise variance scaling), and crucially, without needing any tuning parameters. We thus establish that adaptive gradient methods exhibit order-optimal convergence in much broader regimes than previously understood.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2023

SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance

We study Stochastic Gradient Descent with AdaGrad stepsizes: a popular a...
research
02/13/2023

Beyond Uniform Smoothness: A Stopped Analysis of Adaptive SGD

This work considers the problem of finding a first-order stationary poin...
research
07/20/2022

Adaptive Step-Size Methods for Compressed SGD

Compressed Stochastic Gradient Descent (SGD) algorithms have been recent...
research
03/15/2018

Escaping Saddles with Stochastic Gradients

We analyze the variance of stochastic gradients along negative curvature...
research
06/08/2020

Stochastic Optimization with Non-stationary Noise

We investigate stochastic optimization problems under relaxed assumption...
research
08/28/2019

Linear Convergence of Adaptive Stochastic Gradient Descent

We prove that the norm version of the adaptive stochastic gradient metho...
research
09/29/2022

META-STORM: Generalized Fully-Adaptive Variance Reduced SGD for Unbounded Functions

We study the application of variance reduction (VR) techniques to genera...

Please sign up or login with your details

Forgot password? Click here to reset