SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance

02/17/2023
by   Amit Attia, et al.
0

We study Stochastic Gradient Descent with AdaGrad stepsizes: a popular adaptive (self-tuning) method for first-order stochastic optimization. Despite being well studied, existing analyses of this method suffer from various shortcomings: they either assume some knowledge of the problem parameters, impose strong global Lipschitz conditions, or fail to give bounds that hold with high probability. We provide a comprehensive analysis of this basic method without any of these limitations, in both the convex and non-convex (smooth) cases, that additionally supports a general “affine variance” noise model and provides sharp rates of convergence in both the low-noise and high-noise regimes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2022

The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded Gradients and Affine Variance

We study convergence rates of AdaGrad-Norm as an exemplar of adaptive st...
research
06/28/2021

High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails

We consider non-convex stochastic optimization using first-order algorit...
research
02/13/2023

Near-Optimal High-Probability Convergence for Non-Convex Stochastic Optimization with Variance Reduction

Traditional analyses for non-convex stochastic optimization problems cha...
research
02/28/2023

High Probability Convergence of Stochastic Gradient Methods

In this work, we describe a generic approach to show convergence with hi...
research
11/02/2022

Large deviations rates for stochastic gradient descent with strongly convex functions

Recent works have shown that high probability metrics with stochastic gr...
research
02/14/2022

Stochastic linear optimization never overfits with quadratically-bounded losses on general data

This work shows that a diverse collection of linear optimization methods...
research
03/15/2018

Escaping Saddles with Stochastic Gradients

We analyze the variance of stochastic gradients along negative curvature...

Please sign up or login with your details

Forgot password? Click here to reset