Large deviations rates for stochastic gradient descent with strongly convex functions

11/02/2022
by   Dragana Bajović, et al.
0

Recent works have shown that high probability metrics with stochastic gradient descent (SGD) exhibit informativeness and in some cases advantage over the commonly adopted mean-square error-based ones. In this work we provide a formal framework for the study of general high probability bounds with SGD, based on the theory of large deviations. The framework allows for a generic (not-necessarily bounded) gradient noise satisfying mild technical assumptions, allowing for the dependence of the noise distribution on the current iterate. Under the preceding assumptions, we find an upper large deviations bound for SGD with strongly convex functions. The corresponding rate function captures analytical dependence on the noise distribution and other problem parameters. This is in contrast with conventional mean-square error analysis that captures only the noise dependence through the variance and does not capture the effect of higher order moments nor interplay between the noise geometry and the shape of the cost function. We also derive exact large deviation rates for the case when the objective function is quadratic and show that the obtained function matches the one from the general upper bound hence showing the tightness of the general upper bound. Numerical examples illustrate and corroborate theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/13/2018

Tight Analyses for Non-Smooth Stochastic Gradient Descent

Consider the problem of minimizing functions that are Lipschitz and stro...
research
10/04/2019

The Complexity of Finding Stationary Points with Stochastic Gradient Descent

We study the iteration complexity of stochastic gradient descent (SGD) f...
research
02/17/2023

SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance

We study Stochastic Gradient Descent with AdaGrad stepsizes: a popular a...
research
04/06/2022

Nonlinear gradient mappings and stochastic optimization: A general framework with applications to heavy-tail noise

We introduce a general framework for nonlinear stochastic gradient desce...
research
05/27/2019

Robustness of accelerated first-order algorithms for strongly convex optimization problems

We study the robustness of accelerated first-order algorithms to stochas...
research
06/02/2021

Statistical optimality conditions for compressive ensembles

We present a framework for the theoretical analysis of ensembles of low-...

Please sign up or login with your details

Forgot password? Click here to reset