Tight Analyses for Non-Smooth Stochastic Gradient Descent

12/13/2018
by   Nicholas J. A. Harvey, et al.
0

Consider the problem of minimizing functions that are Lipschitz and strongly convex, but not necessarily differentiable. We prove that after T steps of stochastic gradient descent, the error of the final iterate is O((T)/T) with high probability. We also construct a function from this class for which the error of the final iterate of deterministic gradient descent is Ω((T)/T). This shows that the upper bound is tight and that, in this setting, the last iterate of stochastic gradient descent has the same general error rate (with high probability) as deterministic gradient descent. This resolves both open questions posed by Shamir (2012). An intermediate step of our analysis proves that the suffix averaging method achieves error O(1/T) with high probability, which is optimal (for any first-order optimization method). This improves results of Rakhlin (2012) and Hazan and Kale (2014), both of which achieved error O(1/T), but only in expectation, and achieved a high probability error bound of O((T)/T), which is suboptimal. We prove analogous results for functions that are Lipschitz and convex, but not necessarily strongly convex or differentiable. After T steps of stochastic gradient descent, the error of the final iterate is O((T)/√(T)) with high probability, and there exists a function for which the error of the final iterate of deterministic gradient descent is Ω((T)/√(T)).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/02/2019

Simple and optimal high-probability bounds for strongly-convex stochastic gradient descent

We consider stochastic gradient descent algorithms for minimizing a non-...
research
06/11/2020

A General Framework for Analyzing Stochastic Dynamics in Learning Algorithms

We present a general framework for analyzing high-probability bounds for...
research
11/02/2022

Large deviations rates for stochastic gradient descent with strongly convex functions

Recent works have shown that high probability metrics with stochastic gr...
research
10/22/2018

Optimality of the final model found via Stochastic Gradient Descent

We study convergence properties of Stochastic Gradient Descent (SGD) for...
research
10/03/2022

High Probability Convergence for Accelerated Stochastic Mirror Descent

In this work, we describe a generic approach to show convergence with hi...
research
08/16/2021

Stochastic optimization under time drift: iterate averaging, step decay, and high probability guarantees

We consider the problem of minimizing a convex function that is evolving...
research
04/21/2018

Stability of the Stochastic Gradient Method for an Approximated Large Scale Kernel Machine

In this paper we measured the stability of stochastic gradient method (S...

Please sign up or login with your details

Forgot password? Click here to reset