Unified Optimal Analysis of the (Stochastic) Gradient Method

07/09/2019
by   Sebastian U. Stich, et al.
0

In this note we give a simple proof for the convergence of stochastic gradient (SGD) methods on μ-strongly convex functions under a (milder than standard) L-smoothness assumption. We show that SGD converges after T iterations as O( L x_0-x^^2 [-μ/4LT ] + σ^2/μ T) where σ^2 measures the variance. For deterministic gradient descent (GD) and SGD in the interpolation setting we have σ^2 =0 and we recover the exponential convergence rate. The bound matches with the best known iteration complexity of GD and SGD, up to constants.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset