Unified Optimal Analysis of the (Stochastic) Gradient Method
In this note we give a simple proof for the convergence of stochastic gradient (SGD) methods on μ-strongly convex functions under a (milder than standard) L-smoothness assumption. We show that SGD converges after T iterations as O( L x_0-x^^2 [-μ/4LT ] + σ^2/μ T) where σ^2 measures the variance. For deterministic gradient descent (GD) and SGD in the interpolation setting we have σ^2 =0 and we recover the exponential convergence rate. The bound matches with the best known iteration complexity of GD and SGD, up to constants.
READ FULL TEXT