How Good is SGD with Random Shuffling?

07/31/2019
by   Itay Safran, et al.
4

We study the performance of stochastic gradient descent (SGD) on smooth and strongly-convex finite-sum optimization problems. In contrast to the majority of existing theoretical works, which assume that individual functions are sampled with replacement, we focus here on popular but poorly-understood heuristics, which involve going over random permutations of the individual functions. This setting has been investigated in several recent works, but the optimal error rates remains unclear. In this paper, we provide lower bounds on the expected optimization error with these heuristics (using SGD with any constant step size), which elucidate their advantages and disadvantages. In particular, we prove that after k passes over n individual functions, if the functions are re-shuffled after every pass, the best possible optimization error for SGD is at least Ω(1/(nk)^2+1/nk^3), which partially corresponds to recently derived upper bounds, and we conjecture to be tight. Moreover, if the functions are only shuffled once, then the lower bound increases to Ω(1/nk^2). Since there are strictly smaller upper bounds for random reshuffling, this proves an inherent performance gap between SGD with single shuffling and repeated shuffling. As a more minor contribution, we also provide a non-asymptotic Ω(1/k^2) lower bound (independent of n) for cyclic gradient descent, where no random shuffling takes place.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2019

Error Lower Bounds of Constant Step-size Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) plays a central role in modern machine...
research
03/13/2023

Tighter Lower Bounds for Shuffling SGD: Random Permutations and Beyond

We study convergence lower bounds of without-replacement stochastic grad...
research
06/28/2021

The Convergence Rate of SGD's Final Iterate: Analysis on Dimension Dependence

Stochastic Gradient Descent (SGD) is among the simplest and most popular...
research
10/10/2018

Tight Dimension Independent Lower Bound on Optimal Expected Convergence Rate for Diminishing Step Sizes in SGD

We study convergence of Stochastic Gradient Descent (SGD) for strongly c...
research
06/21/2023

Empirical Risk Minimization with Shuffled SGD: A Primal-Dual Perspective and Improved Bounds

Stochastic gradient descent (SGD) is perhaps the most prevalent optimiza...
research
07/07/2020

Streaming Complexity of SVMs

We study the space complexity of solving the bias-regularized SVM proble...

Please sign up or login with your details

Forgot password? Click here to reset