Random Shuffling Beats SGD after Finite Epochs

06/26/2018
by   Jeff Z. HaoChen, et al.
0

A long-standing problem in the theory of stochastic gradient descent (SGD) is to prove that its without-replacement version RandomShuffle converges faster than the usual with-replacement version. We present the first (to our knowledge) non-asymptotic solution to this problem, which shows that after a "reasonable" number of epochs RandomShuffle indeed converges faster than SGD. Specifically, we prove that under strong convexity and second-order smoothness, the sequence generated by RandomShuffle converges to the optimal solution at the rate O(1/T^2 + n^3/T^3), where n is the number of components in the objective, and T is the total number of iterations. This result shows that after a reasonable number of epochs RandomShuffle is strictly better than SGD (which converges as O(1/T)). The key step toward showing this better dependence on T is the introduction of n into the bound; and as our analysis will show, in general a dependence on n is unavoidable without further changes to the algorithm. We show that for sparse data RandomShuffle has the rate O(1/T^2), again strictly better than SGD. Furthermore, we discuss extensions to nonconvex gradient dominated functions, as well as non-strongly convex settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/10/2019

Better Communication Complexity for Local SGD

We revisit the local Stochastic Gradient Descent (local SGD) method and ...
research
10/21/2021

Towards Noise-adaptive, Problem-adaptive Stochastic Gradient Descent

We design step-size schemes that make stochastic gradient descent (SGD) ...
research
02/24/2020

Closing the convergence gap of SGD without replacement

Stochastic gradient descent without replacement sampling is widely used ...
research
03/04/2019

SGD without Replacement: Sharper Rates for General Smooth Convex Functions

We study stochastic gradient descent without replacement () for smooth ...
research
10/21/2013

Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm

We obtain an improved finite-sample guarantee on the linear convergence ...
research
03/12/2021

Can Single-Shuffle SGD be Better than Reshuffling SGD and GD?

We propose matrix norm inequalities that extend the Recht-Ré (2012) conj...
research
06/12/2020

SGD with shuffling: optimal rates without component convexity and large epoch requirements

We study without-replacement SGD for solving finite-sum optimization pro...

Please sign up or login with your details

Forgot password? Click here to reset