Random Shuffling Beats SGD Only After Many Epochs on Ill-Conditioned Problems

06/12/2021
by   Itay Safran, et al.
0

Recently, there has been much interest in studying the convergence rates of without-replacement SGD, and proving that it is faster than with-replacement SGD in the worst case. However, these works ignore or do not provide tight bounds in terms of the problem's geometry, including its condition number. Perhaps surprisingly, we prove that when the condition number is taken into account, without-replacement SGD does not significantly improve on with-replacement SGD in terms of worst-case bounds, unless the number of epochs (passes over the data) is larger than the condition number. Since many problems in machine learning and other areas are both ill-conditioned and involve large datasets, this indicates that without-replacement does not necessarily improve over with-replacement sampling for realistic iteration budgets. We show this by providing new lower and upper bounds which are tight (up to log factors), for quadratic problems with commuting quadratic terms, precisely quantifying the dependence on the problem parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/18/2020

On Tight Convergence Rates of Without-replacement SGD

For solving finite-sum optimization problems, SGD without replacement sa...
research
10/20/2021

Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond

In distributed learning, local SGD (also known as federated averaging) a...
research
03/12/2021

Can Single-Shuffle SGD be Better than Reshuffling SGD and GD?

We propose matrix norm inequalities that extend the Recht-Ré (2012) conj...
research
04/02/2023

Fast Convergence of Random Reshuffling under Over-Parameterization and the Polyak-Łojasiewicz Condition

Modern machine learning models are often over-parameterized and as a res...
research
02/27/2022

Benign Underfitting of Stochastic Gradient Descent

We study to what extent may stochastic gradient descent (SGD) be underst...
research
06/12/2020

SGD with shuffling: optimal rates without component convexity and large epoch requirements

We study without-replacement SGD for solving finite-sum optimization pro...
research
06/05/2023

Aiming towards the minimizers: fast convergence of SGD for overparametrized problems

Modern machine learning paradigms, such as deep learning, occur in or cl...

Please sign up or login with your details

Forgot password? Click here to reset