Closing the convergence gap of SGD without replacement

02/24/2020
by   Shashank Rajput, et al.
0

Stochastic gradient descent without replacement sampling is widely used in practice for model training. However, the vast majority of SGD analyses assumes data sampled with replacement, and when the function minimized is strongly convex, an O(1/T) rate can be established when SGD is run for T iterations. A recent line of breakthrough work on SGD without replacement (SGDo) established an O(n/T^2) convergence rate when the function minimized is strongly convex and is a sum of n smooth functions, and an O(1/T^2+n^3/T^3) rate for sums of quadratics. On the other hand, the tightest known lower bound postulates an Ω(1/T^2+n^2/T^3) rate, leaving open the possibility of better SGDo convergence rates in the general case. In this paper, we close this gap and show that SGD without replacement achieves a rate of O(1/T^2+n^2/T^3) when the sum of the functions is a quadratic, and offer a new lower bound of Ω(n/T^2) for strongly convex functions that are sums of smooth functions.

READ FULL TEXT

page 12

page 22

research
03/04/2019

SGD without Replacement: Sharper Rates for General Smooth Convex Functions

We study stochastic gradient descent without replacement () for smooth ...
research
10/10/2018

Tight Dimension Independent Lower Bound on Optimal Expected Convergence Rate for Diminishing Step Sizes in SGD

We study convergence of Stochastic Gradient Descent (SGD) for strongly c...
research
02/19/2021

Permutation-Based SGD: Is Random Optimal?

A recent line of ground-breaking results for permutation-based SGD has c...
research
06/28/2023

Ordering for Non-Replacement SGD

One approach for reducing run time and improving efficiency of machine l...
research
06/21/2023

Empirical Risk Minimization with Shuffled SGD: A Primal-Dual Perspective and Improved Bounds

Stochastic gradient descent (SGD) is perhaps the most prevalent optimiza...
research
10/20/2021

Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond

In distributed learning, local SGD (also known as federated averaging) a...
research
06/26/2018

Random Shuffling Beats SGD after Finite Epochs

A long-standing problem in the theory of stochastic gradient descent (SG...

Please sign up or login with your details

Forgot password? Click here to reset