
SGD without Replacement: Sharper Rates for General Smooth Convex Functions
We study stochastic gradient descent without replacement () for smooth ...
read it

Tight Dimension Independent Lower Bound on Optimal Expected Convergence Rate for Diminishing Step Sizes in SGD
We study convergence of Stochastic Gradient Descent (SGD) for strongly c...
read it

PermutationBased SGD: Is Random Optimal?
A recent line of groundbreaking results for permutationbased SGD has c...
read it

Random Shuffling Beats SGD after Finite Epochs
A longstanding problem in the theory of stochastic gradient descent (SG...
read it

Random Shuffling Beats SGD Only After Many Epochs on IllConditioned Problems
Recently, there has been much interest in studying the convergence rates...
read it

How Good is SGD with Random Shuffling?
We study the performance of stochastic gradient descent (SGD) on smooth ...
read it

On Tight Convergence Rates of Withoutreplacement SGD
For solving finitesum optimization problems, SGD without replacement sa...
read it
Closing the convergence gap of SGD without replacement
Stochastic gradient descent without replacement sampling is widely used in practice for model training. However, the vast majority of SGD analyses assumes data sampled with replacement, and when the function minimized is strongly convex, an O(1/T) rate can be established when SGD is run for T iterations. A recent line of breakthrough work on SGD without replacement (SGDo) established an O(n/T^2) convergence rate when the function minimized is strongly convex and is a sum of n smooth functions, and an O(1/T^2+n^3/T^3) rate for sums of quadratics. On the other hand, the tightest known lower bound postulates an Ω(1/T^2+n^2/T^3) rate, leaving open the possibility of better SGDo convergence rates in the general case. In this paper, we close this gap and show that SGD without replacement achieves a rate of O(1/T^2+n^2/T^3) when the sum of the functions is a quadratic, and offer a new lower bound of Ω(n/T^2) for strongly convex functions that are sums of smooth functions.
READ FULL TEXT
Comments
There are no comments yet.