Fast Convergence of Random Reshuffling under Over-Parameterization and the Polyak-Łojasiewicz Condition

04/02/2023
by   Chen Fan, et al.
0

Modern machine learning models are often over-parameterized and as a result they can interpolate the training data. Under such a scenario, we study the convergence properties of a sampling-without-replacement variant of stochastic gradient descent (SGD) known as random reshuffling (RR). Unlike SGD that samples data with replacement at every iteration, RR chooses a random permutation of data at the beginning of each epoch and each iteration chooses the next sample from the permutation. For under-parameterized models, it has been shown RR can converge faster than SGD under certain assumptions. However, previous works do not show that RR outperforms SGD in over-parameterized settings except in some highly-restrictive scenarios. For the class of Polyak-Łojasiewicz (PL) functions, we show that RR can outperform SGD in over-parameterized settings when either one of the following holds: (i) the number of samples (n) is less than the product of the condition number (κ) and the parameter (α) of a weak growth condition (WGC), or (ii) n is less than the parameter (ρ) of a strong growth condition (SGC).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2018

Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron

Modern machine learning focuses on highly expressive models that are abl...
research
07/07/2020

Understanding the Impact of Model Incoherence on Convergence of Incremental SGD with Random Reshuffle

Although SGD with random reshuffle has been widely-used in machine learn...
research
07/11/2019

Amplifying Rényi Differential Privacy via Shuffling

Differential privacy is a useful tool to build machine learning models w...
research
06/05/2023

Aiming towards the minimizers: fast convergence of SGD for overparametrized problems

Modern machine learning paradigms, such as deep learning, occur in or cl...
research
06/12/2021

Random Shuffling Beats SGD Only After Many Epochs on Ill-Conditioned Problems

Recently, there has been much interest in studying the convergence rates...
research
02/03/2022

Characterizing Finding Good Data Orderings for Fast Convergence of Sequential Gradient Methods

While SGD, which samples from the data with replacement is widely studie...
research
02/19/2021

Permutation-Based SGD: Is Random Optimal?

A recent line of ground-breaking results for permutation-based SGD has c...

Please sign up or login with your details

Forgot password? Click here to reset