Aiming towards the minimizers: fast convergence of SGD for overparametrized problems

06/05/2023
by   Chaoyue Liu, et al.
7

Modern machine learning paradigms, such as deep learning, occur in or close to the interpolation regime, wherein the number of model parameters is much larger than the number of data samples. In this work, we propose a regularity condition within the interpolation regime which endows the stochastic gradient method with the same worst-case iteration complexity as the deterministic gradient method, while using only a single sampled gradient (or a minibatch) in each iteration. In contrast, all existing guarantees require the stochastic gradient method to take small steps, thereby resulting in a much slower linear rate of convergence. Finally, we demonstrate that our condition holds when training sufficiently wide feedforward neural networks with a linear output layer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2018

On exponential convergence of SGD in non-convex over-parametrized learning

Large over-parametrized models learned via stochastic gradient descent (...
research
02/11/2018

SGD and Hogwild! Convergence Without the Bounded Gradients Assumption

Stochastic gradient descent (SGD) is the optimization algorithm of choic...
research
04/02/2023

Fast Convergence of Random Reshuffling under Over-Parameterization and the Polyak-Łojasiewicz Condition

Modern machine learning models are often over-parameterized and as a res...
research
03/07/2022

Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime

Stochastic gradient descent (SGD) has achieved great success due to its ...
research
04/12/2021

A Recipe for Global Convergence Guarantee in Deep Neural Networks

Existing global convergence guarantees of (stochastic) gradient descent ...
research
06/12/2021

Random Shuffling Beats SGD Only After Many Epochs on Ill-Conditioned Problems

Recently, there has been much interest in studying the convergence rates...
research
06/14/2022

A Stochastic Proximal Method for Nonsmooth Regularized Finite Sum Optimization

We consider the problem of training a deep neural network with nonsmooth...

Please sign up or login with your details

Forgot password? Click here to reset