Towards Noise-adaptive, Problem-adaptive Stochastic Gradient Descent

10/21/2021
by   Sharan Vaswani, et al.
3

We design step-size schemes that make stochastic gradient descent (SGD) adaptive to (i) the noise σ^2 in the stochastic gradients and (ii) problem-dependent constants. When minimizing smooth, strongly-convex functions with condition number κ, we first prove that T iterations of SGD with Nesterov acceleration and exponentially decreasing step-sizes can achieve a near-optimal Õ(exp(-T/√(κ)) + σ^2/T) convergence rate. Under a relaxed assumption on the noise, with the same step-size scheme and knowledge of the smoothness, we prove that SGD can achieve an Õ(exp(-T/κ) + σ^2/T) rate. In order to be adaptive to the smoothness, we use a stochastic line-search (SLS) and show (via upper and lower-bounds) that SGD converges at the desired rate, but only to a neighbourhood of the solution. Next, we use SGD with an offline estimate of the smoothness and prove convergence to the minimizer. However, its convergence is slowed down proportional to the estimation error and we prove a lower-bound justifying this slowdown. Compared to other step-size schemes, we empirically demonstrate the effectiveness of exponential step-sizes coupled with a novel variant of SLS.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

10/18/2019

Error Lower Bounds of Constant Step-size Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) plays a central role in modern machine...
05/13/2016

Barzilai-Borwein Step Size for Stochastic Gradient Descent

One of the major issues in stochastic gradient descent (SGD) methods is ...
08/28/2019

Linear Convergence of Adaptive Stochastic Gradient Descent

We prove that the norm version of the adaptive stochastic gradient metho...
10/19/2021

Accelerated Graph Learning from Smooth Signals

We consider network topology identification subject to a signal smoothne...
04/29/2019

Making the Last Iterate of SGD Information Theoretically Optimal

Stochastic gradient descent (SGD) is one of the most widely used algorit...
10/13/2021

On the Double Descent of Random Features Models Trained with SGD

We study generalization properties of random features (RF) regression in...
06/26/2018

Random Shuffling Beats SGD after Finite Epochs

A long-standing problem in the theory of stochastic gradient descent (SG...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.