SGD: General Analysis and Improved Rates

01/27/2019
by   Robert Mansel Gower, et al.
5

We propose a general yet simple theorem describing the convergence of SGD under the arbitrary sampling paradigm. Our theorem describes the convergence of an infinite array of variants of SGD, each of which is associated with a specific probability law governing the data selection rule used to form mini-batches. This is the first time such an analysis is performed, and most of our variants of SGD were never explicitly considered in the literature before. Our analysis relies on the recently introduced notion of expected smoothness and does not rely on a uniform bound on the variance of the stochastic gradients. By specializing our theorem to different mini-batching strategies, such as sampling with replacement and independent sampling, we derive exact expressions for the stepsize as a function of the mini-batch size. With this we can also determine the mini-batch size that optimizes the total complexity, and show explicitly that as the variance of the stochastic gradient evaluated at the minimum grows, so does the optimal mini-batch size. For zero variance, the optimal mini-batch size is one. Moreover, we prove insightful stepsize-switching rules which describe when one should switch from a constant to a decreasing stepsize regime.

READ FULL TEXT

page 3

page 5

research
05/03/2020

Adaptive Learning of the Optimal Mini-Batch Size of SGD

Recent advances in the theoretical understandingof SGD (Qian et al., 201...
research
01/31/2019

Optimal mini-batch and step sizes for SAGA

Recently it has been shown that the step sizes of a family of variance r...
research
05/01/2017

Determinantal Point Processes for Mini-Batch Diversification

We study a mini-batch diversification scheme for stochastic gradient des...
research
04/08/2018

Active Mini-Batch Sampling using Repulsive Point Processes

The convergence speed of stochastic gradient descent (SGD) can be improv...
research
11/16/2021

Stochastic Extragradient: General Analysis and Improved Rates

The Stochastic Extragradient (SEG) method is one of the most popular alg...
research
07/21/2021

Differentiable Annealed Importance Sampling and the Perils of Gradient Noise

Annealed importance sampling (AIS) and related algorithms are highly eff...
research
06/22/2022

A view of mini-batch SGD via generating functions: conditions of convergence, phase transitions, benefit from negative momenta

Mini-batch SGD with momentum is a fundamental algorithm for learning lar...

Please sign up or login with your details

Forgot password? Click here to reset