Beating SGD Saturation with Tail-Averaging and Minibatching

02/22/2019
by   Nicole Mücke, et al.
0

While stochastic gradient descent (SGD) is one of the major workhorses in machine learning, the learning properties of many practically used variants are poorly understood. In this paper, we consider least squares learning in a nonparametric setting and contribute to filling this gap by focusing on the effect and interplay of multiple passes, mini-batching and averaging, and in particular tail averaging. Our results show how these different variants of SGD can be combined to achieve optimal learning errors, hence providing practical insights. In particular, we show for the first time in the literature that tail averaging allows faster convergence rates than uniform averaging in the nonparametric setting. Finally, we show that a combination of tail-averaging and minibatching allows more aggressive step-size choices than using any one of said components.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2016

Parallelizing Stochastic Approximation Through Mini-Batching and Tail-Averaging

This work characterizes the benefits of averaging techniques widely used...
research
02/13/2019

Anytime Tail Averaging

Tail averaging consists in averaging the last examples in a stream. Comm...
research
10/12/2022

Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

We study the finite-time behaviour of the popular temporal difference (T...
research
04/16/2018

Constant Step Size Stochastic Gradient Descent for Probabilistic Modeling

Stochastic gradient methods enable learning probabilistic models from la...
research
05/26/2022

Trainable Weight Averaging for Fast Convergence and Better Generalization

Stochastic gradient descent (SGD) and its variants are commonly consider...
research
09/26/2022

Two-Tailed Averaging: Anytime Adaptive Once-in-a-while Optimal Iterate Averaging for Stochastic Optimization

Tail averaging improves on Polyak averaging's non-asymptotic behaviour b...
research
03/02/2020

Iterate Averaging Helps: An Alternative Perspective in Deep Learning

Iterate averaging has a rich history in optimisation, but has only very ...

Please sign up or login with your details

Forgot password? Click here to reset