Accelerating Stochastic Gradient Descent

04/26/2017
by   Prateek Jain, et al.
0

There is widespread sentiment that it is not possible to effectively utilize fast gradient methods (e.g. Nesterov's acceleration, conjugate gradient, heavy ball) for the purposes of stochastic optimization due to their instability and error accumulation, a notion made precise in d'Aspremont 2008 and Devolder, Glineur, and Nesterov 2014. This work considers these issues for the special case of stochastic approximation for the least squares regression problem, and our main result refutes the conventional wisdom by showing that acceleration can be made robust to statistical errors. In particular, this work introduces an accelerated stochastic gradient method that provably achieves the minimax optimal statistical risk faster than stochastic gradient descent. Critical to the analysis is a sharp characterization of accelerated stochastic gradient descent as a stochastic process. We hope this characterization gives insights towards the broader question of designing simple and effective accelerated stochastic methods for more general convex and non-convex optimization problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/24/2018

Asynchronous decentralized accelerated stochastic gradient descent

In this work, we introduce an asynchronous decentralized accelerated sto...
research
09/23/2019

Necessary and Sufficient Conditions for Adaptive, Mirror, and Standard Gradient Methods

We study the impact of the constraint set and gradient geometry on the c...
research
05/30/2019

Implicit Regularization of Accelerated Methods in Hilbert Spaces

We study learning properties of accelerated gradient descent methods for...
research
09/28/2021

An Accelerated Stochastic Gradient for Canonical Polyadic Decomposition

We consider the problem of structured canonical polyadic decomposition. ...
research
02/09/2016

Poor starting points in machine learning

Poor (even random) starting points for learning/training/optimization ar...
research
10/25/2017

A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)

This work provides a simplified proof of the statistical minimax optimal...
research
03/15/2018

On the insufficiency of existing momentum schemes for Stochastic Optimization

Momentum based stochastic gradient methods such as heavy ball (HB) and N...

Please sign up or login with your details

Forgot password? Click here to reset