Reducing the variance in online optimization by transporting past gradients

06/08/2019
by   Sébastien M. R. Arnold, et al.
0

Most stochastic optimization methods use gradients once before discarding them. While variance reduction methods have shown that reusing past gradients can be beneficial when there is a finite number of datapoints, they do not easily extend to the online setting. One issue is the staleness due to using past gradients. We propose to correct this staleness using the idea of implicit gradient transport (IGT) which transforms gradients computed at previous iterates into gradients evaluated at the current iterate without using the Hessian explicitly. In addition to reducing the variance and bias of our updates over time, IGT can be used as a drop-in replacement for the gradient estimate in a number of well-understood methods such as heavy ball or Adam. We show experimentally that it achieves state-of-the-art results on a wide range of architectures and benchmarks. Additionally, the IGT gradient estimator yields the optimal asymptotic convergence rate for online stochastic optimization in the restricted setting where the Hessians of all component functions are equal.

READ FULL TEXT
research
06/13/2014

Smoothed Gradients for Stochastic Variational Inference

Stochastic variational inference (SVI) lets us scale up Bayesian computa...
research
10/20/2017

Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods

Our goal is to improve variance reducing stochastic methods through bett...
research
06/20/2017

A Unified Approach to Adaptive Regularization in Online and Stochastic Optimization

We describe a framework for deriving and analyzing online optimization a...
research
06/22/2021

Asynchronous Stochastic Optimization Robust to Arbitrary Delays

We consider stochastic optimization with delayed gradients where, at eac...
research
08/28/2020

ROOT-SGD: Sharp Nonasymptotics and Asymptotic Efficiency in a Single Algorithm

The theory and practice of stochastic optimization has focused on stocha...
research
02/06/2023

U-Clip: On-Average Unbiased Stochastic Gradient Clipping

U-Clip is a simple amendment to gradient clipping that can be applied to...
research
09/02/2022

Revisiting Outer Optimization in Adversarial Training

Despite the fundamental distinction between adversarial and natural trai...

Please sign up or login with your details

Forgot password? Click here to reset