U-Clip: On-Average Unbiased Stochastic Gradient Clipping

02/06/2023
by   Bryn Elesedy, et al.
0

U-Clip is a simple amendment to gradient clipping that can be applied to any iterative gradient optimization algorithm. Like regular clipping, U-Clip involves using gradients that are clipped to a prescribed size (e.g. with component wise or norm based clipping) but instead of discarding the clipped portion of the gradient, U-Clip maintains a buffer of these values that is added to the gradients on the next iteration (before clipping). We show that the cumulative bias of the U-Clip updates is bounded by a constant. This implies that the clipped updates are unbiased on average. Convergence follows via a lemma that guarantees convergence with updates u_i as long as ∑_i=1^t (u_i - g_i) = o(t) where g_i are the gradients. Extensive experimental exploration is performed on CIFAR10 with further validation given on ImageNet.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2018

SGD and Hogwild! Convergence Without the Bounded Gradients Assumption

Stochastic gradient descent (SGD) is the optimization algorithm of choic...
research
06/13/2014

Smoothed Gradients for Stochastic Variational Inference

Stochastic variational inference (SVI) lets us scale up Bayesian computa...
research
05/02/2023

Revisiting Gradient Clipping: Stochastic bias and tight convergence guarantees

Gradient clipping is a popular modification to standard (stochastic) gra...
research
10/29/2020

Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification

Distributed model training suffers from communication bottlenecks due to...
research
06/08/2019

Reducing the variance in online optimization by transporting past gradients

Most stochastic optimization methods use gradients once before discardin...
research
06/12/2020

Stochastic Gradient Langevin with Delayed Gradients

Stochastic Gradient Langevin Dynamics (SGLD) ensures strong guarantees w...
research
10/07/2021

G̅_mst:An Unbiased Stratified Statistic and a Fast Gradient Optimization Algorithm Based on It

-The fluctuation effect of gradient expectation and variance caused by p...

Please sign up or login with your details

Forgot password? Click here to reset