Revisiting Gradient Clipping: Stochastic bias and tight convergence guarantees

05/02/2023
by   Anastasia Koloskova, et al.
0

Gradient clipping is a popular modification to standard (stochastic) gradient descent, at every iteration limiting the gradient norm to a certain value c >0. It is widely used for example for stabilizing the training of deep learning models (Goodfellow et al., 2016), or for enforcing differential privacy (Abadi et al., 2016). Despite popularity and simplicity of the clipping mechanism, its convergence guarantees often require specific values of c and strong noise assumptions. In this paper, we give convergence guarantees that show precise dependence on arbitrary clipping thresholds c and show that our guarantees are tight with both deterministic and stochastic gradients. In particular, we show that (i) for deterministic gradient descent, the clipping threshold only affects the higher-order terms of convergence, (ii) in the stochastic setting convergence to the true optimum cannot be guaranteed under the standard noise assumption, even under arbitrary small step-sizes. We give matching upper and lower bounds for convergence of the gradient norm when running clipped SGD, and illustrate these results with experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2018

SGD and Hogwild! Convergence Without the Bounded Gradients Assumption

Stochastic gradient descent (SGD) is the optimization algorithm of choic...
research
01/16/2020

A Better Bound Gives a Hundred Rounds: Enhanced Privacy Guarantees via f-Divergences

We derive the optimal differential privacy (DP) parameters of a mechanis...
research
06/20/2021

Privacy Amplification via Iteration for Shuffled and Online PNSGD

In this paper, we consider the framework of privacy amplification via it...
research
02/06/2023

U-Clip: On-Average Unbiased Stochastic Gradient Clipping

U-Clip is a simple amendment to gradient clipping that can be applied to...
research
06/30/2021

Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity

Two of the most prominent algorithms for solving unconstrained smooth ga...
research
04/11/2021

Learning from Censored and Dependent Data: The case of Linear Dynamics

Observations from dynamical systems often exhibit irregularities, such a...
research
03/28/2017

Unifying the Stochastic Spectral Descent for Restricted Boltzmann Machines with Bernoulli or Gaussian Inputs

Stochastic gradient descent based algorithms are typically used as the g...

Please sign up or login with your details

Forgot password? Click here to reset