Stability and Convergence of Stochastic Gradient Clipping: Beyond Lipschitz Continuity and Smoothness

02/12/2021
by   Vien V. Mai, et al.
0

Stochastic gradient algorithms are often unstable when applied to functions that do not have Lipschitz-continuous and/or bounded gradients. Gradient clipping is a simple and effective technique to stabilize the training process for problems that are prone to the exploding gradient problem. Despite its widespread popularity, the convergence properties of the gradient clipping heuristic are poorly understood, especially for stochastic problems. This paper establishes both qualitative and quantitative convergence results of the clipped stochastic (sub)gradient method (SGD) for non-smooth convex functions with rapidly growing subgradients. Our analyses show that clipping enhances the stability of SGD and that the clipped SGD algorithm enjoys finite convergence rates in many cases. We also study the convergence of a clipped method with momentum, which includes clipped SGD as a special case, for weakly convex problems under standard assumptions. With a novel Lyapunov analysis, we show that the proposed method achieves the best-known rate for the considered class of problems, demonstrating the effectiveness of clipped methods also in this regime. Numerical results confirm our theoretical developments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2019

Unified Optimal Analysis of the (Stochastic) Gradient Method

In this note we give a simple proof for the convergence of stochastic gr...
research
09/11/2019

The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication

We analyze (stochastic) gradient descent (SGD) with delayed updates on s...
research
02/13/2020

Convergence of a Stochastic Gradient Method with Momentum for Nonsmooth Nonconvex Optimization

Stochastic gradient methods with momentum are widely used in application...
research
11/24/2015

Performance Limits of Stochastic Sub-Gradient Learning, Part I: Single Agent Case

In this work and the supporting Part II, we examine the performance of s...
research
06/25/2021

Tighter Analysis of Alternating Stochastic Gradient Method for Stochastic Nested Problems

Stochastic nested optimization, including stochastic compositional, min-...
research
01/09/2023

Sharper Analysis for Minibatch Stochastic Proximal Point Methods: Stability, Smoothness, and Deviation

The stochastic proximal point (SPP) methods have gained recent attention...
research
02/02/2022

HMC and Langevin united in the unadjusted and convex case

We consider a family of unadjusted HMC samplers, which includes standard...

Please sign up or login with your details

Forgot password? Click here to reset