Analysis of Gradient Clipping and Adaptive Scaling with a Relaxed Smoothness Condition

05/28/2019
by   Jingzhao Zhang, et al.
0

We provide a theoretical explanation for the fast convergence of gradient clipping and adaptively scaled gradient methods commonly used in neural network training. Our analysis is based on a novel relaxation of gradient smoothness conditions that is weaker than the commonly used Lipschitz smoothness assumption. We validate the new smoothness condition in experiments on large-scale neural network training applications where adaptively-scaled methods have been empirically shown to outperform standard gradient based algorithms. Under this new smoothness condition, we prove that two popular adaptively scaled methods, gradient clipping and normalized gradient, converge faster than the theoretical lower bound of fixed-step gradient descent. We verify this fast convergence empirically in neural network training for language modeling and image classification.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2022

Stochastic Gradient Methods with Preconditioned Updates

This work considers non-convex finite sum minimization. There are a numb...
research
10/12/2021

On Convergence of Training Loss Without Reaching Stationary Points

It is a well-known fact that nonconvex optimization is computationally i...
research
06/01/2022

Convergence of Stein Variational Gradient Descent under a Weaker Smoothness Condition

Stein Variational Gradient Descent (SVGD) is an important alternative to...
research
11/03/2022

A Convergence Theory for Federated Average: Beyond Smoothness

Federated learning enables a large amount of edge computing devices to l...
research
06/27/2022

Theoretical analysis of Adam using hyperparameters close to one without Lipschitz smoothness

Convergence and convergence rate analyses of adaptive methods, such as A...
research
02/26/2020

Lipschitz standardization for robust multivariate learning

Current trends in machine learning rely on out-of-the-box gradient-based...
research
03/04/2013

Riemannian metrics for neural networks I: feedforward networks

We describe four algorithms for neural network training, each adapted to...

Please sign up or login with your details

Forgot password? Click here to reset