Super-Convergence: Very Fast Training of Residual Networks Using Large Learning Rates

08/23/2017
by   Leslie N. Smith, et al.
0

In this paper, we show a phenomenon, which we named "super-convergence", where residual networks can be trained using an order of magnitude fewer iterations than is used with standard training methods. The existence of super-convergence is relevant to understanding why deep networks generalize well. One of the key elements of super-convergence is training with cyclical learning rates and a large maximum learning rate. Furthermore, we present evidence that training with large learning rates improves performance by regularizing the network. In addition, we show that super-convergence provides a greater boost in performance relative to standard training when the amount of labeled training data is limited. We also derive a simplification of the Hessian Free optimization method to compute an estimate of the optimal learning rate. The architectures and code to replicate the figures in this paper are available at github.com/lnsmith54/super-convergence.

READ FULL TEXT
research
02/22/2021

Super-Convergence with an Unstable Learning Rate

Conventional wisdom dictates that learning rate should be in the stable ...
research
10/07/2021

Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect

Recent empirical advances show that training deep models with large lear...
research
03/04/2020

The large learning rate phase of deep learning: the catapult mechanism

The choice of initial learning rate can have a profound effect on the pe...
research
03/18/2021

Super-convergence and Differential Privacy: Training faster with better privacy guarantees

The combination of deep neural networks and Differential Privacy has bee...
research
02/14/2017

Exploring loss function topology with cyclical learning rates

We present observations and discussion of previously unreported phenomen...
research
06/13/2019

Training Neural Networks for and by Interpolation

The majority of modern deep learning models are able to interpolate the ...
research
01/18/2023

Catapult Dynamics and Phase Transitions in Quadratic Nets

Neural networks trained with gradient descent can undergo non-trivial ph...

Please sign up or login with your details

Forgot password? Click here to reset