Adaptive Learning Rate Clipping Stabilizes Learning

06/21/2019
by   Jeffrey M. Ede, et al.
14

Artificial neural network training with stochastic gradient descent can be destabilized by "bad batches" with high losses. This is often problematic for training with small batch sizes, high order loss functions or unstably high learning rates. To stabilize learning, we have developed adaptive learning rate clipping (ALRC) to limit backpropagated losses to a number of standard deviations above their running means. ALRC is designed to complement existing learning algorithms: Our algorithm is computationally inexpensive, can be applied to any loss function or batch size, is robust to hyperparameter choices and does not affect backpropagated gradient distributions. Experiments with CIFAR-10 supersampling show that ALCR decreases errors for unstable mean quartic error training while stable mean squared error training is unaffected. We also show that ALRC decreases unstable mean squared errors for partial scanning transmission electron micrograph completion. Our source code is publicly available at https://github.com/Jeffrey-Ede/ALRC

READ FULL TEXT

page 3

page 4

research
02/23/2021

Convergence rates for gradient descent in the training of overparameterized artificial neural networks with biases

In recent years, artificial neural networks have developed into a powerf...
research
06/12/2021

Decreasing scaling transition from adaptive gradient descent to stochastic gradient descent

Currently, researchers have proposed the adaptive gradient descent algor...
research
05/17/2022

Robust Losses for Learning Value Functions

Most value function learning algorithms in reinforcement learning are ba...
research
06/08/2022

On gradient descent training under data augmentation with on-line noisy copies

In machine learning, data augmentation (DA) is a technique for improving...
research
02/14/2017

Exploring loss function topology with cyclical learning rates

We present observations and discussion of previously unreported phenomen...
research
08/03/2023

Bringing Chemistry to Scale: Loss Weight Adjustment for Multivariate Regression in Deep Learning of Thermochemical Processes

Flamelet models are widely used in computational fluid dynamics to simul...
research
05/31/2019

Partial Scan Electron Microscopy with Deep Learning

We present a multi-scale conditional generative adversarial network that...

Please sign up or login with your details

Forgot password? Click here to reset