Ranger21: a synergistic deep learning optimizer

06/25/2021
by   Less Wright, et al.
0

As optimizers are critical to the performances of neural networks, every year a large number of papers innovating on the subject are published. However, while most of these publications provide incremental improvements to existing algorithms, they tend to be presented as new optimizers rather than composable algorithms. Thus, many worthwhile improvements are rarely seen out of their initial publication. Taking advantage of this untapped potential, we introduce Ranger21, a new optimizer which combines AdamW with eight components, carefully selected after reviewing and testing ideas from the literature. We found that the resulting optimizer provides significantly improved validation accuracy and training speed, smoother training curves, and is even able to train a ResNet50 on ImageNet2012 without Batch Normalization layers. A problem on which AdamW stays systematically stuck in a bad initial state.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2022

Cardinal Optimizer (COPT) User Guide

Cardinal Optimizer is a high-performance mathematical programming solver...
research
07/28/2023

CoRe Optimizer: An All-in-One Solution for Machine Learning

The optimization algorithm and its hyperparameters can significantly aff...
research
10/24/2018

Learned optimizers that outperform SGD on wall-clock and validation loss

Deep learning has shown that learned functions can dramatically outperfo...
research
01/22/2021

Gravity Optimizer: a Kinematic Approach on Optimization in Deep Learning

We introduce Gravity, another algorithm for gradient-based optimization....
research
11/16/2020

Mixing ADAM and SGD: a Combined Optimization Method

Optimization methods (optimizers) get special attention for the efficien...
research
12/08/2017

Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks

Progress in deep learning is slowed by the days or weeks it takes to tra...
research
05/20/2022

SADAM: Stochastic Adam, A Stochastic Operator for First-Order Gradient-based Optimizer

In this work, to efficiently help escape the stationary and saddle point...

Please sign up or login with your details

Forgot password? Click here to reset