A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes

02/12/2021
by   Zachary Nado, et al.
7

Recently the LARS and LAMB optimizers have been proposed for training neural networks faster using large batch sizes. LARS and LAMB add layer-wise normalization to the update rules of Heavy-ball momentum and Adam, respectively, and have become popular in prominent benchmarks and deep learning libraries. However, without fair comparisons to standard optimizers, it remains an open question whether LARS and LAMB have any benefit over traditional, generic algorithms. In this work we demonstrate that standard optimization algorithms such as Nesterov momentum and Adam can match or exceed the results of LARS and LAMB at large batch sizes. Our results establish new, stronger baselines for future comparisons at these batch sizes and shed light on the difficulties of comparing optimizers for neural network training more generally.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2019

Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model

Increasing the batch size is a popular way to speed up neural network tr...
research
08/02/2021

Batch Normalization Preconditioning for Neural Network Training

Batch normalization (BN) is a popular and ubiquitous method in deep lear...
research
07/05/2022

Understanding and Improving Group Normalization

Various normalization layers have been proposed to help the training of ...
research
11/27/2020

Improving Layer-wise Adaptive Rate Methods using Trust Ratio Clipping

Training neural networks with large batch is of fundamental significance...
research
07/13/2021

Automated Learning Rate Scheduler for Large-batch Training

Large-batch training has been essential in leveraging large-scale datase...
research
08/26/2021

The Number of Steps Needed for Nonconvex Optimization of a Deep Learning Optimizer is a Rational Function of Batch Size

Recently, convergence as well as convergence rate analyses of deep learn...
research
10/11/2019

On Empirical Comparisons of Optimizers for Deep Learning

Selecting an optimizer is a central step in the contemporary deep learni...

Please sign up or login with your details

Forgot password? Click here to reset