The Limit of the Batch Size

06/15/2020
by   Yang You, et al.
22

Large-batch training is an efficient approach for current distributed deep learning systems. It has enabled researchers to reduce the ImageNet/ResNet-50 training from 29 hours to around 1 minute. In this paper, we focus on studying the limit of the batch size. We think it may provide a guidance to AI supercomputer and algorithm designers. We provide detailed numerical optimization instructions for step-by-step comparison. Moreover, it is important to understand the generalization and optimization performance of huge batch training. Hoffer et al. introduced "ultra-slow diffusion" theory to large-batch training. However, our experiments show contradictory results with the conclusion of Hoffer et al. We provide comprehensive experimental results and detailed analysis to study the limitations of batch size scaling and "ultra-slow diffusion" theory. For the first time we scale the batch size on ImageNet to at least a magnitude larger than all previous work, and provide detailed studies on the performance of many state-of-the-art optimization schemes under this setting. We propose an optimization recipe that is able to improve the top-1 test accuracy by 18

READ FULL TEXT

page 14

page 15

page 16

research
01/24/2019

Large-Batch Training for LSTM and Beyond

Large-batch training approaches have enabled researchers to utilize larg...
research
09/14/2017

ImageNet Training in Minutes

Finishing 90-epoch ImageNet-1k training with ResNet-50 on a NVIDIA M40 G...
research
11/20/2019

Auto-Precision Scaling for Distributed Deep Learning

In recent years, large-batch optimization is becoming the key of distrib...
research
11/12/2017

Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes

We demonstrate that training ResNet-50 on ImageNet for 90 epochs can be ...
research
11/13/2018

ImageNet/ResNet-50 Training in 224 Seconds

Scaling the distributed deep learning to a massive GPU cluster level is ...
research
06/01/2021

Concurrent Adversarial Learning for Large-Batch Training

Large-batch training has become a commonly used technique when training ...
research
05/05/2020

Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change

The choice of hyper-parameters affects the performance of neural models....

Please sign up or login with your details

Forgot password? Click here to reset