Measuring the Effects of Data Parallelism on Neural Network Training

by   Christopher J. Shallue, et al.

Recent hardware developments have made unprecedented amounts of data parallelism available for accelerating neural network training. Among the simplest ways to harness next-generation accelerators is to increase the batch size in standard mini-batch neural network training algorithms. In this work, we aim to experimentally characterize the effects of increasing the batch size on training time, as measured in the number of steps necessary to reach a goal out-of-sample error. Eventually, increasing the batch size will no longer reduce the number of training steps required, but the exact relationship between the batch size and how many training steps are necessary is of critical importance to practitioners, researchers, and hardware designers alike. We study how this relationship varies with the training algorithm, model, and dataset and find extremely large variation between workloads. Along the way, we reconcile disagreements in the literature on whether batch size affects model quality. Finally, we discuss the implications of our results for efforts to train neural networks much faster in the future.


page 1

page 2

page 3

page 4


Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model

Increasing the batch size is a popular way to speed up neural network tr...

Faster Neural Network Training with Data Echoing

In the twilight of Moore's law, GPUs and other specialized hardware acce...

Training Multiscale-CNN for Large Microscopy Image Classification in One Hour

Existing approaches to train neural networks that use large images requi...

Superposition of many models into one

We present a method for storing multiple models within a single set of p...

Parallelizing Over Artificial Neural Network Training Runs with Multigrid

Artificial neural networks are a popular and effective machine learning ...

An Adaptive Memory Multi-Batch L-BFGS Algorithm for Neural Network Training

Motivated by the potential for parallel implementation of batch-based al...

Parameter Re-Initialization through Cyclical Batch Size Schedules

Optimal parameter initialization remains a crucial problem for neural ne...

Please sign up or login with your details

Forgot password? Click here to reset