Efficient Strong Scaling Through Burst Parallel Training

12/19/2021
by   Seo Jin Park, et al.
0

As emerging deep neural network (DNN) models continue to grow in size, using large GPU clusters to train DNNs is becoming an essential requirement to achieving acceptable training times. In this paper, we consider the case where future increases in cluster size will cause the global batch size that can be used to train models to reach a fundamental limit: beyond a certain point, larger global batch sizes cause sample efficiency to degrade, increasing overall time to accuracy. As a result, to achieve further improvements in training performance, we must instead consider "strong scaling" strategies that hold the global batch size constant and allocate smaller batches to each GPU. Unfortunately, this makes it significantly more difficult to use cluster resources efficiently. We present DeepPool, a system that addresses this efficiency challenge through two key ideas. First, burst parallelism allocates large numbers of GPUs to foreground jobs in bursts to exploit the unevenness in parallelism across layers. Second, GPU multiplexing prioritizes throughput for foreground training jobs, while packing in background training jobs to reclaim underutilized GPU resources, thereby improving cluster-wide utilization. Together, these two ideas enable DeepPool to deliver a 2.2 - 2.4x improvement in total cluster throughput over standard data parallelism with a single task when the cluster scale is large.

READ FULL TEXT
research
09/26/2019

Elastic deep learning in multi-tenant GPU cluster

Multi-tenant GPU clusters are common nowadays due to the huge success of...
research
02/16/2022

Aryl: An Elastic Cluster Scheduler for Deep Learning

Companies build separate training and inference GPU clusters for deep le...
research
08/08/2021

Online Evolutionary Batch Size Orchestration for Scheduling Deep Learning Workloads in GPU Clusters

Efficient GPU resource scheduling is essential to maximize resource util...
research
07/25/2020

The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism

We present scalable hybrid-parallel algorithms for training large-scale ...
research
01/04/2019

A Scalable Framework for Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters with Weight and Workload Balancing

Deep Neural Networks (DNNs) have revolutionized numerous applications, b...
research
03/15/2019

Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism

Scaling CNN training is necessary to keep up with growing datasets and r...

Please sign up or login with your details

Forgot password? Click here to reset