Ada-Grouper: Accelerating Pipeline Parallelism in Preempted Network by Adaptive Group-Scheduling for Micro-Batches

03/03/2023
by   Siyu Wang, et al.
0

Pipeline parallelism has been demonstrated to be a remarkable approach to improve throughput for training deep neural networks with billions of parameters over heterogeneous clusters. The 1F1B scheduling plan is a widely adopted strategy for memory and performance optimization, which interchanges the forward and backward stage computations of different micro-batches. On the other hand, a common issue in using the 1F1B scheduling is that stage computation is delayed due to the data transfer when network resources are preempted by other tasks, even with the minimum communication between stages. The exclusive access of these network resources cannot be guaranteed in cloud offerings. We present a general scheduling technique to accommodate pipeline parallelism to preempted network environments at the expense of a certain amount of memory pressure. The core concept is to extend 1F1B schedule scheme to kFkB, which groups k micro-batches, and alternately executes k forward and backward computations. We propose Ada-Grouper, an adaptive kFkB scheduler which regularly adjusts the number of group members k to maintain an optimal balance between communication and computation efficiency correspond to changes in a changing network environment under the memory limit. Experimental results demonstrate that our design maintain stable performance for pipeline parallelism, yielding a performance increase of up from 4 with 1F1B in preempted network scenarios.

READ FULL TEXT
research
12/23/2020

BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training

The size of deep neural networks (DNNs) grows rapidly as the complexity ...
research
04/01/2021

Optimizer Fusion: Efficient Training with Better Locality and Parallelism

Machine learning frameworks adopt iterative optimizers to train neural n...
research
04/21/2020

torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models

We design and implement a ready-to-use library in PyTorch for performing...
research
08/30/2023

Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency

Large-scale language models have become increasingly challenging and exp...
research
12/27/2021

Design and Experimental Evaluation of Algorithms for Optimizing the Throughput of Dispersed Computing

With growing deployment of Internet of Things (IoT) and machine learning...
research
02/24/2023

Decoupling the All-Reduce Primitive for Accelerating Distributed Deep Learning

Communication scheduling has been shown to be effective in accelerating ...

Please sign up or login with your details

Forgot password? Click here to reset