Breadth-First Pipeline Parallelism

11/11/2022
by   Joel Lamy-Poirier, et al.
0

We introduce Breadth-First Pipeline Parallelism, a novel training schedule which optimizes the combination of pipeline and data parallelism. Breadth-First Pipeline Parallelism lowers training time, cost and memory usage by combining a high GPU utilization with a small batch size per GPU, and by making use of fully sharded data parallelism. Experimentally, we observed increases of up to 53

READ FULL TEXT
research
04/09/2021

Efficient Large-Scale Language Model Training on GPU Clusters

Large language models have led to state-of-the-art accuracies across a r...
research
12/23/2020

BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training

The size of deep neural networks (DNNs) grows rapidly as the complexity ...
research
03/25/2020

Pipelined Backpropagation at Scale: Training Large Models without Batches

Parallelism is crucial for accelerating the training of deep neural netw...
research
04/21/2020

torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models

We design and implement a ready-to-use library in PyTorch for performing...
research
06/10/2022

Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models

Foundation models are becoming the dominant deep learning technologies. ...
research
10/09/2019

PipeMare: Asynchronous Pipeline Parallel DNN Training

Recently there has been a flurry of interest around using pipeline paral...
research
08/19/2023

GNNPipe: Accelerating Distributed Full-Graph GNN Training with Pipelined Model Parallelism

Current distributed full-graph GNN training methods adopt a variant of d...

Please sign up or login with your details

Forgot password? Click here to reset