PipeMare: Asynchronous Pipeline Parallel DNN Training

10/09/2019
∙
by   Bowen Yang, et al.
∙
43
∙

Recently there has been a flurry of interest around using pipeline parallelism while training neural networks. Pipeline parallelism enables larger models to be partitioned spatially across chips and within a chip, leading to both lower network communication and overall higher hardware utilization. Unfortunately, to preserve statistical efficiency, existing pipeline-parallelism techniques sacrifice hardware efficiency by introducing bubbles into the pipeline and/or incurring extra memory costs. In this paper, we investigate to what extent these sacrifices are necessary. Theoretically, we derive a simple but robust training method, called PipeMare, that tolerates asynchronous updates during pipeline-parallel execution. Using this, we show empirically, on a ResNet network and a Transformer network, that PipeMare can achieve final model qualities that match those of synchronous training techniques (at most 0.9 while either using up to 2.0X less weight and optimizer memory or being up to 3.3X faster than other pipeline parallel training techniques. To the best of our knowledge we are the first to explore these techniques and fine-grained pipeline parallelism (e.g. the number of pipeline stages equals to the number of layers) during neural network training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
∙ 11/11/2022

Breadth-First Pipeline Parallelism

We introduce Breadth-First Pipeline Parallelism, a novel training schedu...
research
∙ 10/24/2019

XPipe: Efficient Pipeline Model Parallelism for Multi-GPU DNN Training

We propose XPipe, an efficient asynchronous pipeline model parallelism a...
research
∙ 11/16/2018

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

GPipe is a scalable pipeline parallelism library that enables learning o...
research
∙ 03/25/2020

Pipelined Backpropagation at Scale: Training Large Models without Batches

Parallelism is crucial for accelerating the training of deep neural netw...
research
∙ 04/21/2020

torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models

We design and implement a ready-to-use library in PyTorch for performing...
research
∙ 02/01/2022

Pipeflow: An Efficient Task-Parallel Pipeline Programming Framework using Modern C++

Pipeline is a fundamental parallel programming pattern. Mainstream pipel...
research
∙ 04/26/2022

Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs

DNN models across many domains continue to grow in size, resulting in hi...

Please sign up or login with your details

Forgot password? Click here to reset