XPipe: Efficient Pipeline Model Parallelism for Multi-GPU DNN Training

10/24/2019
by   Lei Guan, et al.
0

We propose XPipe, an efficient asynchronous pipeline model parallelism approach for multi-GPU DNN training. XPipe is designed to make use of multiple GPUs to concurrently and continuously train different parts of a DNN model. To improve GPU utilization and achieve high throughput, it splits a mini-batch into a set of micro-batches and allows the overlapping of the pipelines of multiple micro-batches, including those belonging to different mini-batches. Most importantly, the weight prediction strategy adopted by XPipe enables it to effectively address the weight inconsistency and staleness issues incurred by the asynchronous pipeline parallelism. As a result, XPipe incorporates the advantages of both synchronous and asynchronous pipeline parallelism approaches. It can achieve high throughput while obtaining very comparable (even slightly better) model quality as the synchronous counterpart. Experimental results show that XPipe outperforms other existing synchronous and asynchronous model parallelism approaches.

READ FULL TEXT
research
07/14/2021

Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines

Training large deep learning models at scale is very challenging. This p...
research
07/02/2020

DAPPLE: A Pipelined Data Parallel Approach for Training Large Models

It is a challenging task to train large DNN models on sophisticated GPU ...
research
04/22/2022

Efficient Pipeline Planning for Expedited Distributed DNN Training

To train modern large DNN models, pipeline parallelism has recently emer...
research
10/09/2019

PipeMare: Asynchronous Pipeline Parallel DNN Training

Recently there has been a flurry of interest around using pipeline paral...
research
08/31/2023

SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills

Large Language Model (LLM) inference consists of two distinct phases - p...
research
12/17/2020

High-Throughput Synchronous Deep RL

Deep reinforcement learning (RL) is computationally demanding and requir...
research
03/11/2020

Evaluating Abstract Asynchronous Schwarz solvers on GPUs

With the commencement of the exascale computing era, we realize that the...

Please sign up or login with your details

Forgot password? Click here to reset