Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines

07/14/2021
by   Shigang Li, et al.
15

Training large deep learning models at scale is very challenging. This paper proposes Chimera, a novel pipeline parallelism scheme which combines bidirectional pipelines for efficiently training large-scale models. Chimera is a synchronous approach and therefore no loss of accuracy, which is more convergence-friendly than asynchronous approaches. Compared with the latest synchronous pipeline approach, Chimera reduces the number of bubbles by up to 50 Chimera has a more balanced activation memory consumption. Evaluations are conducted on Transformer based language models. For a GPT-2 model with 1.3 billion parameters running on 2,048 GPU nodes of the Piz Daint supercomputer, Chimera improves the training throughput by 1.16x-2.34x over the state-of-the-art synchronous and asynchronous pipeline approaches.

READ FULL TEXT

page 5

page 9

page 10

page 11

research
10/24/2019

XPipe: Efficient Pipeline Model Parallelism for Multi-GPU DNN Training

We propose XPipe, an efficient asynchronous pipeline model parallelism a...
research
11/25/2022

PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices

Pipeline parallelism enables efficient training of Large Language Models...
research
05/27/2017

AMPNet: Asynchronous Model-Parallel Training for Dynamic Neural Networks

New types of machine learning hardware in development and entering the m...
research
08/30/2023

Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency

Large-scale language models have become increasingly challenging and exp...
research
04/22/2022

Efficient Pipeline Planning for Expedited Distributed DNN Training

To train modern large DNN models, pipeline parallelism has recently emer...
research
05/10/2022

Reducing Activation Recomputation in Large Transformer Models

Training large transformer models is one of the most important computati...
research
11/16/2018

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

GPipe is a scalable pipeline parallelism library that enables learning o...

Please sign up or login with your details

Forgot password? Click here to reset