CrossoverScheduler: Overlapping Multiple Distributed Training Applications in a Crossover Manner

03/14/2021
by   Cheng Luo, et al.
0

Distributed deep learning workloads include throughput-intensive training tasks on the GPU clusters, where the Distributed Stochastic Gradient Descent (SGD) incurs significant communication delays after backward propagation, forces workers to wait for the gradient synchronization via a centralized parameter server or directly in decentralized workers. We present CrossoverScheduler, an algorithm that enables communication cycles of a distributed training application to be filled by other applications through pipelining communication and computation. With CrossoverScheduler, the running performance of distributed training can be significantly improved without sacrificing convergence rate and network accuracy. We achieve so by introducing Crossover Synchronization which allows multiple distributed deep learning applications to time-share the same GPU alternately. The prototype of CrossoverScheduler is built and integrated with Horovod. Experiments on a variety of distributed tasks show that CrossoverScheduler achieves 20 speedup for image classification tasks on ImageNet dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/13/2021

A Distributed SGD Algorithm with Global Sketching for Deep Learning Training Acceleration

Distributed training is an effective way to accelerate the training proc...
research
03/24/2022

Locally Asynchronous Stochastic Gradient Descent for Decentralised Deep Learning

Distributed training algorithms of deep neural networks show impressive ...
research
08/13/2018

RedSync : Reducing Synchronization Traffic for Distributed Deep Learning

Data parallelism has already become a dominant method to scale Deep Neur...
research
06/29/2023

OSP: Boosting Distributed Model Training with 2-stage Synchronization

Distributed deep learning (DDL) is a promising research area, which aims...
research
01/17/2019

Accelerated Training for CNN Distributed Deep Learning through Automatic Resource-Aware Layer Placement

The Convolutional Neural Network (CNN) model, often used for image class...
research
11/04/2021

Finite-Time Consensus Learning for Decentralized Optimization with Nonlinear Gossiping

Distributed learning has become an integral tool for scaling up machine ...
research
03/07/2020

ShadowSync: Performing Synchronization in the Background for Highly Scalable Distributed Training

Distributed training is useful to train complicated models to shorten th...

Please sign up or login with your details

Forgot password? Click here to reset