Heterogeneity-Aware Asynchronous Decentralized Training

09/17/2019
by   Qinyi Luo, et al.
0

Distributed deep learning training usually adopts All-Reduce as the synchronization mechanism for data parallel algorithms due to its high performance in homogeneous environment. However, its performance is bounded by the slowest worker among all workers, and is significantly slower in heterogeneous situations. AD-PSGD, a newly proposed synchronization method which provides numerically fast convergence and heterogeneity tolerance, suffers from deadlock issues and high synchronization overhead. Is it possible to get the best of both worlds - designing a distributed training method that has both high performance as All-Reduce in homogeneous environment and good heterogeneity tolerance as AD-PSGD? In this paper, we propose Ripples, a high-performance heterogeneity-aware asynchronous decentralized training approach. We achieve the above goal with intensive synchronization optimization, emphasizing the interplay between algorithm and system implementation. To reduce synchronization cost, we propose a novel communication primitive Partial All-Reduce that allows a large group of workers to synchronize quickly. To reduce synchronization conflict, we propose static group scheduling in homogeneous environment and simple techniques (Group Buffer and Group Division) to avoid conflicts with slightly reduced randomness. Our experiments show that in homogeneous environment, Ripples is 1.1 times faster than the state-of-the-art implementation of All-Reduce, 5.1 times faster than Parameter Server and 4.3 times faster than AD-PSGD. In a heterogeneous setting, Ripples shows 2 times speedup over All-Reduce, and still obtains 3 times speedup over the Parameter Server baseline.

READ FULL TEXT
research
02/04/2019

Hop: Heterogeneity-Aware Decentralized Training

Recent work has shown that decentralized algorithms can deliver superior...
research
05/19/2018

Tell Me Something New: a new framework for asynchronous parallel learning

We present a novel approach for parallel computation in the context of m...
research
10/11/2019

Blink: Fast and Generic Collectives for Distributed ML

Model parameter synchronization across GPUs introduces high overheads fo...
research
08/16/2019

Dynamic Stale Synchronous Parallel Distributed Training for Deep Learning

Deep learning is a popular machine learning technique and has been appli...
research
09/17/2021

Heterogeneous download times in bandwidth-homogeneous BitTorrent swarms

Modeling and understanding BitTorrent (BT) dynamics is a recurrent resea...
research
07/07/2020

Divide-and-Shuffle Synchronization for Distributed Machine Learning

Distributed Machine Learning suffers from the bottleneck of synchronizat...
research
11/16/2021

Task allocation for decentralized training in heterogeneous environment

The demand for large-scale deep learning is increasing, and distributed ...

Please sign up or login with your details

Forgot password? Click here to reset