A^2CiD^2: Accelerating Asynchronous Communication in Decentralized Deep Learning

06/14/2023
by   Adel Nabli, et al.
0

Distributed training of Deep Learning models has been critical to many recent successes in the field. Current standard methods primarily rely on synchronous centralized algorithms which induce major communication bottlenecks and limit their usability to High-Performance Computing (HPC) environments with strong connectivity. Decentralized asynchronous algorithms are emerging as a potential alternative but their practical applicability still lags. In this work, we focus on peerto-peer asynchronous methods due to their flexibility and parallelization potentials. In order to mitigate the increase in bandwidth they require at large scale and in poorly connected contexts, we introduce a principled asynchronous, randomized, gossip-based algorithm which works thanks to a continuous momentum named A^2CiD^2. In addition to inducing a significant communication acceleration at no cost other than doubling the parameters, minimal adaptation is required to incorporate A^2CiD^2 to other asynchronous approaches. We demonstrate its efficiency theoretically and numerically. Empirically on the ring graph, adding A^2CiD^2 has the same effect as doubling the communication rate. In particular, we show consistent improvement on the ImageNet dataset using up to 64 asynchronous workers (A100 GPUs) and various communication network topologies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2020

Asynchronous Decentralized Learning of a Neural Network

In this work, we exploit an asynchronous computing framework namely ARoc...
research
11/08/2021

BlueFog: Make Decentralized Algorithms Practical for Optimization and Deep Learning

Decentralized algorithm is a form of computation that achieves a global ...
research
06/11/2021

Decoupled Greedy Learning of CNNs for Synchronous and Asynchronous Distributed Learning

A commonly cited inefficiency of neural network training using back-prop...
research
10/21/2021

Asynchronous Decentralized Distributed Training of Acoustic Models

Large-scale distributed training of deep acoustic models plays an import...
research
04/04/2016

Revisiting Distributed Synchronous SGD

Distributed training of deep learning models on large-scale training dat...
research
02/09/2021

Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data

Decentralized training of deep learning models is a key element for enab...
research
04/07/2019

An Asynchronous, Decentralized Solution Framework for the Large Scale Unit Commitment Problem

With increased reliance on cyber infrastructure, large scale power netwo...

Please sign up or login with your details

Forgot password? Click here to reset