Exponential Graph is Provably Efficient for Decentralized Deep Training

10/26/2021
by   Bicheng Ying, et al.
7

Decentralized SGD is an emerging training method for deep learning known for its much less (thus faster) communication per iteration, which relaxes the averaging step in parallel SGD to inexact averaging. The less exact the averaging is, however, the more the total iterations the training needs to take. Therefore, the key to making decentralized SGD efficient is to realize nearly-exact averaging using little communication. This requires a skillful choice of communication topology, which is an under-studied topic in decentralized optimization. In this paper, we study so-called exponential graphs where every node is connected to O(log(n)) neighbors and n is the total number of nodes. This work proves such graphs can lead to both fast communication and effective averaging simultaneously. We also discover that a sequence of log(n) one-peer exponential graphs, in which each node communicates to one single neighbor per iteration, can together achieve exact averaging. This favorable property enables one-peer exponential graph to average as effective as its static counterpart but communicates more efficiently. We apply these exponential graphs in decentralized (momentum) SGD to obtain the state-of-the-art balance between per-iteration communication and iteration complexity among all commonly-used topologies. Experimental results on a variety of tasks and models demonstrate that decentralized (momentum) SGD over exponential graphs promises both fast and high-quality training. Our code is implemented through BlueFog and available at https://github.com/Bluefog-Lib/NeurIPS2021-Exponential-Graph.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2022

Communication-Efficient Topologies for Decentralized Learning with O(1) Consensus Rate

Decentralized optimization is an emerging paradigm in distributed learni...
research
05/19/2023

Beyond Exponential Graph: Communication-Efficient Topologies for Decentralized Learning via Finite-time Convergence

Decentralized learning has recently been attracting increasing attention...
research
06/01/2023

DSGD-CECA: Decentralized SGD with Communication-Optimal Exact Consensus Algorithm

Decentralized Stochastic Gradient Descent (SGD) is an emerging neural ne...
research
04/24/2021

DecentLaM: Decentralized Momentum SGD for Large-batch Deep Training

The scale of deep learning nowadays calls for efficient distributed trai...
research
08/22/2018

Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms

State-of-the-art distributed machine learning suffers from significant d...
research
06/05/2023

Improved Stability and Generalization Analysis of the Decentralized SGD Algorithm

This paper presents a new generalization error analysis for the Decentra...
research
01/05/2023

Beyond spectral gap (extended): The role of the topology in decentralized learning

In data-parallel optimization of machine learning models, workers collab...

Please sign up or login with your details

Forgot password? Click here to reset