DSGD-CECA: Decentralized SGD with Communication-Optimal Exact Consensus Algorithm

06/01/2023
by   Lisang Ding, et al.
0

Decentralized Stochastic Gradient Descent (SGD) is an emerging neural network training approach that enables multiple agents to train a model collaboratively and simultaneously. Rather than using a central parameter server to collect gradients from all the agents, each agent keeps a copy of the model parameters and communicates with a small number of other agents to exchange model updates. Their communication, governed by the communication topology and gossip weight matrices, facilitates the exchange of model updates. The state-of-the-art approach uses the dynamic one-peer exponential-2 topology, achieving faster training times and improved scalability than the ring, grid, torus, and hypercube topologies. However, this approach requires a power-of-2 number of agents, which is impractical at scale. In this paper, we remove this restriction and propose Decentralized SGD with Communication-optimal Exact Consensus Algorithm (DSGD-CECA), which works for any number of agents while still achieving state-of-the-art properties. In particular, DSGD-CECA incurs a unit per-iteration communication overhead and an Õ(n^3) transient iteration complexity. Our proof is based on newly discovered properties of gossip weight matrices and a novel approach to combine them with DSGD's convergence analysis. Numerical experiments show the efficiency of DSGD-CECA.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2022

Communication-Efficient Topologies for Decentralized Learning with O(1) Consensus Rate

Decentralized optimization is an emerging paradigm in distributed learni...
research
10/26/2021

Exponential Graph is Provably Efficient for Decentralized Deep Training

Decentralized SGD is an emerging training method for deep learning known...
research
07/01/2020

Shuffle-Exchange Brings Faster: Reduce the Idle Time During Communication for Decentralized Neural Network Training

As a crucial scheme to accelerate the deep neural network (DNN) training...
research
10/21/2020

Decentralized Deep Learning using Momentum-Accelerated Consensus

We consider the problem of decentralized deep learning where multiple ag...
research
05/19/2023

Beyond Exponential Graph: Communication-Efficient Topologies for Decentralized Learning via Finite-time Convergence

Decentralized learning has recently been attracting increasing attention...
research
02/28/2020

Decentralized gradient methods: does topology matter?

Consensus-based distributed optimization methods have recently been advo...
research
05/30/2018

On Consensus-Optimality Trade-offs in Collaborative Deep Learning

In distributed machine learning, where agents collaboratively learn from...

Please sign up or login with your details

Forgot password? Click here to reset