Removing Data Heterogeneity Influence Enhances Network Topology Dependence of Decentralized SGD

05/17/2021
by   Kun Yuan, et al.
0

We consider decentralized stochastic optimization problems where a network of agents each owns a local cost function cooperate to find a minimizer of the global-averaged cost. A widely studied decentralized algorithm for this problem is D-SGD in which each node applies a stochastic gradient descent step, then averages its estimate with its neighbors. D-SGD is attractive due to its efficient single-iteration communication and can achieve linear speedup in convergence (in terms of the network size). However, D-SGD is very sensitive to the network topology. For smooth objective functions, the transient stage (which measures how fast the algorithm can reach the linear speedup stage) of D-SGD is on the order of O(n/(1-β)^2) and O(n^3/(1-β)^4) for strongly convex and generally convex cost functions, respectively, where 1-β∈ (0,1) is a topology-dependent quantity that approaches 0 for a large and sparse network. Hence, D-SGD suffers from slow convergence for large and sparse networks. In this work, we study the non-asymptotic convergence property of the D^2/Exact-diffusion algorithm. By eliminating the influence of data heterogeneity between nodes, D^2/Exact-diffusion is shown to have an enhanced transient stage that are on the order of O(n/(1-β)) and O(n^3/(1-β)^2) for strongly convex and generally convex cost functions, respectively. Moreover, we provide a lower bound of the transient stage of D-SGD under homogeneous data distributions, which coincides with the transient stage of D^2/Exact-diffusion in the strongly-convex setting. These results show that removing the influence of data heterogeneity can ameliorate the network topology dependence of D-SGD. Compared with existing decentralized algorithms bounds, D^2/Exact-diffusion is least sensitive to network topology.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/11/2021

Improving the Transient Times for Distributed Stochastic Gradient Methods

We consider the distributed optimization problem where n agents each pos...
research
10/19/2021

A Unified and Refined Convergence Analysis for Non-Convex Decentralized Learning

We study the consensus decentralized optimization problem where the obje...
research
05/19/2021

Accelerating Gossip SGD with Periodic Global Averaging

Communication overhead hinders the scalability of large-scale distribute...
research
04/09/2022

Yes, Topology Matters in Decentralized Optimization: Refined Convergence and Topology Learning under Heterogeneous Data

One of the key challenges in federated and decentralized learning is to ...
research
07/08/2022

Tackling Data Heterogeneity: A New Unified Framework for Decentralized SGD with Sample-induced Topology

We develop a general framework unifying several gradient-based stochasti...
research
06/23/2017

Collaborative Deep Learning in Fixed Topology Networks

There is significant recent interest to parallelize deep learning algori...
research
12/16/2022

Addressing Data Heterogeneity in Decentralized Learning via Topological Pre-processing

Recently, local peer topology has been shown to influence the overall co...

Please sign up or login with your details

Forgot password? Click here to reset