Communication-efficient Decentralized Machine Learning over Heterogeneous Networks

09/12/2020
by   Pan Zhou, et al.
0

In the last few years, distributed machine learning has been usually executed over heterogeneous networks such as a local area network within a multi-tenant cluster or a wide area network connecting data centers and edge clusters. In these heterogeneous networks, the link speeds among worker nodes vary significantly, making it challenging for state-of-the-art machine learning approaches to perform efficient training. Both centralized and decentralized training approaches suffer from low-speed links. In this paper, we propose a decentralized approach, namely NetMax, that enables worker nodes to communicate via high-speed links and, thus, significantly speed up the training process. NetMax possesses the following novel features. First, it consists of a novel consensus algorithm that allows worker nodes to train model copies on their local dataset asynchronously and exchange information via peer-to-peer communication to synchronize their local copies, instead of a central master node (i.e., parameter server). Second, each worker node selects one peer randomly with a fine-tuned probability to exchange information per iteration. In particular, peers with high-speed links are selected with high probability. Third, the probabilities of selecting peers are designed to minimize the total convergence time. Moreover, we mathematically prove the convergence of NetMax. We evaluate NetMax on heterogeneous cluster networks and show that it achieves speedups of 3.7X, 3.4X, and 1.9X in comparison with the state-of-the-art decentralized training approaches Prague, Allreduce-SGD, and AD-PSGD, respectively.

READ FULL TEXT
research
02/22/2020

Communication-Efficient Decentralized Learning with Sparsification and Adaptive Peer Selection

Distributed learning techniques such as federated learning have enabled ...
research
01/30/2020

Learning from Peers at the Wireless Edge

The last mile connection is dominated by wireless links where heterogene...
research
10/21/2019

Communication Efficient Decentralized Training with Multiple Local Updates

Communication efficiency plays a significant role in decentralized optim...
research
11/16/2021

Task allocation for decentralized training in heterogeneous environment

The demand for large-scale deep learning is increasing, and distributed ...
research
08/01/2022

Event Notifications in Value-Adding Networks

Linkages between research outputs are crucial in the scholarly knowledge...
research
08/04/2017

Efficient Variance-Reduced Learning for Fully Decentralized On-Device Intelligence

This work develops a fully decentralized variance-reduced learning algor...
research
10/23/2020

Throughput-Optimal Topology Design for Cross-Silo Federated Learning

Federated learning usually employs a client-server architecture where an...

Please sign up or login with your details

Forgot password? Click here to reset