Hop: Heterogeneity-Aware Decentralized Training

02/04/2019
by   Qinyi Luo, et al.
0

Recent work has shown that decentralized algorithms can deliver superior performance over centralized ones in the context of machine learning. The two approaches, with the main difference residing in their distinct communication patterns, are both susceptible to performance degradation in heterogeneous environments. Although vigorous efforts have been devoted to supporting centralized algorithms against heterogeneity, little has been explored in decentralized algorithms regarding this problem. This paper proposes Hop, the first heterogeneity-aware decentralized training protocol. Based on a unique characteristic of decentralized training that we have identified, the iteration gap, we propose a queue-based synchronization mechanism that can efficiently implement backup workers and bounded staleness in the decentralized setting. To cope with deterministic slowdown, we propose skipping iterations so that the effect of slower workers is further mitigated. We build a prototype implementation of Hop on TensorFlow. The experiment results on CNN and SVM show significant speedup over standard decentralized training in heterogeneous settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/17/2019

Heterogeneity-Aware Asynchronous Decentralized Training

Distributed deep learning training usually adopts All-Reduce as the sync...
research
11/16/2021

HADFL: Heterogeneity-aware Decentralized Federated Learning Framework

Federated learning (FL) supports training models on geographically distr...
research
10/08/2021

RelaySum for Decentralized Deep Learning on Heterogeneous Data

In decentralized machine learning, workers compute model updates on thei...
research
08/24/2020

Adaptive Serverless Learning

With the emergence of distributed data, training machine learning models...
research
04/13/2022

Data-heterogeneity-aware Mixing for Decentralized Learning

Decentralized learning provides an effective framework to train machine ...
research
04/09/2022

Yes, Topology Matters in Decentralized Optimization: Refined Convergence and Topology Learning under Heterogeneous Data

One of the key challenges in federated and decentralized learning is to ...
research
11/04/2021

Finite-Time Consensus Learning for Decentralized Optimization with Nonlinear Gossiping

Distributed learning has become an integral tool for scaling up machine ...

Please sign up or login with your details

Forgot password? Click here to reset