Combating Computational Heterogeneity in Large-Scale Distributed Computing via Work Exchange

11/22/2017
by   Mohamed A. Attia, et al.
0

Owing to data-intensive large-scale applications, distributed computation systems have gained significant recent interest, due to their ability of running such tasks over a large number of commodity nodes in a time efficient manner. One of the major bottlenecks that adversely impacts the time efficiency is the computational heterogeneity of distributed nodes, often limiting the task completion time due to the slowest worker. In this paper, we first present a lower bound on the expected computation time based on the work-conservation principle. We then present our approach of work exchange to combat the latency problem, in which faster workers can be reassigned additional leftover computations that were originally assigned to slower workers. We present two variations of the work exchange approach: a) when the computational heterogeneity knowledge is known a priori; and b) when heterogeneity is unknown and is estimated in an online manner to assign tasks to distributed workers. As a baseline, we also present and analyze the use of an optimized Maximum Distance Separable (MDS) coded distributed computation scheme over heterogeneous nodes. Simulation results also compare the proposed approach of work exchange, the baseline MDS coded scheme and the lower bound obtained via work-conservation principle. We show that the work exchange scheme achieves time for computation which is very close to the lower bound with limited coordination and communication overhead even when the knowledge about heterogeneity levels is not available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2019

Optimal Load Allocation for Coded Distributed Computation in Heterogeneous Clusters

Recently, coding has been a useful technique to mitigate the effect of s...
research
01/05/2018

Near Optimal Coded Data Shuffling for Distributed Learning

Data shuffling between distributed cluster of nodes is one of the critic...
research
04/16/2019

Heterogeneous Coded Computation across Heterogeneous Workers

Coded distributed computing framework enables large-scale machine learni...
research
04/16/2019

Heterogeneous Computation across Heterogeneous Workers

Coded distributed computing framework enables large-scale machine learni...
research
12/16/2020

Incentive Mechanism Design for Distributed Coded Machine Learning

A distributed machine learning platform needs to recruit many heterogene...
research
04/27/2022

Stream Iterative Distributed Coded Computing for Learning Applications in Heterogeneous Systems

To improve the utility of learning applications and render machine learn...
research
09/08/2021

Computational Polarization: An Information-theoretic Method for Resilient Computing

We introduce an error resilient distributed computing method based on an...

Please sign up or login with your details

Forgot password? Click here to reset