Coded Computation across Shared Heterogeneous Workers with Communication Delay

09/23/2021
by   Yuxuan Sun, et al.
14

Distributed computing enables large-scale computation tasks to be processed over multiple workers in parallel. However, the randomness of communication and computation delays across workers causes the straggler effect, which may degrade the performance. Coded computation helps to mitigate the straggler effect, but the amount of redundant load and their assignment to the workers should be carefully optimized. In this work, we consider a multi-master heterogeneous-worker distributed computing scenario, where multiple matrix multiplication tasks are encoded and allocated to workers for parallel computation. The goal is to minimize the communication plus computation delay of the slowest task. We propose worker assignment, resource allocation and load allocation algorithms under both dedicated and fractional worker assignment policies, where each worker can process the encoded tasks of either a single master or multiple masters, respectively. Then, the non-convex delay minimization problem is solved by employing the Markov's inequality-based approximation, Karush-Kuhn-Tucker conditions, and successive convex approximation methods. Through extensive simulations, we show that the proposed algorithms can reduce the task completion delay compared to the benchmarks, and observe that dedicated and fractional worker assignment policies have different scopes of applications.

READ FULL TEXT

page 3

page 4

page 5

page 6

page 7

page 8

page 11

page 13

research
04/16/2019

Heterogeneous Coded Computation across Heterogeneous Workers

Coded distributed computing framework enables large-scale machine learni...
research
04/16/2019

Heterogeneous Computation across Heterogeneous Workers

Coded distributed computing framework enables large-scale machine learni...
research
10/23/2018

Computation Scheduling for Distributed Machine Learning with Straggling Workers

We study the scheduling of computation tasks across n workers in a large...
research
04/20/2019

Optimal Load Allocation for Coded Distributed Computation in Heterogeneous Clusters

Recently, coding has been a useful technique to mitigate the effect of s...
research
06/25/2021

Hierarchical Online Convex Optimization

We consider online convex optimization (OCO) over a heterogeneous networ...
research
03/02/2021

Stream Distributed Coded Computing

The emerging large-scale and data-hungry algorithms require the computat...
research
05/09/2023

Fundamental Limits of Distributed Linearly Separable Computation under Cyclic Assignment

Distributed Linearly Separable Computation problem under the cyclic assi...

Please sign up or login with your details

Forgot password? Click here to reset