Stream Distributed Coded Computing

03/02/2021
by   Alejandro Cohen, et al.
0

The emerging large-scale and data-hungry algorithms require the computations to be delegated from a central server to several worker nodes. One major challenge in the distributed computations is to tackle delays and failures caused by the stragglers. To address this challenge, introducing efficient amount of redundant computations via distributed coded computation has received significant attention. Recent approaches in this area have mainly focused on introducing minimum computational redundancies to tolerate certain number of stragglers. To the best of our knowledge, the current literature lacks a unified end-to-end design in a heterogeneous setting where the workers can vary in their computation and communication capabilities. The contribution of this paper is to devise a novel framework for joint scheduling-coding, in a setting where the workers and the arrival of stream computational jobs are based on stochastic models. In our initial joint scheme, we propose a systematic framework that illustrates how to select a set of workers and how to split the computational load among the selected workers based on their differences in order to minimize the average in-order job execution delay. Through simulations, we demonstrate that the performance of our framework is dramatically better than the performance of naive method that splits the computational load uniformly among the workers, and it is close to the ideal performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/27/2022

Stream Iterative Distributed Coded Computing for Learning Applications in Heterogeneous Systems

To improve the utility of learning applications and render machine learn...
research
03/05/2019

Gradient Coding with Clustering and Multi-message Communication

Gradient descent (GD) methods are commonly employed in machine learning ...
research
08/02/2022

Distributed Computations with Layered Resolution

Modern computationally-heavy applications are often time-sensitive, dema...
research
12/17/2021

Coded Consensus Monte Carlo: Robust One-Shot Distributed Bayesian Learning with Stragglers

This letter studies distributed Bayesian learning in a setting encompass...
research
09/23/2021

Coded Computation across Shared Heterogeneous Workers with Communication Delay

Distributed computing enables large-scale computation tasks to be proces...
research
03/01/2021

Gradient Coding with Dynamic Clustering for Straggler-Tolerant Distributed Learning

Distributed implementations are crucial in speeding up large scale machi...
research
01/22/2019

CAMR: Coded Aggregated MapReduce

Many big data algorithms executed on MapReduce-like systems have a shuff...

Please sign up or login with your details

Forgot password? Click here to reset