Timely-Throughput Optimal Coded Computing over Cloud Networks

04/11/2019
by   Chien-Sheng Yang, et al.
0

In modern distributed computing systems, unpredictable and unreliable infrastructures result in high variability of computing resources. Meanwhile, there is significantly increasing demand for timely and event-driven services with deadline constraints. Motivated by measurements over Amazon EC2 clusters, we consider a two-state Markov model for variability of computing speed in cloud networks. In this model, each worker can be either in a good state or a bad state in terms of the computation speed, and the transition between these states is modeled as a Markov chain which is unknown to the scheduler. We then consider a Coded Computing framework, in which the data is possibly encoded and stored at the worker nodes in order to provide robustness against nodes that may be in a bad state. With timely computation requests submitted to the system with computation deadlines, our goal is to design the optimal computation-load allocation scheme and the optimal data encoding scheme that maximize the timely computation throughput (i.e, the average number of computation tasks that are accomplished before their deadline). Our main result is the development of a dynamic computation strategy called Lagrange Estimate-and Allocate (LEA) strategy, which achieves the optimal timely computation throughput. It is shown that compared to the static allocation strategy, LEA increases the timely computation throughput by 1.4X - 17.5X in various scenarios via simulations and by 1.27X - 6.5X in experiments over Amazon EC2 clusters

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/19/2019

Edge Computing in the Dark: Leveraging Contextual-Combinatorial Bandit and Coded Computing

With recent advancements in edge computing capabilities, there has been ...
research
06/19/2022

Hierarchical coded elastic computing

Elasticity is offered by cloud service providers to exploit under-utiliz...
research
07/20/2021

A New Design Framework for Heterogeneous Uncoded Storage Elastic Computing

Elasticity is one important feature in modern cloud computing systems an...
research
04/20/2019

Optimal Load Allocation for Coded Distributed Computation in Heterogeneous Clusters

Recently, coding has been a useful technique to mitigate the effect of s...
research
06/02/2020

Age-Based Coded Computation for Bias Reduction in Distributed Learning

Coded computation can be used to speed up distributed learning in the pr...
research
05/14/2019

Coded Distributed Tracking

We consider the problem of tracking the state of a process that evolves ...
research
11/27/2021

DSAG: A mixed synchronous-asynchronous iterative method for straggler-resilient learning

We consider straggler-resilient learning. In many previous works, e.g., ...

Please sign up or login with your details

Forgot password? Click here to reset