Chronos: A Unifying Optimization Framework for Speculative Execution of Deadline-critical MapReduce Jobs

04/16/2018
by   Maotong Xu, et al.
0

Meeting desired application deadlines in cloud processing systems such as MapReduce is crucial as the nature of cloud applications is becoming increasingly mission-critical and deadline-sensitive. It has been shown that the execution times of MapReduce jobs are often adversely impacted by a few slow tasks, known as stragglers, which result in high latency and deadline violations. While a number of strategies have been developed in existing work to mitigate stragglers by launching speculative or clone task attempts, none of them provides a quantitative framework that optimizes the speculative execution for offering guaranteed Service Level Agreements (SLAs) to meet application deadlines. In this paper, we bring several speculative scheduling strategies together under a unifying optimization framework, called Chronos, which defines a new metric, Probability of Completion before Deadlines (PoCD), to measure the probability that MapReduce jobs meet their desired deadlines. We systematically analyze PoCD for popular strategies including Clone, Speculative-Restart, and Speculative-Resume, and quantify their PoCD in closed-form. The result illuminates an important tradeoff between PoCD and the cost of speculative execution, measured by the total (virtual) machine time required under different strategies. We propose an optimization problem to jointly optimize PoCD and execution cost in different strategies, and develop an algorithmic solution that is guaranteed to be optimal. Chronos is prototyped on Hadoop MapReduce and evaluated against three baseline strategies using both experiments and trace-driven simulations, achieving 50 with up to 80

READ FULL TEXT
research
06/05/2020

Skedulix: Hybrid Cloud Scheduling for Cost-Efficient Execution of Serverless Applications

We present a framework for scheduling multifunction serverless applicati...
research
04/12/2020

QoS-Driven Job Scheduling: Multi-Tier Dependency Considerations

For a cloud service provider, delivering optimal system performance whil...
research
11/10/2020

Scheduling Bag-of-Tasks in Clouds using Spot and Burstable Virtual Machines

Leading Cloud providers offer several types of Virtual Machines (VMs) in...
research
11/05/2021

SLA-Driven Load Scheduling in Multi-Tier Cloud Computing: Financial Impact Considerations

A cloud service provider strives to provide a high Quality of Service (Q...
research
02/21/2022

Non-Clairvoyant Scheduling with Predictions Revisited

In non-clairvoyant scheduling, the task is to find an online strategy fo...
research
12/31/2021

BatchLens: A Visualization Approach for Analyzing Batch Jobs in Cloud Systems

Cloud systems are becoming increasingly powerful and complex. It is high...
research
11/09/2020

TrimTuner: Efficient Optimization of Machine Learning Jobs in the Cloud via Sub-Sampling

This work introduces TrimTuner, the first system for optimizing machine ...

Please sign up or login with your details

Forgot password? Click here to reset