Modeling Constrained Preemption Dynamics Of Transient Cloud Servers

11/12/2019
by   Prateek Sharma, et al.
0

In this paper, we conduct a first of its kind empirical study and statistical analysis of the preemption behavior of Google's Preemptible VMs, that have a distinguishing characteristic of having a maximum lifetime of 24 hours. This temporal constraint introduces many challenges in preemption modeling, since existing memoryless models are not applicable. We introduce and develop a new probability model of constrained preemptions that is based on a large scale empirical study of over 1,500 VM preemptions. We place our preemption probability model in the framework of reliability theory and use insights from statistical mechanics to understand the general nature of constrained preemptions. To highlight the effectiveness of our model, we develop optimized policies for job scheduling and checkpointing for constrained preemptions. Compared to existing preemption modeling techniques, our model-based policies can reduce the running time of jobs on preemptible VMs by up to 5×, and reduce the probability of job failure by more than 2×. We also implement our policies as part of a batch computing service, which can reduce the cost by 5× compared to conventional cloud deployments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/06/2019

Scheduling in the Presence of Data IntensiveCompute Jobs

We study the performance of non-adaptive schedul-ing policies in computi...
research
12/06/2019

Scheduling in the Presence of Data Intensive Compute Jobs

We study the performance of non-adaptive scheduling policies in computin...
research
08/04/2023

A Deep Dive into the Google Cluster Workload Traces: Analyzing the Application Failure Characteristics and User Behaviors

Large-scale cloud data centers have gained popularity due to their high ...
research
01/22/2022

Scheduling Policies for Stability and Optimal Server Running Cost in Cloud Computing Platforms

We propose throughput and cost optimal job scheduling algorithms in clou...
research
02/11/2022

Incentive Compatible Queues Without Money

For job scheduling systems, where jobs require some amount of processing...
research
08/06/2020

Learning Insulin-Glucose Dynamics in the Wild

We develop a new model of insulin-glucose dynamics for forecasting blood...
research
01/03/2022

Balanced Nonadaptive Redundancy Scheduling

Distributed computing systems implement redundancy to reduce the job com...

Please sign up or login with your details

Forgot password? Click here to reset