Modeling Constrained Preemption Dynamics Of Transient Cloud Servers

11/12/2019

∙

In this paper, we conduct a first of its kind empirical study and statistical analysis of the preemption behavior of Google's Preemptible VMs, that have a distinguishing characteristic of having a maximum lifetime of 24 hours. This temporal constraint introduces many challenges in preemption modeling, since existing memoryless models are not applicable. We introduce and develop a new probability model of constrained preemptions that is based on a large scale empirical study of over 1,500 VM preemptions. We place our preemption probability model in the framework of reliability theory and use insights from statistical mechanics to understand the general nature of constrained preemptions. To highlight the effectiveness of our model, we develop optimized policies for job scheduling and checkpointing for constrained preemptions. Compared to existing preemption modeling techniques, our model-based policies can reduce the running time of jobs on preemptible VMs by up to 5×, and reduce the probability of job failure by more than 2×. We also implement our policies as part of a batch computing service, which can reduce the cost by 5× compared to conventional cloud deployments.

READ FULL TEXT

Modeling Constrained Preemption Dynamics Of Transient Cloud Servers

Sign in with Google

Consider DeepAI Pro