CloudCoaster: Transient-aware Bursty Datacenter Workload Scheduling

07/03/2019
by   Samuel S. Ogden, et al.
0

Today's clusters often have to divide resources among a diverse set of jobs. These jobs are heterogeneous both in execution time and in their rate of arrival. Execution time heterogeneity has lead to the development of hybrid schedulers that can schedule both short and long jobs to ensure good task placement. However, arrival rate heterogeneity, or burstiness, remains a problem in existing schedulers. These hybrid schedulers manage resources on statically provisioned cluster, which can quickly be overwhelmed by bursts in the number of arriving jobs. In this paper we propose CloudCoaster, a hybrid scheduler that dynamically resizes the cluster by leveraging cheap transient servers. CloudCoaster schedules jobs in an intelligent way that increases job performance while reducing overall resource cost. We evaluate the effectiveness of CloudCoaster through simulations on real-world traces and compare it against a state-of-art hybrid scheduler. CloudCoaster improves the average queueing delay time of short jobs by 4.8X while maintaining long job performance. In addition, CloudCoaster reduces the short partition budget by over 29.5

READ FULL TEXT
research
06/25/2020

Sequence-to-sequence models for workload interference

Co-scheduling of jobs in data-centers is a challenging scenario, where j...
research
08/19/2020

End-to-End Predictions-Based Resource Management Framework for Supercomputer Jobs

Job submissions of parallel applications to production supercomputer sys...
research
08/24/2018

Hybrid Job-driven Scheduling for Virtual MapReduce Clusters

It is cost-efficient for a tenant with a limited budget to establish a v...
research
09/19/2019

When Two is Worse Than One

This note is concerned with the impact on job latency of splitting a tok...
research
04/09/2018

PingAn: An Insurance Scheme for Job Acceleration in Geo-distributed Big Data Analytics System

Geo-distributed data analysis in a cloud-edge system is emerging as a da...
research
05/02/2016

Highly Accurate Prediction of Jobs Runtime Classes

Separating the short jobs from the long is a known technique to improve ...
research
03/04/2019

Workflow Scheduling in the Cloud with Weighted Upward-rank Priority Scheme Using Random Walk and Uniform Spare Budget Splitting

We study a difficult problem of how to schedule complex workflows with p...

Please sign up or login with your details

Forgot password? Click here to reset