BoPF: Mitigating the Burstiness-Fairness Tradeoff in Multi-Resource Clusters

12/07/2019
by   Tan N. Le, et al.
0

Simultaneously supporting latency- and throughout-sensitive workloads in a shared environment is an increasingly more common challenge in big data clusters. Despite many advances, existing cluster schedulers force the same performance goal - fairness in most cases - on all jobs. Latency-sensitive jobs suffer, while throughput-sensitive ones thrive. Using prioritization does the opposite: it opens up a path for latency-sensitive jobs to dominate. In this paper, we tackle the challenges in supporting both short-term performance and long-term fairness simultaneously with high resource utilization by proposing Bounded Priority Fairness (BoPF). BoPF provides short-term resource guarantees to latency-sensitive jobs and maintains long-term fairness for throughput-sensitive jobs. BoPF is the first scheduler that can provide long-term fairness, burst guarantee, and Pareto efficiency in a strategyproof manner for multi-resource scheduling. Deployments and large-scale simulations show that BoPF closely approximates the performance of Strict Priority as well as the fairness characteristics of DRF. In deployments, BoPF speeds up latency-sensitive jobs by 5.38 times compared to DRF, while still maintaining long-term fairness. In the meantime, BoPF improves the average completion times of throughput-sensitive jobs by up to 3.05 times compared to Strict Priority.

READ FULL TEXT

page 2

page 10

research
09/12/2019

Differential Approximation and Sprinting for Multi-Priority Big Data Engines

Today's big data clusters based on the MapReduce paradigm are capable of...
research
08/25/2021

Node-Based Job Scheduling for Large Scale Simulations of Short Running Jobs

Diverse workloads such as interactive supercomputing, big data analysis,...
research
03/22/2019

heSRPT: Optimal Parallel Scheduling of Jobs With Known Sizes

When parallelizing a set of jobs across many servers, one must balance a...
research
07/02/2019

Themis: Fair and Efficient GPU Cluster Scheduling for Machine Learning Workloads

Modern distributed machine learning (ML) training workloads benefit sign...
research
05/22/2019

Opportunistic Temporal Fair Mode Selection and User Scheduling for Full-duplex Systems

In-band full-duplex (FD) communications - enabled by recent advances in ...
research
08/20/2018

Throughput Optimization of Coexistent LTE-U and WiFi in Next Generation Networks

Next generation networks are envisioned to have ubiquitous availability ...
research
04/25/2016

Do the Hard Stuff First: Scheduling Dependent Computations in Data-Analytics Clusters

We present a scheduler that improves cluster utilization and job complet...

Please sign up or login with your details

Forgot password? Click here to reset