Optimal Resource Allocation for Elastic and Inelastic Jobs

05/19/2020
by   Benjamin Berg, et al.
0

Modern data centers are tasked with processing heterogeneous workloads consisting of various classes of jobs. These classes differ in their arrival rates, size distributions, and job parallelizability. With respect to paralellizability, some jobs are elastic, meaning they can parallelize linearly across many servers. Other jobs are inelastic, meaning they can only run on a single server. Although job classes can differ drastically, they are typically forced to share a single cluster. When sharing a cluster among heterogeneous jobs, one must decide how to allocate servers to each job at every moment in time. In this paper, we design and analyze allocation policies which aim to minimize the mean response time across jobs, where a job's response time is the time from when it arrives until it completes. We model this problem in a stochastic setting where each job may be elastic or inelastic. Job sizes are drawn from exponential distributions, but are unknown to the system. We show that, in the common case where elastic jobs are larger on average than inelastic jobs, the optimal allocation policy is Inelastic-First, giving inelastic jobs preemptive priority over elastic jobs. We obtain this result by introducing a novel sample path argument. We also show that there exist cases where Elastic-First (giving priority to elastic jobs) performs better than Inelastic-First. We then provide the first analysis of mean response time under both Elastic-First and Inelastic-First by leveraging recent techniques for solving high-dimensional Markov chains.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2020

heSRPT: Parallel Scheduling to Minimize Mean Slowdown

Modern data centers serve workloads which are capable of exploiting para...
research
03/22/2019

heSRPT: Optimal Parallel Scheduling of Jobs With Known Sizes

When parallelizing a set of jobs across many servers, one must balance a...
research
02/16/2022

Aryl: An Elastic Cluster Scheduler for Deep Learning

Companies build separate training and inference GPU clusters for deep le...
research
06/24/2020

Effective Elastic Scaling of Deep Learning Workloads

The increased use of deep learning (DL) in academia, government and indu...
research
11/08/2017

Performance of Balanced Fairness in Resource Pools: A Recursive Approach

Understanding the performance of a pool of servers is crucial for proper...
research
07/22/2017

Towards Optimality in Parallel Scheduling

To keep pace with Moore's law, chip designers have focused on increasing...
research
09/12/2019

Differential Approximation and Sprinting for Multi-Priority Big Data Engines

Today's big data clusters based on the MapReduce paradigm are capable of...

Please sign up or login with your details

Forgot password? Click here to reset