DeepAI AI Chat
Log In Sign Up

heSRPT: Parallel Scheduling to Minimize Mean Slowdown

by   Benjamin Berg, et al.

Modern data centers serve workloads which are capable of exploiting parallelism. When a job parallelizes across multiple servers it will complete more quickly, but jobs receive diminishing returns from being allocated additional servers. Because allocating multiple servers to a single job is inefficient, it is unclear how best to allocate a fixed number of servers between many parallelizable jobs. This paper provides the first optimal allocation policy for minimizing the mean slowdown of parallelizable jobs of known size when all jobs are present at time 0. Our policy provides a simple closed form formula for the optimal allocations at every moment in time. Minimizing mean slowdown usually requires favoring short jobs over long ones (as in the SRPT policy). However, because parallelizable jobs have sublinear speedup functions, system efficiency is also an issue. System efficiency is maximized by giving equal allocations to all jobs and thus competes with the goal of prioritizing small jobs. Our optimal policy, high-efficiency SRPT (heSRPT), balances these competing goals. heSRPT completes jobs according to their size order, but maintains overall system efficiency by allocating some servers to each job at every moment in time. Our results generalize to also provide the optimal allocation policy with respect to mean flow time. Finally, we consider the online case where jobs arrive to the system over time. While optimizing mean slowdown in the online setting is even more difficult, we find that heSRPT provides an excellent heuristic policy for the online setting. In fact, our simulations show that heSRPT significantly outperforms state-of-the-art allocation policies for parallelizable jobs.


page 10

page 22

page 30


heSRPT: Optimal Parallel Scheduling of Jobs With Known Sizes

When parallelizing a set of jobs across many servers, one must balance a...

Optimal Resource Allocation for Elastic and Inelastic Jobs

Modern data centers are tasked with processing heterogeneous workloads c...

Towards Optimality in Parallel Scheduling

To keep pace with Moore's law, chip designers have focused on increasing...

Scheduling to Optimize Sojourn Time of Successful Jobs

Deep neural networks training jobs and other iterative computations freq...

Matching Queues, Flexibility and Incentives

Motivated in part by online marketplaces such as ridesharing and freelan...

Online Optimization for Randomized Network Resource Allocation with Long-Term Constraints

In this paper, we study an optimal online resource reservation problem i...

Energy-Efficient Job-Assignment Policy with Asymptotically Guaranteed Performance Deviation

We study a job-assignment problem in a large-scale server farm system wi...