Quantitative Verification of Scheduling Heuristics

01/10/2023
by   Saksham Goel, et al.
0

Computer systems use many scheduling heuristics to allocate resources. Understanding their performance properties is hard because it requires a representative workload and extensive code instrumentation. As a result, widely deployed schedulers can make poor decisions leading to unpredictable performance. We propose a methodology to study their specification using automated verification tools to search for performance issues over a large set of workloads, system characteristics and implementation details. Our key insight is that much of the complexity of the system can be overapproximated without oversimplification, allowing system and heuristic developers to quickly and confidently characterize the performance of their designs. We showcase the power of our methodology through four case studies. First, we produce bounds on the performance of two classical algorithms, SRPT scheduling and work stealing, under practical assumptions. Then, we create a model that identifies two bugs in the Linux CFS scheduler. Finally, we verify a recently made observation that TCP unfairness can cause some ML training workloads to spontaneously converge to a state of high network utilization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2021

Deep Reinforcement Agent for Scheduling in HPC

Cluster scheduler is crucial in high-performance computing (HPC). It det...
research
12/27/2019

URSA: Precise Capacity Planning and Contention-aware Scheduling for Public Clouds

Database platform-as-a-service (dbPaaS) is developing rapidly and a larg...
research
12/07/2022

SDRM3: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads

Emerging real-time multi-model ML (RTMM) workloads such as AR/VR and dro...
research
11/22/2021

KML: Using Machine Learning to Improve Storage Systems

Operating systems include many heuristic algorithms designed to improve ...
research
11/08/2010

Use of Data Mining in Scheduler Optimization

The operating system's role in a computer system is to manage the variou...
research
11/17/2017

RLWS: A Reinforcement Learning based GPU Warp Scheduler

The Streaming Multiprocessors (SMs) of a Graphics Processing Unit (GPU) ...

Please sign up or login with your details

Forgot password? Click here to reset