Deep Reinforcement Agent for Scheduling in HPC

02/11/2021
by   Yuping Fan, et al.
0

Cluster scheduler is crucial in high-performance computing (HPC). It determines when and which user jobs should be allocated to available system resources. Existing cluster scheduling heuristics are developed by human experts based on their experience with specific HPC systems and workloads. However, the increasing complexity of computing systems and the highly dynamic nature of application workloads have placed tremendous burden on manually designed and tuned scheduling heuristics. More aggressive optimization and automation are needed for cluster scheduling in HPC. In this work, we present an automated HPC scheduling agent named DRAS (Deep Reinforcement Agent for Scheduling) by leveraging deep reinforcement learning. DRAS is built on a novel, hierarchical neural network incorporating special HPC scheduling features such as resource reservation and backfilling. A unique training strategy is presented to enable DRAS to rapidly learn the target environment. Once being provided a specific scheduling objective given by system manager, DRAS automatically learns to improve its policy through interaction with the scheduling environment and dynamically adjusts its policy as workload changes. The experiments with different production workloads demonstrate that DRAS outperforms the existing heuristic and optimization approaches by up to 45

READ FULL TEXT

page 1

page 4

page 6

research
10/20/2019

RLScheduler: Learn to Schedule HPC Batch Jobs Using Deep Reinforcement Learning

We present RLScheduler, a deep reinforcement learning based job schedule...
research
05/16/2021

DRAS-CQSim: A Reinforcement Learning based Framework for HPC Cluster Scheduling

For decades, system administrators have been striving to design and tune...
research
12/11/2019

Energy-aware Scheduling of Jobs in Heterogeneous Cluster Systems Using Deep Reinforcement Learning

Energy consumption is one of the most critical concerns in designing com...
research
01/10/2023

Quantitative Verification of Scheduling Heuristics

Computer systems use many scheduling heuristics to allocate resources. U...
research
09/12/2021

Hybrid Workload Scheduling on HPC Systems

Traditionally, on-demand, rigid, and malleable applications have been sc...
research
07/21/2020

Buffer Pool Aware Query Scheduling via Deep Reinforcement Learning

In this extended abstract, we propose a new technique for query scheduli...
research
02/09/2021

Scheduling the NASA Deep Space Network with Deep Reinforcement Learning

With three complexes spread evenly across the Earth, NASA's Deep Space N...

Please sign up or login with your details

Forgot password? Click here to reset