Energy hardware and workload aware job scheduling towards interconnected HPC environments

06/22/2021
by   Marco D'Amico, et al.
0

New HPC machines are getting close to the exascale. Power consumption for those machines has been increasing, and researchers are studying ways to reduce it. A second trend is HPC machines' growing complexity, with increasing heterogeneous hardware components and different clusters architectures cooperating in the same machine. We refer to these environments with the term heterogeneous multi-cluster environments. With the aim of optimizing performance and energy consumption in these environments, this paper proposes an Energy-Aware-Multi-Cluster (EAMC) job scheduling policy. EAMC-policy is able to optimize the scheduling and placement of jobs by predicting performance and energy consumption of arriving jobs for different hardware architectures and processor frequencies, reducing workload's energy consumption, makespan, and response time. The policy assigns a different priority to each job-resource combination so that the most efficient ones are favored, while less efficient ones are still considered on a variable degree, reducing response time and increasing cluster utilization. We implemented EAMC-policy in Slurm, and we evaluated a scenario in which two CPU clusters collaborate in the same machine. Simulations of workloads running applications modeled from real-world show a reduction of response time and makespan by up to 25 20 by 49

READ FULL TEXT
research
02/19/2020

Holistic Slowdown Driven Scheduling and Resource Management for Malleable Jobs

In job scheduling, the concept of malleability has been explored since m...
research
06/22/2020

Multiverse: Dynamic VM Provisioning for Virtualized High Performance Computing Clusters

Traditionally, HPC workloads have been deployed in bare-metal clusters; ...
research
08/20/2023

I/O Burst Prediction for HPC Clusters using Darshan Logs

Understanding cluster-wide I/O patterns of large-scale HPC clusters is e...
research
12/11/2019

Energy-aware Scheduling of Jobs in Heterogeneous Cluster Systems Using Deep Reinforcement Learning

Energy consumption is one of the most critical concerns in designing com...
research
11/01/2022

Using Unused: Non-Invasive Dynamic FaaS Infrastructure with HPC-Whisk

Modern HPC workload managers and their careful tuning contribute to the ...
research
03/24/2022

Adaptive job and resource management for the growing quantum cloud

As the popularity of quantum computing continues to grow, efficient quan...
research
04/28/2022

Predicting batch queue job wait times for informed scheduling of urgent HPC workloads

There is increasing interest in the use of HPC machines for urgent workl...

Please sign up or login with your details

Forgot password? Click here to reset