Using Unused: Non-Invasive Dynamic FaaS Infrastructure with HPC-Whisk

11/01/2022
by   Bartłomiej Przybylski, et al.
0

Modern HPC workload managers and their careful tuning contribute to the high utilization of HPC clusters. However, due to inevitable uncertainty it is impossible to completely avoid node idleness. Although such idle slots are usually too short for any HPC job, they are too long to ignore them. Function-as-a-Service (FaaS) paradigm promisingly fills this gap, and can be a good match, as typical FaaS functions last seconds, not hours. Here we show how to build a FaaS infrastructure on idle nodes in an HPC cluster in such a way that it does not affect the performance of the HPC jobs significantly. We dynamically adapt to a changing set of idle physical machines, by integrating open-source software Slurm and OpenWhisk. We designed and implemented a prototype solution that allowed us to cover up to 90% of the idle time slots on a 50k-core cluster that runs production workloads.

READ FULL TEXT
research
06/22/2020

Multiverse: Dynamic VM Provisioning for Virtualized High Performance Computing Clusters

Traditionally, HPC workloads have been deployed in bare-metal clusters; ...
research
05/03/2018

Why do Users Kill HPC Jobs?

Given the cost of HPC clusters, making best use of them is crucial to im...
research
02/19/2020

Holistic Slowdown Driven Scheduling and Resource Management for Malleable Jobs

In job scheduling, the concept of malleability has been explored since m...
research
06/22/2021

Energy hardware and workload aware job scheduling towards interconnected HPC environments

New HPC machines are getting close to the exascale. Power consumption fo...
research
04/15/2021

Minimizing privilege for building HPC containers

HPC centers face increasing demand for software flexibility, and there i...
research
09/06/2022

Deploying a sharded MongoDB cluster as a queued job on a shared HPC architecture

Data stores are the foundation on which data science, in all its variati...
research
07/12/2018

Virtualizing the Stampede2 Supercomputer with Applications to HPC in the Cloud

Methods developed at the Texas Advanced Computing Center (TACC) are desc...

Please sign up or login with your details

Forgot password? Click here to reset