Analyzing Resource Utilization in an HPC System: A Case Study of NERSC Perlmutter

01/12/2023
by   Jie Li, et al.
0

The resource demands of HPC applications vary significantly. However, it is common for HPC systems to assign resources on a per-node basis to prevent interference from co-located workloads. This gap between the coarse-grained resource allocation and the varying resource demands can lead to underutilization of HPC resources. In this study, we comprehensively analyzed the resource usage and characteristics of NERSC Perlmutter, a state-of-the-art HPC system with both CPU-only and GPU-accelerated nodes. Our three-week usage analysis revealed that the majority of jobs had low CPU utilization and that around 86 host memory. Additionally, 52.1 memory, and the memory capacity was over-provisioned in some ways for all jobs. The study also found that 60 indicate that resource underutilization may occur as users adapt workflows to a system with new resources. Our research provides valuable insights into performance characterization and offers new perspectives for system operators to understand and track the migration of workloads. Furthermore, it can be extremely useful for designing, optimizing, and procuring HPC systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2020

Multiverse: Dynamic VM Provisioning for Virtualized High Performance Computing Clusters

Traditionally, HPC workloads have been deployed in bare-metal clusters; ...
research
09/29/2016

DynIMS: A Dynamic Memory Controller for In-memory Storage on HPC Systems

In order to boost the performance of data-intensive computing on HPC sys...
research
04/12/2022

The MIT Supercloud Workload Classification Challenge

High-Performance Computing (HPC) centers and cloud providers support an ...
research
01/12/2018

A Workload Analysis of NSF's Innovative HPC Resources Using XDMoD

Workload characterization is an integral part of performance analysis of...
research
08/28/2023

A Quantitative Approach for Adopting Disaggregated Memory in HPC Systems

Memory disaggregation has recently been adopted in data centers to impro...
research
08/04/2021

The MIT Supercloud Dataset

Artificial intelligence (AI) and Machine learning (ML) workloads are an ...
research
03/24/2022

Quantum Computing in the Cloud: Analyzing job and machine characteristics

As the popularity of quantum computing continues to grow, quantum machin...

Please sign up or login with your details

Forgot password? Click here to reset