DeepAI AI Chat
Log In Sign Up

Analyzing Resource Utilization in an HPC System: A Case Study of NERSC Perlmutter

01/12/2023
by   Jie Li, et al.
0

The resource demands of HPC applications vary significantly. However, it is common for HPC systems to assign resources on a per-node basis to prevent interference from co-located workloads. This gap between the coarse-grained resource allocation and the varying resource demands can lead to underutilization of HPC resources. In this study, we comprehensively analyzed the resource usage and characteristics of NERSC Perlmutter, a state-of-the-art HPC system with both CPU-only and GPU-accelerated nodes. Our three-week usage analysis revealed that the majority of jobs had low CPU utilization and that around 86 host memory. Additionally, 52.1 memory, and the memory capacity was over-provisioned in some ways for all jobs. The study also found that 60 indicate that resource underutilization may occur as users adapt workflows to a system with new resources. Our research provides valuable insights into performance characterization and offers new perspectives for system operators to understand and track the migration of workloads. Furthermore, it can be extremely useful for designing, optimizing, and procuring HPC systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

06/22/2020

Multiverse: Dynamic VM Provisioning for Virtualized High Performance Computing Clusters

Traditionally, HPC workloads have been deployed in bare-metal clusters; ...
11/04/2022

Evaluating Emerging CXL-enabled Memory Pooling for HPC Systems

Current HPC systems provide memory resources that are statically configu...
04/12/2022

The MIT Supercloud Workload Classification Challenge

High-Performance Computing (HPC) centers and cloud providers support an ...
01/12/2018

A Workload Analysis of NSF's Innovative HPC Resources Using XDMoD

Workload characterization is an integral part of performance analysis of...
08/04/2021

The MIT Supercloud Dataset

Artificial intelligence (AI) and Machine learning (ML) workloads are an ...
08/14/2020

The Impact of Auto-Refactoring Code Smells on the Resource Utilization of Cloud Software

Cloud-based software-as-a-service (SaaS) have gained popularity due to t...
03/24/2022

Quantum Computing in the Cloud: Analyzing job and machine characteristics

As the popularity of quantum computing continues to grow, quantum machin...