A Quantitative Approach for Adopting Disaggregated Memory in HPC Systems

08/28/2023
by   Jacob Wahlgren, et al.
0

Memory disaggregation has recently been adopted in data centers to improve resource utilization, motivated by cost and sustainability. Recent studies on large-scale HPC facilities have also highlighted memory underutilization. A promising and non-disruptive option for memory disaggregation is rack-scale memory pooling, where shared memory pools supplement node-local memory. This work outlines the prospects and requirements for adoption and clarifies several misconceptions. We propose a quantitative method for dissecting application requirements on the memory system from the top down in three levels, moving from general, to multi-tier memory systems, and then to memory pooling. We provide a multi-level profiling tool and LBench to facilitate the quantitative approach. We evaluate a set of representative HPC workloads on an emulated platform. Our results show that prefetching activities can significantly influence memory traffic profiles. Interference in memory pooling has varied impacts on applications, depending on their access ratios to memory tiers and arithmetic intensities. Finally, in two case studies, we show the benefits of our findings at the application and system levels, achieving 50 remote access and 13 co-located workloads in interference-aware job scheduling.

READ FULL TEXT
research
11/04/2022

Evaluating Emerging CXL-enabled Memory Pooling for HPC Systems

Current HPC systems provide memory resources that are statically configu...
research
01/12/2023

Analyzing Resource Utilization in an HPC System: A Case Study of NERSC Perlmutter

The resource demands of HPC applications vary significantly. However, it...
research
06/06/2023

Evaluating the Potential of Disaggregated Memory Systems for HPC applications

Disaggregated memory is a promising approach that addresses the limitati...
research
03/01/2022

Pond: CXL-Based Memory Pooling Systems for Cloud Platforms

Public cloud providers seek to meet stringent performance requirements a...
research
11/21/2022

Fine-Grained Scheduling for Containerized HPC Workloads in Kubernetes Clusters

Containerization technology offers lightweight OS-level virtualization, ...
research
03/16/2021

Intelligent colocation of HPC workloads

Many HPC applications suffer from a bottleneck in the shared caches, ins...
research
05/26/2021

Towards Million-Server Network Simulations on Just a Laptop

The growing size of data center and HPC networks pose unprecedented requ...

Please sign up or login with your details

Forgot password? Click here to reset