Evaluating the Potential of Disaggregated Memory Systems for HPC applications

06/06/2023
by   Nan Ding, et al.
0

Disaggregated memory is a promising approach that addresses the limitations of traditional memory architectures by enabling memory to be decoupled from compute nodes and shared across a data center. Cloud platforms have deployed such systems to improve overall system memory utilization, but performance can vary across workloads. High-performance computing (HPC) is crucial in scientific and engineering applications, where HPC machines also face the issue of underutilized memory. As a result, improving system memory utilization while understanding workload performance is essential for HPC operators. Therefore, learning the potential of a disaggregated memory system before deployment is a critical step. This paper proposes a methodology for exploring the design space of a disaggregated memory system. It incorporates key metrics that affect performance on disaggregated memory systems: memory capacity, local and remote memory access ratio, injection bandwidth, and bisection bandwidth, providing an intuitive approach to guide machine configurations based on technology trends and workload characteristics. We apply our methodology to analyze thirteen diverse workloads, including AI training, data analysis, genomics, protein, fusion, atomic nuclei, and traditional HPC bookends. Our methodology demonstrates the ability to comprehend the potential and pitfalls of a disaggregated memory system and provides motivation for machine configurations. Our results show that eleven of our thirteen applications can leverage injection bandwidth disaggregated memory without affecting performance, while one pays a rack bisection bandwidth penalty and two pay the system-wide bisection bandwidth penalty. In addition, we also show that intra-rack memory disaggregation would meet the application's memory requirement and provide enough remote memory bandwidth.

READ FULL TEXT

page 1

page 4

page 6

page 10

research
11/04/2022

Evaluating Emerging CXL-enabled Memory Pooling for HPC Systems

Current HPC systems provide memory resources that are statically configu...
research
01/09/2023

Efficient Intra-Rack Resource Disaggregation for HPC Using Co-Packaged DWDM Photonics

The diversity of workload requirements and increasing hardware heterogen...
research
04/12/2022

The MIT Supercloud Workload Classification Challenge

High-Performance Computing (HPC) centers and cloud providers support an ...
research
08/28/2023

A Quantitative Approach for Adopting Disaggregated Memory in HPC Systems

Memory disaggregation has recently been adopted in data centers to impro...
research
10/05/2021

Online Application Guidance for Heterogeneous Memory Systems

Many high end and next generation computing systems to incorporated alte...
research
11/21/2022

The AMD Rome Memory Barrier

With the rapid growth of AMD as a competitor in the CPU industry, it is ...
research
01/27/2023

JASS: A Flexible Checkpointing System for NVM-based Systems

NVM-based systems are naturally fit candidates for incorporating periodi...

Please sign up or login with your details

Forgot password? Click here to reset