Exploring GPU Stream-Aware Message Passing using Triggered Operations

08/09/2022
by   Naveen Namashivayam, et al.
0

Modern heterogeneous supercomputing systems are comprised of compute blades that offer CPUs and GPUs. On such systems, it is essential to move data efficiently between these different compute engines across a high-speed network. While current generation scientific applications and systems software stacks are GPU-aware, CPU threads are still required to orchestrate data moving communication operations and inter-process synchronization operations. A new GPU stream-aware MPI communication strategy called stream-triggered (ST) communication is explored to allow offloading both computation and communication control paths to the GPU. The proposed ST communication strategy is implemented on HPE Slingshot Interconnects over a new proprietary HPE Slingshot NIC (Slingshot 11) using the supported triggered operations feature. Performance of the proposed new communication strategy is evaluated using a microbenchmark kernel called Faces, based on the nearest-neighbor communication pattern in the CORAL-2 Nekbone benchmark, over a heterogeneous node architecture consisting of AMD CPUs and GPUs.

READ FULL TEXT

page 1

page 7

page 8

research
06/27/2023

Exploring Fully Offloaded GPU Stream-Aware Message Passing

Modern heterogeneous supercomputing systems are comprised of CPUs, GPUs,...
research
01/21/2021

Efficient MPI-based Communication for GPU-Accelerated Dask Applications

Dask is a popular parallel and distributed computing framework, which ri...
research
09/13/2022

Characterizing the Performance of Node-Aware Strategies for Irregular Point-to-Point Communication on Heterogeneous Architectures

Supercomputer architectures are trending toward higher computational thr...
research
10/20/2020

Modeling Data Movement Performance on Heterogeneous Architectures

The cost of data movement on parallel systems varies greatly with machin...
research
12/14/2018

An Empirical Evaluation of Allgatherv on Multi-GPU Systems

Applications for deep learning and big data analytics have compute and m...
research
03/04/2022

Machine Learning for CUDA+MPI Design Rules

We present a new strategy for automatically exploring the design space o...
research
11/04/2021

Safe and Practical GPU Acceleration in TrustZone

We present a holistic design for GPU-accelerated computation in TrustZon...

Please sign up or login with your details

Forgot password? Click here to reset