Characterizing the Performance of Node-Aware Strategies for Irregular Point-to-Point Communication on Heterogeneous Architectures

09/13/2022
by   Shelby Lockhart, et al.
0

Supercomputer architectures are trending toward higher computational throughput due to the inclusion of heterogeneous compute nodes. These multi-GPU nodes increase on-node computational efficiency, while also increasing the amount of data to be communicated and the number of potential data flow paths. In this work, we characterize the performance of irregular point-to-point communication with MPI on heterogeneous compute environments through performance modeling, demonstrating the limitations of standard communication strategies for both device-aware and staging-through-host communication techniques. Presented models suggest staging communicated data through host processes then using node-aware communication strategies for high inter-node message counts. Notably, the models also predict that node-aware communication utilizing all available CPU cores to communicate inter-node data leads to the most performant strategy when communicating with a high number of nodes. Model validation is provided via a case study of irregular point-to-point communication patterns in distributed sparse matrix-vector products. Importantly, we include a discussion on the implications model predictions have on communication strategy design for emerging supercomputer architectures.

READ FULL TEXT

page 3

page 6

research
06/06/2018

Improving Performance Models for Irregular Point-to-Point Communication

Parallel applications are often unable to take full advantage of emergin...
research
10/20/2020

Modeling Data Movement Performance on Heterogeneous Architectures

The cost of data movement on parallel systems varies greatly with machin...
research
08/09/2022

Exploring GPU Stream-Aware Message Passing using Triggered Operations

Modern heterogeneous supercomputing systems are comprised of compute bla...
research
01/05/2022

Proxying ROS communications – enabling containerized ROS deployments in distributed multi-host environments

With the ability to use containers at the edge, they pose a unified solu...
research
02/02/2021

Customizing Graph500 for Tianhe Pre-exacale system

BFS (Breadth-First Search) is a typical graph algorithm used as a key co...
research
11/20/2022

Best-Effort Communication Improves Performance and Scales Robustly on Conventional Hardware

Here, we test the performance and scalability of fully-asynchronous, bes...
research
03/11/2022

Performance Analysis and Optimal Node-Aware Communication for Enlarged Conjugate Gradient Methods

Krylov methods are a key way of solving large sparse linear systems of e...

Please sign up or login with your details

Forgot password? Click here to reset