Modeling Data Movement Performance on Heterogeneous Architectures

10/20/2020
by   Amanda Bienz, et al.
0

The cost of data movement on parallel systems varies greatly with machine architecture, job partition, and even nearby jobs. Performance models that accurately capture the cost of data movement provide a tool for analysis, allowing for communication bottlenecks to be pinpointed. Modern heterogeneous architectures yield increased variance in data movement as there are a number of viable paths for inter-GPU communication. In this paper, we present performance models for the various paths of inter-node communication on modern heterogeneous architectures. We model the performance of utilizing all available CPU cores as well as the benefit of copying data to the CPUs when sending many messages. Finally, we present optimizations for a variety of MPI collectives based on the performance expectations provided by these models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2022

Characterizing the Performance of Node-Aware Strategies for Irregular Point-to-Point Communication on Heterogeneous Architectures

Supercomputer architectures are trending toward higher computational thr...
research
10/27/2017

Performance optimizations for scalable CFD applications on hybrid CPU+MIC heterogeneous computing system with millions of cores

For computational fluid dynamics (CFD) applications with a large number ...
research
08/09/2022

Exploring GPU Stream-Aware Message Passing using Triggered Operations

Modern heterogeneous supercomputing systems are comprised of compute bla...
research
06/27/2023

Exploring Fully Offloaded GPU Stream-Aware Message Passing

Modern heterogeneous supercomputing systems are comprised of CPUs, GPUs,...
research
03/05/2023

Runtime Support for Performance Portability on Heterogeneous Distributed Platforms

Hardware heterogeneity is here to stay for high-performance computing. L...
research
04/25/2018

Geometric Partitioning and Ordering Strategies for Task Mapping on Parallel Computers

We present a new method for mapping applications' MPI tasks to cores of ...
research
12/13/2022

ALP: Alleviating CPU-Memory Data Movement Overheads in Memory-Centric Systems

Partitioning applications between NDP and host CPU cores causes inter-se...

Please sign up or login with your details

Forgot password? Click here to reset