Making Applications Faster by Asynchronous Execution: Slowing Down Processes or Relaxing MPI Collectives

02/23/2023
by   Ayesha Afzal, et al.
0

Comprehending the performance bottlenecks at the core of the intricate hardware-software interactions exhibited by highly parallel programs on HPC clusters is crucial. This paper sheds light on the issue of automatically asynchronous MPI communication in memory-bound parallel programs on multicore clusters and how it can be facilitated. For instance, slowing down MPI processes by deliberate injection of delays can improve performance if certain conditions are met. This leads to the counter-intuitive conclusion that noise, independent of its source, is not always detrimental but can be leveraged for performance improvements. We employ phase-space graphs as a new tool to visualize parallel program dynamics. They are useful in spotting certain patterns in parallel execution that will easily go unnoticed with traditional tracing tools. We investigate five different microbenchmarks and applications on different supercomputer platforms: an MPI-augmented STREAM Triad, two implementations of Lattice-Boltzmann fluid solvers, and the LULESH and HPCG proxy applications.

READ FULL TEXT

page 7

page 12

page 14

page 15

page 18

research
05/15/2020

Elastic execution of checkpointed MPI applications

MPI applications begin with a fixed number of rank and, by default, the ...
research
05/27/2022

Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications

This paper studies the utility of using data analytics and machine learn...
research
11/12/2020

Fibers are not (P)Threads: The Case for Loose Coupling of Asynchronous Programming Models and MPI Through Continuations

Asynchronous programming models (APM) are gaining more and more traction...
research
01/15/2023

Synthesizing Proxy Applications for MPI Programs

Proxy applications (proxy-apps) are basic tools for evaluating the perfo...
research
05/09/2022

The Role of Idle Waves, Desynchronization, and Bottleneck Evasion in the Performance of Parallel Programs

The performance of highly parallel applications on distributed-memory sy...
research
11/06/2017

Enabling rootless Linux Containers in multi-user environments: the udocker tool

Containers are increasingly used as means to distribute and run Linux se...
research
02/07/2020

Desynchronization and Wave Pattern Formation in MPI-Parallel and Hybrid Memory-Bound Programs

Analytic, first-principles performance modeling of distributed-memory pa...

Please sign up or login with your details

Forgot password? Click here to reset