Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications

05/27/2022
by   Ayesha Afzal, et al.
0

This paper studies the utility of using data analytics and machine learning techniques for identifying, classifying, and characterizing the dynamics of large-scale parallel (MPI) programs. To this end, we run microbenchmarks and realistic proxy applications with the regular compute-communicate structure on two different supercomputing platforms and choose the per-process performance and MPI time per time step as relevant observables. Using principal component analysis, clustering techniques, correlation functions, and a new "phase space plot," we show how desynchronization patterns (or lack thereof) can be readily identified from a data set that is much smaller than a full MPI trace. Our methods also lead the way towards a more general classification of parallel program dynamics.

READ FULL TEXT

page 6

page 7

research
02/23/2023

Making Applications Faster by Asynchronous Execution: Slowing Down Processes or Relaxing MPI Collectives

Comprehending the performance bottlenecks at the core of the intricate h...
research
10/16/2021

Verification of MPI programs

In this paper, we outline an approach to verifying parallel programs. A ...
research
05/15/2020

Elastic execution of checkpointed MPI applications

MPI applications begin with a fixed number of rank and, by default, the ...
research
01/15/2023

Synthesizing Proxy Applications for MPI Programs

Proxy applications (proxy-apps) are basic tools for evaluating the perfo...
research
02/07/2020

Desynchronization and Wave Pattern Formation in MPI-Parallel and Hybrid Memory-Bound Programs

Analytic, first-principles performance modeling of distributed-memory pa...
research
09/27/2018

Performance of MPI sends of non-contiguous data

We present an experimental investigation of the performance of MPI deriv...
research
03/04/2021

Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact

Most distributed-memory bulk-synchronous parallel programs in HPC assume...

Please sign up or login with your details

Forgot password? Click here to reset