Desynchronization and Wave Pattern Formation in MPI-Parallel and Hybrid Memory-Bound Programs

02/07/2020
by   Ayesha Afzal, et al.
0

Analytic, first-principles performance modeling of distributed-memory parallel codes is notoriously imprecise. Even for applications with extremely regular and homogeneous compute-communicate phases, simply adding communication time to computation time does often not yield a satisfactory prediction of parallel runtime due to deviations from the expected simple lockstep pattern caused by system noise, variations in communication time, and inherent load imbalance. In this paper, we highlight the specific cases of provoked and spontaneous desynchronization of memory-bound, bulk-synchronous pure MPI and hybrid MPI+OpenMP programs. Using simple microbenchmarks we observe that although desynchronization can introduce increased waiting time per process, it does not necessarily cause lower resource utilization but can lead to an increase in available bandwidth per core. In case of significant communication overhead, even natural noise can shove the system into a state of automatic overlap of communication and computation, improving the overall time to solution. The saturation point, i.e., the number of processes per memory domain required to achieve full memory bandwidth, is pivotal in the dynamics of this process and the emerging stable wave pattern. We also demonstrate how hybrid MPI-OpenMP programming can prevent desirable desynchronization by eliminating the bandwidth bottleneck among processes. A Chebyshev filter diagonalization application is used to demonstrate some of the observed effects in a realistic setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/09/2022

The Role of Idle Waves, Desynchronization, and Bottleneck Evasion in the Performance of Parallel Programs

The performance of highly parallel applications on distributed-memory sy...
research
12/18/2019

HDOT – an Approach Towards Productive Programming of Hybrid Applications

MPI applications matter. However, with the advent of many-core processor...
research
03/04/2021

Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact

Most distributed-memory bulk-synchronous parallel programs in HPC assume...
research
07/01/2018

Framework for the hybrid parallelisation of simulation codes

Writing efficient hybrid parallel code is tedious, error-prone, and requ...
research
02/23/2023

Making Applications Faster by Asynchronous Execution: Slowing Down Processes or Relaxing MPI Collectives

Comprehending the performance bottlenecks at the core of the intricate h...
research
05/27/2022

Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications

This paper studies the utility of using data analytics and machine learn...
research
06/29/2019

Open-MPI over MOSIX: paralleled computing in a clustered world

Recent increased interest in Cloud computing emphasizes the need to find...

Please sign up or login with your details

Forgot password? Click here to reset