Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact

03/04/2021
by   Ayesha Afzal, et al.
0

Most distributed-memory bulk-synchronous parallel programs in HPC assume that compute resources are available continuously and homogeneously across the allocated set of compute nodes. However, long one-off delays on individual processes can cause global disturbances, so-called idle waves, by rippling through the system. This process is mainly governed by the communication topology of the underlying parallel code. This paper makes significant contributions to the understanding of idle wave dynamics. We study the propagation mechanisms of idle waves across the ranks of MPI-parallel programs. We present a validated analytic model for their propagation velocity with respect to communication parameters and topology, with a special emphasis on sparse communication patterns. We study the interaction of idle waves with MPI collectives and show that, depending on the implementation, a collective may be transparent to the wave. Finally we analyze two mechanisms of idle wave decay: topological decay, which is rooted in differences in communication characteristics among parts of the system, and noise-induced decay, which is caused by system or application noise. We show that noise-induced decay is largely independent of noise characteristics but depends only on the overall noise power. An analytic expression for idle wave decay rate with respect to noise power is derived. For model validation we use microbenchmarks and stencil algorithms on three different supercomputing platforms.

READ FULL TEXT
research
05/25/2019

Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study

Analytic, first-principles performance modeling of distributed-memory ap...
research
05/09/2022

The Role of Idle Waves, Desynchronization, and Bottleneck Evasion in the Performance of Parallel Programs

The performance of highly parallel applications on distributed-memory sy...
research
02/07/2020

Desynchronization and Wave Pattern Formation in MPI-Parallel and Hybrid Memory-Bound Programs

Analytic, first-principles performance modeling of distributed-memory pa...
research
11/29/2021

Estimating earthquake-induced tsunami inundation probabilities without sampling

Given a distribution of earthquake-induced seafloor elevations, we prese...
research
07/18/2022

Development of Massively Parallel Near Peak Performance Solvers for Three-Dimensional Geodynamic Modelling

We address in this thesis the current need to design new parallel algori...
research
05/27/2022

Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications

This paper studies the utility of using data analytics and machine learn...

Please sign up or login with your details

Forgot password? Click here to reset