Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact

by   Ayesha Afzal, et al.

Most distributed-memory bulk-synchronous parallel programs in HPC assume that compute resources are available continuously and homogeneously across the allocated set of compute nodes. However, long one-off delays on individual processes can cause global disturbances, so-called idle waves, by rippling through the system. This process is mainly governed by the communication topology of the underlying parallel code. This paper makes significant contributions to the understanding of idle wave dynamics. We study the propagation mechanisms of idle waves across the ranks of MPI-parallel programs. We present a validated analytic model for their propagation velocity with respect to communication parameters and topology, with a special emphasis on sparse communication patterns. We study the interaction of idle waves with MPI collectives and show that, depending on the implementation, a collective may be transparent to the wave. Finally we analyze two mechanisms of idle wave decay: topological decay, which is rooted in differences in communication characteristics among parts of the system, and noise-induced decay, which is caused by system or application noise. We show that noise-induced decay is largely independent of noise characteristics but depends only on the overall noise power. An analytic expression for idle wave decay rate with respect to noise power is derived. For model validation we use microbenchmarks and stencil algorithms on three different supercomputing platforms.



There are no comments yet.


page 13


Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study

Analytic, first-principles performance modeling of distributed-memory ap...

Desynchronization and Wave Pattern Formation in MPI-Parallel and Hybrid Memory-Bound Programs

Analytic, first-principles performance modeling of distributed-memory pa...

Verification of MPI programs

In this paper, we outline an approach to verifying parallel programs. A ...

Estimating earthquake-induced tsunami inundation probabilities without sampling

Given a distribution of earthquake-induced seafloor elevations, we prese...

OpenMP, OpenMP/MPI, and CUDA/MPI C programs for solving the time-dependent dipolar Gross-Pitaevskii equation

We present new versions of the previously published C and CUDA programs ...

Learning Nonlinear Waves in Plasmon-induced Transparency

Plasmon-induced transparency (PIT) displays complex nonlinear dynamics t...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.