Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact

03/04/2021
by   Ayesha Afzal, et al.
0

Most distributed-memory bulk-synchronous parallel programs in HPC assume that compute resources are available continuously and homogeneously across the allocated set of compute nodes. However, long one-off delays on individual processes can cause global disturbances, so-called idle waves, by rippling through the system. This process is mainly governed by the communication topology of the underlying parallel code. This paper makes significant contributions to the understanding of idle wave dynamics. We study the propagation mechanisms of idle waves across the ranks of MPI-parallel programs. We present a validated analytic model for their propagation velocity with respect to communication parameters and topology, with a special emphasis on sparse communication patterns. We study the interaction of idle waves with MPI collectives and show that, depending on the implementation, a collective may be transparent to the wave. Finally we analyze two mechanisms of idle wave decay: topological decay, which is rooted in differences in communication characteristics among parts of the system, and noise-induced decay, which is caused by system or application noise. We show that noise-induced decay is largely independent of noise characteristics but depends only on the overall noise power. An analytic expression for idle wave decay rate with respect to noise power is derived. For model validation we use microbenchmarks and stencil algorithms on three different supercomputing platforms.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 13

05/25/2019

Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study

Analytic, first-principles performance modeling of distributed-memory ap...
02/07/2020

Desynchronization and Wave Pattern Formation in MPI-Parallel and Hybrid Memory-Bound Programs

Analytic, first-principles performance modeling of distributed-memory pa...
10/16/2021

Verification of MPI programs

In this paper, we outline an approach to verifying parallel programs. A ...
11/29/2021

Estimating earthquake-induced tsunami inundation probabilities without sampling

Given a distribution of earthquake-induced seafloor elevations, we prese...
10/17/2016

OpenMP, OpenMP/MPI, and CUDA/MPI C programs for solving the time-dependent dipolar Gross-Pitaevskii equation

We present new versions of the previously published C and CUDA programs ...
07/31/2021

Learning Nonlinear Waves in Plasmon-induced Transparency

Plasmon-induced transparency (PIT) displays complex nonlinear dynamics t...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.