(Poly)Logarithmic Time Construction of Round-optimal n-Block Broadcast Schedules for Broadcast and irregular Allgather in MPI

05/20/2022
by   Jesper Larsson Träff, et al.
0

We give a fast(er), communication-free, parallel construction of optimal communication schedules that allow broadcasting of n distinct blocks of data from a root processor to all other processors in 1-ported, p-processor networks with fully bidirectional communication. For any p and n, broadcasting in this model requires n-1+⌈log_2 p⌉ communication rounds. In contrast to other constructions, all processors follow the same, circulant graph communication pattern, which makes it possible to use the schedules for the allgather (all-to-all-broadcast) operation as well. The new construction takes O(log^3 p) time steps per processor, each of which can compute its part of the schedule independently of the other processors in O(log p) space. The result is a significant improvement over the sequential O(p log^2 p) time and O(plog p) space construction of Träff and Ripke (2009) with considerable practical import. The round-optimal schedule construction is then used to implement communication optimal algorithms the broadcast and (irregular) allgather collective operations as found in MPI (the Message-Passing Interface), and significantly and practically improve over the implementations in standard MPI libraries (, OpenMPI, Intel MPI) for certain problem ranges. The application to the irregular allgather operation is entirely new.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/26/2021

A Doubly-pipelined, Dual-root Reduction-to-all Algorithm and Implementation

We discuss a simple, binary tree-based algorithm for the collective allr...
research
11/23/2017

On Optimal Trees for Irregular Gather and Scatter Collectives

This paper studies the complexity of finding cost-optimal communication ...
research
08/27/2020

k-ported vs. k-lane Broadcast, Scatter, and Alltoall Algorithms

In k-ported message-passing systems, a processor can simultaneously rece...
research
07/13/2022

Four-splitting based coarse-grained multicomputer parallel algorithm for the optimal binary search tree problem

This paper presents a parallel solution based on the coarse-grained mult...
research
10/21/2019

A King in every two consecutive tournaments

We think of a tournament T=([n], E) as a communication network where in ...
research
04/25/2018

Fast parallel multidimensional FFT using advanced MPI

We present a new method for performing global redistributions of multidi...
research
08/02/2018

Algorithms for Noisy Broadcast under Erasures

The noisy broadcast model was first studied in [Gallager, TranInf'88] wh...

Please sign up or login with your details

Forgot password? Click here to reset