Optimizing Irregular Communication with Neighborhood Collectives and Locality-Aware Parallelism

06/02/2023
by   Gerald Collom, et al.
0

Irregular communication often limits both the performance and scalability of parallel applications. Typically, applications individually implement irregular messages using point-to-point communications, and any optimizations are added directly into the application. As a result, these optimizations lack portability. There is no easy way to optimize point-to-point messages within MPI, as the interface for single messages provides no information on the collection of all communication to be performed. However, the persistent neighbor collective API, released in the MPI 4 standard, provides an interface for portable optimizations of irregular communication within MPI libraries. This paper presents methods for optimizing irregular communication within neighborhood collectives, analyzes the impact of replacing point-to-point communication in existing codebases such as Hypre BoomerAMG with neighborhood collectives, and finally shows an up to 1.32x speedup on sparse matrix-vector multiplication within a BoomerAMG solve through the use of our optimized neighbor collectives. The authors analyze multiple implementations of neighborhood collectives, including a standard implementation, which simply wraps standard point-to-point communication, as well as multiple implementations of locality-aware aggregation. All optimizations are available in an open-source codebase, MPI Advance, which sits on top of MPI, allowing for optimizations to be added into existing codebases regardless of the system MPI install.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2023

MPI Advance : Open-Source Message Passing Optimizations

The large variety of production implementations of the message passing i...
research
04/30/2020

A more secure IPv6 neighborhood process

The process of neighborhood establishment in an IPv6 network is made out...
research
08/26/2023

A Locality-Aware Sparse Dynamic Data Exchange

Parallel architectures are continually increasing in performance and sca...
research
06/07/2022

A Locality-Aware Bruck Allgather

Collective algorithms are an essential part of MPI, allowing application...
research
10/19/2020

High-Performance Distributed RMA Locks

We propose a topology-aware distributed Reader-Writer lock that accelera...
research
06/06/2018

Improving Performance Models for Irregular Point-to-Point Communication

Parallel applications are often unable to take full advantage of emergin...
research
04/25/2018

Fast parallel multidimensional FFT using advanced MPI

We present a new method for performing global redistributions of multidi...

Please sign up or login with your details

Forgot password? Click here to reset