Scalable Communication Endpoints for MPI+Threads Applications

02/06/2020
by   Rohit Zambre, et al.
0

Hybrid MPI+threads programming is gaining prominence as an alternative to the traditional "MPI everywhere'" model to better handle the disproportionate increase in the number of cores compared with other on-node resources. Current implementations of these two models represent the two extreme cases of communication resource sharing in modern MPI implementations. In the MPI-everywhere model, each MPI process has a dedicated set of communication resources (also known as endpoints), which is ideal for performance but is resource wasteful. With MPI+threads, current MPI implementations share a single communication endpoint for all threads, which is ideal for resource usage but is hurtful for performance. In this paper, we explore the tradeoff space between performance and communication resource usage in MPI+threads environments. We first demonstrate the two extreme cases—one where all threads share a single communication endpoint and another where each thread gets its own dedicated communication endpoint (similar to the MPI-everywhere model) and showcase the inefficiencies in both these cases. Next, we perform a thorough analysis of the different levels of resource sharing in the context of Mellanox InfiniBand. Using the lessons learned from this analysis, we design an improved resource-sharing model to produce scalable communication endpoints that can achieve the same performance as with dedicated communication resources per thread but using just a third of the resources.

READ FULL TEXT

page 1

page 11

research
05/01/2020

How I Learned to Stop Worrying About User-Visible Endpoints and Love MPI

MPI+threads is gaining prominence as an alternative to the traditional M...
research
06/28/2022

Lessons Learned on MPI+Threads Communication

Hybrid MPI+threads programming is gaining prominence, but, in practice, ...
research
11/15/2021

Quo Vadis MPI RMA? Towards a More Efficient Use of MPI One-Sided Communication

The MPI standard has long included one-sided communication abstractions ...
research
10/29/2019

Decomposing Collectives for Exploiting Multi-lane Communication

Many modern, high-performance systems increase the cumulated node-bandwi...
research
10/26/2020

Leveraging MPI RMA to optimise halo-swapping communications in MONC on Cray machines

Remote Memory Access (RMA), also known as single sided communications, p...
research
03/04/2022

Machine Learning for CUDA+MPI Design Rules

We present a new strategy for automatically exploring the design space o...
research
05/10/2023

Improving the performance of classical linear algebra iterative methods via hybrid parallelism

We propose fork-join and task-based hybrid implementations of four class...

Please sign up or login with your details

Forgot password? Click here to reset