MPI Collectives for Multi-core Clusters: Optimized Performance of the Hybrid MPI+MPI Parallel Codes

07/14/2020
by   Huan Zhou, et al.
0

The advent of multi-/many-core processors in clusters advocates hybrid parallel programming, which combines Message Passing Interface (MPI) for inter-node parallelism with a shared memory model for on-node parallelism. Compared to the traditional hybrid approach of MPI plus OpenMP, a new, but promising hybrid approach of MPI plus MPI-3 shared-memory extensions (MPI+MPI) is gaining attraction. We describe an algorithmic approach for collective operations (with allgather and broadcast as concrete examples) in the context of hybrid MPI+MPI, so as to minimize memory consumption and memory copies. With this approach, only one memory copy is maintained and shared by on-node processes. This allows the removal of unnecessary on-node copies of replicated data that are required between MPI processes when the collectives are invoked in the context of pure MPI. We compare our approach of collectives for hybrid MPI+MPI and the traditional one for pure MPI, and also have a discussion on the synchronization that is required to guarantee data integrity. The performance of our approach has been validated on a Cray XC40 system (Cray MPI) and NEC cluster (OpenMPI), showing that it achieves comparable or better performance for allgather operations. We have further validated our approach with a standard computational kernel, namely distributed matrix multiplication, and a Bayesian Probabilistic Matrix Factorization code.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/22/2020

Collectives in hybrid MPI+MPI code: design, practice and performance

The use of hybrid scheme combining the message passing programming model...
research
07/12/2022

The OpenMP Cluster Programming Model

Despite the various research initiatives and proposed programming models...
research
11/09/2019

Performance Comparison of MPICH and MPI4py on Raspberry Pi-3B Beowulf Cluster

Moore's Law is running out. Instead of making powerful computer by incre...
research
04/30/2019

Pushing the Limit: A Hybrid Parallel Implementation of the Multi-resolution Approximation for Massive Data

The multi-resolution approximation (MRA) of Gaussian processes was recen...
research
07/01/2018

Framework for the hybrid parallelisation of simulation codes

Writing efficient hybrid parallel code is tedious, error-prone, and requ...
research
12/18/2019

HDOT – an Approach Towards Productive Programming of Hybrid Applications

MPI applications matter. However, with the advent of many-core processor...

Please sign up or login with your details

Forgot password? Click here to reset