Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One Sided

01/21/2020
by   Robert Gerstenberger, et al.
0

Modern interconnects offer remote direct memory access (RDMA) features. Yet, most applications rely on explicit message passing for communications albeit their unwanted overheads. The MPI-3.0 standard defines a programming interface for exploiting RDMA networks directly, however, it's scalability and practicability has to be demonstrated in practice. In this work, we develop scalable bufferless protocols that implement the MPI-3.0 specification. Our protocols support scaling to millions of cores with negligible memory consumption while providing highest performance and minimal overheads. To arm programmers, we provide a spectrum of performance models for all critical functions and demonstrate the usability of our library and models with several application studies with up to half a million processes. We show that our design is comparable to, or better than UPC and Fortran Coarrays in terms of latency, bandwidth, and message rate. We also demonstrate application performance improvements with comparable programming complexity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/26/2020

Leveraging MPI RMA to optimise halo-swapping communications in MONC on Cray machines

Remote Memory Access (RMA), also known as single sided communications, p...
research
07/22/2020

Collectives in hybrid MPI+MPI code: design, practice and performance

The use of hybrid scheme combining the message passing programming model...
research
10/19/2020

High-Performance Distributed RMA Locks

We propose a topology-aware distributed Reader-Writer lock that accelera...
research
06/28/2022

NumS: Scalable Array Programming for the Cloud

Scientists increasingly rely on Python tools to perform scalable distrib...
research
10/25/2018

Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation

TensorFlow has been the most widely adopted Machine/Deep Learning framew...
research
07/26/2022

Productivity meets Performance: Julia on A64FX

The Fujitsu A64FX ARM-based processor is used in supercomputers such as ...
research
05/08/2019

Implementing Efficient Message Logging Protocols as MPI Application Extensions

Message logging protocols are enablers of local rollback, a more efficie...

Please sign up or login with your details

Forgot password? Click here to reset