High-Performance Distributed RMA Locks

10/19/2020
by   Patrick Schmid, et al.
0

We propose a topology-aware distributed Reader-Writer lock that accelerates irregular workloads for supercomputers and data centers. The core idea behind the lock is a modular design that is an interplay of three distributed data structures: a counter of readers/writers in the critical section, a set of queues for ordering writers waiting for the lock, and a tree that binds all the queues and synchronizes writers with readers. Each structure is associated with a parameter for favoring either readers or writers, enabling adjustable performance that can be viewed as a point in a three dimensional parameter space. We also develop a distributed topology-aware MCS lock that is a building block of the above design and improves state-of-the-art MPI implementations. Both schemes use non-blocking Remote Memory Access (RMA) techniques for highest performance and scalability. We evaluate our schemes on a Cray XC30 and illustrate that they outperform state-of-the-art MPI-3 RMA locking protocols by 81 hashtable that represents irregular workloads such as key-value stores or graph processing.

READ FULL TEXT

Authors

page 4

10/30/2018

BCL: A Cross-Platform Distributed Container Library

One-sided communication is a useful paradigm for irregular parallel appl...
01/21/2020

Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One Sided

Modern interconnects offer remote direct memory access (RDMA) features. ...
10/28/2019

Active Access: A Mechanism for High-Performance Distributed Data-Centric Computations

Remote memory access (RMA) is an emerging high-performance programming m...
03/11/2020

Constellation: A High Performance Geo-Distributed Middlebox Framework

Middleboxes are increasingly deployed across geographically distributed ...
10/18/2020

Accelerating Irregular Computations with Hardware Transactional Memory and Active Messages

We propose Atomic Active Messages (AAM), a mechanism that accelerates ir...
07/14/2020

Irregular Accesses Reorder Unit: Improving GPGPU Memory Coalescing for Graph-Based Workloads

GPGPU architectures have become established as the dominant parallelizat...
02/15/2021

Simulation-based Optimization and Sensibility Analysis of MPI Applications: Variability Matters

Finely tuning MPI applications and understanding the influence of keypar...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.