DeepAI AI Chat
Log In Sign Up

Lightweight MPI Communicators with Applications to Perfectly Balanced Schizophrenic Quicksort

by   Michael Axtmann, et al.

MPI uses the concept of communicators to connect groups of processes. It provides nonblocking collective operations on communicators to overlap communication and computation. Flexible algorithms demand flexible communicators. E.g., a process can work on different subproblems within different process groups simultaneously, new process groups can be created, or the members of a process group can change. Depending on the number of communicators, the time for communicator creation can drastically increase the running time of the algorithm. Furthermore, a new communicator synchronizes all processes as communicator creation routines are blocking collective operations. We present RBC, a communication library based on MPI, that creates range-based communicators in constant time without communication. These RBC communicators support (non)blocking point-to-point communication as well as (non)blocking collective operations. Our experiments show that the library reduces the time to create a new communicator by a factor of more than 400 whereas the running time of collective operations remains about the same. We propose Schizophrenic Quicksort, a distributed sorting algorithm that avoids any load imbalances. We improved the performance of this algorithm by a factor of 15 for moderate inputs by using RBC communicators. Finally, we discuss different approaches to bring nonblocking (local) communicator creation of lightweight (range-based) communicators into MPI.


page 1

page 2

page 3

page 4


Integrating Blocking and Non-Blocking MPI Primitives with Task-Based Programming Models

In this paper we present the Task-Aware MPI library (TAMPI) that integra...

Decoupled Strategy for Imbalanced Workloads in MapReduce Frameworks

In this work, we consider the integration of MPI one-sided communication...

A Fault Resilient Approach to Non-collective Communication Creation in MPI

The increasing size of HPC architectures makes the faults' presence an e...

Sparbit: a new logarithmic-cost and data locality-aware MPI Allgather algorithm

The collective operations are considered critical for improving the perf...

Tasks Unlimited: Lightweight Task Offloading Exploiting MPI Wait Times for Parallel Adaptive Mesh Refinement

Balancing dynamically adaptive mesh refinement (AMR) codes is inherently...

A Generalization of the Allreduce Operation

Allreduce is one of the most frequently used MPI collective operations, ...

Implicit Actions and Non-blocking Failure Recovery with MPI

Scientific applications have long embraced the MPI as the environment of...