Improving MPI Collective I/O Performance With Intra-node Request Aggregation

07/29/2019
by   Qiao Kang, et al.
0

Two-phase I/O is a well-known strategy for implementing collective MPI-IO functions. It redistributes I/O requests among the calling processes into a form that minimizes the file access costs. As modern parallel computers continue to grow into the exascale era, the communication cost of such request redistribution can quickly overwhelm collective I/O performance. This effect has been observed from parallel jobs that run on multiple compute nodes with a high count of MPI processes on each node. To reduce the communication cost, we present a new design for collective I/O by adding an extra communication layer that performs request aggregation among processes within the same compute nodes. This approach can significantly reduce inter-node communication congestion when redistributing the I/O requests. We evaluate the performance and compare with the original two-phase I/O on a Cray XC40 parallel computer with Intel KNL processors. Using I/O patterns from two large-scale production applications and an I/O benchmark, we show the performance improvement of up to 29 times when running 16384 MPI processes on 256 compute nodes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2023

Accelerating MPI Collectives with Process-in-Process-based Multi-object Techniques

In the exascale computing era, optimizing MPI collective performance in ...
research
04/08/2023

C-Coll: Introducing Error-bounded Lossy Compression into MPI Collectives

With the ever-increasing computing power of supercomputers and the growi...
research
11/28/2022

RAMP: A Flat Nanosecond Optical Network and MPI Operations for Distributed Deep Learning Systems

Distributed deep learning (DDL) systems strongly depend on network perfo...
research
06/29/2019

Open-MPI over MOSIX: paralleled computing in a clustered world

Recent increased interest in Cloud computing emphasizes the need to find...
research
06/28/2019

Parallel Performance of Molecular Dynamics Trajectory Analysis

The performance of biomolecular molecular dynamics (MD) simulations has ...
research
12/29/2020

Improving the Performance and Resilience of MPI Parallel Jobs with Topology and Fault-Aware Process Placement

HPC systems keep growing in size to meet the ever-increasing demand for ...
research
09/07/2023

pPython Performance Study

pPython seeks to provide a parallel capability that provides good speed-...

Please sign up or login with your details

Forgot password? Click here to reset