A Survey of Potential MPI Complex Collectives: Large-Scale Mining and Analysis of HPC Applications

05/31/2023
by   Pouya Haghi, et al.
0

Offload of MPI collectives to network devices, e.g., NICs and switches, is being implemented as an effective mechanism to improve application performance by reducing inter- and intra-node communication and bypassing MPI software layers. Given the rich deployment of accelerators and programmable NICs/switches in data centers, we posit that there is an opportunity to further improve performance by extending this idea (of in-network collective processing) to a new class of more complex collectives. The most basic type of complex collective is the fusion of existing collectives. In previous work we have demonstrated the efficacy of this additional hardware and software support and shown that it can substantially improve the performance of certain applications. In this work we extend this approach. We seek to characterize a large number of MPI applications to determine overall applicability, both breadth and type, and so provide insight for hardware designers and MPI developers about future offload possibilities. Besides increasing the scope of prior surveys to include finding (potential) new MPI constructs, we also tap into new methods to extend the survey process. Prior surveys on MPI usage considered lists of applications constructed based on application developers' knowledge. The approach taken in this paper, however, is based on an automated mining of a large collection of code sources. More specifically, the mining is accomplished by GitHub REST APIs. We use a database management system to store the results and to answer queries. Another advantage is that this approach provides support for a more complex analysis of MPI usage, which is accomplished by user queries.

READ FULL TEXT
research
06/28/2022

Lessons Learned on MPI+Threads Communication

Hybrid MPI+threads programming is gaining prominence, but, in practice, ...
research
12/10/2021

MANA-2.0: A Future-Proof Design for Transparent Checkpointing of MPI at Scale

MANA-2.0 is a scalable, future-proof design for transparent checkpointin...
research
09/05/2022

A Fault Resilient Approach to Non-collective Communication Creation in MPI

The increasing size of HPC architectures makes the faults' presence an e...
research
05/01/2020

How I Learned to Stop Worrying About User-Visible Endpoints and Love MPI

MPI+threads is gaining prominence as an alternative to the traditional M...
research
04/08/2023

C-Coll: Introducing Error-bounded Lossy Compression into MPI Collectives

With the ever-increasing computing power of supercomputers and the growi...
research
06/05/2018

Energy-efficient localised rollback after failures via data flow analysis

Exascale systems will suffer failures hourly. HPC programmers rely mostl...
research
05/13/2018

Building Near-Real-Time Processing Pipelines with the Spark-MPI Platform

Advances in detectors and computational technologies provide new opportu...

Please sign up or login with your details

Forgot password? Click here to reset