Efficient and Eventually Consistent Collective Operations

03/31/2022
by   Roman Iakymchuk, et al.
0

Collective operations are common features of parallel programming models that are frequently used in High-Performance (HPC) and machine/ deep learning (ML/ DL) applications. In strong scaling scenarios, collective operations can negatively impact the overall application performance: with the increase in core count, the load per rank decreases, while the time spent in collective operations increases logarithmically. In this article, we propose a design for eventually consistent collectives suitable for ML/ DL computations by reducing communication in Broadcast and Reduce, as well as by exploring the Stale Synchronous Parallel (SSP) synchronization model for the Allreduce collective. Moreover, we also enrich the GASPI ecosystem with frequently used classic/ consistent collective operations – such as Allreduce for large messages and AlltoAll used in an HPC code. Our implementations show promising preliminary results with significant improvements, especially for Allreduce and AlltoAll, compared to the vendor-provided MPI alternatives.

READ FULL TEXT

page 5

page 7

research
10/20/2021

OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems

Python has become a dominant programming language for emerging areas lik...
research
03/15/2023

MCR-DL: Mix-and-Match Communication Runtime for Deep Learning

In recent years, the training requirements of many state-of-the-art Deep...
research
06/30/2020

Efficient Communication Acceleration for Next-GenScale-up Deep Learning Training Platforms

Deep Learning (DL) training platforms are built by interconnecting multi...
research
06/30/2020

Efficient Communication Acceleration for Next-Gen Scale-up Deep Learning Training Platforms

Deep Learning (DL) training platforms are built by interconnecting multi...
research
05/10/2017

Performance Evaluation and Modeling of HPC I/O on Non-Volatile Memory

HPC applications pose high demands on I/O performance and storage capabi...
research
12/02/2021

Memory-efficient array redistribution through portable collective communication

Modern large-scale deep learning workloads highlight the need for parall...
research
10/09/2018

Decoupled Strategy for Imbalanced Workloads in MapReduce Frameworks

In this work, we consider the integration of MPI one-sided communication...

Please sign up or login with your details

Forgot password? Click here to reset