C-Coll: Introducing Error-bounded Lossy Compression into MPI Collectives

04/08/2023
by   Jiajun Huang, et al.
0

With the ever-increasing computing power of supercomputers and the growing scale of scientific applications, the efficiency of MPI collective communications turns out to be a critical bottleneck in large-scale distributed and parallel processing. Large message size in MPI collectives is a particularly big concern because it may significantly delay the overall parallel performance. To address this issue, prior research simply applies the off-the-shelf fix-rate lossy compressors in the MPI collectives, leading to suboptimal performance, limited generalizability, and unbounded errors. In this paper, we propose a novel solution, called C-Coll, which leverages error-bounded lossy compression to significantly reduce the message size, resulting in a substantial reduction in communication cost. The key contributions are three-fold. (1) We develop two general, optimized lossy-compression-based frameworks for both types of MPI collectives (collective data movement as well as collective computation), based on their particular characteristics. Our framework not only reduces communication cost but also preserves data accuracy. (2) We customize an optimized version based on SZx, an ultra-fast error-bounded lossy compressor, which can meet the specific needs of collective communication. (3) We integrate C-Coll into multiple collectives, such as MPI_Allreduce, MPI_Scatter, and MPI_Bcast, and perform a comprehensive evaluation based on real-world scientific datasets. Experiments show that our solution outperforms the original MPI collectives as well as multiple baselines and related efforts by 3.5-9.7X.

READ FULL TEXT

page 8

page 11

research
08/09/2023

gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters

GPU-aware collective communication has become a major bottleneck for mod...
research
05/17/2023

Accelerating MPI Collectives with Process-in-Process-based Multi-object Techniques

In the exascale computing era, optimizing MPI collective performance in ...
research
07/29/2019

Improving MPI Collective I/O Performance With Intra-node Request Aggregation

Two-phase I/O is a well-known strategy for implementing collective MPI-I...
research
01/31/2022

SZx: an Ultra-fast Error-bounded Lossy Compressor for Scientific Datasets

Today's scientific high performance computing (HPC) applications or adva...
research
04/23/2020

Accurate runtime selection of optimal MPI collective algorithms using analytical performance modelling

The performance of collective operations has been a critical issue since...
research
05/31/2023

A Survey of Potential MPI Complex Collectives: Large-Scale Mining and Analysis of HPC Applications

Offload of MPI collectives to network devices, e.g., NICs and switches, ...
research
02/15/2021

Simulation-based Optimization and Sensibility Analysis of MPI Applications: Variability Matters

Finely tuning MPI applications and understanding the influence of keypar...

Please sign up or login with your details

Forgot password? Click here to reset