Best-Effort Communication Improves Performance and Scales Robustly on Conventional Hardware

11/20/2022
by   Matthew Andres Moreno, et al.
0

Here, we test the performance and scalability of fully-asynchronous, best-effort communication on existing, commercially-available HPC hardware. A first set of experiments tested whether best-effort communication strategies can benefit performance compared to the traditional perfect communication model. At high CPU counts, best-effort communication improved both the number of computational steps executed per unit time and the solution quality achieved within a fixed-duration run window. Under the best-effort model, characterizing the distribution of quality of service across processing components and over time is critical to understanding the actual computation being performed. Additionally, a complete picture of scalability under the best-effort model requires analysis of how such quality of service fares at scale. To answer these questions, we designed and measured a suite of quality of service metrics: simulation update period, message latency, message delivery failure rate, and message delivery coagulation. Under a lower communication-intensivity benchmark parameterization, we found that median values for all quality of service metrics were stable when scaling from 64 to 256 process. Under maximal communication intensivity, we found only minor – and, in most cases, nil – degradation in median quality of service. In an additional set of experiments, we tested the effect of an apparently faulty compute node on performance and quality of service. Despite extreme quality of service degradation among that node and its clique, median performance and quality of service remained stable.

READ FULL TEXT

page 1

page 10

page 11

page 15

page 19

page 26

page 27

page 28

research
05/21/2021

Conduit: A C++ Library for Best-effort High Performance Computing

Developing software to effectively take advantage of growth in parallel ...
research
07/03/2018

Best-Effort FPGA Programming: A Few Steps Can Go a Long Way

FPGA-based heterogeneous architectures provide programmers with the abil...
research
06/16/2021

Comparison of Automated Machine Learning Tools for SMS Spam Message Filtering

Short Message Service (SMS) is a very popular service used for communica...
research
09/13/2022

Characterizing the Performance of Node-Aware Strategies for Irregular Point-to-Point Communication on Heterogeneous Architectures

Supercomputer architectures are trending toward higher computational thr...
research
07/10/2020

Stability, memory, and messaging tradeoffs in heterogeneous service systems

We consider a heterogeneous distributed service system, consisting of n ...
research
05/16/2022

Let's Trace It: Fine-Grained Serverless Benchmarking using Synchronous and Asynchronous Orchestrated Applications

Making serverless computing widely applicable requires detailed performa...
research
04/21/2023

Measuring Thread Timing to Assess the Feasibility of Early-bird Message Delivery

Early-bird communication is a communication/computation overlap techniqu...

Please sign up or login with your details

Forgot password? Click here to reset