Conduit: A C++ Library for Best-effort High Performance Computing

05/21/2021
by   Matthew Andres Moreno, et al.
0

Developing software to effectively take advantage of growth in parallel and distributed processing capacity poses significant challenges. Traditional programming techniques allow a user to assume that execution, message passing, and memory are always kept synchronized. However, maintaining this consistency becomes increasingly costly at scale. One proposed strategy is "best-effort computing", which relaxes synchronization and hardware reliability requirements, accepting nondeterminism in exchange for efficiency. Although many programming languages and frameworks aim to facilitate software development for high performance applications, existing tools do not directly provide a prepackaged best-effort interface. The Conduit C++ Library aims to provide such an interface for convenient implementation of software that uses best-effort inter-thread and inter-process communication. Here, we describe the motivation, objectives, design, and implementation of the library. Benchmarks on a communication-intensive graph coloring problem and a compute-intensive digital evolution simulation show that Conduit's best-effort model can improve scaling efficiency and solution quality, particularly in a distributed, multi-node context.

READ FULL TEXT

page 2

page 4

page 5

research
09/07/2019

Streaming Message Interface: High-Performance Distributed Memory Programming on Reconfigurable Hardware

Distributed memory programming is the established paradigm used in high-...
research
11/20/2022

Best-Effort Communication Improves Performance and Scales Robustly on Conventional Hardware

Here, we test the performance and scalability of fully-asynchronous, bes...
research
05/15/2023

FMI: Fast and Cheap Message Passing for Serverless Functions

Serverless functions provide elastic scaling and a fine-grained billing ...
research
02/04/2019

Blaze: Simplified High Performance Cluster Computing

MapReduce and its variants have significantly simplified and accelerated...
research
10/28/2021

NetDAM: Network Direct Attached Memory with Programmable In-Memory Computing ISA

Data-intensive applications like distributed AI-training may require mul...
research
02/22/2023

A Unified Cloud-Enabled Discrete Event Parallel and Distributed Simulation Architecture

Cloud simulation environments today are largely employed to model and si...
research
02/14/2018

A co-located partitions strategy for parallel CFD-DEM couplings

In this work, a new partition-collocation strategy for the parallel exec...

Please sign up or login with your details

Forgot password? Click here to reset