Machine Learning for CUDA+MPI Design Rules

03/04/2022
by   Carl Pearson, et al.
0

We present a new strategy for automatically exploring the design space of key CUDA+MPI programs and providing design rules that discriminate slow from fast implementations. In such programs, the order of operations (e.g., GPU kernels, MPI communication) and assignment of operations to resources (e.g., GPU streams) makes the space of possible designs enormous. Systems experts have the task of redesigning and reoptimizing these programs to effectively utilize each new platform. This work provides a prototype tool to reduce that burden. In our approach, a directed acyclic graph of CUDA and MPI operations defines the design space for the program. Monte-Carlo tree search discovers regions of the design space that have large impact on the program's performance. A sequence-to-vector transformation defines features for each explored implementation, and each implementation is assigned a class label according to its relative performance. A decision tree is trained on the features and labels to produce design rules for each class; these rules can be used by systems experts to guide their implementations. We demonstrate our strategy using a key kernel from scientific computing – sparse-matrix vector multiplication – on a platform with multiple MPI ranks and GPU streams.

READ FULL TEXT
research
10/16/2021

Verification of MPI programs

In this paper, we outline an approach to verifying parallel programs. A ...
research
08/29/2022

MPIX Stream: An Explicit Solution to Hybrid MPI+X Programming

The hybrid MPI+X programming paradigm, where X refers to threads or GPUs...
research
03/17/2022

Batched matrix operations on distributed GPUs with application in theoretical physics

One of the most important and commonly used operations in many linear al...
research
02/06/2020

Scalable Communication Endpoints for MPI+Threads Applications

Hybrid MPI+threads programming is gaining prominence as an alternative t...
research
08/09/2022

Exploring GPU Stream-Aware Message Passing using Triggered Operations

Modern heterogeneous supercomputing systems are comprised of compute bla...
research
03/16/2018

Combining Symbolic Execution and Model Checking to Verify MPI Programs

Message Passing Interface (MPI) is the standard paradigm of programming ...
research
04/06/2019

On the Representation of Partially Specified Implementations and its Application to the Optimization of Linear Algebra Kernels on GPU

Traditional optimizing compilers rely on rewrite rules to iteratively ap...

Please sign up or login with your details

Forgot password? Click here to reset