Specifying and Testing GPU Workgroup Progress Models

09/13/2021
by   Tyler Sorensen, et al.
0

As GPU availability has increased and programming support has matured, a wider variety of applications are being ported to these platforms. Many parallel applications contain fine-grained synchronization idioms; as such, their correct execution depends on a degree of relative forward progress between threads (or thread groups). Unfortunately, many GPU programming specifications say almost nothing about relative forward progress guarantees between workgroups. Although prior work has proposed a spectrum of plausible progress models for GPUs, cross-vendor specifications have yet to commit to any model. This work is a collection of tools experimental data to aid specification designers when considering forward progress guarantees in programming frameworks. As a foundation, we formalize a small parallel programming language that captures the essence of fine-grained synchronization. We then provide a means of formally specifying a progress model, and develop a termination oracle that decides whether a given program is guaranteed to eventually terminate with respect to a given progress model. Next, we formalize a constraint for concurrent programs that require relative forward progress to terminate. Using this constraint, we synthesize a large set of 483 progress litmus tests. Combined with the termination oracle, this allows us to determine the expected status of each litmus test – i.e. whether it is guaranteed eventual termination – under various progress models. We present a large experimental campaign running the litmus tests across 8 GPUs from 5 different vendors. Our results highlight that GPUs have significantly different termination behaviors under our test suite. Most notably, we find that Apple and ARM GPUs do not support the linear occupancy-bound model, an intuitive progress model defined by prior work and hypothesized to describe the workgroup schedulers of existing GPUs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/18/2022

A Variant of Concurrent Constraint Programming on GPU

The number of cores on graphical computing units (GPUs) is reaching thou...
research
01/17/2019

TaDA Live: Compositional Reasoning for Termination of Fine-grained Concurrent Programs

We introduce TaDA Live, a separation logic for reasoning compositionally...
research
01/26/2021

C-for-Metal: High Performance SIMD Programming on Intel GPUs

The SIMT execution model is commonly used for general GPU development. C...
research
04/11/2020

A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs

GPUs are playing an increasingly important role in general-purpose compu...
research
09/14/2021

Measurement and Analysis of GPU-accelerated Applications with HPCToolkit

To address the challenge of performance analysis on the US DOE's forthco...
research
05/22/2023

A Framework for Fine-Grained Synchronization of Dependent GPU Kernels

Machine Learning (ML) models contain highly-parallel computations, such ...
research
03/01/2022

Synthesizing Fine-Grained Synchronization Protocols for Implicit Monitors (Extended Version)

A monitor is a widely-used concurrent programming abstraction that encap...

Please sign up or login with your details

Forgot password? Click here to reset