Cooperative Kernels: GPU Multitasking for Blocking Algorithms (Extended Version)

07/06/2017
by   Tyler Sorensen, et al.
0

There is growing interest in accelerating irregular data-parallel algorithms on GPUs. These algorithms are typically blocking, so they require fair scheduling. But GPU programming models (e.g. OpenCL) do not mandate fair scheduling, and GPU schedulers are unfair in practice. Current approaches avoid this issue by exploiting scheduling quirks of today's GPUs in a manner that does not allow the GPU to be shared with other workloads (such as graphics rendering tasks). We propose cooperative kernels, an extension to the traditional GPU programming model geared towards writing blocking algorithms. Workgroups of a cooperative kernel are fairly scheduled, and multitasking is supported via a small set of language extensions through which the kernel and scheduler cooperate. We describe a prototype implementation of a cooperative kernel framework implemented in OpenCL 2.0 and evaluate our approach by porting a set of blocking GPU applications to cooperative kernels and examining their performance under multitasking. Our prototype exploits no vendor-specific hardware, driver or compiler support, thus our results provide a lower-bound on the efficiency with which cooperative kernels can be implemented in practice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2022

Lightning: Scaling the GPU Programming Model Beyond a Single GPU

The GPU programming model is primarily aimed at the development of appli...
research
05/12/2023

Revisiting Temporal Blocking Stencil Optimizations

Iterative stencils are used widely across the spectrum of High Performan...
research
11/17/2020

GPURepair: Automated Repair of GPU Kernels

This paper presents a tool for repairing errors in GPU kernels written i...
research
04/03/2019

GraphCage: Cache Aware Graph Processing on GPUs

Efficient Graph processing is challenging because of the irregularity of...
research
01/25/2021

RTGPU: Real-Time GPU Scheduling of Hard Deadline Parallel Tasks with Fine-Grain Utilization

Many emerging cyber-physical systems, such as autonomous vehicles and ro...
research
07/22/2022

Fast, feature-rich weakly-compressible SPH on GPU: coding strategies and compiler choices

GPUSPH was the first implementation of the weakly-compressible Smoothed ...
research
05/21/2021

Contention-Aware GPU Partitioning and Task-to-Partition Allocation for Real-Time Workloads

In order to satisfy timing constraints, modern real-time applications re...

Please sign up or login with your details

Forgot password? Click here to reset