Preemptive Thread Block Scheduling with Online Structural Runtime Prediction for Concurrent GPGPU Kernels

06/23/2014
by   Sreepathi Pai, et al.
0

Recent NVIDIA Graphics Processing Units (GPUs) can execute multiple kernels concurrently. On these GPUs, the thread block scheduler (TBS) uses the FIFO policy to schedule their thread blocks. We show that FIFO leaves performance to chance, resulting in significant loss of performance and fairness. To improve performance and fairness, we propose use of the preemptive Shortest Remaining Time First (SRTF) policy instead. Although SRTF requires an estimate of runtime of GPU kernels, we show that such an estimate of the runtime can be easily obtained using online profiling and exploiting a simple observation on GPU kernels' grid structure. Specifically, we propose a novel Structural Runtime Predictor. Using a simple Staircase model of GPU kernel execution, we show that the runtime of a kernel can be predicted by profiling only the first few thread blocks. We evaluate an online predictor based on this model on benchmarks from ERCBench, and find that it can estimate the actual runtime reasonably well after the execution of only a single thread block. Next, we design a thread block scheduler that is both concurrent kernel-aware and uses this predictor. We implement the SRTF policy and evaluate it on two-program workloads from ERCBench. SRTF improves STP by 1.18x and ANTT by 2.25x over FIFO. When compared to MPMax, a state-of-the-art resource allocation policy for concurrent kernels, SRTF improves STP by 1.16x and ANTT by 1.3x. To improve fairness, we also propose SRTF/Adaptive which controls resource usage of concurrently executing kernels to maximize fairness. SRTF/Adaptive improves STP by 1.12x, ANTT by 2.23x and Fairness by 2.95x compared to FIFO. Overall, our implementation of SRTF achieves system throughput to within 12.64 oracle optimal scheduling policy), bridging 49 SJF.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/18/2021

Effective GPU Sharing Under Compiler Guidance

Modern computing platforms tend to deploy multiple GPUs (2, 4, or more) ...
research
01/20/2020

A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU Kernels

Characterizing compute kernel execution behavior on GPUs for efficient t...
research
07/10/2023

Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU

Many applications such as autonomous driving and augmented reality, requ...
research
11/17/2020

GPURepair: Automated Repair of GPU Kernels

This paper presents a tool for repairing errors in GPU kernels written i...
research
11/04/2020

An Empirical-cum-Statistical Approach to Power-Performance Characterization of Concurrent GPU Kernels

Growing deployment of power and energy efficient throughput accelerators...
research
01/18/2019

Multiverse: Easy Conversion of Runtime Systems into OS Kernels via Automatic Hybridization

The hybrid runtime (HRT) model offers a path towards high performance an...
research
05/21/2021

Contention-Aware GPU Partitioning and Task-to-Partition Allocation for Real-Time Workloads

In order to satisfy timing constraints, modern real-time applications re...

Please sign up or login with your details

Forgot password? Click here to reset