GPU Load Balancing

12/17/2022
by   Muhammad Osama, et al.
0

Fine-grained workload and resource balancing is the key to high performance for regular and irregular computations on the GPUs. In this dissertation, we conduct an extensive survey of existing load-balancing techniques to build an abstraction that addresses the difficulty of scheduling computations on the GPU. We propose a GPU fine-grained load-balancing abstraction that decouples load balancing from work processing and aims to support both static and dynamic schedules with a programmable interface to implement new load-balancing schedules. Prior to our work, the only way to unleash the GPU's potential on irregular problems has been to workload-balance through application-specific, tightly coupled load-balancing techniques. With our open-source framework for load-balancing, we hope to improve programmers' productivity when developing irregular-parallel algorithms on the GPU, and also improve the overall performance characteristics for such applications by allowing a quick path to experimentation with a variety of existing load-balancing techniques. Using our insights from load-balancing irregular workloads, we build Stream-K, a work-centric parallelization of matrix multiplication (GEMM) and related computations in dense linear algebra. Whereas contemporary decompositions are primarily tile-based, our method operates by partitioning an even share of the aggregate inner loop iterations among physical processing elements. This provides a near-perfect utilization of computing resources, regardless of how efficiently the output tiling for any given problem quantizes across the underlying processing elements. On GPU processors, our Stream-K parallelization of GEMM produces a peak speedup of up to 14x and 6.7x, and an average performance response that is both higher and more consistent across 32K GEMM problem geometries than state-of-the-art math libraries such as CUTLASS and cuBLAS.

READ FULL TEXT

page 19

page 22

research
01/12/2023

A Programming Model for GPU Load Balancing

We propose a GPU fine-grained load-balancing abstraction that decouples ...
research
01/09/2023

Stream-K: Work-centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU

We introduce Stream-K, a work-centric parallelization of matrix multipli...
research
11/30/2021

Atos: A Task-Parallel GPU Dynamic Scheduling Framework for Dynamic Irregular Computations

We present Atos, a task-parallel GPU dynamic scheduling framework that i...
research
09/30/2017

An Efficient Load Balancing Method for Tree Algorithms

Nowadays, multiprocessing is mainstream with exponentially increasing nu...
research
11/01/2017

Dynamic Load Balancing Strategies for Graph Applications on GPUs

Acceleration of graph applications on GPUs has found large interest due ...
research
06/12/2020

Streaming Computations with Region-Based State on SIMD Architectures

Streaming computations on massive data sets are an attractive candidate ...
research
07/15/2020

Auto Adaptive Irregular OpenMP Loops

OpenMP is a standard for the parallelization due to the ease in programm...

Please sign up or login with your details

Forgot password? Click here to reset