Kernel Launcher: C++ Library for Optimal-Performance Portable CUDA Applications

03/22/2023
by   Stijn Heldens, et al.
0

Graphic Processing Units (GPUs) have become ubiquitous in scientific computing. However, writing efficient GPU kernels can be challenging due to the need for careful code tuning. To automatically explore the kernel optimization space, several auto-tuning tools - like Kernel Tuner - have been proposed. Unfortunately, these existing auto-tuning tools often do not concern themselves with integration of tuning results back into applications, which puts a significant implementation and maintenance burden on application developers. In this work, we present Kernel Launcher: an easy-to-use C++ library that simplifies the creation of highly-tuned CUDA applications. With Kernel Launcher, programmers can capture kernel launches, tune the captured kernels for different setups, and integrate the tuning results back into applications using runtime compilation. To showcase the applicability of Kernel Launcher, we consider a real-world computational fluid dynamics code and tune its kernels for different GPUs, input domains, and precisions.

READ FULL TEXT

page 14

page 16

research
10/04/2022

Benchmarking optimization algorithms for auto-tuning GPU kernels

Recent years have witnessed phenomenal growth in the application, and ca...
research
10/18/2019

A Benchmark Set of Highly-efficient CUDA and OpenCL Kernels and its Dynamic Autotuning with Kernel Tuning Toolkit

Autotuning of performance-relevant source-code parameters allows to auto...
research
07/14/2017

Pushing the Limits of Online Auto-tuning: Machine Code Optimization in Short-Running Kernels

We propose an online auto-tuning approach for computing kernels. Differe...
research
03/19/2017

CLTune: A Generic Auto-Tuner for OpenCL Kernels

This work presents CLTune, an auto-tuner for OpenCL kernels. It evaluate...
research
01/30/2017

Autotuning GPU Kernels via Static and Predictive Analysis

Optimizing the performance of GPU kernels is challenging for both human ...
research
03/15/2020

Towards automated kernel selection in machine learning systems: A SYCL case study

Automated tuning of compute kernels is a popular area of research, mainl...
research
03/04/2023

Stellar Mergers with HPX-Kokkos and SYCL: Methods of using an Asynchronous Many-Task Runtime System with SYCL

Ranging from NVIDIA GPUs to AMD GPUs and Intel GPUs: Given the heterogen...

Please sign up or login with your details

Forgot password? Click here to reset