DeepAI AI Chat
Log In Sign Up

Kernel Launcher: C++ Library for Optimal-Performance Portable CUDA Applications

03/22/2023
by   Stijn Heldens, et al.
eScience Center
0

Graphic Processing Units (GPUs) have become ubiquitous in scientific computing. However, writing efficient GPU kernels can be challenging due to the need for careful code tuning. To automatically explore the kernel optimization space, several auto-tuning tools - like Kernel Tuner - have been proposed. Unfortunately, these existing auto-tuning tools often do not concern themselves with integration of tuning results back into applications, which puts a significant implementation and maintenance burden on application developers. In this work, we present Kernel Launcher: an easy-to-use C++ library that simplifies the creation of highly-tuned CUDA applications. With Kernel Launcher, programmers can capture kernel launches, tune the captured kernels for different setups, and integrate the tuning results back into applications using runtime compilation. To showcase the applicability of Kernel Launcher, we consider a real-world computational fluid dynamics code and tune its kernels for different GPUs, input domains, and precisions.

READ FULL TEXT

page 14

page 16

10/04/2022

Benchmarking optimization algorithms for auto-tuning GPU kernels

Recent years have witnessed phenomenal growth in the application, and ca...
07/14/2017

Pushing the Limits of Online Auto-tuning: Machine Code Optimization in Short-Running Kernels

We propose an online auto-tuning approach for computing kernels. Differe...
03/19/2017

CLTune: A Generic Auto-Tuner for OpenCL Kernels

This work presents CLTune, an auto-tuner for OpenCL kernels. It evaluate...
01/30/2017

Autotuning GPU Kernels via Static and Predictive Analysis

Optimizing the performance of GPU kernels is challenging for both human ...
03/15/2020

Towards automated kernel selection in machine learning systems: A SYCL case study

Automated tuning of compute kernels is a popular area of research, mainl...
03/04/2023

Stellar Mergers with HPX-Kokkos and SYCL: Methods of using an Asynchronous Many-Task Runtime System with SYCL

Ranging from NVIDIA GPUs to AMD GPUs and Intel GPUs: Given the heterogen...