Cross-Platform Performance Portability Using Highly Parametrized SYCL Kernels

04/10/2019
by   John Lawson, et al.
0

Over recent years heterogeneous systems have become more prevalent across HPC systems, with over 100 supercomputers in the TOP500 incorporating GPUs or other accelerators. These hardware platforms have different performance characteristics and optimization requirements. In order to make the most of multiple accelerators a developer has to provide implementations of their algorithms tuned for each device. Hardware vendors provide libraries targeting their devices specifically, which provide good performance but frequently have different API designs, hampering portability. The SYCL programming model allows users to write heterogeneous programs using completely standard C++, and so developers have access to the power of C++ templates when developing compute kernels. In this paper we show that by writing highly parameterized kernels for matrix multiplies and convolutions we achieve performance competitive with vendor implementations across different architectures. Furthermore, tuning for new devices amounts to choosing the combinations of kernel parameters that perform best on the hardware.

READ FULL TEXT

page 4

page 6

page 10

research
02/15/2018

Input-Aware Auto-Tuning of Compute-Bound HPC Kernels

Efficient implementations of HPC applications for parallel architectures...
research
08/15/2018

libhclooc: Software Library Facilitating Out-of-core Implementations of Accelerator Kernels on Hybrid Computing Platforms

Hardware accelerators such as Graphics Processing Units (GPUs), Intel Xe...
research
03/15/2020

Towards automated kernel selection in machine learning systems: A SYCL case study

Automated tuning of compute kernels is a popular area of research, mainl...
research
06/01/2019

A Technique for Finding Optimal Program Launch Parameters Targeting Manycore Accelerators

In this paper, we present a new technique to dynamically determine the v...
research
11/05/2019

KLARAPTOR: A Tool for Dynamically Finding Optimal Kernel Launch Parameters Targeting CUDA Programs

In this paper we present KLARAPTOR (Kernel LAunch parameters RAtional Pr...
research
12/24/2021

Fast 2D Convolutions and Cross-Correlations Using Scalable Architectures

The manuscript describes fast and scalable architectures and associated ...
research
10/31/2018

OpenCL Performance Prediction using Architecture-Independent Features

OpenCL is an attractive model for heterogeneous high-performance computi...

Please sign up or login with your details

Forgot password? Click here to reset