Towards automated kernel selection in machine learning systems: A SYCL case study

03/15/2020
by   John Lawson, et al.
0

Automated tuning of compute kernels is a popular area of research, mainly focused on finding optimal kernel parameters for a problem with fixed input sizes. This approach is good for deploying machine learning models, where the network topology is constant, but machine learning research often involves changing network topologies and hyperparameters. Traditional kernel auto-tuning has limited impact in this case; a more general selection of kernels is required for libraries to accelerate machine learning research. In this paper we present initial results using machine learning to select kernels in a case study deploying high performance SYCL kernels in libraries that target a range of heterogeneous devices from desktop GPUs to embedded accelerators. The techniques investigated apply more generally and could similarly be integrated with other heterogeneous programming systems. By combining auto-tuning and machine learning these kernel selection processes can be deployed with little developer effort to achieve high performance on new hardware.

READ FULL TEXT
research
08/30/2020

Performance portability through machine learning guided kernel selection in SYCL libraries

Automatically tuning parallel compute kernels allows libraries and frame...
research
04/10/2019

Cross-Platform Performance Portability Using Highly Parametrized SYCL Kernels

Over recent years heterogeneous systems have become more prevalent acros...
research
02/15/2018

Input-Aware Auto-Tuning of Compute-Bound HPC Kernels

Efficient implementations of HPC applications for parallel architectures...
research
03/22/2023

Kernel Launcher: C++ Library for Optimal-Performance Portable CUDA Applications

Graphic Processing Units (GPUs) have become ubiquitous in scientific com...
research
09/25/2020

Flexible Performant GEMM Kernels on GPUs

General Matrix Multiplication or GEMM kernels take centre place in high ...
research
08/21/2022

IAAT: A Input-Aware Adaptive Tuning framework for Small GEMM

GEMM with the small size of input matrices is becoming widely used in ma...
research
06/01/2016

Boda-RTC: Productive Generation of Portable, Efficient Code for Convolutional Neural Networks on Mobile Computing Platforms

The popularity of neural networks (NNs) spans academia, industry, and po...

Please sign up or login with your details

Forgot password? Click here to reset