libhclooc: Software Library Facilitating Out-of-core Implementations of Accelerator Kernels on Hybrid Computing Platforms

08/15/2018
by   Daniel Hanlon, et al.
0

Hardware accelerators such as Graphics Processing Units (GPUs), Intel Xeon Phi co-processors (PHIs), and Field-Programmable Gate Arrays (FPGAs) are now ubiquitous in extreme-scale high performance computing (HPC), cloud, and Big data platforms to facilitate execution of workloads that demand high energy efficiency. They present unique interfaces and programming models therefore posing several limitations, which must be addressed to facilitate execution of large workloads. There is no library providing a unifying interface that allows programmers to write reusable out-of-core implementations of their data-parallel kernels that can run efficiently on different mainstream accelerators such as GPUs, PHIs, and FPGAs. We address this shortage in this paper. We present a library called libhclooc, which provides a unifying interface facilitating out-of-core implementations for data parallel kernels on the three different mainstream accelerators (GPUs, Intel Xeon Phis, FPGAs). We implement out-of-core matrix-matrix multiplication (MMOOC) using the libhclooc API and demonstrate its superior performance over vendor implementations. We show that it suffers from a maximum overhead of 10 abstraction) compared to the state-of-the-art optimised implementations for Nvidia K40c GPU, Nvidia P100 PCIe GPU, and Intel Xeon Phi 3120P respectively. We also show that using libhclooc API reduces the number of lines of code (LOC) by 75

READ FULL TEXT
research
01/26/2021

C-for-Metal: High Performance SIMD Programming on Intel GPUs

The SIMT execution model is commonly used for general GPU development. C...
research
04/10/2019

Cross-Platform Performance Portability Using Highly Parametrized SYCL Kernels

Over recent years heterogeneous systems have become more prevalent acros...
research
07/14/2019

A Versatile Software Systolic Execution Model for GPU Memory-Bound Kernels

This paper proposes a versatile high-performance execution model, inspir...
research
06/28/2023

Evaluating Portable Parallelization Strategies for Heterogeneous Architectures in High Energy Physics

High-energy physics (HEP) experiments have developed millions of lines o...
research
07/07/2023

High-performance evaluation of high angular momentum 4-center Gaussian integrals on modern accelerated processors

We present a high-performance evaluation method for 4-center 2-particle ...
research
05/22/2023

A Framework for Fine-Grained Synchronization of Dependent GPU Kernels

Machine Learning (ML) models contain highly-parallel computations, such ...

Please sign up or login with your details

Forgot password? Click here to reset