Implementing implicit OpenMP data sharing on GPUs

11/28/2017
by   Gheorghe-Teodor Bercea, et al.
0

OpenMP is a shared memory programming model which supports the offloading of target regions to accelerators such as NVIDIA GPUs. The implementation in Clang/LLVM aims to deliver a generic GPU compilation toolchain that supports both the native CUDA C/C++ and the OpenMP device offloading models. There are situations where the semantics of OpenMP and those of CUDA diverge. One such example is the policy for implicitly handling local variables. In CUDA, local variables are implicitly mapped to thread local memory and thus become private to a CUDA thread. In OpenMP, due to semantics that allow the nesting of regions executed by different numbers of threads, variables need to be implicitly shared among the threads of a contention group. In this paper we introduce a re-design of the OpenMP device data sharing infrastructure that is responsible for the implicit sharing of local variables in the Clang/LLVM toolchain. We introduce a new data sharing infrastructure that lowers implicitly shared variables to the shared memory of the GPU. We measure the amount of shared memory used by our scheme in cases that involve scalar variables and statically allocated arrays. The evaluation is carried out by offloading to K40 and P100 NVIDIA GPUs. For scalar variables the pressure on shared memory is relatively low, under 26% of shared memory utilization for the K40, and does not negatively impact occupancy. The limiting occupancy factor in that case is register pressure. The data sharing scheme offers the users a simple memory model for controlling the implicit allocation of device shared memory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2016

Scratchpad Sharing in GPUs

GPGPU applications exploit on-chip scratchpad memory available in the Gr...
research
10/04/2019

GPU Fast Convolution via the Overlap-and-Save Method in Shared Memory

We present an implementation of the overlap-and-save method, a method fo...
research
06/23/2021

Native Implementation of Mutable Value Semantics

Unrestricted mutation of shared state is a source of many well-known pro...
research
12/05/2020

An Improved Framework of GPU Computing for CFD Applications on Structured Grids using OpenACC

This paper is focused on improving multi-GPU performance of a research C...
research
02/17/2023

GPU Offloading in ExaHyPE Through C++ Standard Algorithms

The ISO C++17 standard introduces parallel algorithms, a parallel progra...
research
11/01/2022

sRSP: GPUlarda Asimetrik Senkronizasyon Icin Yeni Olceklenebilir Bir Cozum

Asymmetric sharing is a dynamic sharing model, where a shared data is he...

Please sign up or login with your details

Forgot password? Click here to reset