Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators

04/09/2023
by   Yehonatan Fridman, et al.
0

Over the last decade, most of the increase in computing power has been gained by advances in accelerated many-core architectures, mainly in the form of GPGPUs. While accelerators achieve phenomenal performances in various computing tasks, their utilization requires code adaptations and transformations. Thus, OpenMP, the most common standard for multi-threading in scientific computing applications, introduced offloading capabilities between host (CPUs) and accelerators since v4.0, with increasing support in the successive v4.5, v5.0, v5.1, and the latest v5.2 versions. Recently, two state-of-the-art GPUs – the Intel Ponte Vecchio Max 1100 and the NVIDIA A100 GPUs – were released to the market, with the oneAPI and NVHPC compilers for offloading, correspondingly. In this work, we present early performance results of OpenMP offloading capabilities to these devices while specifically analyzing the portability of advanced directives (using SOLLVE's OMPVV test suite) and the scalability of the hardware in representative scientific mini-app (the LULESH benchmark). Our results show that the coverage for version 4.5 is nearly complete in both latest NVHPC and oneAPI tools. However, we observed a lack of support in versions 5.0, 5.1, and 5.2, which is particularly noticeable when using NVHPC. From the performance perspective, we found that the PVC1100 and A100 are relatively comparable on the LULESH benchmark. While the A100 is slightly better due to faster memory bandwidth, the PVC1100 reaches the next problem size (400^3) scalably due to the larger memory size.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/10/2018

Dwarfs on Accelerators: Enhancing OpenCL Benchmarking for Heterogeneous Computing Architectures

For reasons of both performance and energy efficiency, high-performance ...
research
11/06/2020

Study of Automatic GPU Offloading Method from Various Language Applications

In recent years, utilization of heterogeneous hardware other than small ...
research
10/16/2021

Challenges Porting a C++ Template-Metaprogramming Abstraction Layer to Directive-based Offloading

HPC systems employ a growing variety of compute accelerators with differ...
research
09/18/2023

Comparing Performance and Portability between CUDA and SYCL for Protein Database Search on NVIDIA, AMD, and Intel GPUs

The heterogeneous computing paradigm has led to the need for portable an...
research
11/01/2022

Apple Silicon Performance in Scientific Computing

With the release of the Apple Silicon System-on-a-Chip processors, and t...
research
10/04/2020

High level programming abstractions for leveraging hierarchical memories with micro-core architectures

Micro-core architectures combine many low memory, low power computing co...
research
10/19/2020

Performance Assessment of OpenMP Compilers Targeting NVIDIA V100 GPUs

Heterogeneous systems are becoming increasingly prevalent. In order to e...

Please sign up or login with your details

Forgot password? Click here to reset