Performance Assessment of OpenMP Compilers Targeting NVIDIA V100 GPUs

10/19/2020
by   Joshua Hoke Davis, et al.
0

Heterogeneous systems are becoming increasingly prevalent. In order to exploit the rich compute resources of such systems, robust programming models are needed for application developers to seamlessly migrate legacy code from today's systems to tomorrow's. Over the past decade and more, directives have been established as one of the promising paths to tackle programmatic challenges on emerging systems. This work focuses on applying and demonstrating OpenMP offloading directives on five proxy applications. We observe that the performance varies widely from one compiler to the other; a crucial aspect of our work is reporting best practices to application developers who use OpenMP offloading compilers. While some issues can be worked around by the developer, there are other issues that must be reported to the compiler vendors. By restructuring OpenMP offloading directives, we gain an 18x speedup for the su3 proxy application on NERSC's Cori system when using the Clang compiler, and a 15.7x speedup by switching max reductions to add reductions in the laplace mini-app when using the Cray-llvm compiler on Cori.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/27/2022

Porting OpenACC to OpenMP on heterogeneous systems

This documentation is designed for beginners in Graphics Processing Unit...
research
10/16/2021

Challenges Porting a C++ Template-Metaprogramming Abstraction Layer to Directive-based Offloading

HPC systems employ a growing variety of compute accelerators with differ...
research
08/25/2021

Visualizing JIT Compiler Graphs

Just-in-time (JIT) compilers are used by many modern programming systems...
research
10/25/2017

Performance Portability Strategies for Grid C++ Expression Templates

One of the key requirements for the Lattice QCD Application Development ...
research
05/06/2023

Revisiting Lightweight Compiler Provenance Recovery on ARM Binaries

A binary's behavior is greatly influenced by how the compiler builds its...
research
04/09/2023

Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators

Over the last decade, most of the increase in computing power has been g...
research
11/06/2017

Comparison of Parallelisation Approaches, Languages, and Compilers for Unstructured Mesh Algorithms on GPUs

Efficiently exploiting GPUs is increasingly essential in scientific comp...

Please sign up or login with your details

Forgot password? Click here to reset