Comparison of Parallelisation Approaches, Languages, and Compilers for Unstructured Mesh Algorithms on GPUs

11/06/2017
by   G. D. Balogh, et al.
0

Efficiently exploiting GPUs is increasingly essential in scientific computing, as many current and upcoming supercomputers are built using them. To facilitate this, there are a number of programming approaches, such as CUDA, OpenACC and OpenMP 4, supporting different programming languages (mainly C/C++ and Fortran). There are also several compiler suites (clang, nvcc, PGI, XL) each supporting different combinations of languages. In this study, we take a detailed look at some of the currently available options, and carry out a comprehensive analysis and comparison using computational loops and applications from the domain of unstructured mesh computations. Beyond runtimes and performance metrics (GB/s), we explore factors that influence performance such as register counts, occupancy, usage of different memory types, instruction counts, and algorithmic differences. Results of this work show how clang's CUDA compiler frequently outperform NVIDIA's nvcc, performance issues with directive-based approaches on complex kernels, and OpenMP 4 support maturing in clang and XL; currently around 10

READ FULL TEXT

page 8

page 14

page 15

page 18

research
02/11/2018

Locality Optimized Unstructured Mesh Algorithms on GPUs

Unstructured-mesh based numerical algorithms such as finite volume and f...
research
02/11/2018

Improving Locality of Unstructured Mesh Algorithms on GPUs

To most efficiently utilize modern parallel architectures, the memory ac...
research
02/13/2023

Revet: A Language and Compiler for Dataflow Threads

Spatial dataflow architectures such as reconfigurable dataflow accelerat...
research
10/16/2020

Probabilistic Programming with CuPPL

Probabilistic Programming Languages (PPLs) are a powerful tool in machin...
research
09/26/2020

Applying Type Oriented Programming to the PGAS Memory Model

The Partitioned Global Address Space memory model has been popularised b...
research
12/21/2018

Towards Automatic Transformation of Legacy Scientific Code into OpenCL for Optimal Performance on FPGAs

There is a large body of legacy scientific code written in languages lik...
research
10/19/2020

Performance Assessment of OpenMP Compilers Targeting NVIDIA V100 GPUs

Heterogeneous systems are becoming increasingly prevalent. In order to e...

Please sign up or login with your details

Forgot password? Click here to reset