Locality Optimized Unstructured Mesh Algorithms on GPUs

02/11/2018
by   András Attila Sulyok, et al.
0

Unstructured-mesh based numerical algorithms such as finite volume and finite element algorithms form an important class of applications for many scientific and engineering domains. The key difficulty in achieving higher performance from these applications is the indirect accesses that lead to data-races when parallelized. Current methods for handling such data-races lead to reduced parallelism and suboptimal performance. Particularly on modern many-core architectures, such as GPUs, that has increasing core/thread counts, reducing data movement and exploiting memory locality is vital for gaining good performance. In this work we present novel locality-exploiting optimizations for the efficient execution of unstructured-mesh algorithms on GPUs. Building on a two-layered coloring strategy for handling data races, we introduce novel reordering and partitioning techniques to further improve efficient execution. The new optimizations are then applied to several well established unstructured-mesh applications, investigating their performance on NVIDIA's latest P100 and V100 GPUs. We demonstrate significant speedups (1.1--1.75×) compared to the state-of-the-art. A range of performance metrics are benchmarked including runtime, memory transactions, achieved bandwidth performance, GPU occupancy and data reuse factors and are used to understand and explain the key factors impacting performance. The optimized algorithms are implemented as an open-source software library and we illustrate its use for improving performance of existing or new unstructured-mesh applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2018

Improving Locality of Unstructured Mesh Algorithms on GPUs

To most efficiently utilize modern parallel architectures, the memory ac...
research
11/06/2017

Comparison of Parallelisation Approaches, Languages, and Compilers for Unstructured Mesh Algorithms on GPUs

Efficiently exploiting GPUs is increasingly essential in scientific comp...
research
04/27/2021

Performance Portable Back-projection Algorithms on CPUs: Agnostic Data Locality and Vectorization Optimizations

Computed Tomography (CT) is a key 3D imaging technology that fundamental...
research
08/10/2017

Automated Tiling of Unstructured Mesh Computations with Application to Seismological Modelling

Sparse tiling is a technique to fuse loops that access common data, thus...
research
09/10/2021

Efficient Exascale Discretizations: High-Order Finite Element Methods

Efficient exploitation of exascale architectures requires rethinking of ...
research
08/16/2019

ArborX: A Performance Portable Search Library

Searching for geometric objects that are close in space is a fundamental...
research
08/16/2019

ArborX: A Performance Portable Geometric Search Library

Searching for geometric objects that are close in space is a fundamental...

Please sign up or login with your details

Forgot password? Click here to reset