Accelerating High-Order Stencils on GPUs

09/10/2020
by   Ryuichi Sai, et al.
0

Stencil computations are widely used in HPC applications. Today, many HPC platforms use GPUs as accelerators. As a result, understanding how to perform stencil computations fast on GPUs is important. While implementation strategies for low-order stencils on GPUs have been well-studied in the literature, not all of proposed enhancements work well for high-order stencils, such as those used for seismic modeling. Furthermore, coping with boundary conditions often requires different computational logic, which complicates efficient exploitation of the thread-level parallelism on GPUs. In this paper, we study high-order stencils and their unique characteristics on GPUs. We manually crafted a collection of implementations of a 25-point seismic modeling stencil in CUDA and related boundary conditions. We evaluate their code shapes, memory hierarchy usage, data-fetching patterns, and other performance attributes. We conducted an empirical evaluation of these stencils using several mature and emerging tools and discuss our quantitative findings. Among our implementations, we achieve twice the performance of a proprietary code developed in C and mapped to GPUs using OpenACC. Additionally, several of our implementations have excellent performance portability.

READ FULL TEXT

page 4

page 10

research
09/09/2023

Towards Accelerating High-Order Stencils on Modern GPUs and Emerging Architectures with a Portable Framework

PDE discretization schemes yielding stencil-like computing patterns are ...
research
10/23/2018

High Performance Computing with FPGAs and OpenCL

In this work we evaluate the potential of FPGAs for accelerating HPC wor...
research
06/07/2021

High Order Impedance Boundary Condition for the Three-dimensional Scattering Problem in Electromagnetism

In this paper, we propose a variational formulation with the use of high...
research
03/10/2023

Evaluating performance and portability of high-level programming models: Julia, Python/Numba, and Kokkos on exascale nodes

We explore the performance and portability of the high-level programming...
research
05/26/2021

kEDM: A Performance-portable Implementation of Empirical Dynamic Modeling using Kokkos

Empirical Dynamic Modeling (EDM) is a state-of-the-art non-linear time-s...
research
09/12/2019

PittPack: An Open-Source Poisson's Equation Solver for Extreme-Scale Computing with Accelerators

We present a parallel implementation of a direct solver for the Poisson'...
research
03/09/2021

Fast tree-based algorithms for DBSCAN on GPUs

DBSCAN is a well-known density-based clustering algorithm to discover cl...

Please sign up or login with your details

Forgot password? Click here to reset