From Merging Frameworks to Merging Stars: Experiences using HPX, Kokkos and SIMD Types

09/26/2022
by   Gregor Daiß, et al.
0

Octo-Tiger, a large-scale 3D AMR code for the merger of stars, uses a combination of HPX, Kokkos and explicit SIMD types, aiming to achieve performance-portability for a broad range of heterogeneous hardware. However, on A64FX CPUs, we encountered several missing pieces, hindering performance by causing problems with the SIMD vectorization. Therefore, we add std::experimental::simd as an option to use in Octo-Tiger's Kokkos kernels alongside Kokkos SIMD, and further add a new SVE (Scalable Vector Extensions) SIMD backend. Additionally, we amend missing SIMD implementations in the Kokkos kernels within Octo-Tiger's hydro solver. We test our changes by running Octo-Tiger on three different CPUs: An A64FX, an Intel Icelake and an AMD EPYC CPU, evaluating SIMD speedup and node-level performance. We get a good SIMD speedup on the A64FX CPU, as well as noticeable speedups on the other two CPU platforms. However, we also experience a scaling issue on the EPYC CPU.

READ FULL TEXT
research
05/13/2021

Efficient executions of Pipelined Conjugate Gradient Method on Heterogeneous Architectures

The Preconditioned Conjugate Gradient (PCG) method is widely used for so...
research
07/27/2023

SPC5: an efficient SpMV framework vectorized using ARM SVE and x86 AVX-512

The sparse matrix/vector product (SpMV) is a fundamental operation in sc...
research
05/10/2019

K-Athena: a performance portable structured grid finite volume magnetohydrodynamics code

Large scale simulations are a key pillar of modern research and require ...
research
05/27/2021

Early Experiences Migrating CUDA codes to oneAPI

The heterogeneous computing paradigm represents a real programming chall...
research
10/18/2017

Wilson and Domainwall Kernels on Oakforest-PACS

We report the performance of Wilson and Domainwall Kernels on a new Inte...
research
03/04/2023

Stellar Mergers with HPX-Kokkos and SYCL: Methods of using an Asynchronous Many-Task Runtime System with SYCL

Ranging from NVIDIA GPUs to AMD GPUs and Intel GPUs: Given the heterogen...
research
05/31/2023

ReDSEa: Automated Acceleration of Triangular Solver on Supercloud Heterogeneous Systems

When utilized effectively, Supercloud heterogeneous systems have the pot...

Please sign up or login with your details

Forgot password? Click here to reset