Vectorization and Minimization of Memory Footprint for Linear High-Order Discontinuous Galerkin Schemes

03/28/2020
by   Jean-Matthieu Gallard, et al.
0

We present a sequence of optimizations to the performance-critical compute kernels of the high-order discontinuous Galerkin solver of the hyperbolic PDE engine ExaHyPE – successively tackling bottlenecks due to SIMD operations, cache hierarchies and restrictions in the software design. Starting from a generic scalar implementation of the numerical scheme, our first optimized variant applies state-of-the-art optimization techniques by vectorizing loops, improving the data layout and using Loop-over-GEMM to perform tensor contractions via highly optimized matrix multiplication functions provided by the LIBXSMM library. We show that memory stalls due to a memory footprint exceeding our L2 cache size hindered the vectorization gains. We therefore introduce a new kernel that applies a sum factorization approach to reduce the kernel's memory footprint and improve its cache locality. With the L2 cache bottleneck removed, we were able to exploit additional vectorization opportunities, by introducing a hybrid Array-of-Structure-of-Array data layout that solves the data layout conflict between matrix multiplications kernels and the point-wise functions to implement PDE-specific terms. With this last kernel, evaluated in a benchmark simulation at high polynomial order, only 2% of the floating point operations are still performed using scalar instructions and 22.5% of the available performance is achieved.

READ FULL TEXT

page 1

page 4

page 9

research
03/27/2019

Yet Another Tensor Toolbox for discontinuous Galerkin methods and other applications

The numerical solution of partial differential equations is at the heart...
research
07/02/2020

Monolithic convex limiting in discontinuous Galerkin discretizations of hyperbolic conservation laws

In this work we present a framework for enforcing discrete maximum princ...
research
04/30/2021

Fourier Continuation Discontinuous Galerkin Methods for Linear Hyperbolic Problems

Fourier continuation is an approach used to create periodic extensions o...
research
11/15/2019

Role-Oriented Code Generation in an Engine for Solving Hyperbolic PDE Systems

The development of a high performance PDE solver requires the combined e...
research
07/01/2016

Design of a high-performance GEMM-like Tensor-Tensor Multiplication

We present "GEMM-like Tensor-Tensor multiplication" (GETT), a novel appr...
research
03/04/2021

ECM modeling and performance tuning of SpMV and Lattice QCD on A64FX

The A64FX CPU is arguably the most powerful Arm-based processor design t...
research
06/19/2018

Forest Packing: Fast, Parallel Decision Forests

Machine learning has an emerging critical role in high-performance compu...

Please sign up or login with your details

Forgot password? Click here to reset