Hyperbolic Diffusion in Flux Reconstruction: Optimisation through Kernel Fusion within Tensor-Product Elements

07/22/2021
by   Will Trojak, et al.
0

Novel methods are presented in this initial study for the fusion of GPU kernels in the artificial compressibility method (ACM), using tensor product elements with constant Jacobians and flux reconstruction. This is made possible through the hyperbolisation of the diffusion terms, which eliminates the expensive algorithmic steps needed to form the viscous stresses. Two fusion approaches are presented, which offer differing levels of parallelism. This is found to be necessary for the change in workload as the order of accuracy of the elements is increased. Several further optimisations of these approaches are demonstrated, including a generation time memory manager which maximises resource usage. The fused kernels are able to achieve 3-4 times speedup, which compares favourably with a theoretical maximum speedup of 4. In three dimensional test cases, the generated fused kernels are found to reduce total runtime by ∼25%, and, when compared to the standard ACM formulation, simulations demonstrate that a speedup of 2.3 times can be achieved.

READ FULL TEXT
research
07/02/2020

Automatic Horizontal Fusion for GPU Kernels

We present automatic horizontal fusion, a novel optimization technique t...
research
09/18/2021

Reconfigurable Low-latency Memory System for Sparse Matricized Tensor Times Khatri-Rao Product on FPGA

Tensor decomposition has become an essential tool in many applications i...
research
09/23/2020

FusionStitching: Boosting Memory Intensive Computations for Deep Learning Workloads

We show in this work that memory intensive computations can result in se...
research
01/30/2023

Operator Fusion in XLA: Analysis and Evaluation

Machine learning (ML) compilers are an active area of research because t...
research
11/24/2021

Composing Loop-carried Dependence with Other Loops

Sparse fusion is a compile-time loop transformation and runtime scheduli...
research
08/24/2022

A Scalable and Energy Efficient GPU Thread Map for m-Simplex Domains

This work proposes a new GPU thread map for m-simplex domains, that scal...

Please sign up or login with your details

Forgot password? Click here to reset