Exploiting Scratchpad Memory for Deep Temporal Blocking: A case study for 2D Jacobian 5-point iterative stencil kernel (j2d5pt)

06/06/2023
by   Lingqi Zhang, et al.
0

General Purpose Graphics Processing Units (GPGPU) are used in most of the top systems in HPC. The total capacity of scratchpad memory has increased by more than 40 times in the last decade. However, existing optimizations for stencil computations using temporal blocking have not aggressively exploited the large capacity of scratchpad memory. This work uses the 2D Jacobian 5-point iterative stencil as a case study to investigate the use of large scratchpad memory. Unlike existing research that tiles the domain in a thread block fashion, we tile the domain so that each tile is large enough to utilize all available scratchpad memory on the GPU. Consequently, we process several time steps inside a single tile before offloading the result back to global memory. Our evaluation shows that our performance is comparable to state-of-the-art implementations, yet our implementation is much simpler and does not require auto-generation of code.

READ FULL TEXT
research
05/12/2023

Revisiting Temporal Blocking Stencil Optimizations

Iterative stencils are used widely across the spectrum of High Performan...
research
01/06/2020

AN5D: Automated Stencil Framework for High-Degree Temporal Blocking on GPUs

Stencil computation is one of the most widely-used compute patterns in h...
research
10/20/2020

Temporal blocking of finite-difference stencil operators with sparse "off-the-grid" sources

Stencil kernels dominate a range of scientific applications, including s...
research
12/26/2021

Asynchronous Memory Access Unit for General Purpose Processors

In future data centers, applications will make heavy use of far memory (...
research
07/16/2022

An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System

Training machine learning (ML) algorithms is a computationally intensive...
research
04/05/2022

Persistent Kernels for Iterative Memory-bound GPU Applications

Iterative memory-bound solvers commonly occur in HPC codes. Typical GPU ...
research
06/21/2023

Constant Memory Attention Block

Modern foundation model architectures rely on attention mechanisms to ef...

Please sign up or login with your details

Forgot password? Click here to reset