SASA: A Scalable and Automatic Stencil Acceleration Framework for Optimized Hybrid Spatial and Temporal Parallelism on HBM-based FPGAs

08/23/2022
by   Xingyu Tian, et al.
0

Stencil computation is one of the fundamental computing patterns in many application domains such as scientific computing and image processing. While there are promising studies that accelerate stencils on FPGAs, there lacks an automated acceleration framework to systematically explore both spatial and temporal parallelisms for iterative stencils that could be either computation-bound or memory-bound. In this paper, we present SASA, a scalable and automatic stencil acceleration framework on modern HBM-based FPGAs. SASA takes the high-level stencil DSL and FPGA platform as inputs, automatically exploits the best spatial and temporal parallelism configuration based on our accurate analytical model, and generates the optimized FPGA design with the best parallelism configuration in TAPA high-level synthesis C++ as well as its corresponding host code. Compared to state-of-the-art automatic stencil acceleration framework SODA that only exploits temporal parallelism, SASA achieves an average speedup of 3.74x and up to 15.73x speedup on the HBM-based Xilinx Alveo U280 FPGA board for a wide range of stencil kernels.

READ FULL TEXT

page 1

page 19

page 20

page 21

page 22

page 23

page 25

page 26

research
05/09/2018

Parallel Programming for FPGAs

This book focuses on the use of algorithmic high-level synthesis (HLS) t...
research
11/11/2016

Revisiting FPGA Acceleration of Molecular Dynamics Simulation with Dynamic Data Flow Behavior in High-Level Synthesis

Molecular dynamics (MD) simulation is one of the past decade's most impo...
research
03/02/2023

HitGNN: High-throughput GNN Training Framework on CPU+Multi-FPGA Heterogeneous Platform

As the size of real-world graphs increases, training Graph Neural Networ...
research
08/11/2023

INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order Gradient Computations in Implicit Neural Representation Processing

An increasing number of researchers are finding use for nth-order gradie...
research
02/01/2018

Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL

Recent developments in High Level Synthesis tools have attracted softwar...
research
10/26/2018

A Scalable Pipelined Dataflow Accelerator for Object Region Proposals on FPGA Platform

Region proposal is critical for object detection while it usually poses ...
research
06/09/2022

Spatial-temporal Concept based Explanation of 3D ConvNets

Recent studies have achieved outstanding success in explaining 2D image ...

Please sign up or login with your details

Forgot password? Click here to reset