Mapping Stencils on Coarse-grained Reconfigurable Spatial Architecture

11/06/2020
by   Jesmin Jahan Tithi, et al.
0

Stencils represent a class of computational patterns where an output grid point depends on a fixed shape of neighboring points in an input grid. Stencil computations are prevalent in scientific applications engaging a significant portion of supercomputing resources. Therefore, it has been always important to optimize stencil programs for the best performance. A rich body of research has focused on optimizing stencil computations on almost all parallel architectures. Stencil applications have regular dependency patterns, inherent pipeline-parallelism, and plenty of data reuse. This makes these applications a perfect match for a coarse-grained reconfigurable spatial architecture (CGRA). A CGRA consists of many simple, small processing elements (PEs) connected with an on-chip network. Each PE can be configured to execute part of a stencil computation and all PEs run in parallel; the network can also be configured so that data loaded can be passed from a PE to a neighbor PE directly and thus reused by many PEs without register spilling and memory traffic. How to efficiently map a stencil computation to a CGRA is the key to performance. In this paper, we show a few unique and generalizable ways of mapping one- and multidimensional stencil computations to a CGRA, fully exploiting the data reuse opportunities and parallelism. Our simulation experiments demonstrate that these mappings are efficient and enable the CGRA to outperform state-of-the-art GPUs.

READ FULL TEXT
research
09/14/2021

GRiD: GPU-Accelerated Rigid Body Dynamics with Analytical Gradients

We introduce GRiD: a GPU-accelerated library for computing rigid body dy...
research
09/19/2023

Flip: Data-Centric Edge CGRA Accelerator

Coarse-Grained Reconfigurable Arrays (CGRA) are promising edge accelerat...
research
02/13/2023

Revet: A Language and Compiler for Dataflow Threads

Spatial dataflow architectures such as reconfigurable dataflow accelerat...
research
03/02/2023

Q2Logic: An Coarse-Grained Architecture targeting Schrödinger Quantum Circuit Simulations

Quantum computing is emerging as an important (but radical) technology t...
research
04/09/2020

A Survey on Coarse-Grained Reconfigurable Architectures from a Performance Perspective

With the end of both Dennard's scaling and Moore's law, computer users a...
research
03/14/2018

Efficient Realization of Givens Rotation through Algorithm-Architecture Co-design for Acceleration of QR Factorization

We present efficient realization of Generalized Givens Rotation (GGR) ba...

Please sign up or login with your details

Forgot password? Click here to reset