Automatic Nested Loop Acceleration on FPGAs Using Soft CGRA Overlay

08/27/2015
by   Cheng Liu, et al.
0

Offloading compute intensive nested loops to execute on FPGA accelerators have been demonstrated by numerous researchers as an effective performance enhancement technique across numerous application domains. To construct such accelerators with high design productivity, researchers have increasingly turned to the use of overlay architectures as an intermediate generation target built on top of off-the-shelf FPGAs. However, achieving the desired performance-overhead trade-off remains a major productivity challenge as complex application-specific customizations over a large design space covering multiple architectural parameters are needed. In this work, an automatic nested loop acceleration framework utilizing a regular soft coarse-grained reconfigurable array (SCGRA) overlay is presented. Given high-level resource constraints, the framework automatically customizes the overlay architectural design parameters, high-level compilation options as well as communication between the accelerator and the host processor for optimized performance specifically to the given application. In our experiments, at a cost of 10 to 20 minutes additional tools run time, the proposed customization process resulted in up to 5 times additional speedup over a baseline accelerator generated by the same framework without customization. Overall, when compared to the equivalent software running on the host ARM processor alone on the Zedboard, the resulting accelerators achieved up to 10 times speedup.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2016

A Soft Processor Overlay with Tightly-coupled FPGA Accelerator

FPGA overlays are commonly implemented as coarse-grained reconfigurable ...
research
10/23/2019

Sidebar: Scratchpad Based Communication Between CPUs and Accelerators

Hardware accelerators for neural networks have shown great promise for b...
research
09/23/2020

Extending High-Level Synthesis for Task-Parallel Programs

C/C++/OpenCL-based high-level synthesis (HLS) becomes more and more popu...
research
05/08/2017

Resource-Aware Just-in-Time OpenCL Compiler for Coarse-Grained FPGA Overlays

FPGA vendors have recently started focusing on OpenCL for FPGAs because ...
research
01/07/2023

Duet: Creating Harmony between Processors and Embedded FPGAs

The demise of Moore's Law has led to the rise of hardware acceleration. ...
research
07/29/2020

Transaction-level Model Simulator for Communication-Limited Accelerators

Rapid design space exploration in early design stage is critical to algo...
research
11/12/2021

Elastic Silicon Interconnects: Abstracting Communication in Accelerator Design

Communication is an important part of accelerator design, though it is u...

Please sign up or login with your details

Forgot password? Click here to reset