Early DSE and Automatic Generation of Coarse Grained Merged Accelerators

by   Iulian Brumar, et al.

Post-Moore's law area-constrained systems rely on accelerators to deliver performance enhancements. Coarse grained accelerators can offer substantial domain acceleration, but manual, ad-hoc identification of code to accelerate is prohibitively expensive. Because cycle-accurate simulators and high-level synthesis flows are so time-consuming, manual creation of high-utilization accelerators that exploit control and data flow patterns at optimal granularities is rarely successful. To address these challenges, we present AccelMerger, the first automated methodology to create coarse grained, control- and data-flow-rich, merged accelerators. AccelMerger uses sequence alignment matching to recognize similar function call-graphs and loops, and neural networks to quickly evaluate their post-HLS characteristics. It accurately identifies which functions to accelerate, and it merges accelerators to respect an area budget and to accommodate system communication characteristics like latency and bandwidth. Merging two accelerators can save as much as 99 area of one. The space saved is used by a globally optimal integer linear program to allocate more accelerators for increased performance. We demonstate AccelMerger's effectiveness using HLS flows without any manual effort to fine-tune the resulting designs. On FPGA-based systems, AccelMerger yields application performance improvements of up to 16.7x over software implementations, and 1.91x on average with respect to state-of-the-art early-stage design space exploration tools.


page 2

page 6

page 7

page 11

page 15


PAGURUS: Low-Overhead Dynamic Information Flow Tracking on Loosely Coupled Accelerators

Software-based attacks exploit bugs or vulnerabilities to get unauthoriz...

A Compilation Flow for the Generation of CNN Inference Accelerators on FPGAs

We present a compilation flow for the generation of CNN inference accele...

Address Translation Design Tradeoffs for Heterogeneous Systems

This paper presents a broad, pathfinding design space exploration of mem...

Extending High-Level Synthesis for Task-Parallel Programs

C/C++/OpenCL-based high-level synthesis (HLS) becomes more and more popu...

autoXFPGAs: An End-to-End Automated Exploration Framework for Approximate Accelerators in FPGA-Based Systems

Generation and exploration of approximate circuits and accelerators has ...

ApproxFPGAs: Embracing ASIC-Based Approximate Arithmetic Components for FPGA-Based Systems

There has been abundant research on the development of Approximate Circu...

Data-Driven Offline Optimization For Architecting Hardware Accelerators

Industry has gradually moved towards application-specific hardware accel...

Please sign up or login with your details

Forgot password? Click here to reset