FADO: Floorplan-Aware Directive Optimization for High-Level Synthesis Designs on Multi-Die FPGAs

12/22/2022
by   Linfeng Du, et al.
0

Multi-die FPGAs are widely adopted to deploy large hardware accelerators. Two factors impede the performance optimization of HLS designs implemented on multi-die FPGAs. On the one hand, the long net delay due to nets crossing die-boundaries results in an NP-hard problem to properly floorplan and pipeline an application. On the other hand, traditional automated searching flow for HLS directive optimizations targets single-die FPGAs, and hence, it cannot consider the resource constraints on each die and the timing issue incurred by the die-crossings. Further, it leads to an excessively long runtime to legalize the floorplan of HLS designs generated under each group of configurations during directive optimization due to the large design scale. To co-optimize the directives and floorplan of HLS designs on multi-die FPGAs, we propose the FADO framework, which formulates the directive-floorplan co-search problem based on the multi-choice multi-dimensional bin-packing and solves it using an iterative optimization flow. For each step of directive search, a latency-bottleneck-guided greedy algorithm searches for more efficient directive configurations. For floorplanning, instead of repetitively incurring global floorplanning algorithms, we implement a more efficient incremental floorplan legalization algorithm. It mainly applies the worst-fit online bin-packing algorithm to balance the floorplan, together with an offline best-fit-decreasing re-packing to compact the floorplan, followed by pipelining of long wires crossing die-boundaries. Through experiments on HLS designs mixing dataflow and non-dataflow kernels, FADO not only well-automates the co-optimization and finishes within 693X 4925X shorter runtime, compared with DSE assisted by global floorplanning, but also yields an improvement of 1.16X 8.78X in overall workflow execution time after implementation on the Xilinx Alveo U250 FPGA.

READ FULL TEXT

page 2

page 4

page 8

research
03/08/2022

A Compilation Flow for the Generation of CNN Inference Accelerators on FPGAs

We present a compilation flow for the generation of CNN inference accele...
research
07/03/2022

Chimera: A Hybrid Machine Learning Driven Multi-Objective Design Space Exploration Tool for FPGA High-Level Synthesis

In recent years, hardware accelerators based on field-programmable gate ...
research
08/04/2023

Automatic multi-dimensional pipelining for high-level synthesis of dataflow accelerators

In recent years, there has been a surging demand for edge computing of i...
research
09/19/2022

Temporal Vectorization: A Compiler Approach to Automatic Multi-Pumping

The multi-pumping resource sharing technique can overcome the limitation...
research
03/26/2020

Enabling Efficient and Flexible FPGA Virtualization for Deep Learning in the Cloud

FPGAs have shown great potential in providing low-latency and energy-eff...
research
12/08/2022

HLS-based Optimization of Tau Triggering Algorithm for LHC: a case study

With the current increase in the data produced by the Large Hadron Colli...
research
03/29/2020

Analytical Model of Memory-Bound Applications Compiled with High Level Synthesis

The increasing demand of dedicated accelerators to improve energy effici...

Please sign up or login with your details

Forgot password? Click here to reset