Automatic multi-dimensional pipelining for high-level synthesis of dataflow accelerators

08/04/2023
by   Kingshuk Majumder, et al.
0

In recent years, there has been a surging demand for edge computing of image processing and machine learning workloads. This has reignited interest in the development of custom hardware accelerators that can deliver enhanced performance and improved energy efficiency. These workloads frequently demonstrate affine memory accesses and constant loop bounds. In this paper, we introduce an ILP-based automatic scheduler for high-level synthesis, with a specific emphasis on aggressive pipelining to enhance parallelism. In this study, we propose a unified Integer Linear Programming (ILP) formulation that can identify pipelining opportunities along multiple loop and scalar dimensions. Our multi-dimensional pipelining technique encompasses both inner loop pipelining and dataflow optimizations of Vitis HLS, while also being capable of handling more general memory access patterns compared to the dataflow optimization in Vitis HLS. Furthermore, our approach enables the generation of statically scheduled circuits, leading to improved resource efficiency. We have integrated our scheduler into a high-level synthesis compiler framework (HIR) based on MLIR and conducted performance evaluations. Our findings reveal that our scheduler, in comparison to Vitis HLS, can achieve more aggressive pipelining across multiple producer-consumer loop nests, resulting in reduced overall execution latency. The producer-consumer pipelined execution facilitated by our scheduler yields an average performance improvement of 2.42X across a set of representative benchmarks when compared to only loop pipelining. Furthermore, we achieved an average performance improvement of 1.30X over Vitis HLS with dataflow optimizations.

READ FULL TEXT
research
02/27/2021

HIR: An MLIR-based Intermediate Representation for Hardware Accelerator Description

The emergence of machine learning, image and audio processing on edge de...
research
01/15/2018

Improving Communication Patterns in Polyhedral Process Networks

Embedded system performances are bounded by power consumption. The trend...
research
12/24/2016

Application-aware Retiming of Accelerators: A High-level Data-driven Approach

Flexibility at hardware level is the main driving force behind adaptive ...
research
03/29/2020

Analytical Model of Memory-Bound Applications Compiled with High Level Synthesis

The increasing demand of dedicated accelerators to improve energy effici...
research
12/22/2022

FADO: Floorplan-Aware Directive Optimization for High-Level Synthesis Designs on Multi-Die FPGAs

Multi-die FPGAs are widely adopted to deploy large hardware accelerators...
research
12/20/2021

Dijkstra-Through-Time: Ahead of time hardware scheduling method for deterministic workloads

Most of the previous works on data flow optimizations for Machine Learni...
research
03/28/2018

An Approach for Finding Permutations Quickly: Fusion and Dimension matching

Polyhedral compilers can perform complex loop optimizations that improve...

Please sign up or login with your details

Forgot password? Click here to reset