Improving Communication Patterns in Polyhedral Process Networks

01/15/2018
by   Christophe Alias, et al.
0

Embedded system performances are bounded by power consumption. The trend is to offload greedy computations on hardware accelerators as GPU, Xeon Phi or FPGA. FPGA chips combine both flexibility of programmable chips and energy-efficiency of specialized hardware and appear as a natural solution. Hardware compilers from high-level languages (High-level synthesis, HLS) are required to exploit all the capabilities of FPGA while satisfying tight time-to-market constraints. Compiler optimizations for parallelism and data locality restructure deeply the execution order of the processes, hence the read/write patterns in communication channels. This breaks most FIFO channels, which have to be implemented with addressable buffers. Expensive hardware is required to enforce synchronizations, which often results in dramatic performance loss. In this paper, we present an algorithm to partition the communications so that most FIFO channels can be recovered after a loop tiling, a key optimization for parallelism and data locality. Experimental results show a drastic improvement of FIFO detection for regular kernels at the cost of a few additional storage. As a bonus, the storage can even be reduced in some cases.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/04/2023

Automatic multi-dimensional pipelining for high-level synthesis of dataflow accelerators

In recent years, there has been a surging demand for edge computing of i...
research
05/08/2015

FPGA-Based Bandwidth Selection for Kernel Density Estimation Using High Level Synthesis Approach

FPGA technology can offer significantly higher performance at much lower...
research
03/21/2022

Automatic Creation of High-Bandwidth Memory Architectures from Domain-Specific Languages: The Case of Computational Fluid Dynamics

Numerical simulations can help solve complex problems. Most of these alg...
research
05/09/2018

Parallel Programming for FPGAs

This book focuses on the use of algorithmic high-level synthesis (HLS) t...
research
01/31/2020

Exploiting RapidWright in the Automatic Generation of Application-Specific FPGA Overlays

Overlay architectures implemented on FPGA devices have been proposed as ...
research
02/26/2015

Automatic Optimization of Hardware Accelerators for Image Processing

In the domain of image processing, often real-time constraints are requi...
research
09/24/2019

A high-level characterisation and generalisation of communication-avoiding programming techniques

Today's hardware's explosion of concurrency plus the explosion of data w...

Please sign up or login with your details

Forgot password? Click here to reset