Increasing FPGA Accelerators Memory Bandwidth with a Burst-Friendly Memory Layout

02/11/2022
by   Corentin Ferry, et al.
0

Offloading compute-intensive kernels to hardware accelerators relies on the large degree of parallelism offered by these platforms. However, the effective bandwidth of the memory interface often causes a bottleneck, hindering the accelerator's effective performance. Techniques enabling data reuse, such as tiling, lower the pressure on memory traffic but still often leave the accelerators I/O-bound. A further increase in effective bandwidth is possible by using burst rather than element-wise accesses, provided the data is contiguous in memory. In this paper, we propose a memory allocation technique, and provide a proof-of-concept source-to-source compiler pass, that enables such burst transfers by modifying the data layout in external memory. We assess how this technique pushes up the memory throughput, leaving room for exploiting additional parallelism, for a minimal logic overhead.

READ FULL TEXT
research
06/03/2016

GRVI Phalanx: A Massively Parallel RISC-V FPGA Accelerator Accelerator

GRVI is an FPGA-efficient RISC-V RV32I soft processor. Phalanx is a para...
research
08/29/2022

Improving the Efficiency of OpenCL Kernels through Pipes

In an effort to lower the barrier to the adoption of FPGAs by a broader ...
research
05/02/2022

Zebra: Memory Bandwidth Reduction for CNN Accelerators With Zero Block Regularization of Activation Maps

The large amount of memory bandwidth between local buffer and external D...
research
12/30/2018

ORIGAMI: A Heterogeneous Split Architecture for In-Memory Acceleration of Learning

Memory bandwidth bottleneck is a major challenges in processing machine ...
research
06/20/2023

An Introduction to the Compute Express Link (CXL) Interconnect

The Compute Express Link (CXL) is an open industry-standard interconnect...
research
09/20/2018

SoaAlloc: Accelerating Single-Method Multiple-Objects Applications on GPUs

We propose SoaAlloc, a dynamic object allocator for Single-Method Multip...
research
11/02/2020

On the Impact of Partial Sums on Interconnect Bandwidth and Memory Accesses in a DNN Accelerator

Dedicated accelerators are being designed to address the huge resource r...

Please sign up or login with your details

Forgot password? Click here to reset