Benchmarking High Bandwidth Memory on FPGAs

05/09/2020
by   Zeke Wang, et al.
0

FPGAs are starting to be enhanced with High Bandwidth Memory (HBM) as a way to reduce the memory bandwidth bottleneck encountered in some applications and to give the FPGA more capacity to deal with application state. However, the performance characteristics of HBM are still not well specified, especially in the context of FPGAs. In this paper, we bridge the gap between nominal specifications and actual performance by benchmarkingHBM on a state-of-the-art FPGA, i.e., a Xilinx Alveo U280 featuring a two-stack HBM subsystem. To this end, we propose Shuhai, a benchmarking tool that allows us to demystify all the underlying details of HBM on an FPGA. FPGA-based benchmarking should also provide a more accurate picture of HBM than doing so on CPUs/GPUs, since CPUs/GPUs are noisier systems due to their complex control logic and cache hierarchy. Since the memory itself is complex, leveraging custom hardware logic to benchmark inside an FPGA provides more details as well as accurate and deterministic measurements. We observe that 1) HBM is able to provide up to 425GB/s memory bandwidth, and 2) how HBM is used has a significant impact on performance, which in turn demonstrates the importance of unveiling the performance characteristics of HBM so as to select the best approach. As a yardstick, we also applyShuhaito DDR4to show the differences between HBM and DDR4.Shuhai can be easily generalized to other FPGA boards or other generations of memory, e.g., HBM3, and DDR3. We will makeShuhaiopen-source, benefiting the community

READ FULL TEXT
research
10/12/2020

When HLS Meets FPGA HBM: Benchmarking and Bandwidth Optimization

With the recent release of High Bandwidth Memory (HBM) based FPGA boards...
research
10/18/2020

Optimizing Memory Performance of Xilinx FPGAs under Vitis

Plenty of research efforts have been devoted to FPGA-based acceleration,...
research
06/10/2020

Unified Characterization Platform for Emerging NVM Technology: Neural Network Application Benchmarking Using off-the-shelf NVM Chips

In this paper, we present a unified FPGA based electrical test-bench for...
research
08/29/2022

Improving the Efficiency of OpenCL Kernels through Pipes

In an effort to lower the barrier to the adoption of FPGAs by a broader ...
research
05/09/2018

Parallel Programming for FPGAs

This book focuses on the use of algorithmic high-level synthesis (HLS) t...
research
02/08/2023

Feature-based SpMV Performance Analysis on Contemporary Devices

The SpMV kernel is characterized by high performance variation per input...
research
12/03/2018

Programming Strategies for Irregular Algorithms on the Emu Chick

The Emu Chick prototype implements migratory memory-side processing in a...

Please sign up or login with your details

Forgot password? Click here to reset