ScalaBFS: A Scalable BFS Accelerator on HBM-Enhanced FPGAs

05/25/2021
by   Kexin Li, et al.
0

High Bandwidth Memory (HBM) provides massive aggregated memory bandwidth by exposing multiple memory channels to the processing units. To achieve high performance, an accelerator built on top of an FPGA configured with HBM (i.e., FPGA-HBM platform) needs to scale its performance according to the available memory channels. In this paper, we propose an accelerator for BFS (Breadth-First Search) algorithm, named as ScalaBFS, that builds multiple processing elements to sufficiently exploit the high bandwidth of HBM to improve efficiency. We implement the prototype system of ScalaBFS and conduct BFS in both real-world and synthetic scale-free graphs on Xilinx Alveo U280 FPGA card real hardware. The experimental results show that ScalaBFS scales its performance almost linearly according to the available memory pseudo channels (PCs) from the HBM2 subsystem of U280. By fully using the 32 PCs and building 64 processing elements (PEs) on U280, ScalaBFS achieves a performance up to 19.7 GTEPS (Giga Traversed Edges Per Second). When conducting BFS in sparse real-world graphs, ScalaBFS achieves equivalent GTEPS to Gunrock running on the state-of-art Nvidia V100 GPU that features 64-PC HBM2 (twice memory bandwidth than U280).

READ FULL TEXT
research
10/12/2020

When HLS Meets FPGA HBM: Benchmarking and Bandwidth Optimization

With the recent release of High Bandwidth Memory (HBM) based FPGA boards...
research
06/03/2016

GRVI Phalanx: A Massively Parallel RISC-V FPGA Accelerator Accelerator

GRVI is an FPGA-efficient RISC-V RV32I soft processor. Phalanx is a para...
research
05/16/2022

TopSort: A High-Performance Two-Phase Sorting Accelerator Optimized on HBM-based FPGAs

The emergence of high-bandwidth memory (HBM) brings new opportunities to...
research
11/08/2022

Iris: Automatic Generation of Efficient Data Layouts for High Bandwidth Utilization

Optimizing data movements is becoming one of the biggest challenges in h...
research
12/03/2018

Programming Strategies for Irregular Algorithms on the Emu Chick

The Emu Chick prototype implements migratory memory-side processing in a...
research
01/13/2017

An OpenCL(TM) Deep Learning Accelerator on Arria 10

Convolutional neural nets (CNNs) have become a practical means to perfor...
research
09/17/2020

NERO: A Near High-Bandwidth Memory Stencil Accelerator for Weather Prediction Modeling

Ongoing climate change calls for fast and accurate weather and climate m...

Please sign up or login with your details

Forgot password? Click here to reset