TopSort: A High-Performance Two-Phase Sorting Accelerator Optimized on HBM-based FPGAs

05/16/2022
by   Weikang Qiao, et al.
0

The emergence of high-bandwidth memory (HBM) brings new opportunities to boost the performance of sorting acceleration on FPGAs, which was conventionally bounded by the available off-chip memory bandwidth. However, it is nontrivial for designers to fully utilize this immense bandwidth. First, the existing sorter designs cannot be directly scaled at the increasing rate of available off-chip bandwidth, as the required on-chip resource usage grows at a much faster rate and would bound the sorting performance in turn. Second, designers need an in-depth understanding of HBM characteristics to effectively utilize the HBM bandwidth. To tackle these challenges, we present TopSort, a novel two-phase sorting solution optimized for HBM-based FPGAs. In the first phase, 16 merge trees work in parallel to fully utilize 32 HBM channels. In the second phase, TopSort reuses the logic from phase one to form a wider merge tree to merge the partially sorted results from phase one. TopSort also adopts HBM-specific optimizations to reduce resource overhead and improve bandwidth utilization. TopSort can sort up to 4 GB data using all 32 HBM channels, with an overall sorting performance of 15.6 GB/s. TopSort is 6.7x and 2.2x faster than state-of-the-art CPU and FPGA sorters.

READ FULL TEXT

page 8

page 11

research
10/12/2020

When HLS Meets FPGA HBM: Benchmarking and Bandwidth Optimization

With the recent release of High Bandwidth Memory (HBM) based FPGA boards...
research
05/25/2021

ScalaBFS: A Scalable BFS Accelerator on HBM-Enhanced FPGAs

High Bandwidth Memory (HBM) provides massive aggregated memory bandwidth...
research
03/02/2020

High Performance Parallel Sort for Shared and Distributed Memory MIMD

We present four high performance hybrid sorting methods developed for va...
research
07/12/2023

WiscSort: External Sorting For Byte-Addressable Storage

We present WiscSort, a new approach to high-performance concurrent sorti...
research
12/10/2021

FLiMS: a Fast Lightweight 2-way Merger for Sorting

In this paper, we present FLiMS, a highly-efficient and simple parallel ...
research
11/08/2022

Iris: Automatic Generation of Efficient Data Layouts for High Bandwidth Utilization

Optimizing data movements is becoming one of the biggest challenges in h...
research
07/17/2017

Performance Implications of NoCs on 3D-Stacked Memories: Insights from the Hybrid Memory Cube

Memories that exploit three-dimensional (3D)-stacking technology, which ...

Please sign up or login with your details

Forgot password? Click here to reset