Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication

11/24/2021
by   Linghao Song, et al.
0

Sparse matrix-vector multiplication (SpMV) multiplies a sparse matrix with a dense vector. SpMV plays a crucial role in many applications, from graph analytics to deep learning. The random memory accesses of the sparse matrix make accelerator design challenging. However, high bandwidth memory (HBM) based FPGAs are a good fit for designing accelerators for SpMV. In this paper, we present Serpens, an HBM based accelerator for general-purpose SpMV.Serpens features (1) a general-purpose design, (2) memory-centric processing engines, and (3) index coalescing to support the efficient processing of arbitrary SpMVs. From the evaluation of twelve large-size matrices, Serpens is 1.91x and 1.76x better in terms of geomean throughput than the latest accelerators GraphLiLy and Sextans, respectively. We also evaluate 2,519 SuiteSparse matrices, and Serpens achieves 2.10x higher throughput than a K80 GPU. For the energy/bandwidth efficiency, Serpens is 1.71x/1.99x, 1.90x/2.69x, and 6.25x/4.06x better compared with GraphLily, Sextans, and K80, respectively. After scaling up to 24 HBM channels, Serpens achieves up to 30,204MTEPS and up to 3.79x over GraphLily.

READ FULL TEXT
research
09/22/2021

Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication

Sparse-Matrix Dense-Matrix multiplication (SpMM) is the key operator for...
research
03/08/2021

Scaling up HBM Efficiency of Top-K SpMV for Approximate Embedding Similarity on FPGAs

Top-K SpMV is a key component of similarity-search on sparse embeddings....
research
09/28/2022

Callipepla: Stream Centric Instruction Set and Mixed Precision for Accelerating Conjugate Gradient Solver

The continued growth in the processing power of FPGAs coupled with high ...
research
07/24/2023

Entropy Maximization in Sparse Matrix by Vector Multiplication (max_E SpMV)

The peak performance of any SpMV depends primarily on the available memo...
research
07/23/2022

Bandwidth-Hard Functions from Random Permutations

ASIC hash engines are specifically optimized for parallel computations o...
research
03/28/2013

A Massively Parallel Associative Memory Based on Sparse Neural Networks

Associative memories store content in such a way that the content can be...
research
07/19/2019

PPAC: A Versatile In-Memory Accelerator for Matrix-Vector-Product-Like Operations

Processing in memory (PIM) moves computation into memories with the goal...

Please sign up or login with your details

Forgot password? Click here to reset