Scaling up HBM Efficiency of Top-K SpMV for Approximate Embedding Similarity on FPGAs

by   Alberto Parravicini, et al.

Top-K SpMV is a key component of similarity-search on sparse embeddings. This sparse workload does not perform well on general-purpose NUMA systems that employ traditional caching strategies. Instead, modern FPGA accelerator cards have a few tricks up their sleeve. We introduce a Top-K SpMV FPGA design that leverages reduced precision and a novel packet-wise CSR matrix compression, enabling custom data layouts and delivering bandwidth efficiency often unreachable even in architectures with higher peak bandwidth. With HBM-based boards, we are 100x faster than a multi-threaded CPU implementation and 2x faster than a GPU with 20 power-efficiency.



page 1

page 2

page 3

page 4


Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication

Sparse matrix-vector multiplication (SpMV) multiplies a sparse matrix wi...

Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication

Sparse-Matrix Dense-Matrix multiplication (SpMM) is the key operator for...

High-Performance Spectral Element Methods on Field-Programmable Gate Arrays

Improvements in computer systems have historically relied on two well-kn...

ScalaBFS: A Scalable BFS Accelerator on HBM-Enhanced FPGAs

High Bandwidth Memory (HBM) provides massive aggregated memory bandwidth...

An OpenCL(TM) Deep Learning Accelerator on Arria 10

Convolutional neural nets (CNNs) have become a practical means to perfor...

Extreme Software Defined Radio – GHz in Real Time

Software defined radio is a widely accepted paradigm for design of recon...

Solving Large Top-K Graph Eigenproblems with a Memory and Compute-optimized FPGA Design

Large-scale eigenvalue computations on sparse matrices are a key compone...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.