A reduced-precision streaming SpMV architecture for Personalized PageRank on FPGA

09/22/2020
by   Alberto Parravicini, et al.
0

Sparse matrix-vector multiplication is often employed in many data-analytic workloads in which low latency and high throughput are more valuable than exact numerical convergence. FPGAs provide quick execution times while offering precise control over the accuracy of the results thanks to reduced-precision fixed-point arithmetic. In this work, we propose a novel streaming implementation of Coordinate Format (COO) sparse matrix-vector multiplication, and study its effectiveness when applied to the Personalized PageRank algorithm, a common building block of recommender systems in e-commerce websites and social networks. Our implementation achieves speedups up to 6x over a reference floating-point FPGA architecture and a state-of-the-art multi-threaded CPU implementation on 8 different data-sets, while preserving the numerical fidelity of the results and reaching up to 42x higher energy efficiency compared to the CPU implementation.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

10/29/2021

Design and implementation of an out-of-order execution engine of floating-point arithmetic operations

In this thesis, work is undertaken towards the design in hardware descri...
04/13/2021

MELOPPR: Software/Hardware Co-design for Memory-efficient Low-latency Personalized PageRank

Personalized PageRank (PPR) is a graph algorithm that evaluates the impo...
10/22/2021

Experience with PCIe streaming on FPGA for high throughput ML inferencing

Achieving maximum possible rate of inferencing with minimum hardware res...
10/23/2020

Efficient Floating-Point Givens Rotation Unit

High-throughput QR decomposition is a key operation in many advanced sig...
10/20/2020

Sparse Tucker Tensor Decomposition on a Hybrid FPGA-CPU Platform

Recommendation systems, social network analysis, medical imaging, and da...
04/05/2021

Near-Precise Parameter Approximation for Multiple Multiplications on A Single DSP Block

A multiply-accumulate (MAC) operation is the main computation unit for D...
05/12/2020

Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations

Personalized recommendations are the backbone machine learning (ML) algo...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.