An FPGA cached sparse matrix vector product (SpMV) for unstructured computational fluid dynamics simulations

07/24/2021
by   Guillermo Oyarzun, et al.
0

Field Programmable Gate Arrays generate algorithmic specific architectures that improve the code's FLOP per watt ratio. Such devices are re-gaining interest due to the rise of new tools that facilitate their programming, such as OmpSs. The computational fluid dynamics community is always investigating new architectures that can improve its algorithm's performance. Commonly, those algorithms have a low arithmetic intensity and only reach a small percentage of the peak performance. The sparse matrix-vector multiplication is one of the most time-consuming operations on unstructured simulations. The matrix's sparsity pattern determines the indirect memory accesses of the multiplying vector. This data path is hard to predict, making traditional implementations fail. In this work, we present an FPGA architecture that maximizes the vector's re-usability by introducing a cache-like architecture. The cache is implemented as a circular list that maintains the BRAM vector components while needed. Following this strategy, up to 16 times of acceleration is obtained compared to a naive implementation of the algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2020

Performance Modeling of Streaming Kernels and Sparse Matrix-Vector Multiplication on A64FX

The A64FX CPU powers the current number one supercomputer on the Top500 ...
research
06/15/2018

FPGA acceleration of Model Predictive Control for Iter Plasma current and shape control

A faster implementation of the Quadratic Programming (QP) solver used in...
research
01/12/2023

RAD-Sim: Rapid Architecture Exploration for Novel Reconfigurable Acceleration Devices

With the continued growth in field-programmable gate array (FPGA) capaci...
research
12/14/2018

Impact of Traditional Sparse Optimizations on a Migratory Thread Architecture

Achieving high performance for sparse applications is challenging due to...
research
01/17/2022

Low hardware consumption, resolution-configurable Gray code oscillator time-to-digital converters implemented in 16nm, 20nm and 28nm FPGAs

This paper presents a low hardware consumption, resolution-configurable,...
research
09/22/2020

A reduced-precision streaming SpMV architecture for Personalized PageRank on FPGA

Sparse matrix-vector multiplication is often employed in many data-analy...
research
10/28/2016

Performance evaluation of explicit finite difference algorithms with varying amounts of computational and memory intensity

Future architectures designed to deliver exascale performance motivate t...

Please sign up or login with your details

Forgot password? Click here to reset