Direct Spatial Implementation of Sparse Matrix Multipliers for Reservoir Computing

01/21/2021
by   Matthew Denton, et al.
0

Reservoir computing systems rely on the recurrent multiplication of a very large, sparse, fixed matrix. We argue that direct spatial implementation of these fixed matrices minimizes the work performed in the computation, and allows for significant reduction in latency and power through constant propagation and logic minimization. Bit-serial arithmetic enables massive static matrices to be implemented. We present the structure of our bit-serial matrix multiplier, and evaluate using canonical signed digit representation to further reduce logic utilization. We have implemented these matrices on a large FPGA and provide a cost model that is simple and extensible. These FPGA implementations, on average, reduce latency by 50x up to 86x versus GPU libraries. Comparing against a recent sparse DNN accelerator, we measure a 4.1x to 47x reduction in latency depending on matrix dimension and sparsity. Throughput of the FPGA solution is also competitive for a wide range of matrix dimensions and batch sizes. Finally, we discuss ways these techniques could be deployed in ASICs, making them applicable for dynamic sparse matrix computations.

READ FULL TEXT

page 1

page 4

research
09/12/2018

An FPGA Implementation of a Time Delay Reservoir Using Stochastic Logic

This paper presents and demonstrates a stochastic logic time delay reser...
research
04/29/2020

Synergistic CPU-FPGA Acceleration of Sparse Linear Algebra

This paper describes REAP, a software-hardware approach that enables hig...
research
06/10/2023

RAMAN: A Re-configurable and Sparse tinyML Accelerator for Inference on Edge

Deep Neural Network (DNN) based inference at the edge is challenging as ...
research
09/28/2022

LL-GNN: Low Latency Graph Neural Networks on FPGAs for Particle Detectors

This work proposes a novel reconfigurable architecture for low latency G...
research
02/10/2020

Efficient Matrix Multiplication: The Sparse Power-of-2 Factorization

We present an algorithm to reduce the computational effort for the multi...
research
11/22/2020

Copernicus: Characterizing the Performance Implications of Compression Formats Used in Sparse Workloads

Sparse matrices are the key ingredients of several application domains, ...
research
03/10/2018

Efficient FPGA Implementation of Conjugate Gradient Methods for Laplacian System using HLS

In this paper, we study FPGA based pipelined and superscalar design of t...

Please sign up or login with your details

Forgot password? Click here to reset