Dynamically Reconfigurable Variable-precision Sparse-Dense Matrix Acceleration in Tensorflow Lite

04/17/2023
by   Jose Nunez-Yanez, et al.
0

In this paper, we present a dynamically reconfigurable hardware accelerator called FADES (Fused Architecture for DEnse and Sparse matrices). The FADES design offers multiple configuration options that trade off parallelism and complexity using a dataflow model to create four stages that read, compute, scale and write results. FADES is mapped to the programmable logic (PL) and integrated with the TensorFlow Lite inference engine running on the processing system (PS) of a heterogeneous SoC device. The accelerator is used to compute the tensor operations, while the dynamically reconfigurable approach can be used to switch precision between int8 and float modes. This dynamic reconfiguration enables better performance by allowing more cores to be mapped to the resource-constrained device and lower power consumption compared with supporting both arithmetic precisions simultaneously. We compare the proposed hardware with a high-performance systolic architecture for dense matrices obtaining 25 same technology. In sparse mode, we show that the core can outperform dense mode even at low sparsity levels, and a single-core achieves up to 20x acceleration over the software-optimized NEON RUY library.

READ FULL TEXT
research
04/26/2021

Capstan: A Vector RDA for Sparsity

This paper proposes Capstan: a scalable, parallel-patterns-based, reconf...
research
09/14/2022

Efficient Quantized Sparse Matrix Operations on Tensor Cores

The exponentially growing model size drives the continued success of dee...
research
08/01/2020

CuttleSys: Data-Driven Resource Management forInteractive Applications on Reconfigurable Multicores

Multi-tenancy for latency-critical applications leads to re-source inter...
research
11/23/2022

Cascade: An Application Pipelining Toolkit for Coarse-Grained Reconfigurable Arrays

While coarse-grained reconfigurable arrays (CGRAs) have emerged as promi...
research
07/31/2016

Data-Driven Background Subtraction Algorithm for in-Camera Acceleration in Thermal Imagery

Detection of moving objects in videos is a crucial step towards successf...
research
02/07/2021

CrossStack: A 3-D Reconfigurable RRAM Crossbar Inference Engine

Deep neural network inference accelerators are rapidly growing in import...
research
12/14/2016

Efficient Realization of Householder Transform through Algorithm-Architecture Co-design for Acceleration of QR Factorization

We present efficient realization of Householder Transform (HT) based QR ...

Please sign up or login with your details

Forgot password? Click here to reset