RAMAN: A Re-configurable and Sparse tinyML Accelerator for Inference on Edge

06/10/2023
by   Adithya Krishna, et al.
0

Deep Neural Network (DNN) based inference at the edge is challenging as these compute and data-intensive algorithms need to be implemented at low cost and low power while meeting the latency constraints of the target applications. Sparsity, in both activations and weights inherent to DNNs, is a key knob to leverage. In this paper, we present RAMAN, a Re-configurable and spArse tinyML Accelerator for infereNce on edge, architected to exploit the sparsity to reduce area (storage), power as well as latency. RAMAN can be configured to support a wide range of DNN topologies - consisting of different convolution layer types and a range of layer parameters (feature-map size and the number of channels). RAMAN can also be configured to support accuracy vs power/latency tradeoffs using techniques deployed at compile-time and run-time. We present the salient features of the architecture, provide implementation results and compare the same with the state-of-the-art. RAMAN employs novel dataflow inspired by Gustavson's algorithm that has optimal input activation (IA) and output activation (OA) reuse to minimize memory access and the overall data movement cost. The dataflow allows RAMAN to locally reduce the partial sum (Psum) within a processing element array to eliminate the Psum writeback traffic. Additionally, we suggest a method to reduce peak activation memory by overlapping IA and OA on the same memory space, which can reduce storage requirements by up to 50 resource-constrained Efinix Ti60 FPGA with 37.2K LUTs and 8.6K register utilization. RAMAN processes all layers of the MobileNetV1 model at 98.47 GOp/s/W and the DS-CNN model at 79.68 GOp/s/W by leveraging both weight and activation sparsity.

READ FULL TEXT

page 1

page 5

page 10

page 16

page 17

page 21

page 24

page 25

research
06/05/2017

NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps

Convolutional neural networks (CNNs) have become the dominant neural net...
research
07/15/2021

Training for temporal sparsity in deep neural networks, application in video processing

Activation sparsity improves compute efficiency and resource utilization...
research
05/21/2021

Trimming Feature Extraction and Inference for MCU-based Edge NILM: a Systematic Approach

Non-Intrusive Load Monitoring (NILM) enables the disaggregation of the g...
research
09/12/2023

DSLOT-NN: Digit-Serial Left-to-Right Neural Network Accelerator

We propose a Digit-Serial Left-tO-righT (DSLOT) arithmetic based process...
research
01/21/2021

Direct Spatial Implementation of Sparse Matrix Multipliers for Reservoir Computing

Reservoir computing systems rely on the recurrent multiplication of a ve...
research
03/07/2022

Dynamic ConvNets on Tiny Devices via Nested Sparsity

This work introduces a new training and compression pipeline to build Ne...
research
04/19/2022

Seculator: A Fast and Secure Neural Processing Unit

Securing deep neural networks (DNNs) is a problem of significant interes...

Please sign up or login with your details

Forgot password? Click here to reset