Sparse Periodic Systolic Dataflow for Lowering Latency and Power Dissipation of Convolutional Neural Network Accelerators

06/30/2022
by   Jung Hwan Heo, et al.
0

This paper introduces the sparse periodic systolic (SPS) dataflow, which advances the state-of-the-art hardware accelerator for supporting lightweight neural networks. Specifically, the SPS dataflow enables a novel hardware design approach unlocked by an emergent pruning scheme, periodic pattern-based sparsity (PPS). By exploiting the regularity of PPS, our sparsity-aware compiler optimally reorders the weights and uses a simple indexing unit in hardware to create matches between the weights and activations. Through the compiler-hardware codesign, SPS dataflow enjoys higher degrees of parallelism while being free of the high indexing overhead and without model accuracy loss. Evaluated on popular benchmarks such as VGG and ResNet, the SPS dataflow and accompanying neural network compiler outperform prior work in convolutional neural network (CNN) accelerator designs targeting FPGA devices. Against other sparsity-supporting weight storage formats, SPS results in 4.49x energy efficiency gain while lowering storage requirements by 3.67x for total weight storage (non-pruned weights plus indexing) and 22,044x for indexing memory.

READ FULL TEXT
research
10/13/2020

High Area/Energy Efficiency RRAM CNN Accelerator with Kernel-Reordering Weight Mapping Scheme Based on Pattern Pruning

Resistive Random Access Memory (RRAM) is an emerging device for processi...
research
07/20/2020

HPIPE: Heterogeneous Layer-Pipelined and Sparse-Aware CNN Inference for FPGAs

We present both a novel Convolutional Neural Network (CNN) accelerator a...
research
07/16/2021

S2TA: Exploiting Structured Sparsity for Energy-Efficient Mobile CNN Acceleration

Exploiting sparsity is a key technique in accelerating quantized convolu...
research
01/19/2022

FAT: An In-Memory Accelerator with Fast Addition for Ternary Weight Neural Networks

Convolutional Neural Networks (CNNs) demonstrate great performance in va...
research
04/26/2018

Accelerator-Aware Pruning for Convolutional Neural Networks

Convolutional neural networks have shown tremendous performance in compu...
research
01/07/2021

BRDS: An FPGA-based LSTM Accelerator with Row-Balanced Dual-Ratio Sparsification

In this paper, first, a hardware-friendly pruning algorithm for reducing...
research
05/22/2023

HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity

Due to complex interactions among various deep neural network (DNN) opti...

Please sign up or login with your details

Forgot password? Click here to reset