S2TA: Exploiting Structured Sparsity for Energy-Efficient Mobile CNN Acceleration

07/16/2021
by   Zhi-Gang Liu, et al.
0

Exploiting sparsity is a key technique in accelerating quantized convolutional neural network (CNN) inference on mobile devices. Prior sparse CNN accelerators largely exploit un-structured sparsity and achieve significant speedups. Due to the unbounded, largely unpredictable sparsity patterns, however, exploiting unstructured sparsity requires complicated hardware design with significant energy and area overhead, which is particularly detrimental to mobile/IoT inference scenarios where energy and area efficiency are crucial. We propose to exploit structured sparsity, more specifically, Density Bound Block (DBB) sparsity for both weights and activations. DBB block tensors bound the maximum number of non-zeros per block. DBB thus exposes statically predictable sparsity patterns that enable lean sparsity-exploiting hardware. We propose new hardware primitives to implement DBB sparsity for (static) weights and (dynamic) activations, respectively, with very low overheads. Building on top of the primitives, we describe S2TA, a systolic array-based CNN accelerator that exploits joint weight and activation DBB sparsity and new dimensions of data reuse unavailable on the traditional systolic array. S2TA in 16nm achieves more than 2x speedup and energy reduction compared to a strong baseline of a systolic array with zero-value clock gating, over five popular CNN benchmarks. Compared to two recent non-systolic sparse accelerators, Eyeriss v2 (65nm) and SparTen (45nm), S2TA in 65nm uses about 2.2x and 3.1x less energy per inference, respectively.

READ FULL TEXT

page 4

page 5

research
09/04/2020

Sparse Systolic Tensor Array for Efficient CNN Hardware Acceleration

Convolutional neural network (CNN) inference on mobile devices demands e...
research
07/15/2023

PASS: Exploiting Post-Activation Sparsity in Streaming Architectures for CNN Acceleration

With the ever-growing popularity of Artificial Intelligence, there is an...
research
04/19/2021

RingCNN: Exploiting Algebraically-Sparse Ring Tensors for Energy-Efficient CNN-Based Computational Imaging

In the era of artificial intelligence, convolutional neural networks (CN...
research
05/16/2020

Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference

Convolutional neural network (CNN) inference on mobile devices demands e...
research
06/30/2022

Sparse Periodic Systolic Dataflow for Lowering Latency and Power Dissipation of Convolutional Neural Network Accelerators

This paper introduces the sparse periodic systolic (SPS) dataflow, which...
research
05/22/2023

HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity

Due to complex interactions among various deep neural network (DNN) opti...
research
03/27/2023

Maple: A Processing Element for Row-Wise Product Based Sparse Tensor Accelerators

Sparse tensor computing is a core computational part of numerous applica...

Please sign up or login with your details

Forgot password? Click here to reset