Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity

09/12/2023
by   Matteo Grimaldi, et al.
0

The demand for efficient processing of deep neural networks (DNNs) on embedded devices is a significant challenge limiting their deployment. Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference latency. It is known that unstructured sparsity results in lower accuracy degradation with respect to structured sparsity but the former needs extensive inference engine changes to get latency benefits. To tackle this challenge, we propose a solution to induce semi-structured activation sparsity exploitable through minor runtime modifications. To attain high speedup levels at inference time, we design a sparse training procedure with awareness of the final position of the activations while computing the General Matrix Multiplication (GEMM). We extensively evaluate the proposed solution across various models for image classification and object detection tasks. Remarkably, our approach yields a speed improvement of 1.25 × with a minimal accuracy drop of 1.1% for the ResNet18 model on the ImageNet dataset. Furthermore, when combined with a state-of-the-art structured pruning method, the resulting models provide a good latency-accuracy trade-off, outperforming models that solely employ structured pruning techniques.

READ FULL TEXT

page 3

page 4

page 7

research
09/15/2022

Training Recipe for N:M Structured Sparsity with Decaying Pruning Mask

Sparsity has become one of the promising methods to compress and acceler...
research
09/13/2019

DASNet: Dynamic Activation Sparsity for Neural Network Efficiency Improvement

To improve the execution speed and efficiency of neural networks in embe...
research
11/25/2022

Signed Binary Weight Networks

Efficient inference of Deep Neural Networks (DNNs) is essential to makin...
research
01/21/2022

Adaptive Activation-based Structured Pruning

Pruning is a promising approach to compress complex deep learning models...
research
10/21/2020

Adaptive Structured Sparse Network for Efficient CNNs with Feature Regularization

Neural networks have made great progress in pixel to pixel image process...
research
01/07/2019

GASL: Guided Attention for Sparsity Learning in Deep Neural Networks

The main goal of network pruning is imposing sparsity on the neural netw...
research
05/23/2023

Layer-adaptive Structured Pruning Guided by Latency

Structured pruning can simplify network architecture and improve inferen...

Please sign up or login with your details

Forgot password? Click here to reset