Layer-adaptive Structured Pruning Guided by Latency

05/23/2023
by   Siyuan Pan, et al.
0

Structured pruning can simplify network architecture and improve inference speed. Combined with the underlying hardware and inference engine in which the final model is deployed, better results can be obtained by using latency collaborative loss function to guide network pruning together. Existing pruning methods that optimize latency have demonstrated leading performance, however, they often overlook the hardware features and connection in the network. To address this problem, we propose a global importance score SP-LAMP(Structured Pruning Layer-Adaptive Magnitude-based Pruning) by deriving a global importance score LAMP from unstructured pruning to structured pruning. In SP-LAMP, each layer includes a filter with an SP-LAMP score of 1, and the remaining filters are grouped. We utilize a group knapsack solver to maximize the SP-LAMP score under latency constraints. In addition, we improve the strategy of collect the latency to make it more accurate. In particular, for ResNet50/ResNet18 on ImageNet and CIFAR10, SP-LAMP is 1.28x/8.45x faster with +1.7 accuracy changed, respectively. Experimental results in ResNet56 on CIFAR10 demonstrate that our algorithm achieves lower latency compared to alternative approaches while ensuring accuracy and FLOPs.

READ FULL TEXT
research
10/20/2021

HALP: Hardware-Aware Latency Pruning

Structural pruning can simplify network architecture and improve inferen...
research
10/13/2022

Structural Pruning via Latency-Saliency Knapsack

Structural pruning can simplify network architecture and improve inferen...
research
07/11/2020

To filter prune, or to layer prune, that is the question

Recent advances in pruning of neural networks have made it possible to r...
research
07/04/2020

Weight-dependent Gates for Network Pruning

In this paper, we propose a simple and effective network pruning framewo...
research
03/14/2023

Automatic Attention Pruning: Improving and Automating Model Pruning using Attentions

Pruning is a promising approach to compress deep learning models in orde...
research
09/12/2023

Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity

The demand for efficient processing of deep neural networks (DNNs) on em...
research
04/22/2022

Depth Pruning with Auxiliary Networks for TinyML

Pruning is a neural network optimization technique that sacrifices accur...

Please sign up or login with your details

Forgot password? Click here to reset