Automatic Attention Pruning: Improving and Automating Model Pruning using Attentions

03/14/2023
by   Kaiqi Zhao, et al.
0

Pruning is a promising approach to compress deep learning models in order to deploy them on resource-constrained edge devices. However, many existing pruning solutions are based on unstructured pruning, which yields models that cannot efficiently run on commodity hardware; and they often require users to manually explore and tune the pruning process, which is time-consuming and often leads to sub-optimal results. To address these limitations, this paper presents Automatic Attention Pruning (AAP), an adaptive, attention-based, structured pruning approach to automatically generate small, accurate, and hardware-efficient models that meet user objectives. First, it proposes iterative structured pruning using activation-based attention maps to effectively identify and prune unimportant filters. Then, it proposes adaptive pruning policies for automatically meeting the pruning objectives of accuracy-critical, memory-constrained, and latency-sensitive tasks. A comprehensive evaluation shows that AAP substantially outperforms the state-of-the-art structured pruning works for a variety of model architectures. Our code is at: https://github.com/kaiqi123/Automatic-Attention-Pruning.git.

READ FULL TEXT

page 4

page 15

page 16

research
01/21/2022

Adaptive Activation-based Structured Pruning

Pruning is a promising approach to compress complex deep learning models...
research
01/22/2022

Iterative Activation-based Structured Pruning

Deploying complex deep learning models on edge devices is challenging be...
research
03/01/2023

Structured Pruning for Deep Convolutional Neural Networks: A survey

The remarkable performance of deep Convolutional neural networks (CNNs) ...
research
05/31/2022

ViNNPruner: Visual Interactive Pruning for Deep Learning

Neural networks grow vastly in size to tackle more sophisticated tasks. ...
research
09/21/2023

CoMFLP: Correlation Measure based Fast Search on ASR Layer Pruning

Transformer-based speech recognition (ASR) model with deep layers exhibi...
research
06/29/2022

Cut Inner Layers: A Structured Pruning Strategy for Efficient U-Net GANs

Pruning effectively compresses overparameterized models. Despite the suc...
research
05/23/2023

Layer-adaptive Structured Pruning Guided by Latency

Structured pruning can simplify network architecture and improve inferen...

Please sign up or login with your details

Forgot password? Click here to reset