Performance-aware Approximation of Global Channel Pruning for Multitask CNNs

03/21/2023
by   Hancheng Ye, et al.
0

Global channel pruning (GCP) aims to remove a subset of channels (filters) across different layers from a deep model without hurting the performance. Previous works focus on either single task model pruning or simply adapting it to multitask scenario, and still face the following problems when handling multitask pruning: 1) Due to the task mismatch, a well-pruned backbone for classification task focuses on preserving filters that can extract category-sensitive information, causing filters that may be useful for other tasks to be pruned during the backbone pruning stage; 2) For multitask predictions, different filters within or between layers are more closely related and interacted than that for single task prediction, making multitask pruning more difficult. Therefore, aiming at multitask model compression, we propose a Performance-Aware Global Channel Pruning (PAGCP) framework. We first theoretically present the objective for achieving superior GCP, by considering the joint saliency of filters from intra- and inter-layers. Then a sequentially greedy pruning strategy is proposed to optimize the objective, where a performance-aware oracle criterion is developed to evaluate sensitivity of filters to each task and preserve the globally most task-related filters. Experiments on several multitask datasets show that the proposed PAGCP can reduce the FLOPs and parameters by over 60 achieves 1.2x∼3.3x acceleration on both cloud and mobile platforms.

READ FULL TEXT

page 2

page 3

page 5

page 16

page 20

page 21

page 22

research
12/10/2021

Pruning Pretrained Encoders with a Multitask Objective

The sizes of pretrained language models make them challenging and expens...
research
05/16/2019

Investigating Channel Pruning through Structural Redundancy Reduction - A Statistical Study

Most existing channel pruning methods formulate the pruning task from a ...
research
06/09/2022

DiSparse: Disentangled Sparsification for Multitask Model Compression

Despite the popularity of Model Compression and Multitask Learning, how ...
research
06/29/2022

Cut Inner Layers: A Structured Pruning Strategy for Efficient U-Net GANs

Pruning effectively compresses overparameterized models. Despite the suc...
research
01/15/2020

A "Network Pruning Network" Approach to Deep Model Compression

We present a filter pruning approach for deep model compression, using a...
research
03/18/2020

MINT: Deep Network Compression via Mutual Information-based Neuron Trimming

Most approaches to deep neural network compression via pruning either ev...
research
03/12/2020

SASL: Saliency-Adaptive Sparsity Learning for Neural Network Acceleration

Accelerating the inference speed of CNNs is critical to their deployment...

Please sign up or login with your details

Forgot password? Click here to reset