AutoSlim: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates

07/06/2019 ∙ by Ning Liu, et al. ∙ Northeastern University Syracuse University Beijing Didi Infinity Technology and Development Co., Ltd. 2

Structured weight pruning is a representative model compression technique of DNNs to reduce the storage and computation requirements and accelerate inference. An automatic hyperparameter determination process is necessary due to the large number of flexible hyperparameters. This work proposes AutoSlim, an automatic structured pruning framework with the following key performance improvements: (i) effectively incorporate the combination of structured pruning schemes in the automatic process; (ii) adopt the state-of-art ADMM-based structured weight pruning as the core algorithm, and propose an innovative additional purification step for further weight reduction without accuracy loss; and (iii) develop effective heuristic search method enhanced by experience-based guided search, replacing the prior deep reinforcement learning technique which has underlying incompatibility with the target pruning problem. Extensive experiments on CIFAR-10 and ImageNet datasets demonstrate that AutoSlim is the key to achieve ultra-high pruning rates on the number of weights and FLOPs that cannot be achieved before. As an example, AutoSlim outperforms the prior work on automatic model compression by up to 33× in pruning rate under the same accuracy. We release all models of this work at anonymous link: http://bit.ly/2VZ63dS.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 3

page 4

page 5

page 6

page 8

page 9

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The high computational and storage requirements of large-scale DNNs, such as VGG (1) or ResNet (2), make it prohibitive for broad, real-time applications at the mobile end. Model compression techniques have been proposed that aim at reducing both the storage and computational costs for DNN inference phase (3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15). One key model compression technique is DNN weight pruning (3; 4; 5; 6; 7; 8; 9; 10; 11; 14; 15) that reduces the number of weight parameters, with minor (or no) accuracy loss.

There are mainly two categories of weight pruning. The general, non-structured pruning (4; 6; 7; 10; 14; 15) can prune arbitrary weight in DNN. Despite the high pruning rate (weight reduction), it suffers from limited acceleration in actual hardware implementation due to the sparse weight matrix storage and associated indices (7; 3; 8). On the other hand, structured pruning (3; 5; 8; 11) can directly reduce the size of weight matrix while maintaining the form of a full matrix, without the need of indices. It is thus more compatible with hardware acceleration and has become the recent research focus. There are multiple types/schemes of structured pruning, e.g., filter pruning, channel pruning, and column pruning for CONV layers of DNN as summarized in (3; 4; 8; 11). Recently, a systematic solution framework (10; 11) has been developed based on the powerful optimization tool ADMM (Alternating Direction Methods of Multipliers) (16; 17; 18). It is applicable to different schemes of structured pruning (and non-structured one) and achieves state-of-art results (10; 11) by far.

The structured pruning problem of DNNs is flexible, comprising a large number of hyper-parameters, including the scheme of structured pruning and combination (for each layer), per-layer weight pruning rate, etc. Conventional hand-crafted policy has to explore the large design space for hyperparameter determination for weight or computation (FLOPs) reductions, with minimum accuracy loss. The trial-and-error process is highly time-consuming, and derived hyperparameters are usually sub-optimal. It is thus desirable to employ an automated process of hyperparameter determination for such structured pruning problem, motivated by the concept of AutoML (automated machine learning)

(19; 20; 21; 22; 23; 24; 25). Recent work AMC (9) employs the popular deep reinforcement learning (DRL) (19; 20) technique for automatic determination of per-layer pruning rates. However, it has limitations that (i) it employs an early weight pruning technique based on fixed regularization, and (ii) it only considers filter pruning for structured pruning. As we shall see later, the underlying incompatibility between the utilized DRL framework with the problem further limits its ability to achieve high weight pruning rates (the maximum reported pruning rate in (9) is only 5 and is non-structured pruning).

This work makes the following innovative contributions in the automatic hyperparameter determination process for DNN structured pruning. First, we analyze such automatic process in details and extract the generic flow, with four steps: (i) action sampling, (ii) quick action evaluation, (iii) decision making, and (iv) actual pruning and result generation. Next, we identify three sources of performance improvement compared with prior work. We adopt the ADMM-based structured weight pruning algorithm as the core algorithm, and propose an innovative additional purification step for further weight reduction without accuracy loss. Furthermore, we found that the DRL framework has underlying incompatibility with the characteristics of the target pruning problem, and conclude that such issues can be mitigated simultaneously using effective heuristic search method enhanced by experience-based guided search.

Combining all the improvements results in our automatic framework AutoSlim, which outperforms the prior work on automatic model compression by up to 33 in pruning rate under the same accuracy. Through extensive experiments on CIFAR-10 and ImageNet datasets, we conclude that AutoSlim is the key to achieve ultra-high pruning rates on the number of weights and FLOPs that cannot be achieved before, while DRL cannot compete with human experts to achieve high pruning rates. We release codes and all models of this work at anonymous link: http://bit.ly/2VZ63dS.

Figure 1: Different structured pruning schemes: A filter-based view and a GEMM view.

2 Related Work

DNN Weight Pruning and Structured Pruning: DNN weight pruning includes two major categories: the general, non-structured pruning (4; 6; 7; 10; 14; 15) where arbitrary weight can be pruned, and structured pruning (3; 4; 5; 8; 11) that maintains certain regularity. Non-structured pruning can result in a higher pruning rate (weight reduction). However, as weight storage is in a sparse matrix format with indices, it often results in performance degradation in highly parallel implementations like GPUs. This limitation can be overcome in structured weight pruning.

Figure 1 illustrates three structured pruning schemes on the CONV layers of DNN: filter pruning, channel pruning, and filter-shape pruning (a.k.a. column pruning

), removing whole filter(s), channel(s), and the same location in each filter in each layer. CONV operations in DNNs are commonly transformed to matrix multiplications by converting weight tensors and feature map tensors to matrices

(3), named general matrix multiplication (GEMM). The key advantage of structured pruning is that a full matrix will be maintained in GEMM with dimensionality reduction, without the need of indices, thereby facilitating hardware implementations.

It is also worth mentioning that filter pruning and channel pruning are correlated (8), as pruning a filter in layer (after batch norm) results in the removal of corresponding channel in layer . The relationship in ResNet (2) and MobileNet (26) will be more complicated due to bypass links.

ADMM:

Alternating Direction Method of Multipliers (ADMM) is a powerful mathematical optimization technique, by decomposing an original problem into two subproblems that can be solved separately and efficiently

(16). Consider the general optimization problem . In ADMM, it is decomposed into two subproblems on and ( is an auxiliary variable), to be solved iteratively until convergence. The first subproblem derives given : . The second subproblem derives given : . Both and are quadratic functions.

As a key property, ADMM can effectively deal with a subset of combinatorial constraints and yield optimal (or at least high quality) solutions. The associated constraints in DNN weight pruning (both non-structured and structured) belong to this subset (27; 28). In DNN weight pruning problem,

is loss function of DNN and the first subproblem is DNN training with dynamic regularization, which can be solved using current gradient descent techniques and solution tools

(29; 30) for DNN training. corresponds to the combinatorial constraints on the number of weights. As the result of the compatibility with ADMM, the second subproblem has optimal, analytical solution for weight pruning via Euclidean projection. This solution framework applies both to non-structured and different variations of structured pruning schemes.

AutoML: Many recent work have investigated the concept of automated machine learning (AutoML), i.e., using machine learning for hyperparameter determination in DNNs. Neural architecture search (NAS) (19; 20; 25) is an representative application of AutoML. NAS has been deployed in Google’s Cloud AutoML framework, which frees customers from the time-consuming DNN architecture design process. The most related prior work, AMC (9), applies AutoML for DNN weight pruning, leveraging a similar DRL framework as Google AutoML to generate weight pruning rate for each layer of the target DNN. In conventional machine learning methods, the overall performance (accuracy) depends greatly on the quality of features (31)

. To reduce the burdensome manual feature selection process, automated feature engineering

(32) learns to generate appropriate feature set in order to improve the performance of corresponding machine learning tools.

3 The Proposed AutoSlim Framework for DNN Structured Pruning

Given a pretrained DNN or predefined DNN structure, the automatic hyperparameter determination process will decide the per-layer weight pruning rate, and type (and possible combination) of structured pruning scheme per layer. The objective is the maximum reduction in the number of weights or FLOPs, with minimum accuracy loss.

3.1 Automatic Process: Generic Flow and Key Steps

Figure 2: The generic flow of automatic hyperparameter determination framework, and sources of performance improvements.

Figure 2 illustrates the generic flow of such automatic process, which applies to both AutoSlim and the prior work AMC. Here we call a sample selection of hyperparamters an “action" for compatibility with DRL. The flow has the following steps: (i) action sampling, (ii) quick action evaluation, (iii) decision making, and (iv) actual pruning and result generation. Due to the high search space of hyperparameters, steps (i) and (ii) should be fast. This is especially important for step (ii), in that we cannot employ the time-consuming, retraining based weight pruning (e.g., fixed regularization (3; 8) or ADMM-based techniques) to evaluate the actual accuracy loss. Instead, we can only use simple heuristic, e.g., eliminating a pre-defined portion (based on the chosen hyperparameters) of weights with least magnitudes for each layer, and evaluating the accuracy. This is similar to (9). Step (iii) makes decision on the hyperparameter values based on the collection of action samples and evaluations. Step (iv) generates the pruning result, and the optimized (core) algorithm for structured weight pruning will be employed here. Here the algorithm can be more complicated with higher performance (e.g., the ADMM-based one), as it is only performed once in each round.

The overall automatic process is often iterative, and the above steps (i) through (iv) reflect only one round. The reason is that it is difficult to search for high pruning rates in one single round, and the overall weight pruning process will be progressive. This applies to both AMC and AutoSlim. The number of rounds is 4 - 8 in AutoSlim for fair comparison. Note that AutoSlim supports flexible number of progressive rounds to achieve the maximum weight/FLOPs reduction given accuracy requirement (or with zero accuracy loss).

3.2 Motivation: Sources of Performance Improvements

Based on the generic flow, we identify three sources of performance improvement (in terms of pruning rate, accuracy, etc.) compared with prior work. The first is the structured pruning scheme. Our observation is that an effective combination of filter pruning (which is correlated with channel pruning) and column pruning will perform better compared with filter pruning alone (as employed in AMC (9)). Comparison results are shown in Section 4. This is because of the high flexibility in column pruning, while maintaining the hardware-friendly full matrix format in GEMM. The second is the core algorithm for structured weight pruning in Step (iv). We adopt the state-of-art ADMM-based weight pruning algorithm in this step. Furthermore, we propose further improvement of a purification step on the ADMM-based algorithm taking advantages of the special characteristics after ADMM regularization. In the following Section 3.3 and 3.4, we will discuss the core algorithm and the proposed purification step, respectively.

The third source of improvement is the underlying principle of action sampling (Step (i)) and decision making (Step (iii)). The DRL-based framework in (9)

adopts an exploration vs. exploitation-based search for action sampling. For Step (iii), it trains a neural network using action samples and fast evaluations, and uses the neural network to make decision on hyperparameter values. Our hypothesis is that DRL is inherently incompatible with the target automatic process, and can be easily outperformed by effective heuristic search methods (such as simulated annealing or genetic algorithm), especially the enhanced versions. More specifically, the DRL-based framework adopted in

(9) is difficult to achieve high pruning rates (the maximum pruning rate in (9) is only 5 and is on non-structured pruning), due to the following reasons.

First

, the sample actions in DRL are generated in a randomized manner, and are evaluated (Step (ii)) using very simple heuristic. As a result, these action samples and evaluation results (rewards) are just rough estimations. When training a neural network and relying on it for making decisions, it will hardly generate satisfactory decisions especially for high pruning rates.

Second, there is a common limitation of reinforcement learning technique (both basic one and DRL) on optimization problem with constraints (33; 34; 35; 36). As pruning rates cannot be set as hard constraints in DRL, it has to adopt a composite reward function with both accuracy loss and weight No./FLOPs reduction. This is the source of issue in controllability, as the relative strength of accuracy loss and weight reduction is very different for small pruning rates (the first couple of rounds) and high pruning rates (the latter rounds). Then there is the paradox of using a single reward function in DRL (hard to satisfy the requirement throughout pruning process) or multiple reward functions (how many? how to adjust the parameters?). Third, it is difficult for DRL to support flexible and adaptive number of rounds in the automatic process to achieve the maximum pruning rates. As different DNNs have vastly different degrees of compression, it is challenging to achieve the best weight/FLOPs reduction with a fixed, predefined number of rounds. These can be observed in Section 4 on the difficulty of DRL to achieve high pruning rates. As these issues can be mitigated by effective heuristic search, we emphasize that an additional benefit of heuristic search is the ability to perform guided search based on prior human experience. In fact, the DRL research also tries to learn from heuristic search methods in this aspect for action sampling (37; 38; 39; 40), but the generality is still not widely evaluated.

3.3 Core Algorithm for Structured Weight Pruning

This work adopts the ADMM-based weight pruning algorithm (10; 11) as the core algorithm, which generates state-of-art results in both non-structured and structured weight pruning. Details are in (10; 11; 16; 17; 18). The major step in the algorithm is ADMM regularization. Consider a general DNN with loss function , where and correspond to the collections of weights and biases in layer , respectively. The overall (structured) weight pruning problem is defined as

(1)

where reflects the requirement that remaining weights in layer satisfy predefined “structures". Please refer to (3; 8) for more details.

By defining (i) indicator functions , (ii) incorporating auxiliary variable and dual variable , (iii) adopting augmented Lagrangian (16), the ADMM regularization decomposes the overall problem into two subproblems, and iteratively solved them until convergence. The first subproblem is It can be solved using current gradient descent techniques and solution tools for DNN training. The second subproblem is , which can be optimally solved as Euclidean mapping.

Overall speaking, ADMM regularization is a dynamic regularization where the regularization target is dynamically adjusted in each iteration, without penalty on all the weights. This is the reason that ADMM regularization outperforms prior work of fixed , regularization or projected gradient descent (PGD). To further enhance the convergence rate, the multi- method (41) is adopted in ADMM regularization, where the values will gradually increase with ADMM iterations.

3.4 Purification and Unused Weights Removal Step

After ADMM-based structured weight pruning, we propose the purification and unused weights removal step for further weight reduction without accuracy loss. First, as also noticed by prior work (8), a specific filter in layer is responsible for generating one channel in layer . As a result, removing the filter in layer (in fact removing the batch norm results) also results in the removal of the corresponding channel, thereby achieving further weight reduction. Besides this straightforward procedure, there is further margin of weight reduction based on the characteristics of ADMM regularization. As ADMM regularization is essentially a dynamic, -norm based regularization procedure, there are a large number of non-zero, small weight values after regularization. Due to the non-convex property in ADMM regularization, our observation is that removing these weights can maintain the accuracy or even slightly improve the accuracy occasionally. As a result, we define two thresholds, a column-wise threshold and a filter-wise threshold, for each DNN layer. When the norm of a column (or filter) of weights is below the threshold, the column (or filter) will be removed. Also the corresponding channel in layer can be removed upon filter removal in layer . Structures in each DNN layer will be maintained after this purification step.

These two threshold values are layer-specific, depending on the relative weight values of each layer, and the sensitivity on overall accuracy. They are hyperparameters to be determined for each layer in the AutoSlim framework, for maximum weight/FLOPs reduction without accuracy loss.

3.5 The Overall AutoSlim Framework for Structured Weight Pruning and Purification

In this section, we discuss the AutoSlim framework based on the enhanced, guided heuristic search method, in which the automatic process determines per-layer weight pruning rates, structured pruning schemes (and combinations), as well as hyperparameters in the purification step (discussed in Section 3.4). The overall framework has two phases as shown in Figure 3: Phase I for structured weight pruning based on ADMM, and Phase II for the purification step. Each phase has multiple progressive rounds as discussed in Section 3.1, in which the weight pruning result from the previous round serves as the starting point of the subsequent round. We use Phase I as illustrative example, and Phase II uses the similar steps.

Figure 3: Illustration of the AutoSlim framework.

The AutoSlim framework supports flexible number of progressive rounds, as well as hard constraints on the weight or FLOPs reduction. In this way, it aims to achieve the maximum weight or FLOPs reduction while maintaining accuracy (or satisfying accuracy requirement). For each round

, we set the overall reduction in weight number/FLOPs to be a factor of 2 (with a small variance), based on the result from the previous round. In this way, we can achieve around

weight/FLOPs reduction within 2 rounds, already outperforming the reported structured pruning results in prior work (9).

We leverage a classical heuristic search technique simulated annealing (SA), with enhancement on guided search based on prior experience. The enhanced SA technique is based on the observation that a DNN layer with more number of weights often has a higher degree of model compression with less impact on overall accuracy. The basic idea of SA is in the search for actions: When a perturbation on the candidate action results in better evaluation result (Step (ii) in Figure 2

), the perturbation will be accepted; otherwise the perturbation will be accepted with a probability depending on the degradation in evaluation result, as well as a temperature

. The reason is to avoid being trapped in local minimum in the search process. The temperature will gradually decrease during the search process, in analogy to the physical “annealing" process.

REQUIRE: Initial (unpruned) DNN model or DNN structure.

for each progressive round  do
      Initialize the action with partitioning of structured pruning schemes and pruning rate , satisfying the heuristic constraint.
      while  > stop temperature do
            for iteration  do
                 Generate perturbation (magnitude decreases with ) on action, satisfying the heuristic constraint.
                 Perform fast evaluation on the perturbation.
                 if better evaluation result (higher accuracy) then
                       Accept the perturbation.
                 else
                       Accept with probability , where is increase in evaluation cost (accuracy loss).                              
            Cool down .       
      The action outcome becomes the decision of hyperparameter values.
      Perform ADMM-based structured pruning to generate pruning result, for the next round.
Algorithm 1 AutoSlim Framework for Structured Weight Pruning (similar process also applies to purification).

Given the overall pruning rate (on weight No. or FLOPs) in the current round, we initialize a randomized action using the following process: i) order all layers based on the number of remaining weights, ii) assign a randomized pruning rate (and partition between filter and column pruning schemes) for each layer, satisfying that a layer with more weights will have no less pruning rate, and iii) normalize the pruning rates by . We also have a high initialized temperature . We define perturbation as the change of weight pruning rates (and portion of structured pruning schemes) in a subset of DNN layers. The perturbation will also satisfy the requirement that the layer will more remaining weights will have a higher pruning rate. The result evaluation is the fast evaluation introduced in Section 3.1. The acceptance/denial of action perturbation, the degradation in temperature , and the associated reduction in the degree of perturbation with follow the SA rules until convergence. The action outcome will become the decision of hyperparameter values (Step (iii), this is different from DRL which trains a neural network). The ADMM-based structured pruning will be adopted to generate pruning result (Step (iv)), possibly for the next round until final result.

4 Evaluation, Experimental Results, and Discussions

In this section, the effectiveness of AutoSlim is evaluated on VGG-16 and ResNet-18 on CIFAR-10 dataset, and VGG-16 and ResNet-18/50 on ImageNet dataset. We focus on the structured pruning on CONV layers, which are the most computationally intensive layers in DNNs and the major storage in state-of-art DNNs such as ResNet. We focus on two objective functions: reduction in the number of weight parameters and computation (FLOPs). The implementations are based on PyTorch

(42)

. In the ADMM-based structured pruning algorithm, the number of epochs in each progressive round is 200, which is lower than the prior iterative pruning and retraining heuristic

(7). We use an initial penalty parameter for ADMM and initial learning rate . The ADAM (29) optimizer is utilized. In the SA setup, we use cooling factor and Boltzmann’s constant . The initial probability of accepting high energy (bad) moves is set to be relatively high.

We aim at fair and comprehensive evaluation on the effectiveness of three sources of performance improvements discussed in Section 3.2. In order to illustrate the effect of individual improvement source, we design a number of combinations of techniques in the experiments. In the source of structured pruning scheme, we compare between filter pruning only vs. combined structured pruning (abbr. Fil vs. Comb). In the source of core algorithm, we compare between fixed regularization vs. ADMM (abbr. Fix vs. ADMM). In action sampling and decision making framework, we compare among DRL, manual optimization, and enhanced SA (abbr. DRL vs. Man vs. SA). For example, the configuration Comb-ADMM-SA reflects the full AutoSlim, while Fil-Fix-DRL reflects AMC (9).

Through extensive experiments, we conclude that AutoSlim is the key to achieve ultra-high pruning rates on the number of weights and FLOPs that cannot be achieved before, while DRL cannot compete with human experts to achieve high structured pruning rates.

4.1 Results and Discussions on CIFAR-10 Dataset

Figure 4: Detailed per-layer portion of pruned weights on VGG-16 for CIFAR-10 under two objective functions Params# and FLOPs#.

Table 1 illustrates the comparison results on weight reduction and FLOPs reduction on VGG-16 for CIFAR-10 dataset, while Table 2 shows the results on ResNet-18. The proposed AutoSlim framework (Comb-ADMM-SA) uses two objective functions: reducing the number of weight parameters or FLOPs. For VGG-16, compared to the prior work 2PFPCE (5) (Fil-Fix-Man) with 4 weight reduction, AutoSlim achieves 61.1 weight reduction under the same accuracy, a 15.3 improvement. For ResNet-18, compared to the prior work AMC (9) (Fil-Fix-DRL) with 1.7 weight reduction, AutoSlim achieves 61.2 reduction under the same accuracy, a significant improvement as high as 33. When accounting for the different number of parameters in ResNet-18 and ResNet-50 (AMC), the improvement can be even perceived as 120. In fact, it implies that the high redundancy of DNNs on CIFAR-10 dataset has not been exploited in prior work.

AutoSlim outperforms the prior work due to all three sources of improvement. We perform more comparisons to show the improvement due to enhanced SA compared with DRL and manual hyperparameter optimization. More specifically, we compare between AutoSlim (Comb-ADMM-SA) with configurations Comb-ADMM-Man and Comb-ADMM-DRL at (about) the same accuracy. As can be observed in the two tables, AutoSlim achieves a moderate improvement in pruning rate compared with manual hyperparameter optimization, but significantly outperforms DRL-based framework (all other sources of improvement are the same). This demonstrates the statement that DRL is not compatible with ultra-high pruning rates. For relatively small pruning rates, it appears that DRL can hardly outperform manual process as well, as the improvement over 2PFPCE (Fil-Fix-Man) is less compared with the improvement over AMC (Fil-Fix-DRL).

width=center Method Acc. Params Rt. FLOPs Rt. Objective 2PFPCE (5) 92.8% 4 N/A N/A Comb-ADMM-Man 93.26% 44.3 8.1 N/A AutoSlim 93.21% 52.2 8.8 Params# AutoSlim 92.72% 61.1 10.6 Params# AutoSlim 92.65% 59.1 10.8 FLOPs# AutoSlim 92.79% 51.3 9.1 FLOPs#

Table 2: Comparison results on ResNet-18 (ResNet-50 in AMC) for CIFAR-10 dataset.

width=center Method Acc. Params Rt FLOPs Red. Objective AMC (9) 93.5% 1.7 N/A N/A Comb-ADMM-DRL 93.55% 11.8 3.8 Params# Comb-ADMM-Man 93.69% 43.3 9.6 N/A AutoSlim 93.75% 55.6 12.3 Params# AutoSlim 93.43% 61.2 13.3 Params# AutoSlim 92.98% 80.8 17.2 Params# AutoSlim 93.81% 54.2 12.2 FLOPs#

Table 1: Comparison results on VGG-16 for CIFAR-10 dataset.

Next, we compare between two objectives: weight and FLOPs reductions. Figure 4 reveals the portion of pruned weights per layer on VGG-16 for CIFAR-10 by Params# and FLOPs# search objectives. One can only observe slight difference in the portion of pruned weights per layer. This is because further weight reduction in the first several layers results in significant accuracy degradation. The somewhat convergence in results using two objectives seems to be one characteristic under ultra-high pruning rates.

4.2 Results and Discussions on ImageNet Dataset

In this subsection, we show the application of AutoSlim on ImageNet dataset, and more comparison results with Fil-ADMM-SA (showing the first source of improvement). Table 4 and Table 4 show the comparison results on VGG-16 and ResNet-18 (ResNet-50) structured pruning on ImageNet dataset, respectively. We can clearly see the advantage of AutoSlim over prior work, such as (8) (Fil-Fix-Man), AMC (9) (Fil-Fix-DRL), and ThiNet (43) (Fil-Fix-Man). We can also see the advantage of AutoSlim over manual hyperparameter determination (Comb-ADMM-Man), improving from 2.7 to 3.3 structured pruning rates on ResNet-18 (ResNet-50) under the same (Top-5) accuracy. Finally, AutoSlim also outperforms filter pruning only (Fil-ADMM-SA), improvement from 3.8 to 6.4 structured pruning rates on VGG-16 under the same (Top-5) accuracy. This demonstrates the advantage of combined filter and column pruning compared with filter pruning only, when the other sources of improvement are the same. Besides, our Fil-ADMM-SA also outperforms prior work (Fil-Fix-Man) and (Fil-Fix-DRL), demonstrating the advantage of the proposed AutoSlim framework.

width=center Method Top-5 Acc. Loss Params Rt. Objective Filter pruning (8) 1.7% 4 N/A AMC (9) 1.4% 4 NA Fil-ADMM-SA 0.6% 3.8 Params# AutoSlim 0.6% 6.4 Params#

Table 4: Comparison results on ResNet-18 (ResNet-50) for the ImageNet dataset.

width=center Method Top-5 Acc. Loss Params Rt. Objective ThiNet (43) 1.1% 2 N/A Comb-ADMM-Man 0.1% 2.7 N/A AutoSlim 0.1% 3.3 Params#

Table 3: Comparison results on VGG-16 for the ImageNet dataset.

Last but not least, the proposed AutoSlim framework can also be applied to non-structured pruning. For non-structured pruning on ResNet-50 model for ImageNet dataset, AutoSlim results in 9.2 non-structured pruning rate on CONV layers without accuracy loss (92.7% Top-5 accuracy), which outperforms manual hyperparameter optimization with ADMM-based pruning (8 pruning rate) and prior work AMC (4.8 pruning rate).

width=0.5center Method Top-5 Acc. Loss Params Rt. Objective AMC (9) 0% 4.8 N/A ADMM-Man 0% 8.0 N/A AutoSlim 0% 9.2 Params# AutoSlim 0.7% 17.4 Params#

Table 5: Comparison results on non-structured weight pruning on ResNet-50 using ImageNet dataset.

5 Conclusion

This work proposes AutoSlim, an automatic structured pruning framework with the following key performance improvements: (i) effectively incorporate the combination of structured pruning schemes in the automatic process; (ii) adopt the state-of-art ADMM-based structured weight pruning as the core algorithm, and propose an innovative additional purification step for further weight reduction without accuracy loss; and (iii) develop effective heuristic search method enhanced by experience-based guided search, replacing the prior deep reinforcement learning technique which has underlying incompatibility with the target pruning problem. Extensive experiments on CIFAR-10 and ImageNet datasets demonstrate that AutoSlim is the key to achieve ultra-high pruning rates on the number of weights and FLOPs that cannot be achieved before.

References