Architecture-aware Network Pruning for Vision Quality Applications

08/05/2019 ∙ by Wei-Ting Wang, et al. ∙ 6

Convolutional neural network (CNN) delivers impressive achievements in computer vision and machine learning field. However, CNN incurs high computational complexity, especially for vision quality applications because of large image resolution. In this paper, we propose an iterative architecture-aware pruning algorithm with adaptive magnitude threshold while cooperating with quality-metric measurement simultaneously. We show the performance improvement applied on vision quality applications and provide comprehensive analysis with flexible pruning configuration. With the proposed method, the Multiply-Accumulate (MAC) of state-of-the-art low-light imaging (SID) and super-resolution (EDSR) are reduced by 58 drop, respectively. The memory bandwidth (BW) requirements of convolutional layer can be also reduced by 20



There are no comments yet.


page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

CNN is adopted as an essential ingredients in computer vision and machine learning areas [13, 6, 17]. Vision perception tasks including image classification, object detection and semantic segmentation are comprehensively investigated and associated with CNN. Even in image processing field such as super resolution, high dynamic range imaging and de-noising, CNN has progressive and promising improvement on image quality in recent years [4, 16].

However, compared to perception tasks, it requires higher computational complexity and BW requirements for vision quality tasks. MobileNetV1 [11]

is designed with 569M MAC for ImageNet classification. On the other hand, in low-light photography SID

[4] and super-resolution EDSR [16], it takes 560G MAC and 1.4T MAC per inference, respectively. It is more challenge to deploy CNN models on mobile devices for vision quality applications.

Network pruning [21] is an effective methodology toward performance optimization. Sparsity is defined as the ratio of the number of zero weights divided by the number of total weights. Better pruning algorithm delivers higher sparsity and reduces more MAC and BW correspondingly. However, quality drop is one of the major challenges in network pruning. Fig. 1 shows visible defect on SID even with only 0.1 PSNR degradation.

In this paper, we propose architecture-aware pruning to maximize sparsity and MAC reduction without quality-metric (PSNR or SSIM) drops. We also analyze the effects of MAC and BW reduction with different configurations associating with pruned structures. The proposed method focus on algorithms including but not limited to SID and EDSR.

2 Related Works

Network pruning has been widely explored in existing literatures. To answer which weight should be pruned, some works add evaluation functions to loss function, such as group lasso

[24] and MAC regularization [7]. However, it is difficult to find a proper ratio between additional pruning-related loss and original loss. Others works create evaluation functions, including sensitivity [14, 5] and weight magnitude [8]. The sensitivity method computes the impact of weights on the training loss and removes low-impact weights. Weight magnitude method simply prune weight if its absolute value is less than the threshold, which is easier to be applied on large-scale CNNs. In this work, we use weight magnitude method to prune the network.

Pruning granularity. There are two granularity of pruning, fine-grained pruning [8, 23] and coarse-grained pruning [2, 19, 20, 15]. Fined-grained method prunes individual weights (i.e., within a filter kernel), whereas coarse-grained method extensively considers network structures (i.e., along the output and the input channels). According to [10], fine-grained pruning needs additional dedicated hardware to handle irregular sparsity. Coarse-grained method may obtain higher compression ratio without the need of compression header [18]. Therefore, we focus on coarse-grained output-channel-wise pruning.

Iterative pruning. To prevent catastrophic accuracy degradation, iterative pruning is viewed as an effective retraining procedure [25, 12, 3]. For vision quality applications, quality metrics are required as a reference judgement for termination of pruning procedure.

(a) (b)
Figure 1: Slightly quality-metric drop (PSNR -0.09) may incur visible defects (SID). (a) PSNR: 25.41. (b) PSNR: 25.32.

3 Proposed Method

3.1 Architecture-aware Pruning

An output channel is pruned if its maximum absolute weight value is less than magnitude threshold. For convolutional layer, the weight kernel has tensor shape

, where is the number of input channels, is the number of output channels and is kernel size. Output-channel-wise pruning removes the weights along output channels. The kernel shape becomes if output channels are pruned. The output-channel pruned ratio is defined as .

Once output channels of a layer are pruned, the corresponding input channels of the following layer are also removed. We defined one layer’s sparsity as , where is the number of pruned input channel in the layer. The network sparsity is defined as the ratio of the number of zero weights of a pruned network divided by the number of total weights of the original network.

3.1.1 Keep Layer Depth

Usually, in vision quality applications, each layer in network is semantically designed for quality-sensitive primitives, such as edge and chroma, with respect to different resolutions. Intensively removing a layer can severely degrade the quality. Therefore, we keep the network architecture by preserving minimum number of output channels.

3.1.2 Enhance MAC Efficiency

A pruned network with higher weight sparsity may not imply higher computation reduction. We define MAC/weight, (Eq. 1), for each layer as an indicator of MAC efficiency. and is number of MAC and weights of a layer, respectively. Fig. 2 shows that MAC/weight are much larger on both top and bottom layers on SID because of its U-Net [22] network topology. Therefore, to productively reduce computation, output channels of a layer with higher are tend to be pruned more because of higher magnitude threshold.

Figure 2: MAC/weight are much larger on top and bottom layers in SID. However, MAC/weight for most layers are uniform in EDSR.
Figure 3: An example of residual block. There are 4 layers with 6 output channels in each layer. Color parts represent removed output channels. One color denotes one group of channels in balance pruned output channel method.

3.1.3 Balance Pruned Output Channel

Residual block is universally used in network topology design such as in EDSR which is a variation of ResNet [9] with long shortcut. However, to prune output channels from residual block is arduous because of element-wise operations (i.e., element-wise ADD) or concatenation. Fig. 3 illustrates an example that a magnitude threshold is applied to 4-layer residual block. Because of element-wise ADD after layer Conv-B and layer Conv-D, the output channel of a given layer (Conv-D) and its preceding layer (Conv-B) with the same index (5) should be grouped and pruned at the same time. Conv-B layer and Conv-D layer have less pruned output channels compared to layer Conv-A and layer Conv-C.

We propose a guidance (Eq. 2) to prune output channels of residual block easier by increasing magnitude threshold on layers with lower ratio of pruned output channels. MAC efficiency mentioned in Sec. 3.1.2 is also applied. is the ratio of pruned output channels of layer . is the original magnitude threshold base. Thus, output channels of a layer with lower have higher tendency to be pruned.


3.2 Quality Metric Guarantee

To maintain quality metric (PSNR and SSIM) while maximizing pruned MAC, our algorithm prunes and retrains network iteratively. The iteration terminates when either the target quality-metric criteria or maximum training steps is reached. The proposed overall flow is shown in Algorithm 1.

1:  Input: Target quality , Target Sparsity Increment , Threshold Increment , Total Step
2:  Target Sparsity total-network-sparsity
3:  Initial Threshold Base
4:  repeat
5:     for layer in network do
6:         pruned-output-channel-ratio(layer)
7:         MAC(layer)
8:         weight-size(layer)
11:        prune-output-channels-by-threshold
12:     end for
13:      calculate-total-network-sparsity
15:  until 
16:  repeat
17:     retrain-pruned-network
18:      evaluate-quality-metric
19:      get-current-step
20:  until ( or )
21:  if  then
22:     jump to line 2
23:  end if
Algorithm 1 Architecture-aware and Quality Metric Guaranteed Pruning

4 Experimental Result

4.1 Experiment Setup

We generally investigate both SID for low-light photography and EDSR (baseline network, 2) for super resolution. SID uses its own dataset [4] and EDSR adopts DIV2K dataset [1]. The input size of the network is set to the maximum image resolution in the datasets, 14242128 for SID and 10201020 for DIV2K, to calculate MAC and BW.

In SID dataset, we use images captured by Sony 7SII camera as our training and validation data, which contains 280 pairs and 93 pairs, respectively. The pre-process stage is aligned with the setting in SID paper. In DIV2K dataset, we use the pre-process setting mentioned in [16] to generate 5,458,040 training patches from 800 training images and use 100 validation images as our validation data.


Network Solution
of MAC
of Weights
of Activations


560 (100%) 7757 (100%) 1915 (100%) 1922 (100%) 28.54 0.767
458 (82%) 6918 (89%) 1632 (85%) 1639 (85%) 28.54 0.768
354 (63%) 5275 (68%) 1485 (78%) 1491 (78%) 28.54 0.771
270 (48%) 5584 (72%) 1219 (64%) 1225 (64%) 28.54 0.769
236 (42%) 4241 (55%) 1169 (61%) 1173 (61%) 28.55 0.768


1428 (100%) 1367 (100%) 5076 (100%) 5077 (100%) 34.42 0.942
1085 (76%) 1037 (76%) 4481 (88%) 4481 (88%) 34.43 0.942
1085 (76%) 1037 (76%) 4481 (88%) 4481 (88%) 34.43 0.942
1085 (76%) 1037 (76%) 4481 (88%) 4481 (88%) 34.43 0.942
897 (63%) 857 (63%) 4083 (80%) 4083 (80%) 34.42 0.942


Table 1: Detailed results. BW, considering only convolutional layers, consists of both weights and activations. Each weight and activation is represented with 4-byte floating-point numerical precision.
(a) (b) (c) (d)
Figure 4: SID results of our method compared to original (without pruning). (a)(c) Original (PSNR: 28.54, SSIM: 0.767). (b)(d) Pruned with Method-D (PSNR: 28.55, SSIM:0.768)
(a) (b) (c)
Figure 5: EDSR results of our method compared to original (without pruning). (a) Image sampled from DIV2K. (b) Original (PSNR: 34.42, SSIM: 0.942). (c) Proposed Method-D (PSNR: 34.42, SSIM: 0.942)
Figure 6: Pruned output channel per layer on SID
Figure 7: Pruned output channel per layer on EDSR

4.2 Result

The comprehensive analysis is elaborated in Table 1. We evaluate four distinct approaches. Method-A stands for magnitude threshold pruning without any structural hints. Method-B keeps the depth of network (Sec. 3.1.1). Method-C further considers MAC/weight ratio (Sec. 3.1.2) on the basis of Method-B. Method-D integrates all proposed techniques. All methods are conducted in company with quality-metric constraints (Sec. 3.2). For both SID and EDSR, it shows no PSNR and SSIM drop on all methods. Fig. 4 and Fig. 5 reveal the indistinguishable quality difference.

Keep Layer Depth. In SID, Method-B reduces MAC from 82% to 63% compared to Method-A. As shown in Fig. 6, Method-A may remove all the output channels of a layer due to not keeping layer depth, which leads to severe quality metric drops that cannot be recovered in retraining steps.

Enhance MAC Efficiency. In SID, Fig. 6 shows that Method-C prunes more weights on both top and bottom layers that have larger MAC/weight (Eq. 1). Therefore, Method-C reduces MAC from 63% to 48% compared to Method-B.

Balance Pruned Output Channel. Method-D increases 17% weight sparsity but only reduces 6% MAC compared to Method-C in SID. Fig. 6 illustrates that Method-D prunes less on top and bottom layers which have larger MAC/weight.

In EDSR, there is no difference among Method-A, Method-B and Method-C because no layer is pruned by Method-A and MAC/weight are identical for all layers (last layer could not be pruned) as shown in Fig. 2. Fig. 7 shows that Method-D reduces MAC from 76% to 63%, which is more than 10%, in shortcut-connected layers.

In summary, our methodology has significant reduction on both MAC and BW, which implies reduction on inference latency. For BW, we have 39% and 20% reduction on SID and EDSR, respectively. Our methodology also works well on complex network architecture.

5 Conclusion

To minimize computation complexity without quality drop on vision quality applications, our architecture-aware pruning is optimized for pruning more for complexity metric (e.g., MAC) on SID and shortcut-connected layers on EDSR. The MAC of SID and EDSR are reduced by 58% and 37%, respectively. Memory bandwidth is also reduced without degradation of PSNR, SSIM and subjective quality. The reduction of computation complexity and memory bandwidth could benefit on general mobile devices without special hardware design.


  • [1] E. Agustsson and R. Timofte (2017-07) NTIRE 2017 challenge on single image super-resolution: dataset and study. Cited by: §4.1.
  • [2] S. Anwar and W. Sung (2016) Compact deep convolutional neural networks with coarse pruning. CoRR abs/1610.09639. External Links: Link, 1610.09639 Cited by: §2.
  • [3] G. Castellano, A. M. Fanelli, and M. Pelillo (1997-05) An iterative pruning algorithm for feedforward neural networks. IEEE Transactions on Neural Networks 8 (3), pp. 519–531. External Links: Document, ISSN 1045-9227 Cited by: §2.
  • [4] C. Chen, Q. Chen, J. Xu, and V. Koltun (2018) Learning to see in the dark. CoRR abs/1805.01934. External Links: Link, 1805.01934 Cited by: §1, §1, §4.1.
  • [5] A. P. Engelbrecht (2001-11)

    A new pruning heuristic based on variance analysis of sensitivity information

    Trans. Neur. Netw. 12 (6), pp. 1386–1399. External Links: ISSN 1045-9227, Link, Document Cited by: §2.
  • [6] R. B. Girshick, J. Donahue, T. Darrell, and J. Malik (2013) Rich feature hierarchies for accurate object detection and semantic segmentation. CoRR abs/1311.2524. External Links: Link, 1311.2524 Cited by: §1.
  • [7] A. Gordon, E. Eban, O. Nachum, B. Chen, T. Yang, and E. Choi (2017) MorphNet: fast & simple resource-constrained structure learning of deep networks. CoRR abs/1711.06798. External Links: Link, 1711.06798 Cited by: §2.
  • [8] S. Han, J. Pool, J. Tran, and W. J. Dally (2015) Learning both weights and connections for efficient neural networks. CoRR abs/1506.02626. External Links: Link, 1506.02626 Cited by: §2, §2.
  • [9] K. He, X. Zhang, S. Ren, and J. Sun (2015) Deep residual learning for image recognition. CoRR abs/1512.03385. External Links: Link, 1512.03385 Cited by: §3.1.3.
  • [10] Y. He and S. Han (2018)

    ADC: automated deep compression and acceleration with reinforcement learning

    CoRR abs/1802.03494. External Links: Link, 1802.03494 Cited by: §2.
  • [11] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam (2017) MobileNets: efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861. External Links: Link, 1704.04861 Cited by: §1.
  • [12] H. Hu, R. Peng, Y. Tai, and C. Tang (2016)

    Network trimming: A data-driven neuron pruning approach towards efficient deep architectures

    CoRR abs/1607.03250. External Links: Link, 1607.03250 Cited by: §2.
  • [13] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.), pp. 1097–1105. External Links: Link Cited by: §1.
  • [14] Y. Le Cun, J. S. Denker, and S. A. Solla (1989) Optimal brain damage. In Proceedings of the 2Nd International Conference on Neural Information Processing Systems, NIPS’89, Cambridge, MA, USA, pp. 598–605. External Links: Link Cited by: §2.
  • [15] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf (2016) Pruning filters for efficient convnets. CoRR abs/1608.08710. External Links: Link, 1608.08710 Cited by: §2.
  • [16] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee (2017) Enhanced deep residual networks for single image super-resolution. CoRR abs/1707.02921. External Links: Link, 1707.02921 Cited by: §1, §1, §4.1.
  • [17] J. Long, E. Shelhamer, and T. Darrell (2014) Fully convolutional networks for semantic segmentation. CoRR abs/1411.4038. External Links: Link, 1411.4038 Cited by: §1.
  • [18] H. Mao, S. Han, J. Pool, W. Li, X. Liu, Y. Wang, and W. J. Dally (2017) Exploring the regularity of sparse structure in convolutional neural networks. CoRR abs/1705.08922. External Links: Link, 1705.08922 Cited by: §2.
  • [19] P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz (2016)

    Pruning convolutional neural networks for resource efficient transfer learning

    CoRR abs/1611.06440. External Links: Link, 1611.06440 Cited by: §2.
  • [20] A. Polyak and L. Wolf (2015) Channel-level acceleration of deep face representations. IEEE Access 3 (), pp. 2163–2175. External Links: Document, ISSN 2169-3536 Cited by: §2.
  • [21] R. Reed (1993-Sep.) Pruning algorithms-a survey. IEEE Transactions on Neural Networks 4 (5), pp. 740–747. External Links: Document, ISSN 1045-9227 Cited by: §1.
  • [22] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. CoRR abs/1505.04597. External Links: Link, 1505.04597 Cited by: §3.1.2.
  • [23] S. Srinivas, A. Subramanya, and R. V. Babu (2016) Training sparse neural networks. CoRR abs/1611.06694. External Links: Link, 1611.06694 Cited by: §2.
  • [24] W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li (2016) Learning structured sparsity in deep neural networks. CoRR abs/1608.03665. External Links: Link, 1608.03665 Cited by: §2.
  • [25] T. Yang, Y. Chen, and V. Sze (2016) Designing energy-efficient convolutional neural networks using energy-aware pruning. CoRR abs/1611.05128. External Links: Link, 1611.05128 Cited by: §2.