CNN is adopted as an essential ingredients in computer vision and machine learning areas [13, 6, 17]. Vision perception tasks including image classification, object detection and semantic segmentation are comprehensively investigated and associated with CNN. Even in image processing field such as super resolution, high dynamic range imaging and de-noising, CNN has progressive and promising improvement on image quality in recent years [4, 16].
However, compared to perception tasks, it requires higher computational complexity and BW requirements for vision quality tasks. MobileNetV1 
is designed with 569M MAC for ImageNet classification. On the other hand, in low-light photography SID and super-resolution EDSR , it takes 560G MAC and 1.4T MAC per inference, respectively. It is more challenge to deploy CNN models on mobile devices for vision quality applications.
Network pruning  is an effective methodology toward performance optimization. Sparsity is defined as the ratio of the number of zero weights divided by the number of total weights. Better pruning algorithm delivers higher sparsity and reduces more MAC and BW correspondingly. However, quality drop is one of the major challenges in network pruning. Fig. 1 shows visible defect on SID even with only 0.1 PSNR degradation.
In this paper, we propose architecture-aware pruning to maximize sparsity and MAC reduction without quality-metric (PSNR or SSIM) drops. We also analyze the effects of MAC and BW reduction with different configurations associating with pruned structures. The proposed method focus on algorithms including but not limited to SID and EDSR.
2 Related Works
Network pruning has been widely explored in existing literatures. To answer which weight should be pruned, some works add evaluation functions to loss function, such as group lasso and MAC regularization . However, it is difficult to find a proper ratio between additional pruning-related loss and original loss. Others works create evaluation functions, including sensitivity [14, 5] and weight magnitude . The sensitivity method computes the impact of weights on the training loss and removes low-impact weights. Weight magnitude method simply prune weight if its absolute value is less than the threshold, which is easier to be applied on large-scale CNNs. In this work, we use weight magnitude method to prune the network.
Pruning granularity. There are two granularity of pruning, fine-grained pruning [8, 23] and coarse-grained pruning
[2, 19, 20, 15].
Fined-grained method prunes individual weights (i.e., within a filter kernel), whereas coarse-grained method extensively considers network structures (i.e., along the output and the input channels).
According to , fine-grained pruning
needs additional dedicated hardware to handle irregular sparsity.
Coarse-grained method may obtain higher compression ratio
without the need of compression header .
Therefore, we focus on coarse-grained output-channel-wise pruning.
3 Proposed Method
3.1 Architecture-aware Pruning
An output channel is pruned if its maximum absolute weight value is less than magnitude threshold. For convolutional layer, the weight kernel has tensor shape, where is the number of input channels, is the number of output channels and is kernel size. Output-channel-wise pruning removes the weights along output channels. The kernel shape becomes if output channels are pruned. The output-channel pruned ratio is defined as .
Once output channels of a layer are pruned, the corresponding input channels of the following layer are also removed. We defined one layer’s sparsity as , where is the number of pruned input channel in the layer. The network sparsity is defined as the ratio of the number of zero weights of a pruned network divided by the number of total weights of the original network.
3.1.1 Keep Layer Depth
Usually, in vision quality applications, each layer in network is semantically designed for quality-sensitive primitives, such as edge and chroma, with respect to different resolutions. Intensively removing a layer can severely degrade the quality. Therefore, we keep the network architecture by preserving minimum number of output channels.
3.1.2 Enhance MAC Efficiency
A pruned network with higher weight sparsity may not imply higher computation reduction. We define MAC/weight, (Eq. 1), for each layer as an indicator of MAC efficiency. and is number of MAC and weights of a layer, respectively. Fig. 2 shows that MAC/weight are much larger on both top and bottom layers on SID because of its U-Net  network topology. Therefore, to productively reduce computation, output channels of a layer with higher are tend to be pruned more because of higher magnitude threshold.
3.1.3 Balance Pruned Output Channel
Residual block is universally used in network topology design such as in EDSR which is a variation of ResNet  with long shortcut. However, to prune output channels from residual block is arduous because of element-wise operations (i.e., element-wise ADD) or concatenation. Fig. 3 illustrates an example that a magnitude threshold is applied to 4-layer residual block. Because of element-wise ADD after layer Conv-B and layer Conv-D, the output channel of a given layer (Conv-D) and its preceding layer (Conv-B) with the same index (5) should be grouped and pruned at the same time. Conv-B layer and Conv-D layer have less pruned output channels compared to layer Conv-A and layer Conv-C.
We propose a guidance (Eq. 2) to prune output channels of residual block easier by increasing magnitude threshold on layers with lower ratio of pruned output channels. MAC efficiency mentioned in Sec. 3.1.2 is also applied. is the ratio of pruned output channels of layer . is the original magnitude threshold base. Thus, output channels of a layer with lower have higher tendency to be pruned.
3.2 Quality Metric Guarantee
To maintain quality metric (PSNR and SSIM) while maximizing pruned MAC, our algorithm prunes and retrains network iteratively. The iteration terminates when either the target quality-metric criteria or maximum training steps is reached. The proposed overall flow is shown in Algorithm 1.
4 Experimental Result
4.1 Experiment Setup
We generally investigate both SID for low-light photography and EDSR (baseline network, 2) for super resolution. SID uses its own dataset  and EDSR adopts DIV2K dataset . The input size of the network is set to the maximum image resolution in the datasets, 14242128 for SID and 10201020 for DIV2K, to calculate MAC and BW.
In SID dataset, we use images captured by Sony 7SII camera as our training and validation data, which contains 280 pairs and 93 pairs, respectively. The pre-process stage is aligned with the setting in SID paper. In DIV2K dataset, we use the pre-process setting mentioned in  to generate 5,458,040 training patches from 800 training images and use 100 validation images as our validation data.
The comprehensive analysis is elaborated in Table 1.
We evaluate four distinct approaches.
Method-A stands for magnitude threshold pruning without any structural hints. Method-B keeps the depth of network (Sec. 3.1.1).
Method-C further considers MAC/weight ratio (Sec. 3.1.2) on the basis of Method-B.
Method-D integrates all proposed techniques. All methods are conducted in company with quality-metric constraints (Sec. 3.2).
For both SID and EDSR, it shows no PSNR and SSIM drop on all methods.
Fig. 4 and Fig. 5 reveal the indistinguishable quality difference.
Keep Layer Depth.
In SID, Method-B reduces MAC from 82% to 63% compared to Method-A.
As shown in Fig. 6,
Method-A may remove all the output channels of a layer due to not keeping layer depth, which leads to severe quality metric drops that cannot be recovered in retraining steps.
Enhance MAC Efficiency.
In SID, Fig. 6 shows that Method-C prunes more weights on both top and bottom layers that have larger MAC/weight (Eq. 1).
Therefore, Method-C reduces MAC from 63% to 48% compared to Method-B.
Balance Pruned Output Channel.
Method-D increases 17% weight sparsity but only reduces 6% MAC compared to Method-C in SID.
Fig. 6 illustrates that Method-D prunes less on top and bottom layers which have larger MAC/weight.
In EDSR, there is no difference among Method-A, Method-B and Method-C because no layer is pruned by Method-A
and MAC/weight are identical for all layers (last layer could not be pruned) as shown in Fig. 2.
Fig. 7 shows that Method-D reduces MAC from 76% to 63%, which is more than 10%, in shortcut-connected layers.
In summary, our methodology has significant reduction on both MAC and BW, which implies reduction on inference latency. For BW, we have 39% and 20% reduction on SID and EDSR, respectively. Our methodology also works well on complex network architecture.
To minimize computation complexity without quality drop on vision quality applications, our architecture-aware pruning is optimized for pruning more for complexity metric (e.g., MAC) on SID and shortcut-connected layers on EDSR. The MAC of SID and EDSR are reduced by 58% and 37%, respectively. Memory bandwidth is also reduced without degradation of PSNR, SSIM and subjective quality. The reduction of computation complexity and memory bandwidth could benefit on general mobile devices without special hardware design.
-  (2017-07) NTIRE 2017 challenge on single image super-resolution: dataset and study. Cited by: §4.1.
-  (2016) Compact deep convolutional neural networks with coarse pruning. CoRR abs/1610.09639. External Links: Cited by: §2.
-  (1997-05) An iterative pruning algorithm for feedforward neural networks. IEEE Transactions on Neural Networks 8 (3), pp. 519–531. External Links: Cited by: §2.
-  (2018) Learning to see in the dark. CoRR abs/1805.01934. External Links: Cited by: §1, §1, §4.1.
-  (2001-11) . Trans. Neur. Netw. 12 (6), pp. 1386–1399. External Links: Cited by: §2.
-  (2013) Rich feature hierarchies for accurate object detection and semantic segmentation. CoRR abs/1311.2524. External Links: Cited by: §1.
-  (2017) MorphNet: fast & simple resource-constrained structure learning of deep networks. CoRR abs/1711.06798. External Links: Cited by: §2.
-  (2015) Learning both weights and connections for efficient neural networks. CoRR abs/1506.02626. External Links: Cited by: §2, §2.
-  (2015) Deep residual learning for image recognition. CoRR abs/1512.03385. External Links: Cited by: §3.1.3.
ADC: automated deep compression and acceleration with reinforcement learning. CoRR abs/1802.03494. External Links: Cited by: §2.
-  (2017) MobileNets: efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861. External Links: Cited by: §1.
Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. CoRR abs/1607.03250. External Links: Cited by: §2.
-  (2012) ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.), pp. 1097–1105. External Links: Cited by: §1.
-  (1989) Optimal brain damage. In Proceedings of the 2Nd International Conference on Neural Information Processing Systems, NIPS’89, Cambridge, MA, USA, pp. 598–605. External Links: Cited by: §2.
-  (2016) Pruning filters for efficient convnets. CoRR abs/1608.08710. External Links: Cited by: §2.
-  (2017) Enhanced deep residual networks for single image super-resolution. CoRR abs/1707.02921. External Links: Cited by: §1, §1, §4.1.
-  (2014) Fully convolutional networks for semantic segmentation. CoRR abs/1411.4038. External Links: Cited by: §1.
-  (2017) Exploring the regularity of sparse structure in convolutional neural networks. CoRR abs/1705.08922. External Links: Cited by: §2.
Pruning convolutional neural networks for resource efficient transfer learning. CoRR abs/1611.06440. External Links: Cited by: §2.
-  (2015) Channel-level acceleration of deep face representations. IEEE Access 3 (), pp. 2163–2175. External Links: Cited by: §2.
-  (1993-Sep.) Pruning algorithms-a survey. IEEE Transactions on Neural Networks 4 (5), pp. 740–747. External Links: Cited by: §1.
-  (2015) U-net: convolutional networks for biomedical image segmentation. CoRR abs/1505.04597. External Links: Cited by: §3.1.2.
-  (2016) Training sparse neural networks. CoRR abs/1611.06694. External Links: Cited by: §2.
-  (2016) Learning structured sparsity in deep neural networks. CoRR abs/1608.03665. External Links: Cited by: §2.
-  (2016) Designing energy-efficient convolutional neural networks using energy-aware pruning. CoRR abs/1611.05128. External Links: Cited by: §2.