1 Introduction
Channel pruning has been recognized as an effective neural network compression/acceleration method [27, 19, 2, 3, 18, 48] and is widely used in the industry. A typical pruning approach contains three stages: training a large overparameterized network, pruning the lessimportant weights or channels, finetuning or retrain the pruned network. The second stage is the key. It usually performs iterative layerwise pruning and fast finetuning or weight reconstruction to retain the accuracy [15, 1, 28, 36].
Conventional channel pruning methods mainly relie on datadriven sparsity constraints [24, 30], or humandesigned policies [19, 27, 35, 21, 33, 2]. Recent AutoMLstyle works automatically prune channels in an iterative model, based on a feedback loop [48]
[18]. Compared with the conventional pruning methods, the AutoML methods save human efforts and can optimize the direct metrics like the hardware latency.Apart from the idea of keeping the important weights in the pruned network, a recent study [31] finds that the pruned network can achieve the same accuracy no matter it inherits the weights in the original network or not. This finding suggests that the essence of channel pruning is finding good pruning structure  layerwise channel numbers.
However, exhaustively finding the optimal pruning structure is computationally prohibitive. Considering a network with 10 layers and each layer contains 32 channels. The possible combination of layerwise channel numbers could be . Inspired by the recent Neural Architecture Search (NAS), specifically OneShot model [5], as well as the weight prediction mechanism in HyperNetwork [13], we propose to train a PruningNet that can generate weights for all candidate pruned networks structures, such that we can search goodperforming structures by just evaluating their accuracy on the validation data, which is highly efficient.
To train the PruningNet, we use a stochastic training. As shown in Figure 1, the PruningNet generates the weights for pruned networks with corresponding network encoding vectors, which is the number of channels in each layer. By stochastically feeding in the network encoding vector, the PruningNet gradually learns to generate weights for various pruned structures. After the training, we search for goodperforming Pruned Networks by an evolutionary search method which can flexibly incorporate various hard constraints such as computation FLOPs or hardware latency. Moreover, by directly searching the best pruned network via determining the channels for each layer or each stage, we can prune channels in the shortcut without extra effort, which is seldom addressed in previous channel pruning solutions. We name the proposed method as MetaPruning.
We apply our approach on MobileNets [20, 41]. At the same FLOPS, our accuracy is  higher than MobileNet V1, and  higher than MobileNet V2. At the same latency, our accuracy is  higher than MobileNet V1, and  higher than MobileNet V2. Compared with stateoftheart AutoMLbased channel pruning methods [18, 48], our MetaPruning produces superior or comparable results.
Our contribution lies in four folds:

[leftmargin=*]

We proposed a meta learning approach, MetaPruning, for channel pruning. The central of this approach is learning a meta network (named PruningNet) which generates weights for various pruning structures. With a single trained PruningNet, we can obtain various pruned networks under different constraints.

Compared to conventional pruning methods, MetaPruning liberates human from cumbersome hyperparameter tuning and enables the direct optimization with desired metrics.

Compared to other AutoML methods, MetaPruning can easily enforce hard constraints in the search of desired structures, without manually tuning the reinforcement learning hyperparameters.

The meta learning is able to effortlessly prune the channels in the shortcuts for ResNetlike structures, which is nontrivial because the channels in the shortcut affect more than one layers.
2 Related Works
There are extensive studies on compressing and accelerating neural networks, such as quantization [38, 32], pruning [19, 25, 14] and compact network design [20, 41, 50, 34]. A comprehensive survey is provided in [43]. Here, we summarize the approaches that are most related to our work.
Pruning Network pruning is a prevalent approach for removing redundancy in DNNs. In weight pruning, people prune individual weights to compress the model size [25, 16, 14, 12]. However, weight pruning results in unstructured sparse filters, which can hardly be accelerated by generalpurpose hardware. Recent works [21, 27, 35, 19, 33, 49] focus on channel pruning in the CNNs, which removes entire weight filters instead of individual weights. Traditional channel pruning methods trim channels based on the importance of each channel either in an iterative mode [19, 33] or by adding a datadriven sparsity [24, 30]
. In most traditional channel pruning, compression ratio for each layer need to be manually set based on human experts or heuristics, which is time consuming and prone to be trapped in suboptimal solutions.
AutoML Recently, AutoML methods [18, 48] take the realtime inference latency on multiple devices into account to iteratively prune channels in different layers of a network via reinforcement learning [18] or an automatic feedback loop [48]. Compared with traditional channel pruning methods, AutoML methods help to alleviate the manual efforts for tuning the hyperparameters in channel pruning. Our proposed MetaPruning also involves little human participation. Different from previous AutoML pruning methods, which is carried out in a layerwise pruning and finetuning loop, our methods is motivated by recent findings [31], which suggests that instead of selecting “important” weights, the essence of channel pruning sometimes lies in identifying the best pruned network. From this prospective, we propose MetaPruning for directly finding the optimal pruned network structures. Compared to previous AutoML pruning methods [18, 48], MetaPruning method enjoys higher flexibility in precisely meeting the constraints and possesses the ability of pruning the channel in the shortcut.
Meta Learning
Metalearning refers to learning from observing how different machine learning approaches perform on various learning tasks. Meta learning can be used in few/zeroshot learning
[39, 11][44]. A comprehensive overview of meta learning is provided in [26]. In this work we are inspired by[13] to use meta learning for weight prediction. Weight predictions refer to weights of a neural network are predicted by another neural network rather than directly learned [13]. Recent works also applies meta learning on various tasks and achieves stateoftheart results in detection [47], superresolution with arbitrary magnification
[23] and instance segmentation [22].Neural Architecture Search Studies for neural architecture search try to find the optimal network structures and hyperparameters with reinforcement learning [51, 4]
[46, 37, 40] or gradient based approaches [29, 45]. Parameter sharing [7, 5, 45, 29] and weights prediction [6, 10] methods are also extensively studied in neural architecture search. Oneshot architecture search [5] uses an overparameterized network with multiple operation choices in each layer. By jointly training multiple choices with droppath, it can search for the path with highest accuracy in the trained network, which also inspired our two step pruning pipeline. Tuning channel width are also included in some neural architecture search methods. ChamNet [8] built an accuracy predictor atop Gaussian Process with Bayesian optimization to predict the network accuracy with various channel widths, expand ratios and numbers of blocks in each stage. Despite its high accuracy, building such an accuracy predictor requires a substantial of computational power. FBNet [45] and ProxylessNas [7] include blocks with several different middle channel choices in the search space. Different from neural architecture search, in channel pruning task, the channel width choices in each layer is continuous, which makes enumerate every channel width choice as an independent operation infeasible. Proposed MetaPruning targeting at channel pruning is able to solve this continuous channel pruning challenge by training the PruningNet with weight prediction, which will be explained in Sec.33 Methodology
In this section, we introduce our meta learning approach for automatically pruning channels in deep neural networks, that pruned network could meet various hard constraints.
We formulate the channel pruning problem as
(1)  
where is the network before the pruning. We try to find out the pruned network channel width () for layer to layer that has the minimum loss after the weights are trained, with the cost meets the hard constraint (i.e. FLOPs or latency).
To achieve this, we propose to construct a PruningNet, a kind of meta network, where we can quickly obtain the goodness of all potential pruned network structures by evaluating on the validation data only. Then we can apply any search method, which is evolution algorithm in this paper, to search for the best pruned network.
3.1 PruningNet training
Channel pruning is nontrivial because the layerwise dependence in channels such that pruning one channel may significantly influence the following layers and, in return, degrade the overall accuracy. Previous methods try to decompose the channel pruning problem into the subproblem of pruning the unimportant channels layerbylayer [19] or adding the sparsity regularization [24]. AutoML methods prune channels automatically with a feedback loop [48] or reinforcement learning [18]. Among those methods, how to prune channels in the shortcut is seldom addressed. Most previous methods prune the middle channels in each block only[48, 18], which limits the overall compression ratio without manually reducing the input image resolution.
Carrying out channel pruning task with consideration of the overall pruned network structure is beneficial for finding optimal solutions for channel pruning and can solve the shortcut pruning problem. However, obtaining the best pruned network is not straightforward, considering a small network with 10 layers and each layer containing 32 channels, the combination of possible pruned network structures is huge.
Inspired by the recent work [31], which suggests the weights left by pruning is not important compared to the pruned network structure, we are motivated to directly find the best pruned network structure. In this sense, we may directly predict the optimal pruned network without iteratively decide the important weight filters. To achieve this goal, we construct a meta network, PruningNet, for providing reasonable weights for various pruned network structures to rank their performance.
The PruningNet is a meta network, which takes a network encoding vector as input and outputs the weights of pruned network:
(2) 
A PruningNet block consists of two fullyconnected layers. In the forward pass, the PruningNet takes the network encoding vector (i.e., the number of channels in each layer) as input, and generates the weight matrix. Meanwhile, a Pruned Network is constructed with output channels width in each layer equal to the element in the network encoding vector. The generated weight matrix is cropped to match the number of input and output channel in the Pruned Network. Given a batch of input image, we can calculate the loss from the Pruned Network with generated weights.
In the backward pass, instead of updating the weights in the Pruned Networks, we calculate the gradients w.r.t the weights in the PruningNet. Since the reshape operation as well as the convolution operation between the output of the fullyconnect layer in the PruningNet and the output of the previous convolutional layer in the Pruned Network is also differentiable, the gradient of the weights in the PruningNet can be easily calculated by the Chain Rule. The PruningNet is endtoend trainable. The detailed structure of PruningNet connected with Pruned Network is shown in Figure
3stochastic training. We propose to randomly change the network encoding vector in the training phase. At each iteration, the network encoding vector is generated by randomly choosing the number of channels in each layer. With different network encodings, different Pruned Networks are constructed and the corresponding weights are provided with the PruningNet. By stochastically training with different encoding vectors, the PruningNet learns to predict reasonable weights for Pruned Networks.
3.2 PrunedNetwork search
After the PruningNet is trained, we can obtain the accuracy of each potential pruned network by inputting the network encoding into the PruningNet, generating the corresponding weights and doing the evaluation on the validation data.
Since the number of network encoding vectors is huge, we are not able to enumerate. To find out the pruned network with high accuracy under the constraint, we use an evolutionary search, which is able to easily incorporate any soft or hard constraints.
In the evolutionary algorithm used in MetaPruning, each pruned network is encoded with a vector of channel numbers in each layer, named the genes of pruned networks. Under the hard constraint, we first randomly select a number of genes and obtain the accuracy of the corresponding pruned network by doing the evaluation. Then the top k genes with highest accuracy are selected for generating the new genes with mutation and crossover. The mutation is carried out by changing a proportion of elements in the gene randomly. The crossover means that we randomly recombine the genes in two parent genes to generate an offspring. We can easily enforce the hard constraint by eliminate the unqualified genes. By further repeating the top k selection process and new genes generation process for several iterations, we can obtain the gene that meets hard constraints while achieving the highest accuracy. Detailed algorithm is described in Algorithm.
1.4 Experimental Results
In this section, we demonstrate the effectiveness of our proposed MetaPruning method. We first explain the experiment settings and introduce how to apply the MetaPruning on MobileNet V1 [20] and V2 [41], which can be easily generalized to other network structures. Second, we compare our results with the baseline uniform pruning as well as two stateoftheart AutoML channel pruning methods. Third, we visualize the pruned network obtained with MetaPruning. Last, ablation studies are carried out to elaborate the effect of weight prediction in our method.
4.1 Experiment settings
The proposed MetaPruning is efficient to run as it does not involve iterative finetuning process and requires the same training epochs as training a network normally. Consequently, it is feasible to carry out all experiments on the ImageNet 2012 classification dataset
[9].Our MetaPruning method consists of two stages. In the first stage, we train the PruningNet from scratch with stochastic training. In the second stage, we use an evolutionary search algorithm to find the best pruned network train it from scratch normally. For training process in both stages, we use the standard data augmentation strategies as ResNet [17] to process the input images and the same training scheme [34]. The resolutions of the input image is set to 224
224 for all experiments. We split the original training images into subvalidation dataset, which contains 50000 images randomly selected from the training images with 50 images in each 1000class, and subtraining dataset with the rest of images. We train the PruningNet on the subtraining dataset and evaluating the performance of pruned network on the subvalidation dataset in the searching phase. In evaluating the performance of pruned network, we recalculate the running mean and running variance in the BatchNorm layer with 20000 subtraining images, which takes only a few seconds. After obtaining the best pruned network, the pruned network is trained from scratch on the original training dataset and evaluated on the test dataset.
4.2 MetaPruning on MobileNets
To prove the effectiveness of our MetaPruning method, we apply it on MobileNets [20, 41], which is a welldesigned network for the Mobile or embedded devices. Compared to other networks like ResNet [17] and VGG [42], MobileNets are more compact with fewer redundancy, therefore harder pruned. MobileNets extensively use the depthwise convolution concatenating with pointwise convolution, greatly reduces the redundancy in the original convolutions, which are used in VGG and ResNet. In this work, we choose to prune the filters in MobileNets aiming at proving the effectiveness of our method and make further improvements over the MobileNets, so that we can provide offtheshelf solutions for channel pruning targeting at resource limited devices. Of course, our method can be easily generalized to other network architectures.
4.2.1 MobileNet V1
MobileNet V1 is a network without shortcut. The input vector to the PruningNet is the number of channels in each layer. A PruningNet block is composed of two concatenated fullyconnected layers and the number of PruningNet blocks equals to the number of dwconvpwconv blocks (i.e. two concatenated layers with a 3x3 depthwise convolution and a 1x1 pointwise convolution) in the MobileNet v1. In PruningNet, we first decode the network encoding vector to the input compression ratio and output compression ratio of each dwconvpwconv block. In generating each weight matrix, a vector is inputted to the first fullyconnected layer with a output size of 64 in PruningNet block. The second fullyconnected layer use this 64 encoding to output a vector with a length of , then we reshape it to () as the weight matrix in the convolution layer.
In stochastic training, the number of output channels for each dwconvpwconv block is randomly selected in [, ], with the step being , more refined or coarse step can be chosen according to the fineness of pruning. After PruningNet takes the network encoding vector and generates the corresponding weight matrix, the top left part generated weight matrix matching the input and output channels is cropped and used in training, as shown in Figure 4 (a), and the rest of the weights can be regards as being ‘untouched’ in this iteration. In different iterations, different channel width encoding vectors are generated.
4.2.2 MobileNet V2
In MobileNet V2, each stage starts with a bottleneck block matching the dimension between two stages. If a stage consists of more than one block, the following blocks in this stage will contain a shortcut adding the input feature maps with the output feature maps, for input and output channels in a stage should be identical, as shown in Figure 4 (b). To prune this ResNetlike structure containing shortcut, we generate two network encoding vectors, one encodes the overall stage output channels for matching the channels in the shortcut and another encodes the middle channels of each blocks. We construct the pruned network with respect to the overall stage channel encoding as well as the middle channel encoding. In PruningNet, we first decode this network encoding vector to the input compression ratio, output compression ratio and the middle channel compression ratio of each block. Then we generate the corresponding weight matrices in that block, with a vector inputting to the corresponding block in the PruningNet. The PruningNet block design is the same as that in MobileNetV1, and the number of PruningNet block equals to the number of bottleneck blocks in the MobileNet v2.
4.3 Comparisons with stateofthearts
We compare our method with the uniform pruning baselines as well as two stateoftheart AutoML based channel pruning methods.
 Uniform Baselines MobileNets [20, 41] proposes to use multipliers to uniformly prune the channel width in each layer to meet the resource constraints.
 AMC [18] utilizes reinforcement learning to iteratively prune channels in each layer taking the accuracy as well as the latency as the reward.
 NetAdapt [48]
automatically decide pruning how much proportion in which layer based on a feedback loop with latency estimated from the device.
4.3.1 Pruning under FLOPS constraint
Uniform Baselines  MetaPruning  

Ratio  Top1Acc  FLOPs  Top1Acc  FLOPs 
1  70.6%  569M  –  – 
0.75  68.4%  325M  70.6%  324M 
0.5  63.7%  149M  66.1%  149M 
0.25  50.6%  41M  57.2%  41M 
Uniform Baselines  MetaPruning  

Top1Acc  FLOPs  Top1Acc  FLOPs 
74.7%  585M  –  – 
72.0%  313M  72.7%  291M 
67.2%  140M  68.2%  140M 
66.5%  127M  67.3%  124M 
64.0%  106M  65.0%  105M 
62.1%  87M  63.8%  84M 
54.6%  43M  58.3%  43M 
Network  FLOPs  Top1Acc 

0.75x MobileNet V1 [20]  325M  68.4% 
NetAdapt [48]  284M  69.1% 
AMC [18]  285M  70.5% 
MetaPruning  281M  70.4% 
0.75x MobileNet V2 [41]  220M  69.8% 
AMC [18]  220M  70.8% 
MetaPruning  217M  71.2% 
Table 1 compares our accuracy with the uniform pruning baselines reported in [20]. With the pruning scheme learned by MetaPruning, we obtain 6.6 higher accuracy than the baseline 0.25 MobileNet V1. Further more, as our method can be generalized to prune the shortcuts in a network, we also achieves decent improvement on MobileNet V2, shown in Table.2 Previous pruning methods only prunes the middle channels of the bottleneck structure [48, 18], which limits their maximum compress ratio at given input resolution. With MetaPruning, we can obtain 3.7 accuracy boost when the model size is as small as 43M FLOPs.
In Table 3, we compare MetaPruning with the stateoftheart AutoML pruning methods. MetaPruning achieves superior or comparable results than AMC [18] and NetAdapt [48]. Moreover, MetaPruning gets rid of manually tuning the reinforcement learning hyperparameters and can obtain the pruned network precisely meeting the FLOPs or latency constraints. With the PruningNet trained once using the same epoch as normally training the target network, we can obtain multiple pruned network structures to strike different accuracyspeed tradeoff, which is more efficient than the stateoftheart AutoML pruning methods [18, 48].
4.3.2 Pruning under latency constraint
There is an increasing attention in directly optimizing the latency on the target devices. Without knowing the implementation details inside the device, MetaPruning learns to prune channels according to the latency estimated from the device.
As the number of potential Pruned Network is numerous, measuring the latency for each network is too timeconsuming. With a reasonable assumption that the execution time of each layer is independent, we can obtain the network latency by summing up the runtime of all layers in the network. Following the practice in [45, 48], we first construct a lookup table, by estimating the latency of executing different convolution layers with different input and output channel width on the target device, which is Titan Xp GPU in our experiments. Then we can calculate the latency of the constructed network from the lookup table.
We carried out experiments on MobileNet V1 and V2. Table 4 and Table 5 show that the prune networks discovered by MetaPruning achieve significantly higher accuracy than the uniform baselines with the same latency.
Uniform Baselines  MetaPruning  

Ratio  Top1Acc  Latency  Top1Acc  Latency 
1  70.6%  0.62ms  –  – 
0.75  68.4%  0.48ms  70.5%  0.48ms 
0.5  63.7%  0.31ms  67.4%  0.30ms 
0.25  50.6%  0.17ms  59.6%  0.17ms 
Uniform Baselines  MetaPruning  

Ratio  Top1Acc  Latency  Top1Acc  Latency 
1.4  74.7%  0.95ms  –  – 
1  72.0%  0.70ms  73.2%  0.67ms 
0.65  67.2%  0.49ms  71.7%  0.47ms 
0.35  54.6%  0.28ms  64.5%  0.29ms 
4.4 Pruned result visualization
In channel pruning, people are curious about what is the best pruning heuristic and lots of human experts are working on manually design the pruning policies. With the same curiosity, we wonder if any reasonable pruning schemes are learned by our MetaPruning method that contributes to its high accuracy. In visualizing the pruned network structures, we find that the MetaPruning did learn something interesting.
Figure 5
shows the pruned network structure of MobileNet V1. We observe significant peeks in the pruned network every time when there is a down sampling operation. When the downsampling occurs with a stride 2 depthwise convolution, the resolution degradation in the feature map size need to be compensated by using more channels to carry the same amount of information. Thus, MetaPruning automatically learns to keep more channels at the downsampling layers. The same phenomenon is also observed in MobileNet V2, shown in Figure
6. The middle channels will be pruned less when the corresponding block is in responsible for shrinking the feature map size.Moreover, when we automatically prune the shortcut channels in MobileNet V2 with MetaPruning, we find that, despite the 145M pruned network contains only half of the FLOPs in the 300M pruned network, 145M network keeps similar number of channels in the last stages as the 300M network, and pruned more channels in the early stages. We suspect it is because the number of classifiers for the ImageNet dataset contains 1000 output nodes and thus more channels are needed at later stages to extract sufficient features. When the FLOPs being restrict to 45M, the network almost reaches the maximum pruning ratio and it has no choice but to prune the channels in the later stage, and the accuracy degradation from 145M network to 45M networks is much severer than that from 300M to 145M.
4.5 Ablation study
In this section, we discuss about the effect of weight prediction in the MetaPruning method.
We wondered about the consequence if we do not use the two fullyconnected layers in the PruningNet for weight prediction but directly apply the proposed stochastic training and crop the same weight matrix for matching the input and output channels in the Pruned Network. We compare between performance of the PruningNet with/without weight prediction. We select the channel number with uniformly pruning each layer at a ratio ranging from [0.25, 1], and evaluate the accuracy from both networks. From Figure 8, we see that the PruningNet without weight prediction achieves 10 lower accuracy. We further use the PruningNet without weight prediction to search for the Pruned MobileNet V1 with less than 45M FLOPs, and the obtained network achieves only 55.3 top1 accuracy, 1.9 lower than the pruned network obtained with weight prediction. It is intuitive. For example, the weight matrix for a input channel width of 64 may not be optimal when the total input channels are increased to 128 with 64 more channels added behind. In that case, the weight prediction mechanism in meta learning is effective in decorrelating the weights for different pruned structures and thus achieves much higher accuracy for the PruningNet.
5 Conclusion
In this work, we have presented MetaPruning for channel pruning. This meta learning approach has the following advantages: 1) it achieves much higher accuracy than the uniform pruning baselines and better or comparable accuracy than other AutoMLbased channel pruning methods; 2) it can flexibly optimize with respect to different constraints without introducing extra hyperparameter; 3) ResNetlike architecture can be effectively handled; 4) the whole pipeline is highly efficient.
References

[1]
J. M. Alvarez and M. Salzmann.
Learning the number of neurons in deep networks.
In Advances in Neural Information Processing Systems, pages 2270–2278, 2016. 
[2]
S. Anwar, K. Hwang, and W. Sung.
Structured pruning of deep convolutional neural networks.
ACM Journal on Emerging Technologies in Computing Systems (JETC), 13(3):32, 2017.  [3] S. Anwar and W. Sung. Compact deep convolutional neural networks with coarse pruning. arXiv preprint arXiv:1610.09639, 2016.
 [4] B. Baker, O. Gupta, N. Naik, and R. Raskar. Designing neural network architectures using reinforcement learning. arXiv preprint arXiv:1611.02167, 2016.
 [5] G. Bender, P.J. Kindermans, B. Zoph, V. Vasudevan, and Q. Le. Understanding and simplifying oneshot architecture search. In International Conference on Machine Learning, pages 549–558, 2018.
 [6] A. Brock, T. Lim, J. M. Ritchie, and N. Weston. Smash: oneshot model architecture search through hypernetworks. arXiv preprint arXiv:1708.05344, 2017.
 [7] H. Cai, L. Zhu, and S. Han. Proxylessnas: Direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332, 2018.
 [8] X. Dai, P. Zhang, B. Wu, H. Yin, F. Sun, Y. Wang, M. Dukhan, Y. Hu, Y. Wu, Y. Jia, et al. Chamnet: Towards efficient network design through platformaware model adaptation. arXiv preprint arXiv:1812.08934, 2018.
 [9] J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, and L. FeiFei. Imagenet: A largescale hierarchical image database. 2009.

[10]
M. Denil, B. Shakibi, L. Dinh, N. De Freitas, et al.
Predicting parameters in deep learning.
In Advances in neural information processing systems, pages 2148–2156, 2013. 
[11]
M. Elhoseiny, B. Saleh, and A. Elgammal.
Write a classifier: Zeroshot learning using purely textual
descriptions.
In
Proceedings of the IEEE International Conference on Computer Vision
, pages 2584–2591, 2013.  [12] Y. Guo, A. Yao, and Y. Chen. Dynamic network surgery for efficient dnns. In Advances In Neural Information Processing Systems, pages 1379–1387, 2016.
 [13] D. Ha, A. Dai, and Q. V. Le. Hypernetworks. arXiv preprint arXiv:1609.09106, 2016.
 [14] S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.
 [15] S. Han, J. Pool, J. Tran, and W. Dally. Learning both weights and connections for efficient neural network. In Advances in neural information processing systems, pages 1135–1143, 2015.
 [16] B. Hassibi, D. G. Stork, and G. J. Wolff. Optimal brain surgeon and general network pruning. In IEEE international conference on neural networks, pages 293–299. IEEE, 1993.

[17]
K. He, X. Zhang, S. Ren, and J. Sun.
Deep residual learning for image recognition.
In
Proceedings of the IEEE conference on computer vision and pattern recognition
, pages 770–778, 2016.  [18] Y. He, J. Lin, Z. Liu, H. Wang, L.J. Li, and S. Han. Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV), pages 784–800, 2018.
 [19] Y. He, X. Zhang, and J. Sun. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 1389–1397, 2017.
 [20] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
 [21] H. Hu, R. Peng, Y.W. Tai, and C.K. Tang. Network trimming: A datadriven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250, 2016.
 [22] R. Hu, P. Dollár, K. He, T. Darrell, and R. Girshick. Learning to segment every thing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4233–4241, 2018.
 [23] X. Hu, H. Mu, X. Zhang, Z. Wang, J. Sun, and T. Tan. Metasr: A magnificationarbitrary network for superresolution. arXiv preprint arXiv:1903.00875, 2019.
 [24] Z. Huang and N. Wang. Datadriven sparse structure selection for deep neural networks. In Proceedings of the European Conference on Computer Vision (ECCV), pages 304–320, 2018.
 [25] Y. LeCun, J. S. Denker, and S. A. Solla. Optimal brain damage. In Advances in neural information processing systems, pages 598–605, 1990.
 [26] C. Lemke, M. Budka, and B. Gabrys. Metalearning: a survey of trends and technologies. Artificial intelligence review, 44(1):117–130, 2015.
 [27] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710, 2016.
 [28] B. Liu, M. Wang, H. Foroosh, M. Tappen, and M. Pensky. Sparse convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 806–814, 2015.
 [29] H. Liu, K. Simonyan, and Y. Yang. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018.
 [30] Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, and C. Zhang. Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE International Conference on Computer Vision, pages 2736–2744, 2017.
 [31] Z. Liu, M. Sun, T. Zhou, G. Huang, and T. Darrell. Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270, 2018.
 [32] Z. Liu, B. Wu, W. Luo, X. Yang, W. Liu, and K.T. Cheng. Bireal net: Enhancing the performance of 1bit cnns with improved representational capability and advanced training algorithm. In Proceedings of the European Conference on Computer Vision (ECCV), pages 722–737, 2018.
 [33] J.H. Luo, J. Wu, and W. Lin. Thinet: A filter level pruning method for deep neural network compression. In Proceedings of the IEEE international conference on computer vision, pages 5058–5066, 2017.
 [34] N. Ma, X. Zhang, H.T. Zheng, and J. Sun. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), pages 116–131, 2018.
 [35] Z. Mariet and S. Sra. Diversity networks. Proceedings of ICLR, 2016.
 [36] P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz. Pruning convolutional neural networks for resource efficient transfer learning. arXiv preprint arXiv:1611.06440, 3, 2016.
 [37] H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and J. Dean. Efficient neural architecture search via parameter sharing. arXiv preprint arXiv:1802.03268, 2018.
 [38] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi. Xnornet: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision, pages 525–542. Springer, 2016.
 [39] S. Ravi and H. Larochelle. Optimization as a model for fewshot learning. 2016.
 [40] E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, Q. V. Le, and A. Kurakin. Largescale evolution of image classifiers. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pages 2902–2911. JMLR. org, 2017.
 [41] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.C. Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4510–4520, 2018.
 [42] K. Simonyan and A. Zisserman. Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556, 2014.
 [43] V. Sze, Y.H. Chen, T.J. Yang, and J. S. Emer. Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE, 105(12):2295–2329, 2017.
 [44] Y.X. Wang and M. Hebert. Learning to learn: Model regression networks for easy small sample learning. In European Conference on Computer Vision, pages 616–634. Springer, 2016.
 [45] B. Wu, X. Dai, P. Zhang, Y. Wang, F. Sun, Y. Wu, Y. Tian, P. Vajda, Y. Jia, and K. Keutzer. Fbnet: Hardwareaware efficient convnet design via differentiable neural architecture search. arXiv preprint arXiv:1812.03443, 2018.
 [46] L. Xie and A. Yuille. Genetic cnn. In Proceedings of the IEEE International Conference on Computer Vision, pages 1379–1388, 2017.
 [47] T. Yang, X. Zhang, Z. Li, W. Zhang, and J. Sun. Metaanchor: Learning to detect objects with customized anchors. In Advances in Neural Information Processing Systems, pages 318–328, 2018.
 [48] T.J. Yang, A. Howard, B. Chen, X. Zhang, A. Go, M. Sandler, V. Sze, and H. Adam. Netadapt: Platformaware neural network adaptation for mobile applications. In Proceedings of the European Conference on Computer Vision (ECCV), pages 285–300, 2018.
 [49] J. Yu, L. Yang, N. Xu, J. Yang, and T. Huang. Slimmable neural networks. arXiv preprint arXiv:1812.08928, 2018.
 [50] X. Zhang, X. Zhou, M. Lin, and J. Sun. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6848–6856, 2018.
 [51] B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016.
Comments
There are no comments yet.