1 Introduction
Deep convolutional neural networks (CNNs) have recently achieved great success in various fields including computer vision, natural language processing, pattern recognition, bioinformatics, and many others. However, the arbitrary complexity of target problems and the requirement of extensive hyperparameter search make it inevitable to manually explore the ideal deep network architectures customized for the given tasks. Consequently, neural architecture search (NAS) approaches have been studied actively, and the models identified by the NAS techniques
[57, 39, 47, 18] started to surpass the performance of the traditional deep neural networks [43, 16, 24] designed by human. Despite such successful results, it is still a challenging problem to optimize deep neural networks even by sophisticated AutoML techniques because the search space of the existing NAS methods is limited while their search cost is high.AMC [18]  NetAdapt [52]  Huang et al. [26]  MnasNet [47]  ProxylessNAS & FBNet [5, 50]  FGNAS (Ours)  

Structure search  Prune channels  ✓  ✓  ✓  ✓  
Operation search  Find efficient operations  ✓  ✓  ✓  
Layerwise optimization  ✓  ✓  ✓  ✓  ✓  
Channelwise optimization  ✓  ✓  ✓  ✓  
Optimization method  RL  trialanderror  policygradient  RL  gradientbased  gradientbased 
Researchers have aimed to develop flexible and scalable NAS techniques with large search spaces and identify the unique models different from the manually designed structures [57]. However, NAS methods often suffer from huge computational cost and reduce their search space significantly for practical reasons. For example, [57, 3, 39, 33, 35] search for two cells as basic building blocks to construct full models by stacking them. To tackle the redundancy between the cells and increase the diversity of full models, MnasNet [47] adopts smaller search units, blocks, than cells. Recently, FBNet [50] and ProxylessNAS [5] reduce their search units further to individual layers. Although the resulting models are more flexible by decreasing the granularity of search units and increasing the diversity of the generated models through their composition, those methods are limited to allocating a single operation per layer and the operation configurations of the whole network are proportional to the number of layers.
On the contrary, we present a flexible and scalable neural architecture search algorithm. The search unit of our algorithm is channel, which is even smaller than layer; each channel chooses a different operation^{1}^{1}1We define a series of a convolution, normalization and activation function application by an operation., which also includes nooperation, equivalent to channel pruning. This kind of search strategy improves the flexibility of resulting models because it is possible to generate a large number of configurations even within a single layer, which increase exponentially by adding layers. Such an extremely flexible framework incurs small overhead, which allows to maintain various operations for search and increase search space significantly. Figure 1 illustrates the proposed finegrained neural architecture search (FGNAS) approach, where our perchannel search algorithm generates a feature map given by a composition of multiple operations and also reduces the number of channels by pruning.
FGNAS is trained to maximize the validation accuracy efficiently and stably by a stochastic gradient descent method. Moreover, it is convenient to regularize individual channels by incorporating FLOPs and latency into the training objective. Therefore, the proposed algorithm has a great deal of flexibility and scalability to maximize the accuracy of searched models while facilitating to consider various aspects for optimization. Our overall contribution is summarized as follows:

We propose a flexible and scalable finegrained neural architecture search algorithm, which allows to perform perchannel operation search including channel pruning efficiently and optimize endtoend by a stochastic gradient descent method.

Our framework deals with diverse objectives of neural architecture search such as number of parameters, FLOPs and latency, in addition to accuracy, conveniently.

The resulting models from our algorithm achieve outstanding performance improvements with respect to various evaluation metrics in image classification and single image superresolution problems.
The rest of this paper is organized as follows. We first discuss existing works related to deep neural network optimization and neural architecture search in Section 2. Section 3 describes the proposed algorithm in details including training methods and Section 4 presents experimental results in comparison to the existing methods.
2 Related Work
This section describes existing efficient convolution network designs and neural architecture search techniques in details. Table 1 presents the snapshot of the algorithms discussed in this section.
Efficient Convolution Networks
Designing compact convolutional neural networks has been an active research problem in the last few years. While the handcrafted models achieve efficient convolutional operations by revising network structures [27, 21, 54, 42, 23], the simple rulebased network quantization [14] and pruning techniques [15, 9, 13, 14, 38, 36, 31, 19]
reduce the redundancy of deep and complex pretrained models successfully. Recent pruning methods automatically remove filters and/or activations using reinforcement learning
[18], trialanderror [52], and policygradient [26]. They optimize a network in a layerbylayer fashion, which is inefficient in dealing with interlayer relationships, while our FGNAS optimizes all layers jointly using a gradientbased method.Neural Architecture Search (NAS)
Automatic architecture search techniques conceptually have more flexibility in the identified models than the handcrafted methods. NASNet [57] and MetaQNN [3] adopt reinforcement learning for nondifferential optimization. ENAS [39] employs a RNN controller to search for the optimal model by drawing a series of sample models and maximizing their expected reward, while PNAS [33] performs a progressive architecture search by predicting accuracy of candidate models. Evolutionary search [40] employs a tournament selection; although it is the first algorithm to surpass the stateoftheart classification accuracy, it requires significantly more computational resources. DARTS [35] relaxes the discrete architecture representation to a continuous one and addresses scalability issue by making the objective function differentiable. MnasNet [47] and DPPNet [12] are optimized with respect to the accuracy and runtime via reinforcement learning and performance predictor. EfficientNet [48] improves network efficiency by simply scaling depth, width, and resolution of backbone network. MobileNetV3 [20] adopts blockwise search [47] with layerwise pruning [52] and presents a novel architecture design with SqueezeandExcitation [22]. Recently, multiple choice gating function is often adopted for differentiable and multiobjective search techniques. ProxylessNAS [5] and FBNet [50] search for efficient convolution operations in each layer. MixConv [49] finds a new depthwise convolution operation that has multiple kernel sizes within a layer. Our FGNAS presents perchannel convolution operation search, which constructs maximally flexible layer configurations as illustrated in Figure 1 and runs efficiently through a differentiable optimization.
3 Proposed Algorithm
This section first presents our efficient search formulation via binary masking, and discusses our gating function that allows to perform the endtoend differentiable search. Then, we present the objective function of our algorithm based on resource regularizer, which directly penalizes each channel, and describes the exact search space.
3.1 Formulation of Operation Search
produces a binary value in the forward pass and a softmax probability in the backward pass for gradientdecent optimization. (b) The collection of gating functions
is a relaxed version of in (2). (c) controls searched architectures by determining active channels in the forward pass. During the gradientdecent optimization procedure in the backward pass, resource regularizer plays a role to penalize a channel with high resource consumption, while the taskspecific loss attempts to keep the channel alive if it performs well in the target task.Although FGNAS has a large search space and generates flexible output models, a critical concern is how to perform NAS efficiently through proper configuration of the search space. To tackle this challenge, FGNAS constructs a feature map using a composition of multiple operations as illustrated in Figure 2, where the composition allows to generate a large number of virtual operations and increase the flexibility of searched models. Given an input tensor in the th layer, denoted by , the output of the layer, , is expressed as
(1) 
where is the number of operations at th layer considered in our search and
(2) 
Note that is a binary vector, represents the th operation producing a tensor with channels, and denotes the channelwise binary masking operator. In other words, the output tensor is given by the average of masked tensors, where the mask of each tensor is learned by our search algorithm, which also allows channel pruning by masking out the same channels in all output tensors. In addition to the operation search, we consider the identity connections from a preceding layer optionally, which derives the modification of (1) as
(3) 
where denotes the feature map from which the identify connection is originated.
In our algorithm, each operation is defined by a series of convolution, normalization, and activation function application as illustrated in Figure 1 (b). Figure 3 presents our efficient operation structure to increase the number of operations with little additional cost because all the three operations in Figure 3 share the previous feature map of a normalization. For some parts of backbone networks that convolutional layers are not followed by normalization and activation function layers, an operation is actually equivalent to a convolution.
3.2 PerChannel Differentiable Gating Functions
To relax the binary mask in (2), we introduce a relaxed gating function , and define a collection of the gating functions, denoted by , as
(4) 
where and denote the layer and operation index, respectively, and is the number of channels. A relaxed gating function for each channel parametrized by is given by
(5) 
where is an indicator function that returns 1 when its input is true and 0 otherwise, and denotes the value corresponding to dimension after applying a softmax function. Figure 4 (a) and (b) illustrate and , respectively.
Using the relaxed gating function, we reformulate the channelwise tensor masking in (2) as
(6) 
This relaxed gating function allows to update the architecture by a gradientdecent optimization method because the backward function is differentiable.
3.3 Resource Regularizer on Channels
The proposed approach aims to maximize the accuracy of a target task and minimize the resource usage of the identified model. Hence, our objective function is composed of two terms; one is the taskspecific loss and the other is a regularizer penalizing overhead of networks such as parameters, FLOPs, and latency. To search for operations per channel, the proposed regularizer computes the amount of resource usage of a channel, which changes over iterations due to the gradual update of architectures. Figure 4 (c) illustrates overview of the resource regularizer, and the rest of this section discusses the details.
Let
denote a loss function for an arbitrary task
^{2}^{2}2In our work, the tasks are image classification and superresolution. andbe a differentiable regularizer that estimates the resources of the current model identified by our search algorithm. Then, the objective function is formally given by
(7) 
where and are learnable parameters in the neural networks and the gating functions , respectively, and is the hyperparameter balancing the two terms. Specifically, the regularizer is given by
(8) 
where is a resource measurement function of the th operation, indicates the type of the resources, and is the number of layers. Note that and are the number of input/output channels of the th operation, respectively, and they are differentiable via gating function, defined as
(9) 
and
(10) 
where denotes norm of a vector. The function produces a binary vector, valued 1 for nonzero elements of the input vector in the forward pass, but is an identity function in the backward pass. The skip connection from an earlier layer affects (10) because we need to consider an extra term for the summation.
On the other hand, and are welldefined functions of convolution kernel sizes, number of channels, feature map resolution, etc. They are differentiable with respect to the number of active channels by the definitions in (9) and (10). However, it is not straightforward how to define the latency measurement function on specific devices such as Google Pixel 1 and Samsung Galaxy S8. We address this problem by fitting affine functions of the relation between latency and FLOPs, which are parameterized by
; it turns out that the convolution operations present strong correlations between latency and FLOPs in a particular condition provided by the combination of input feature map size, kernel size, stride, convolutional groups and so on. By approximating latency as a function of FLOPs, (
8) with naturally penalizes all channels to minimize the runtime of networks.3.4 Search Space
FGNAS searches for an operation in each channel; the granularity of architecture search is as small as a channel. Consequently, the possible combinations of operations in FGNAS is significantly more than those of any other NAS techniques. Specifically, the search space in a single layer is , where is the number of operations and is the number of channels at th layer, while it has minor variations depending on the network configurations (e.g., existence of skip connections). This is truly beyond the comparable range to other approaches because most of the NAS techniques are limited to adopting a perlayer search strategy and exploring few building blocks instead of directly optimizing the whole model.
Table 2 illustrates the search space of operations in our search algorithm. The backbone networks for image classification include VGG, ResNet, DenseNet, EfficientNet, and MobileNetV2, while EDSR is employed for image superresolution. Note that we insert a 11 convolution operation after an identity connection to reduce the number of input channels to the first convolution operation of a residual (or dense) block.
Factor  Search Space 

Convolution types  Normal, Depthwise 
Convolution kernel sizes  1, 3, 5, 7, 9, 11 
Normalization method  BN 
Activation functions  ReLU, PReLU, tanh 
The number of channels  0, 1, 2, , , 
Model  Type  Search Cost (GPUdays)  Top1 Acc.  Parameters 

DenseNetBC [24]  manual    96.5 %  25.6 M 
Hierarchical Evolution [34]  evolution  300  96.3 %  15.7 M 
PDARTS (large) [6] + cutout  gradientbased  0.3  97.8 %  10.5 M 
ProxylessNASG [5] + cutout  gradientbased  4.0  97.9 %  5.7 M 
ENAS [39] + cutout  RL  0.5  97.1 %  4.6 M 
EfficientNetB0 [48]  model scaling    98.1 %  4.0 M 
EfficientNetB0FGNAS (Large) + cutout  gradientbased  0.1  98.2 %  3.9 M 
PDARTS [6] + cutout  gradientbased  0.3  97.5 %  3.4 M 
NASNetA [58] + cutout  RL  1800  97.4 %  3.3 M 
DARTS [35] (first order) + cutout  gradientbased  1.5  97.0 %  3.3 M 
DARTS [35] (second order) + cutout  gradientbased  4  97.2 %  3.3 M 
AmoebaNetA [40] + cutout  evolution  3150  96.6 %  3.2 M 
PNAS [33]  SMBO  225  96.6 %  3.2 M 
SNAS [51] + mild constraint + cutout  gradientbased  1.5  97.0 %  2.9 M 
SNAS [51] + moderate constraint + cutout  gradientbased  1.5  97.2 %  2.8 M 
AmoebaNetB [40] + cutout  evolution  3150  97.5 %  2.8 M 
EfficientNetB0FGNAS (Small) + cutout  gradientbased  0.5  97.8 %  2.7 M 
Model  Search Space  Method  Type  Top1 Acc.  Parameters  FLOPs  CPU 

MobileNetV2 (224)  No Search  Baseline  manual  72.0 %  3.4 M  600 M  75 ms 
+ Channel Pruning  Multiplier (0.75) [42]  manual  69.8 %  2.6 M  418 M  56 ms  
NetAdapt [52]  trialanderror  70.9 %      64 ms  
FGNAS (P)  gradientbased  70.9 %  3.5 M  410 M  53 ms  
+ 55 DConv  FGNAS  gradientbased  71.4 %  3.1 M  378 M  53 ms 
Comparison with channel pruning methods on ImageNet.
is a reported result and similar latency with Multiplier (0.75) in [52].4 Experiment
This section first presents the benchmark datasets for image classification and superresolution tasks, and describe the implementation details of our algorithm. Then, we present the experimental results including performance analysis.
4.1 Dataset
CIFAR10 [30] and ILSVRC2012 [41] are popular datasets for image classification. The former contains 50K and 10K 3232 images for training and testing in 10 classes. The latter consists of 1.2M training and 50K validation images in 1,000 object categories, which are a subset of ImageNet [8]. DIV2K [1] is a training dataset for image superresolution, which contains 800 2K images while we evaluate superresolution algorithms on Set5 [4], Set14 [53], B100 [37], and Urban100 [25]
4.2 Implementation Details
Search steps
The proposed algorithm searches for architectures with 4 steps; (1) determine a backbone network and operations for each layer, (2) pretrain the network without gating functions, (3) search for architectures by learning gating function parameters until the resource of searched architecture reaches target resource, (4) finetune the searched architectures with fixed gating function parameters.
Cifar10
The backbone network is EfficientNetB0 [48]
, of which the architecture is designed for ImageNet and transferred to CIFAR10. The search space is 1, 3, and 5 kernel sizes in depthwise convolution layers and the number of channels in all layers. We train the model for 160 epochs with minibatch size 128 and initial learning rate 0.01. The resource of interest
is number of parameters of networks and the hyperparameter is set to for resource regularizer. We use the standard SGD optimizer with nesterov [45] and Cutout augmentation [10]. We use weight decay and the momentum of 0.0001 and 0.9, respectively.Model  Search Space  Method  Top1 Acc.  FLOPs 

VGG16  No Search  Baseline  93.7 %  627 M 
+ Channel pruning  FGNAS (P)  93.6 %  149 M  
+ 11 1111 Conv.  FGNAS  93.6 %  119 M  
+ ReLU, PReLU, Tanh  FGNAS  93.6 %  110 M 
ImageNet
MobileNetV2 [42] is the backbone network, of which the architecture has compact designed for ImageNet classification. The search space is 3 and 5 kernel sizes in depthwise convolution layers and the number of channels in all layers. We train models using minibatch size 256 with the initial learning rates are set to 0.01. The training epochs are 400 and the learning rates are divided by 10 at 50% and 75% of the total number of training epochs. The resource of interest is latency of networks and the hyperparameter is set to 0.0012 for resource regularizer. We evaluate our models on Google Pixel 1 CPU using Google’s TensorFlow Lite engine.
Div2k
The backbone network is a small version of EDSR [32], of which each layer and the architecture have 64 channels and 16 residual blocks, respectively. The search space is ReLU, PReLU, and tanh in activation layers and the number of channels in all layers. The model is pretrained for 300 epochs using Adam [29], where minibatch size is 16 with learning rate , patch size 9696 pixels, , , . The resource of interest is FLOPs of networks and the hyperparameter is set to for resource regularizer. The image restoration performance measures are PSNR and SSIM on Y channel of YCbCr color space with the scaling factor 2.
4.3 Image Classification
Results on CIFAR10
Table 3 illustrates the performance comparison with the stateoftheart architectures. FGNAS (Large) outperforms the backbone network EfficientNetB0 [48] with smaller number of parameters, and FGNAS (Small) has 2.1 smaller parameters than ProxylessNASG [5] with the comparable accuracy. The search cost of the proposed algorithm is small, but requires more time to find smaller networks.
Model  Method  Type  Top1 Acc.  FLOPs 
VGG16  Baseline  manual  93.7 %  627 M 
Huang et al. [26]  policygradient  90.9 %  222 M  
Slimming [36]  rulebased  93.6 %  211 M  
FGNAS (P)  gradientbased  93.6 %  149 M  
VGG19  Baseline  manual  94.0 %  797 M 
Slimming [36]  rulebased  93.8 %  391 M  
DCP [56]  gradientbased  94.2 %  398 M  
FGNAS (P)  gradientbased  94.3 %  348 M  
ResNet18  Baseline  manual  91.5 %  26.0 G 
Huang et al. [26]  policygradient  90.7 %  6.2 G  
FGNAS (P)  gradientbased  92.5 %  1.3 G  
ResNet20  Baseline  manual  92.2 %  81 M 
Soft Filter [17]  rulebased  91.2 %  57 M  
FGNAS (P)  gradientbased  91.7 %  34 M  
DenseNet40  Baseline  manual  94.3 %  566 M 
Slimming [36]  rulebased  93.5 %  188 M  
FGNAS (P)  gradientbased  93.6 %  149 M 
Type  Channel Pruning  Multipleoperation  Top1 Acc.  FLOPs 

(1)  ✓  91.0 %  278 M  
(2)  ✓  91.6 %  131 M  
Ours  ✓  ✓  92.5 %  61 M 
Model  Type  Set5  Set14  B100  Urban100  Parameters  FLOPs 
(PSNR/SSIM)  (PSNR/SSIM)  (PSNR/SSIM)  (PSNR/SSIM)  
SRCNN [11]  manual  36.66 dB / 0.9542  32.42 dB / 0.9063  31.36 dB / 0.8879  29.50 dB / 0.8946  57 K  105.4 G 
VDSR [28]  manual  37.53 dB / 0.9587  33.03 dB / 0.9124  31.90 dB / 0.8960  30.76 dB / 0.9140  665 K  1,225.2 G 
CARNM [2]  manual  37.53 dB / 0.9583  33.26 dB / 0.9141  31.92 dB / 0.8960  31.23 dB / 0.9144  412 K  182.4 G 
CARN [2]  manual  37.76 dB / 0.9590  33.52 dB / 0.9166  32.09 dB / 0.8978  31.92 dB / 0.9256  1,592 K  445.6 G 
MemNet [46]  manual  37.78 dB / 0.9597  33.28 dB / 0.9142  32.08 dB / 0.8978  31.51 dB / 0.9312  677 K  5,324.8 G 
EDSR [32]  manual  38.11 dB / 0.9601  33.92 dB / 0.9198  32.32 dB / 0.9013  32.93 dB / 0.9351  40,712 K  18,769.5 G 
RDN [55]  manual  38.24 dB / 0.9614  34.01 dB / 0.9212  32.34 dB / 0.9017  32.89 dB / 0.9353  22,114 K  10,192.4 G 
FALSRB [7]  evolution  37.61 dB / 0.9585  33.29 dB / 0.9143  31.97 dB / 0.8967  31.28 dB / 0.9191  326 K  149.4 G 
ESRNV [44]  evolution  37.85 dB / 0.9600  33.42 dB / 0.9161  32.10 dB / 0.8987  31.79 dB / 0.9248  324 K  146.8 G 
EDSRFGNAS  gradientbased  37.86 dB / 0.9593  33.44 dB / 0.9157  32.11 dB / 0.8987  31.85 dB / 0.9254  212 K  97.6 G 
Results on ImageNet
Table 4 presents the performance comparison with MobileNetV2 Multiplier [42] and NetAdapt [52], which successfully prunes channels of efficiently designed networks [42, 20]. For the fair comparison, we evaluate the proposed algorithm as a channel pruning method, referred as FGNAS (P), of which search space is only the number of channels in all layers. FGNAS (P) is faster in the both of FLOPs and latency than other channel pruning methods and FGNAS achieves 1.6% higher Top1 accuracy than Multiplier. The model latency reaches the target latency within 40 epochs at the search stage, which indicate the search cost of the proposed algorithm.
Ablation study of search space
Our search method easily enlarges search space by adding operations to the layers of backbone networks for more efficient architectures. Table 5 shows that the proposed algorithm finds faster networks in large search space with the same Top1 Accuracy. Figure 5 draws FLOPs/Accuracy graphs of our search methods. FGNAS consistently outperforms FANAS (P) while reducing the network runtime, and finds the 5.7 smaller FLOPs architecture than original VGG16 on CIFAR10.
Searched architecture analysis
To analyze the performance improvement from flexible architectures, we visualize two FGNAS architectures, which have 250M and 110M FLOPs from VGG16 on CIFAR10. The search space is 1, 3, 5, 7, 9, and 11 kernel sizes in convolutions and ReLU, PReLU, and tanh in activation functions, and the number of channels in all layers. The searched networks by FGNAS and original VGG16 have less than 0.3% accuracy differences. Figure 6 (a) shows that 3, 5, 8, and 10th layers, located at after pooling operation, remains more channels than next layers and 110M FLOPs network prunes most of channels at 1012th layers of 250M FLOPs network. As illustrated in Figure 6 (b), 110M FLOPs network has much higher numbers of operation types within a layer which lead complex layer configurations. Note that 5th layer has 31 different operation types. Figure 6 (c) shows that 11 convolutions appear more frequently for the network efficiency. Figure 7 (a) shows convolutions of 11 kernel size produce more channels at 813th layers, where the feature map resolutions are 44 and 22 pixels. On the other hand, 18th layers prefer 33 convolutions than 11 and prune most channels at 10th layer, as illustrated in Figure 7 (b). The channels from convolutions of 55 kernel sizes mainly remain at 3, 5, and 8th layers, located at after pooling operation.
Channel pruning results on CIFAR10
We evaluate the channel pruning performance of our algorithm FGNAS (P) based on diverse backbone networks of VGGNet [43], ResNet [16], and DenseNet [24]. Since original standard CNN networks are designed for ImageNet, we adopt the modified networks for CIFAR10 [36, 26]. Table 6 shows that the proposed algorithm outperforms the existing pruning methods [26, 36, 56, 17] even with less FLOPs. Huang et al. [26] removes channels layerbylayer with RLbased policy gradient estimation, of which search cost is 30 GPU days using Nvidia K40. Since FGNAS (P) searches over all layers simultaneously using differentiable gating functions, the search cost is 1 GPU hour using GeForce 1080 Ti on CIFAR10. We reproduced the DenseNet40 result of Slimming [36] for fair comparison.
Ablation study of gating function
We evaluate the proposed search algorithm with the modifications of gating function, which exclude its advantages one by one. Table 7 shows that each advantage significantly improves the performance of searched architectures. Note that Type (2) gating function in Table 7 search for an operation per channel, while the gating functions in ProxylessNAS [5] and FBNet [50] choose one operation per layer.
4.4 Image SuperResolution
To verify the more practical effectiveness of our approach, we evaluate our search method on image superresolution (SR) tasks. The primary metric of this task is FLOPs of networks because the FLOPs are easy to calculate regardless of input image resolutions, which are arbitrary in SR problems.
Results
Table 8 shows FLOPs of networks producing an HD image (1280720 resolution) by scaling factor 2. Since SR networks require substantially large amount of FLOPs comparing to conventional image classification networks, our search algorithm aims to find faster networks. FGNAS achieves 1.5 reduced FLOPs and the number of parameters than the stateoftheart NAS approaches [7, 44] as illustrated in Table 8. Note that FGNAS is even faster than SRCNN [11], which consists of 3 convolution layers. The searched residual blocks have large number of channels and operations for activation. The number of channels for skip connections gradually increases in the depth of networks. The search cost is 0.5 GPU day with GeForce 2080 Ti.
5 Conclusion
We presented a novel architecture search technique, referred to as FGNAS, which provides a unified framework of structure and operation search via channel pruning. The proposed approach can be optimized by a gradientbased method, and we formulate a differentiable regularizer of neural networks with respect to resources, which facilitates efficient and stable optimization with the diverse tasksspecific and resourceaware loss functions.
References
 [1] Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge on single image superresolution: Dataset and study. In CVPRW, 2017.
 [2] Namhyuk Ahn, Byungkon Kang, and KyungAh Sohn. Fast, accurate, and lightweight superresolution with cascading residual network. arXiv:1803.08664, 2018.
 [3] Bowen Baker, Otkrist Gupta, Nikhil Naik, and Ramesh Raskar. Designing neural network architectures using reinforcement learning. ICLR, 2017.
 [4] Marco Bevilacqua, Aline Roumy, Christine Guillemot, and MarieLine AlberiMorel. Lowcomplexity singleimage superresolution based on nonnegative neighbor embedding. In BMVC, 2012.
 [5] Han Cai, Ligeng Zhu, and Song Han. ProxylessNAS: Direct neural architecture search on target task and hardware. In ICLR, 2019.
 [6] Xin Chen, Lingxi Xie, Jun Wu, and Qi Tian. Progressive differentiable architecture search: Bridging the depth gap between search and evaluation. In ICCV, 2019.
 [7] Xiangxiang Chu, Bo Zhang, Hailong Ma, Ruijun Xu, Jixiang Li, and Qingyuan Li. Fast, accurate and lightweight superresolution with neural architecture search. ArXiv, abs/1901.07261, 2019.
 [8] J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, and L. FeiFei. ImageNet: A LargeScale Hierarchical Image Database. In CVPR, 2009.
 [9] Emily Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, and Rob Fergus. Exploiting linear structure within convolutional networks for efficient evaluation. In NIPS, 2014.
 [10] Terrance DeVries and Graham W Taylor. Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552, 2017.
 [11] Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Image superresolution using deep convolutional networks. TPAMI, 38:295–307, 2014.
 [12] JinDong Dong, AnChieh Cheng, DaCheng Juan, Wei Wei, and Min Sun. Dppnet: Deviceaware progressive search for paretooptimal neural architectures. In ECCV, 2018.
 [13] Xuanyi Dong, Junshi Huang, Yi Yang, and Shuicheng Yan. More is less: A more complicated network with less inference complexity. In CVPR, 2017.
 [14] Song Han, Huizi Mao, and William J. Dally. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. ICLR, 2016.
 [15] Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neural network. In NIPS. 2015.
 [16] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. CVPR, 2016.
 [17] Yang He, Guoliang Kang, Xuanyi Dong, Yanwei Fu, and Yi Yang. Soft filter pruning for accelerating deep convolutional neural networks. In IJCAI, 2018.
 [18] Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, LiJia Li, and Song Han. Amc: Automl for model compression and acceleration on mobile devices. In ECCV, 2018.
 [19] Yihui He, Xiangyu Zhang, and Jian Sun. Channel pruning for accelerating very deep neural networks. In ICCV, Oct 2017.
 [20] Andrew Howard, Mark Sandler, Grace Chu, LiangChieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, and Hartwig Adam. Searching for mobilenetv3. In ICCV, 2019.
 [21] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint, arXiv:1704.04861, 2017.
 [22] Jie Hu, Li Shen, and Gang Sun. Squeezeandexcitation networks. In CVPR, 2018.
 [23] Gao Huang, Shichen Liu, Laurens van der Maaten, and Kilian Q. Weinberger. Condensenet: An efficient densenet using learned group convolutions. CVPR, 2018.
 [24] Gao Huang, Zhuang Liu, and Kilian Q. Weinberger. Densely connected convolutional networks. CVPR, 2017.
 [25] JiaBin Huang, Abhishek Singh, and Narendra Ahuja. Single image superresolution from transformed selfexemplars. In CVPR. IEEE, 2015.
 [26] Q. Huang, K. Zhou, S. You, and U. Neumann. Learning to prune filters in convolutional neural networks. In WACV, 2018.
 [27] Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. Squeezenet: Alexnetlevel accuracy with 50x fewer parameters and <1mb model size. arXiv preprint, arXiv:1602.07360, 2016.
 [28] Jiwon Kim, Jungkwon Lee, and Kyoung Mu Lee. Accurate image superresolution using very deep convolutional networks. In CVPR, 2016.
 [29] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. ICLR, 2015.
 [30] Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Cifar10 (canadian institute for advanced research).
 [31] Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. Pruning filters for efficient convnets. ICLR, 2017.
 [32] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced deep residual networks for single image superresolution. arXiv preprint, arXiv:1707.02921, 2017.
 [33] Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, LiJia Li, Li FeiFei, Alan Yuille, Jonathan Huang, and Kevin Murphy. Progressive neural architecture search. In ECCV, 2018.
 [34] Hanxiao Liu, Karen Simonyan, Oriol Vinyals, Chrisantha Fernando, and Koray Kavukcuoglu. Hierarchical representations for efficient architecture search. In ICLR, 2018.
 [35] Hanxiao Liu, Karen Simonyan, and Yiming Yang. DARTS: Differentiable architecture search. In ICLR, 2019.
 [36] Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. Learning efficient convolutional networks through network slimming. In ICCV, Oct 2017.
 [37] David R. Martin, Charless C. Fowlkes, Doron Tal, and Jitendra Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In ICCV, 2001.

[38]
Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz.
Pruning convolutional neural networks for resource efficient transfer learning.
ICLR, 2017.  [39] Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, and Jeff Dean. Efficient neural architecture search via parameter sharing. PMLR, 2018.

[40]
Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V. Le.
Regularized evolution for image classifier architecture search.
AAAI, 2019.  [41] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li FeiFei. ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015.
 [42] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and LiangChieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR, 2018.
 [43] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for largescale image recognition. arXiv preprint, arXiv:1409.1556, 2014.
 [44] Dehua Song, Chang Xu, Xu Jia, Yiyi Chen, Chunjing Xu, and Yunhe Wang. Efficient residual dense block search for image superresolution. ArXiv, abs/1909.11409, 2019.

[45]
Ilya Sutskever, James Martens, George E. Dahl, and Geoffrey E. Hinton.
On the importance of initialization and momentum in deep learning.
In ICML, 2013.  [46] Ying Tai, Jian Yang, Xiaoming Liu, and Chunyan Xu. Memnet: A persistent memory network for image restoration. In ICCV, 2017.
 [47] Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, and Quoc V. Le. Mnasnet: Platformaware neural architecture search for mobile. arXiv preprint, arXiv:1807.11626, 2018.
 [48] Mingxing Tan and Quoc V. Le. Efficientnet: Rethinking model scaling for convolutional neural networks. In ICML, 2019.
 [49] Mingxing Tan and Quoc V. Le, editors. MixConv: Mixed Depthwise Convolutional Kernels, 2019.
 [50] Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. Fbnet: Hardwareaware efficient convnet design via differentiable neural architecture search. In CVPR, 2019.
 [51] Sirui Xie, Hehui Zheng, Chunxiao Liu, and Liang Lin. SNAS: stochastic neural architecture search. In ICLR, 2019.
 [52] TienJu Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, and Hartwig Adam. Netadapt: Platformaware neural network adaptation for mobile applications. In ECCV, 2018.
 [53] Roman Zeyde, Michael Elad, and Matan Protter. On single image scaleup using sparserepresentations. In ICCS, 2010.
 [54] Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In CVPR, 2018.
 [55] Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, and Yun Fu. Residual dense network for image superresolution. In CVPR, 2018.
 [56] Zhuangwei Zhuang, Mingkui Tan, Bohan Zhuang, Jing Liu, Yong Guo, Qingyao Wu, Junzhou Huang, and JinHui Zhu. Discriminationaware channel pruning for deep neural networks. In NIPS, 2018.
 [57] Barret Zoph and Quoc V. Le. Neural architecture search with reinforcement learning. In ICLR, 2017.
 [58] Barret Zoph, V. Vasudevan, Jonathon Shlens, and Quoc V. Le. Learning transferable architectures for scalable image recognition. CVPR, 2018.
Comments
There are no comments yet.