As an important category of tasks in computer vision, image restoration aims to estimate the underlying image from its degraded measurements, which is known to be an ill-posed inverse procedure. Depending on the type of degradation, image restoration can be categorized into different sub-problems, such as denoising, de-raining, inpainting and super-resolution, etc. In this work, we mainly focus on image denoising and image de-raining, although the NAS method developed here is so general that it can be applied to most image restoration problems. Most of the recent works on image restoration have shifted their approaches to deep learning, which outperformed conventional methods significantly. Nonetheless, discovering state-of-the-art neural network architectures requires substantial effort. Recently, there has been a growing interest in developing algorithmic solutions to automate the manual process of architecture design. Architectures found automatically have achieved highly competitive performance in high-level vision tasks such as image classification, object detection  and semantic segmentation [3, 4]. In this paper, we propose IR-NAS, a neural architecture search (NAS) algorithm for low-level image restoration tasks including both image denoising and image de-raining. Our main contributions can be summarized in the following four aspects.
We propose an efficient neural architecture search method for low-level image restoration, termed IR-NAS, and apply it to image denoising and de-raining tasks.
The proposed IR-NAS is able to search for both inner cell structures and outer layer widths. It is also memory and computation efficient, taking only 6 hours to search on a single GPU, taking one third of the memory of Auto-Deeplab  to search for the same structure.
We apply our proposed IR-NAS on two denoising datasets of different noise modes and one widely used de-raining dataset for evaluation. Experiments show that the IR-NAS designed networks outperform state-of-the-art algorithms on the three datasets with fewer parameters and faster speed.
We conduct comparison experiments to analyse the networks found by our NAS algorithm in terms of the internal structure, offering insights in architectures found by NAS.
2 Related Work
Low level image processing
. Currently, due to the popularity of convolutional neural networks (CNNs), image restoration algorithms including image denoising and image de-raining have achieved a significant performance boost. For image denoising, the recent network model, DnCNN predicts the residue present in the image instead of the denoised image, showing good performance. Lately, FFDNet  attempts to address spatially varying noise by appending noise level maps to the input of DnCNN. NLRN 
incorporates non-local operations into a recurrent neural network (RNN) for image restoration. N3Net formulates a differentiable version of nearest neighbor search to further improve DnCNN. Recently, some algorithms focus on denoising for real-noisy images since many existing denoisers tend to overfit the additive white Gaussain noise (AWGN) and generalize poorly to real-world noisy images which are contaminated with more sophisticated noises than additive Gaussian noises. CBDNet  uses simulated camera pipeline to supplement real training data. Similar work in  proposes a camera simulator that aims to accurately simulate the degradation and noise transformation performed by camera pipelines.
For image de-raining, Fu et al.  introduced deep learning methods for solving the de-raining problem where the rain streaks are modelled as residues between the input and output of the networks in an end-to-end fashion. Yang et al.  design a deep recurrent dilated network to jointly detect and remove rain streaks. Li et al.  design a scale-aware multi-stage recurrent network that consists of several parallel sub-networks to estimate rain streaks of different sizes and densities individually. Recently, Zhang et al. 
propose to classify rain density for guiding the rain removal step. Li et al. propose a recurrent squeeze-and-excitation based context aggregation network to remove rain streaks through multiple stages.
Network architecture search
. Network architecture search (NAS) aims to find automatic apporaches of designing neural architectures to replace conventional hand-crafted ones. Early attempts employ evolutionary algorithms (EAs) for optimizing the neural architectures and parameters. The best architecture may be obtained by iteratively mutating a population of candidate architectures
. An alternative to EA is to use reinforcement learning (RL) techniques such as policy gradients and Q-learning  to train a recurrent neural network that acts as a meta-controller to generate sequences encoding potential architectures by exploring a predefined search space. One drawback is that these EA and RL based methods tend to require a large amount of computations. Recently, speed-up techniques like hyper-networks , network morphism  and shared weights  lead to substantial reduction of the search cost.
Our work is most closely related to DARTS , ProxylessNAS  and Auto-Deeplab . DARTS are based on the continuous relaxation of the architecture representation, allowing efficient search of the cell architecture via gradient descent, has achieved competitive performance. We extend its search space to include widths for cells by layering multiple candidate paths. Another optimization based NAS with widths in its search space is ProxylessNAS. However, it is limited to discover sequential structures and choose kernel widths within manually designed blocks (Inverted Bottlenecks ). By introducing multiple paths of different widths, the search space of our IR-NAS resembles Auto-Deeplab. The two major differences are: 1) to retain high resolution feature maps, we do not downsample the features but reply on automatically selected dilated convolutions and deformable convolutions to adapt receptive field; 2) we share the cell weights across different paths which leads to three times memory efficiency comparing to Auto-Deeplab counterparts.
One more relevant work is FALSR , which is proposed for super resolution task. FALSR involves RL and EA in its controller and it takes about 3 days on 8 Tesla-V100 GPUs to find the final architecture. Our proposed IR-NAS takes only 6 hours to search on a single GPU. Compared with FALSR, our IR-NAS is 96 fast in searching.
3 Our Proposed Approach
Following [22, 23], we employ gradient-based architecture search strategy in our IR-NAS and we search for a computation cell as basic block then build the final architecture by stacking the searched block with different widths. Differing from these methods, IR-NAS has a more flexible search space and it is able to search both the cell structures and widths. In this section, we first introduce how to search architectures for cells; then we explain how to determine the widths of cells. Finally we present our search strategy and our designed loss.
3.1 Cell Architecture Search
For cell architecture search, we employ the continuous relaxation strategy proposed in . More specifically, we build a supercell that integrate all possible layer types, which is show in the left side of Figure 1. This supercell is a directed acyclic graph containing a sequenced nodes. In Figure 1, we only show three nodes for clear exposition.
We denote the super cell in layer as
, which takes outputs of previous cell and the cell before previous cell as inputs and outputs one tensor. Inside , each node takes the two inputs of current cell and the outputs of all previous nodes as input and outputs one tensor. Taking the th node in as an example, the output of this node is calculated as follows:
where is the input set of node . and are the outputs of Cells in layers and , respectively. is the set of possible layer types. Here, to make the search space continuous, we operate each in an continuous relaxation way, that is:
where correspond to possible layer types. denotes the weight of operator . Following several recent image restoration network [7, 26, 15], we do not reduce the spatial resolution of the input. To preserve pixel-level information for low-level image processing, we decide to not downsample the features but rely on operations with adaptive receptive field such as dilated convolutions and deformable convolutions. In this paper, we provide the following 6 types of basic operators:
sep: separable convolution;
dil: convolution with dilation rate 2;
def: deformable convolution V2 ;
skip: skip connection;
none: no connection and return zero.
is the concatenation of the outputs of nodes and it can be expressed as:
In summary, the task of cell architecture search is learning continuous weights , which will be updated via gradient descent. After the supercell is trained, for each node, we rank the corresponding inputs according to values, then reserve the top two inputs and remove the rest to obtain the compact cell, as shown in the right of Figure 1.
3.2 Cell Width Search
In the last section, we have presented the main idea of cell architecture search, which is used to design the specific architectures inside cells. However, the overall network is built by stacking several cells with different widths. Only searching architectures for cells is not sufficient.
In this section, we introduce the inter cell search space which determines the widths of cells in different layers. Similarly, we build a supernet that contains several supercells with different widths in each layer. As illustrated in the left of Figure 2, the supernet mainly consists of three parts:
1) start part, consists of input layer and two convolution layer;
2) middle part, contains layers and each layer has three supercells of different widths;
3) end part, concatenating the outputs of , then feeding them to a convolution layer to generate output. The task of cell width search is selecting a proper width for each layer of middle part.
Our supernet provides three paths of cells with different widths. For each layer, the supernet decides to increase the width by twice, keeping previous width or reducing the width by two. After searching, only one cell at each layer is kept. The continuous relaxation strategy mentioned in cell architecture search section is reused for inter cell search.
At each layer , there are three cells , and with widths , and , where is the basic width and is set to 10 during search phase. The output feature of each layer is
where is the output of . The channel width of is , where is the number of nodes in the cells.
Each cell is connected to , and in the the previous layer and two layers before. We first process the outputs from those layers with a convolution to features with width matching the input of . Then the output for the th cell in layer is computed with
where is the weight of . We combine the three outputs of according to corresponding weights then feed them to as input. After the supernet is trained, we select the widths for each layer according to the values.
Note the similarity of this design with Auto-Deeplab which is used to select feature strides for image segmentation. However, in Auto-Deeplab, the outputs from the three different levels are first processed by separate cells with different sets of weights before summing into the output:
By reusing the cell weights , we are able to save three times the memory consumption in the supernet and use a deeper and wider supernet for more accurate approximations.
Note that, different from in cell architecture search, we can not simply rank cells of different widths according to values then reserve the top one cell. In cell widths search, the channel widths of outputs of different cells in the same layer are very different. Using the strategy what we adopted in cell architecture search may lead to the widths of adjacent layers in the final network change drastically, which has a negative effect to the efficiency, as explained in . In cell width search, we view the
values as probability, then use the Viterbi algorithm to select the path with the maximum probability as the final result. In addition, an ASPP module is added to the end of the last Cell in the final architecture, as illustrated in the right of Figure2.
3.3 Searching with Gradient Descent
In terms of the optimization method, our proposed IR-NAS belongs to differentiable architecture search. The searching process is the optimization process. For image denoising and image de-raining, the two most widely used evaluation metrics are PSNR and SSIM. Inspired by this, we design the following loss for optimizing supernet:
where and denote the input image and corresponding groundtruth. is a loss item that we designed to enforce the visible structure of result. is the supernet. is structural similarity . is a weighting coefficient and it is empirically set to 0.5 in our experiments. During optimizing the supernet with gradient descent, we split the training set into three disjoint parts: Train W, Train A and Train V. W and A are used to optimize the weights of the supernet (kernels in convolution layers ) and weights of different layer types and cells of different widths ( and ). Train V is used to evaluate the performance of the trained supernet. More details are introduced in the section of implementation details.
4.1 Datasets and Implementation Details
Datasets For the denoising experiments, we use two datasets. The first one is BSD500 . following [31, 32, 7], we use as the training set the combination of 200 images from the training set and 100 images from the validation set, and test on 200 images from test set. On this dataset, we generate noisy images by adding white Gaussian noise to clean images with .
The second one is SIM1800, built by ourselves. As the additive white noise models is not able to accurately reproduce the true noise in real world, by using the camera pipeline simulation method proposed in, we build this new denoising dataset SIM1800, which contains 1600 training samples and 212 test samples.
Firstly, we use the camera pipeline simulation method to add noise to 25k patches extracted from MIT-Adobe5k dataset , then manually pick up 1812 patches which have the most realistic visual effects and finally randomly select 1600 patches as training set and reserve the rest as test set.
For de-raining experiments, we compare the IR-NAS designed models with previous works on the outdoor synthetic 800 rain images (Rain800), which consists of 700 training image pairs and 100 test image pairs.
|Methods||# parameters (M)||time cost (s)|
Search settings The supernet that we build for image denoising consists of 4 cells and each cell has 5 nodes. The supernet that we build for image de-raining contains 3 cells and each cell is made up of 4 nodes. Both basic widths in the two supernets are set to 10 during search.
In designing network for image denoising, we conduct architecture search on BSD500 and apply the networks found by IR-NAS on both two denoising datasets. Specifically, we combine the 200 images of training set and 100 images of validation set as training set, 2% of which are randomly selected and used to evaluate the performance of the supernet (Train V). The rest are equally divided into two parts, one part is used to update the kernels of convolution layers (Train W) and the other part is used to optimize the parameters of architecture (Train A). Similarly, For image de-raining, we search the architecture on the training set of Rain800, which is also split to three parts.
We train the supernet for 100 epochs with batchsize of 12. We optimize the parameters of kernels and architecture with two optimizers. For learning the kernels of convolution layers, we employ SGD optimizer. The momentum and weight decay are set to 0.9 and 0.0003, respectively. The learning rate decays from 0.025 to 0.001 with cosine annealing strategy. For learning the parameters of architecture, we use the Adam optimizer, where both learning rate and weight decay are set to 0.001. In the first 20 epochs, we only update the parameters of kernels, then we start to alternately optimize the kernels of convolution layers and architecture parameters from epoch 21.
During the training process of searching, we randomly crop patches of and feed them to network. During evaluation, we split each image to some adjacent patches of and then feed them to network and finally joint the corresponding patch results to form a whole image. We evaluate the supernet for every 1000 iterations and save the one which has the most high PSNR and SSIM as the result of architecture search.
Training settings For image denoising and image de-raining, we training the final architectures found by IR-NAS with same strategy. Specifically, we train the network for 600k iterations with Adam optimizer, where the initial learning rate, batchsize are set to 0.05 and 12, respectively. For data augmentation in image denoising, we use random crop, random rotations, horizontal and vertical flipping. In random crop, the patches of are randomly cropped from input images. For fair comparison, following , we train a different model for each noise level on BSD500. For image de-raining, we use random crop and horizontal flipping for augmentation.
4.2 Comparisons with State-of-the-Art Results
In this section, we compare the IR-NAS designed networks with a number of recent image denoising and de-raining methods and use PSNR and SSIM to quantitatively measure the restoration performance of those methods.
Image denoising results For image denoising experiments, we compare our IR-NAS designed network with several published image denoising methods, including BM3D , WNNM , RED , MemNet , NLRN  and N3Net . The comparison results on BSD500 and SIM1800 are listed in Table 1 and Table 2, respectively. Figures 3 and4 show the visual effects.
Table 1 shows that NLRN, N3Net and IR-NAS beat other models by a clear margin. Among the top three methods, our proposed IR-NAS achieves the best performance when is set to 50 and 70. When the noise level is set to 30, the SSIM of NLRN is slightly higher (0.002) than that of our IR-NAS, but the PSNR of NLRN is much lower (nearly 1 dB) than that of IR-NAS.
Broadly speaking, our IR-NAS achieves better performance than others. In addition, compared with NLRN and N3Net, the network designed by our IR-NAS has fewer parameters and faster inference speed. As listed in Table 1, the IR-NAS designed network contains 0.63M parameters, which is 92.65% that of N3Net and 64.29% that of NLRN. Compared with N3Net, the IR-NAS designed network reduces the inference time on the test set of BSD500 by 31.26%. Figure 3 shows that the network designed by our IR-NAS achieves the best visual effect.
As NLRN and N3Net beat other denoising models by a large margin on BSD500, we now compare the network designed by IR-NAS with NLRN and N3Net on SIM1800. Table 2 lists the results, from which we can see that the SSIM of the IR-NAS desgined network is much higher than that of NLRN and N3Net. However, PSNR of IR-NAS designed network is slightly lower than that of NLRN and N3Net. In summary, the performance of the IR-NAS designed network is competitive with that of NLRN and N3Net on SIM1800. Figure 4 shows a visual comparison.
Image de-raining results On Rain800, we compare the de-raining network found by IR-NAS with seven previous methods. The results are listed in Table 3 and shown in Figure 5. As shown in Table 3, the de-raining network designed by IR-NAS has much better performance than others. From RESCAN to the network designed by IR-NAS, PSNR and SSIM are improved by 2.22 and 0.0275, respectively. In addition, the inference speed of IR-NAS designed de-raining network is 2.03 that of RESCAN.
In this section, we analyse the architectures designed by IR-NAS. Figure 6 (a) and (b) show the searched networks. (a) shows the search results in outer network level and (b) show the details inside cells. From Figure 6 (a) and (b), we can see that:
In both the denoising network and de-raining network that are found by our IR-NAS, the width of cell which is most close to output layer has the maximum number of channels, which is consist with previous manually designed networks.
Generally speaking, with the same widths, deformable convolution is more flexible and powerful than other convolution operations. Even so, inside cells, instead of connecting all the nodes with the powerful deformable convolution, IR-NAS connects different nodes with different types of operators, such as conventional convolution, dilated convolution and skip connection.
We believe that these results prove that IR-NAS is able to select proper operators.
Separable convolutions are not included in the searched results. We conjecture that this is caused by the fact that we do not limit FLOPS or parameter number during search. Interestingly, the networks found by our IR-NAS still have fewer parameters than other manual models.
From Figure 6 (b), we can see that the networks designed by IR-NAS consist of many fragmented branches, which might be the main reason of that the designed networks have better performance than previous denoising and de-raining models.
As explained in , fragmentation structure is beneficial for accuracy. Here we verify if IR-NAS improves the accuracy by designing a proper architecture or by simply integrating various branch structures and convolution operations. We modify the architecture found by our IR-NAS in two different ways and then compare the modified architectures with unmodified architectures. The first modification is replacing conventional convolutions in searched architectures with deformable convolutions as shown in Figure 6 (c). As mentioned above, deformable convolution is more flexible than conventional convolution, replacing conventional convolutions with deformable convolutions should improve the capacity of networks. The other modification is to change the connection relationships between nodes inside each cell, as shown in Figure 6 (d). This modification is aiming to verify if the connection relationship built by our IR-NAS is indeed appropriate.
|Methods||Image denoising ()||Image de-raining|
The modification parts are marked in red in Figure 6 (c) and (d). Following the two proposed modifications, we also modify other parts for comparison experiments. However, limited by space, we only show two examples for each task in this paper. The comparison results are listed in Table 4, where the two mentioned modification operations, are denoted as and .
From Table 4, we can see that both modification reduce the accuracy on image denoising and de-raining. Especially for image de-raining, replacing convolution operation reduces the PSNR and SSIM by 4.83 and 0.1931, respectively. Changing connection relationships decreases the PSNR and SSIM to 25.69 and 0.8416, respectively.
In summary, we can draw one conclusion from the comparison results, that is, IR-NAS does design a proper structure and select proper convolution operations, instead of simply integrating a complex network with various convolution operations.
4.4 Ablation Study
In this section we analyze how our designed loss item improve image restoration results. We implement two baselines: (1) IR-NAS trained with single loss MSE, and (2) IR-NAS trained with the combination loss MSE and . Table 5 shows the denoising results of these two methods and another two recent state-of-the-art denoisers, NLRN, N3Net on BSD500. Meanwhile, we conduct the experiments on the de-raining dataset Rian800, and the experimental results are summarized in Table 6. It is clear that either IR-NAS or IR-NAS outperforms other competitive models on both datasets, while IR-NAS* trained with combination loss brings a gain over IR-NAS trained with single loss, in particularly for image de-raining, the gain is remarkable, about 0.8 dB PSNR, 0.02 SSIM, respectively. In short, our desinged loss is useful for improving both PSNR and SSIM metrics.
In this work, we have proposed IR-NAS, NAS for low-level image restoration tasks. The proposed IR-NAS is both memory and computation efficient. It takes only 6 hours to search on a single GPU and takes only one third of the memory of Auto-Deeplab to search for the same structure. Our proposed IR-NAS is applied on three datasets, two denoising dataset and one de-raining dataset, and achieves highly competitive or better performance compared with previous state-of-the-art methods with fewer parameters and faster inference speed.
We have also introduced an SSIM based loss, , which is proved very useful for improving the two evaluation metrics, PSNR and SSIM. In future work, we plan to improve the efficiency of IR-NAS and solve more low-level tasks.
-  B. Zoph and Q. V. Le, “Neural architecture search with reinforcement learning,” arXiv: Comp. Res. Repository, vol. abs/1611.01578, 2016.
-  G. Ghiasi, T.-Y. Lin, and Q. V. Le, “Nas-fpn: Learning scalable feature pyramid architecture for object detection,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 7036–7045, 2019.
-  C. Liu, L.-C. Chen, F. Schroff, H. Adam, W. Hua, A. L. Yuille, and L. Fei-Fei, “Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 82–92, 2019.
-  V. Nekrasov, H. Chen, C. Shen, and I. Reid, “Fast neural architecture search of compact semantic segmentation models via auxiliary cells,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 9126–9135, 2019.
-  K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising,” IEEE Trans. Image Process., vol. 26, no. 7, pp. 3142–3155, 2017.
-  K. Zhang, W. Zuo, and L. Zhang, “Ffdnet: Toward a fast and flexible solution for cnn-based image denoising,” IEEE Trans. Image Process., vol. 27, no. 9, pp. 4608–4622, 2018.
-  D. Liu, B. Wen, Y. Fan, C. C. Loy, and T. S. Huang, “Non-local recurrent network for image restoration,” in Proc. Advances in Neural Inf. Process. Syst., pp. 1673–1682, 2018.
-  S. R. Tobias Plötz, “Neural nearest neighbors networks,” in Proc. Advances in Neural Inf. Process. Syst., pp. 1673–1682, 2018.
-  S. Guo, Z. Yan, K. Zhang, W. Zuo, and L. Zhang, “Toward convolutional blind denoising of real photographs,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 1712–1722, 2019.
-  R. Jaroensri, C. Biscarrat, M. Aittala, and F. Durand, “Generating training data for denoising real rgb images via camera pipeline simulation,” arXiv: Comp. Res. Repository, vol. abs/1904.08825, 2019.
-  X. Fu, J. Huang, D. Zeng, Y. Huang, X. Ding, and J. Paisley, “Removing rain from single images via a deep detail network,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 3855–3863, 2017.
-  W. Yang, R. T. Tan, J. Feng, J. Liu, Z. Guo, and S. Yan, “Deep joint rain detection and removal from a single image,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 1357–1366, 2017.
-  R. Li, L.-F. Cheong, and R. T. Tan, “Single image deraining using scale-aware multi-stage recurrent network,” arXiv: Comp. Res. Repository, vol. abs/1712.06830, 2017.
-  H. Zhang and V. M. Patel, “Density-aware single image de-raining using a multi-stream dense network,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 695–704, 2018.
-  X. Li, J. Wu, Z. Lin, H. Liu, and H. Zha, “Recurrent squeeze-and-excitation context aggregation net for single image deraining,” in Proc. Eur. Conf. Comp. Vis., pp. 254–269, 2018.
-  H. Liu, K. Simonyan, O. Vinyals, C. Fernando, and K. Kavukcuoglu, “Hierarchical representations for efficient architecture search,” arXiv: Comp. Res. Repository, vol. abs/1711.00436, 2017.
-  B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable architectures for scalable image recognition,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 8697–8710, 2018.
-  Z. Zhong, J. Yan, W. Wu, J. Shao, and C.-L. Liu, “Practical block-wise neural network architecture generation,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 2423–2432, 2018.
-  C. Zhang, M. Ren, and R. Urtasun, “Graph hypernetworks for neural architecture search,” arXiv: Comp. Res. Repository, vol. abs/1810.05749, 2018.
-  T. Elsken, J. H. Metzen, and F. Hutter, “Efficient multi-objective neural architecture search via lamarckian evolution,” arXiv: Comp. Res. Repository, vol. abs/1804.09081, 2018.
-  H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and J. Dean, “Efficient neural architecture search via parameter sharing,” arXiv: Comp. Res. Repository, vol. abs/1802.03268, 2018.
-  H. Liu, K. Simonyan, and Y. Yang, “Darts: Differentiable architecture search,” arXiv: Comp. Res. Repository, vol. abs/1806.09055, 2018.
-  H. Cai, L. Zhu, and S. Han, “Proxylessnas: Direct neural architecture search on target task and hardware,” arXiv: Comp. Res. Repository, vol. abs/1812.00332, 2018.
-  K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in European conference on computer vision, pp. 630–645, Springer, 2016.
-  X. Chu, B. Zhang, H. Ma, R. Xu, J. Li, and Q. Li, “Fast, accurate and lightweight super-resolution with neural architecture search,” arXiv: Comp. Res. Repository, vol. abs/1901.07261, 2019.
-  T. Plötz and S. Roth, “Neural nearest neighbors networks,” in Proc. Advances in Neural Inf. Process. Syst., pp. 1087–1098, 2018.
-  X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable convnets v2: More deformable, better results,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 9308–9316, 2019.
-  N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “Shufflenet v2: Practical guidelines for efficient cnn architecture design,” in Proc. Eur. Conf. Comp. Vis., pp. 116–131, 2018.
-  Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, 2004.
-  D. Martin, C. Fowlkes, D. Tal, J. Malik, et al., “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in Proc. IEEE Int. Conf. Comp. Vis., pp. 416–423, 2001.
-  X. Mao, C. Shen, and Y.-B. Yang, “Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections,” in Proc. Advances in Neural Inf. Process. Syst., pp. 2802–2810, 2016.
-  Y. Tai, J. Yang, X. Liu, and C. Xu, “Memnet: A persistent memory network for image restoration,” in Proc. IEEE Int. Conf. Comp. Vis., pp. 4539–4547, 2017.
-  V. Bychkovsky, S. Paris, E. Chan, and F. Durand, “Learning photographic global tonal adjustment with a database of input/output image pairs,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 97–104, IEEE, 2011.
-  K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-d transform-domain collaborative filtering,” IEEE Trans. Image Process., vol. 16, no. 8, pp. 2080–2095, 2007.
-  S. Gu, L. Zhang, W. Zuo, and X. Feng, “Weighted nuclear norm minimization with application to image denoising,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 2862–2869, 2014.
I. Loshchilov and F. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,” inProc. Int. Conf. Learn. Representations, 2017.
-  Y. Luo, Y. Xu, and H. Ji, “Removing rain from a single image via discriminative sparse coding,” in Proc. IEEE Int. Conf. Comp. Vis., pp. 3397–3405, 2015.
-  Y. Li, R. T. Tan, X. Guo, J. Lu, and M. S. Brown, “Rain streak removal using layer priors,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 2736–2744, 2016.