I Introduction
As a common weather condition, rain impacts not only human visual perception but also computer vision systems, such as self driving vehicles and surveillance systems. Due to the effects of light refraction and scattering, objects in an image are easily blurred and blocked by individual rain streaks. When facing heavy rainy conditions, this problem becomes more severe due to the increased density of rain streaks. Since most existing computer vision algorithms are designed based on the assumption of clear inputs, their performance is easily degraded by rainy weather. Thus, designing effective and efficient algorithms for rain streak removal is a significant problem with many downstream uses. Figure
1 shows an example of our lightweight pyramid network.Ia Related works
Depending on the input data, rain removal algorithms can be categorized into video and singleimage based methods.
IA1 Video based methods
We first briefly review the rain removal methods in a video, which was the major focus in the early stages of this problem. These methods use both spatial and temporal information from video. The first study on video deraining removed rain from a static background using average intensities from the neighboring frames [1]. Other methods focus on deraining in the Fourier domain [2]
, using Gaussian mixture models
[3], low rank approximations [4] and via matrix completions [5]. In [6], the authors divide rain streaks into sparse ones and dense ones, then a matrix decomposition based algorithm is proposed for deraining. More recently, [7] proposed a patchbased mixture of Gaussians for rain removal in video. Though these methods work well, they require temporal content of video. In this paper we instead focus on the single image deraining problem.IA2 Singleimage methods
Since information is drastically reduced in individual images, single image deraining is a much more difficult problem. Methods for addressing this problem have employed kernels [8], low rank approximations [4, 9] and dictionary learning [10, 11, 12, 13]. In [8], rain streaks are detected and removed by using kernel regression and a nonlocal mean filtering. In [10], the authors decompose a rainy image into its low and high frequency components. The highfrequency part is processed to extract and remove rain streaks by using sparsecoding based dictionary learning. In [11], a self learning method is proposed to automatically distinguish rain streaks from the highfrequency part. A discriminative sparse coding method is proposed in [12]
. By forcing the coefficient vector of rain layer to be sparse, the objective function is solved to separate background and rain streaks. Other methods have used mixture models
[14] and local gradients [15] to model and then remove rain streaks. In [14], by utilizing Gaussian Mixture Models (GMMs), the authors explore patchbased priors for both the clean and rain layers. The GMM prior for background layers is learned from natural images, while that for rain streaks layers is learned from rainy images. In [15], three new priors are defined by exploring local image gradients. The priors are used to modeling the objective function which is solved by using alternating direction method of multipliers (ADMM).Deep learning has also been introduced for this problem. Convolutional neural networks (CNN) have proven useful for a variety of highlevel vision tasks [16, 17, 18, 19, 20] as well as various image processing problems [21, 22, 23, 24, 25]. In [26], a related work based on deep learning was introduced to remove static raindrops and dirt spots from pictures taken through windows. Our previous CNNbased method for removing dynamic rain streaks was introduced by [27]. Here the authors build a relative shallow network with 3 layers to extract features of rain streaks from the high frequency content of a rainy image. Based on the introduction of an effective strategy for training very deep networks [20], two deeper networks were proposed based on image residuals [28] and multiscale information [29]. In [30], the authors utilize the generative adversarial framework to further enhance the textures and improve the visual quality of derained results. Recently, in [31]
, a density aware multistream densely connected CNN is proposed for joint rain density estimation and deraining. This method can automatically generate rain density label, which is further utilized to guide rain streaks removal.
IB Our contributions
Though very deep networks achieve excellent performance on single image deraining, a main drawback that potentially limits their application in mobile devices, automatic driving, and other computer vision tasks is their huge number of parameters. As a networks become deeper, more storage space is required [32]
. To address this issue, we propose a lightweight pyramid network (LPNet), which contains fewer than 8K parameters, with the single image rain removal problem in mind. Instead of designing a complex network structure, we use problemspecific knowledge to simplify the learning process. Specifically, we first adopt Laplacian pyramids to decompose a degraded/rainy image into different levels. Then we use recursive and residual networks to build a subnetwork for each level to reconstruct Gaussian pyramids of derained images. A specific loss function is selected for training each subnetwork according to its own physical characteristics and the whole training is performed in a multitask supervision. The final recovered image is the bottom level of the reconstructed Gaussian pyramid.
The main feature of our LPNet approach is to use the mature GaussianLaplacian image pyramid technique [33] to transform one hard problem into several easier subproblems. In other words, since the Laplacian pyramid contains different levels that can differentiate large scale edges from small scale details, one can design simple and lightweight subnetwork to handle each level in a divideandconquer way. The contributions of our paper are summarized as follows:

We show how by combining the classical GaussianLaplacian pyramid technique with CNN, a simple network structure with few parameters and relative shallow depth is sufficient for excellent performance. To our knowledge the resulting network is far more lightweight (in terms of parameters) among deep networks with comparable performance.

Through multiscale techniques and recursive and residual deep learning, our proposed network achieves stateoftheart performances on single image deraining. Although LPNet is trained on synthetic data by necessity, it still generalizes well to realworld images.

We discuss how LPNet can be applied to other fundamental low and highlevel vision tasks in image processing. We also show how LPNet can improve downstream applications such as object recognition.
Ii Lightweight pyramid network for deraining
In Figure 2, we show our proposed LPNet for single image deraining. To summarize at a high level, we first decompose a rainy image into a Laplacian pyramid and build a subnetwork for each pyramid level. Then each subnetwork is trained with its own loss function according to the specific physical characteristics of the data at that level. The network outputs a Gaussian pyramid of the derained image. The final derained result is the bottom level of the Gaussian pyramid .
Iia Motivation
Since rain streaks are blended with object edges and the background scene, it is hard to directly learn the deraining function in the image domain [27]. To simplify the problem, it is natural to train a network on the highfrequency information in images, which primarily contain rain streaks and edges without background interference. Based on this motivation, the authors in [27, 28] use the guided filter [34] to obtain the highfrequency component of an image as the input to a deep network, which is then derained and fused back with the lowresolution information of the same image. However, these two methods fail when very thick rain streaks cannot be extracted by the guided filter. Inspired by this decomposition idea, we instead build a lightweight pyramid of networks to instead simplify the learning processing and reduce the number of necessary parameters as a result.
IiB Stage 1: The Laplacian pyramid
We first decompose a rainy image into its Laplacian pyramid, which is a set of images with levels:
(1) 
where is the Gaussian pyramid, . The function is computed by downsampling using a Gaussian kernel, with and .
The reasons we choose the classical Laplacian pyramid to decompose the rainy image are fourfold: 1) The background scene can be fully extracted at the top level of while the other levels contain rain streaks and details at different spatial scales. Thus, the rain interference is removed and each subnetwork only needs to deal with highfrequency components at a single scale. 2) This decomposition strategy will allow the network to take advantage of the sparsity at each level, which motivates many other deraining methods [8, 11, 27], to simplify the learning problem. However, unlike previous deraining methods that use a singlescale decomposition, LPNet performs a multiscale decomposition using Laplacian pyramids. 3) As shown in Figure 3, compared with the image domain, deep learning at each pyramid level is more like an identity mapping (e.g., the top row is more similar to the middle row, as evident in the bottom row) which is known to be the situation where residual learning (ResNet) excels [20]. 4) The Laplacian pyramid is a mature algorithm with low computation cost. Most calculations are based on convolutions (Gaussian filtering) which can be easily embedded into existing systems with GPU acceleration.
IiC Stage 2: Subnetwork structure
After decomposing into different pyramid levels, we build a set of subnetworks independently for each level to predict a corresponding clean Gaussian pyramid . All the subnetworks have the same network structure with different numbers of kernels. We adopt residual learning [20] for each network structure and recursive blocks [35] to reduce parameters. The subnetwork structure can be expressed as follows:
Feature extraction
The first layer extracts features from the th input level,
(2) 
where indexes the feature map, is the convolution operation, are weights and are biases.
is an activation function for nonlinearity.
Recursive block
To reduce the number of parameters, we build intermediate inference layers in a recursive fashion. The basic idea is to share parameters among recursive blocks. Motivated by our experiments, we adopt three convolutional operations in each recursive block. Calculations in the th recursive block are
(3)  
(4)  
(5) 
where are intermediate features in the recursive block, and are shared parameters among recursive blocks and .
To help propagate information and backpropagate gradients, the output feature map of the th recursive block is calculated by adding :
(6) 
Gaussian pyramid reconstruction
To obtain the output level of the pyramid, the reconstruction layer is expressed as:
(7) 
After obtaining the output of the Laplacian pyramid , the corresponding Gaussian pyramid of the derained image can be reconstructed by
(8) 
where . Since each level of a Gaussian pyramid should equal or lager than 0, we use
, which is actually the rectified linear units (ReLU) operation
[16], to simply correct the outputs. The final derained image is the bottom level of the Gaussian pyramid, i.e., ., the authors build similar networks based on the image pyramid, which are the most related to our own work. However, these papers apply similar structures to other tasks such a image generation or superresolution using different network approaches on the pyramid.
IiD Loss Function
Given a training set , where is the number of training data and is the ground truth, the most widely used loss function for training a network is mean squared error (MSE). However, MSE usually generates oversmoothed results due to the squared penalty that works poorly at edges in an image. Thus, for each subnetwork we adopt different loss functions and minimize their combination. Following [39], we choose and SSIM [40] as our loss functions. Specifically, as shown in Figure 3, since finer details and rain streaks exist in lower pyramid levels we use SSIM loss to train the corresponding subnetworks for better preserving highfrequency information. On the contrary, larger structures and smooth background areas exist in higher pyramid levels. Thus we use the loss to update the corresponding network parameters there. The overall loss function is
(9) 
where is the SSIM loss and is the loss. In this paper, we set the pyramid level based on our experiments. We use SSIM loss for levels and loss for all levels.
IiE Removing batch normalization
As one of the most effective way to alleviate the internal covariate shift, batch normalization (BN)
[41]is widely adopted before the nonlinearity in each layer in existing deep learning based methods. However, we argue that by introducing image pyramid technology, BN can be removed to improve the flexibility of networks. This is because BN constrains the feature maps to obey a Gaussian distribution. While during our experiments, we found that distributions of lower Laplacian pyramid levels of both clean and rainy images are sparse. To demonstrate this viewpoint, in Figure
4, we show the histogram distributions of each Laplacian pyramid level from 200 clean and light rainy training image pairs from [29]. As can be seen, compared to the image domain in Figure 4(a), distributions of lower pyramid levels, i.e., Figures 4(c) to (f), are more sparse and do not obey Gaussian distribution. This implies that we do not need BN to further constrain the feature maps since the mapping problem already becomes easy to handle. Moreover, removing BN can sufficiently reduce GPU memory usage since the BN layers consume the same amount of memory as the preceding convolutional layers. Based on the above observation and analysis, we remove BN layers from our network to improve flexibility and reduce parameter numbers and computing resource.IiF Parameter settings
We decompose an RGB image into a level Laplacian pyramid by using a fixed smoothing kernel , which is also used to reconstruct the Gaussian pyramid. In our network architecture, each subnetwork has the same structure with a different numbers of kernels. The kernel sizes for are . For , the kernel size is to further increase nonlinearity and reduce parameters. The number of recursive blocks is for each subnetwork. For the activation function , we use the leaky rectified linear units (LReLUs) [42] with a negative slope of 0.2.
Moreover, as shown in the last row of Figure 3, higher levels are closer to an identity mapping since rain streaks only remain in lower levels. This means for higher levels, fewer parameters are required for learning a good network. Thus, from low to high levels, we set the kernel numbers to and , respectively. Since the top level is a tiny and smoothed version of image and rain streaks remain in highfrequency parts, the function of top level subnetwork is more like a simple global contrast adjustment. Thus we set the kernel numbers to kernel for the top level. As shown in Figure 2, by connecting the upsampled version of the output from the higher level, the direct prediction of all subnetworks is actually the clean Laplacian pyramid. We show the intermediate results predicted by each subnetwork in Figure 5. It is clear that rain streaks remain in lower levels while higher levels are almost the same. This demonstrates that our diminishing parameter setting is reasonable. As a result, the total number of trainable parameters is only , far fewer than the hundreds of thousands often encountered in deep learning.
IiG Training details
We use synthetic rainy images from [29] as our training data. This dataset contains 1800 images with heavy rain and 200 images with light rain. We randomly generate one million
clean/rainy patch pairs. We use TensorFlow
[43] to train LPNet using the Adam solver [44] with a minibatch size of . We set the learning rate as and finish the training after epochs. The whole network is trained in a endtoend fashion.Iii Experiments
We compare our LPNet with four stateoftheart deraining methods: the Gaussian Mixture Model (GMM) of [14], a CNN baseline SRCNN [22], the deep detail network (DDN) of [28] and joint rain detection and removal (JORDER) [29], which is also a deep learning method. For fair comparison, all CNN based methods are retrained on the same training dataset.
Iiia Synthetic data
Three synthetic datasets are chosen for comparison. Two of them are from [29] and each one contains images. One is synthesized with heavy rain called Rain100H and the other one is with light rain called Rain100L. The third dataset called Rain12 is from [14] which contains synthetic images. All testing results shown are not included in the training data. Following [29], for each CNN method we train two models, one for heavy and for light rain datasets. The model trained on the light rainy dataset is used to test Rain12.
Figures 6 to 8 shows visual results from each dataset. As can be seen, GMM [14] fails to remove rain streaks form heavy rainy images. SRCNN [22] and DDN [28] are able to remove the rain streaks while tend to generate obvious artifacts. Our LPNet has comparable visual results with JORDER and outperforms other methods.
We also adopt PSNR and SSIM [40] to perform quantitative evaluations in Table I. Our method has comparable SSIM values with JORDER while outperforming other methods, in agreement with the visual results. Though our result has a lower PSNR value than JORDER method, the visual quality is comparable. This is because PSNR is calculated based on the mean squared error (MSE), which measures global pixel errors without considering local image characters. Moreover, as shown in Table I our LPNet contains far fewer parameters, potentially making LPNet more suitable for storage, e.g., in mobile devices.
Rainy images  GMM [14]  SRCNN [22]  DDN [28]  JORDER [29]  Our LPNet  
SSIM  PSNR  SSIM  PSNR  SSIM  PSNR  SSIM  PSNR  SSIM  PSNR  SSIM  PSNR  
Rain100H  0.38  13.56  0.43  15.05  0.70  22.84  0.76  21.92  0.83  26.54  0.81  23.73 
Rain100L  0.84  26.90  0.86  28.65  0.91  29.39  0.93  32.16  0.97  36.63  0.95  34.26 
Rain12  0.86  30.14  0.91  32.02  0.92  31.90  0.94  31.76  0.95  33.92  0.95  35.35 
Parameters #      20,099  57,369  369,792  7,548 
IiiB Realworld data
In this section, we show that the LPNet learned on synthetic training data still performs well on realworld data. Figure 9 shows five visual results on realworld images. The model trained on the dataset with light rain is used for testing on realworld images. As can be seen, LPNet generates consistently promising derained results on images with different kinds of rain streaks.
Since no ground truth exists, we construct an independent user study to provide realistic feedback and quantify the subjective evaluation. We collect 50 realworld rainy images from the Internet as a new dataset ^{1}^{1}1Our code and data will be released soon.. We use the compared five methods to generate derained results and randomly order the outputs, as well as the original rainy image, and display them on a screen. We then separately asked 20 participants to rank each image from 1 to 5 subjectively according to quality, with the instructions being that visible rain streaks should decrease the quality and clarity should increase quality (1 represents the worst quality and 5 represents the best quality). We show the average scores in Table II from these 1,000 trials and our LPNet has the best performance. In Figure 10, we show the scatter plot of the rainy inputs vs derained user scores. This smallscale experiment gives additional support that our LPnet improves the deraining on realworld images.
Inputs  GMM  SRCNN  DDN  JORDER  Our LPNet 
1.31  2.12  3.39  3.41  3.41  3.58 
Moreover, when dealing with dense rain, LPNet trained on images with heavy rain has a dehazing effect as shown in Figure 11, which can further improve the visual quality. This is because the highest level subnetwork (lowpass component) can adjust image contrast. Although dehazing is not the main focus of this paper, we believe that LPNet can be easily modified for joint deraining and dehazing.
IiiC Running time and convergence
To demonstrate the efficiency of LPNet, we show the average running time for a test image in Table III. Three different image sizes are chosen and each one is tested over 100 images. The GMM is implemented on CPUs according to the provided code, while other deep CNNbased methods are tested on both CPU and GPU. All experiments are performed on a server with Intel(R) Xeon(R) CPU E52683, 64GB RAM and NVIDIA GTX 1080. The GMM has the slowest running time since complicated inference is required to process each new image. Our method has a comparable and even faster computational time on both CPU and GPU compared with other deep models. This is because LPNet uses relatively shallow networks for each level, so requires fewer convolutions.
GMM [14]  SRCNN [22]  DDN [28]  JORDER [29]  Our LPNet  
Image size  CPU  GPU  CPU  GPU  CPU  GPU  CPU  GPU  CPU  GPU 
500 500  1.9910    0.25  0.03  1.51  0.16  2.9510  0.18  0.67  0.12 
750 750  3.0910    0.58  0.09  3.33  0.22  5.9810  0.36  1.49  0.16 
1024 1024  6.5210    1.07  0.11  5.40  0.32  1.2010  0.82  2.46  0.20 
We also show the average training loss as a function of training epoch in Figure 12. We observe that LPNet converges quickly on training with both light and heavy rainy datasets. Since heavy rain streaks are harder to handle, as shown in the 1st row of Figure 6, the training error of heavy rain streaks has a vibration.
IiiD Parameter settings
In this section, we discuss different parameters setting to study their impact on performance.
IiiD1 Increasing parameter number
We have conducted an experiment on the Rain100H dataset with increased parameters, i.e., 16 feature maps for all convolution layers at each subnetwork. The results are shown in Table IV. As can be seen, the SSIM evaluation is better than JORDER and PSNR value is also improved. We believe that the performance can be further improved by using more parameters. However, increasing parameter number requires more storage and computing resources. Figure 13 shows one example by using different parameter numbers. As can be seen, the visual quality is almost the same. Thus, we use our diminishing parameter setting to achieve the balance between effectiveness and efficiency.
JORDER [29]  Our LPNet (default)  Our LPNet (increasing)  
SSIM  PSNR  SSIM  PSNR  SSIM  PSNR  
Rain100H  0.83  26.54  0.81  23.73  0.84  24.09 
Parameters #  369,792  7,548  27,055 
IiiD2 Skip connections
Though Laplacian pyramid images introduce sparsity in each level to simply the mapping problem, it is still essential to add skip connection in each subnetwork. We adopt skip connection for two reasons. First, image information may be lost during feedforward convolutional operations, using skip connection helps to propagate information flow and improve the deraining performance. Second, using skip connection helps to backpropagate gradient, which can accelerate the training procedure, when updating parameters. In Figure 14 we show the training curves on the heavy rainy dataset with and without all skip connections. As can be seen, using skip connection can bring a faster convergence rate and lower training loss.
IiiD3 Loss function
We use SSIM as a part of loss function (9) for two main reasons. First, SSIM is calculated based on local image characteristics, e.g., local contrast, luminance and details, which are also the characteristics of rain streaks. Thus, using SSIM as the loss function is appropriate to guide the network training. Second, the human visual system is also sensitive to local image characteristics. SSIM has been motivated as generating more visually pleasing results, unlike PSNR. It has therefore become a more prominent measure in the image processing community. We also use loss because does not overpenalize larger errors and thus can preserve structures and edges. On the contrary, the widely used loss (which PSNR is based on) often generates oversmoothed results because it penalizes larger errors and tolerates small errors. Therefore, struggles to preserve underlying structures in the image compared with . Figure 15 shows two results generated by using our combined loss (9) and loss, respectively. As can be seen, using our combined loss (9) can preserve more details.
IiiE Extensions
IiiE1 Generalization to other image processing tasks
Since both Laplacian pyramids and CNNs are fundamental and general image processing technologies, our network design has potential value for other lowlevel vision tasks. Figure 16 shows the experimental result on image denoising and JPEG artifacts reduction, which shares the property of rainy images in that the desired image is corrupted by high frequency content. This test demonstrates that LPNet can generalize to similar image restoration problems.
IiiE2 Preprocessing for highlevel vision tasks
Due to the lightweight architecture, our LPNet can potentially be efficiently incorporated into other highlevel vision systems. For example, we study the problem of object detection in rainy environments. Since rain steaks can blur and block objects, the performance of object detection will degrade in rainy weather. Figure 17 shows a visual result of object detection by combining with the popular Faster RCNN model [45]. It is obviously that rain streaks can degrade the performance of Faster RCNN, i.e., by missing detections and producing low recognition confidence. On the other hand, after deraining by LPNet, the detection performance has a notable improvement over the naive FasterRCNN.
Additionally, due to the lightweight architecture, using LPNet with Faster RCNN does not significantly increase the complexity. To process a color image with size of , the running time is 3.7 seconds for Faster RCNN, and 4.0 seconds for LPNet + Faster RCNN.
Iv Conclusion
In this paper, we have introduced a lightweight deep network that is based on the classical GaussianLaplacian pyramid for single image deraining. Our LPNet contains several subnetworks and inputs the Laplacian pyramid to predict the clean Gaussian pyramid. By using the pyramid to simplify the learning problem and adopting recursive blocks to share parameters, LPNet has fewer than K parameters while still achieving good performance. Moreover, due to the generality and lightweight architecture, our LPNet has potential values for other low and highlevel vision tasks.
References
 [1] K. Garg and S. K. Nayar, “Detection and removal of rain from videos,” in CVPR, 2004.
 [2] P. C. Barnum, S. Narasimhan, and T. Kanade, “Analysis of rain and snow in frequency space,” Int’l. J. Computer Vision, vol. 86, no. 2, pp. 256–274, 2010.
 [3] J. Bossu, N. Hautiere, and J. P. Tarel, “Rain or snow detection in image sequences through use of a histogram of orientation of streaks,” Int’l. J. Computer Vision, vol. 93, no. 3, pp. 348–367, 2011.
 [4] Y. L. Chen and C. T. Hsu, “A generalized lowrank appearance model for spatiotemporally correlated rain streaks,” in ICCV, 2013.
 [5] J. H. Kim, J. Y. Sim, and C. S. Kim, “Video deraining and desnowing using temporal correlation and lowrank matrix completion,” IEEE Trans. Image Process., vol. 24, no. 9, pp. 2658–2670, 2015.
 [6] W. Ren, J. Tian, Z. Han, A. Chan, and Y. Tang, “Video desnowing and deraining based on matrix decomposition,” in ICCV, 2017.
 [7] W. Wei, L. Yi, Q. Xie, Q. Zhao, D. Meng, and Z. Xu, “Should we encode rain streaks in video as deterministic or stochastic?,” in ICCV, 2017.
 [8] J. H. Kim, C. Lee, J. Y. Sim, and C. S. Kim, “Singleimage deraining using an adaptive nonlocal means filter,” in IEEE ICIP, 2013.
 [9] Y. Chang, L. Yan, and S. Zhong, “Transformed lowrank model for line pattern noise removal,” in ICCV, 2017.
 [10] L. W. Kang, C. W. Lin, and Y. H. Fu, “Automatic single imagebased rain streaks removal via image decomposition,” IEEE Trans. Image Process., vol. 21, no. 4, pp. 1742–1755, 2012.
 [11] D. A. Huang, L. W. Kang, Y. C. F. Wang, and C. W. Lin, “Selflearning based image decomposition with applications to single image denoising,” IEEE Trans. Multimedia, vol. 16, no. 1, pp. 83–93, 2014.
 [12] Y. Luo, Y. Xu, and H. Ji, “Removing rain from a single image via discriminative sparse coding,” in ICCV, 2015.
 [13] Y. Wang, S. Liu, C. Chen, and B. Zeng, “A hierarchical approach for rain or snow removing in a single color image,” IEEE Trans. Image Process., vol. 26, no. 8, pp. 3936–3950, 2017.
 [14] Y. Li, R. T. Tan, X. Guo, J. Lu, and M. S. Brown, “Rain streak removal using layer priors,” in CVPR, 2016.
 [15] L. Zhu, C. W. Fu, D. Lischinski, and P. A. Heng, “Joint bilayer optimization for singleimage rain streak removal,” in ICCV, 2017.

[16]
A. Krizhevsky, I. Sutskever, and G. E. Hinton,
“ImageNet classification with deep convolutional neural networks,”
in NIPS, 2012.  [17] M. P. Eckstein H. Cecotti and B. Giesbrecht, “Singletrial classification of eventrelated potentials in rapid serial visual presentation tasks using supervised spatial filtering,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 11, pp. 2030–2042, 2014.
 [18] L. Liu X. Luo T. Chen, L. Lin and X. Li, “DISC: Deep image saliency computing via progressive representation learning,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 6, pp. 1135–1149, 2016.
 [19] J. Liu Q. Miao M. Gong, J. Zhao and L. Jiao, “Change detection in synthetic aperture radar images based on deep neural networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 1, pp. 125–138, 2016.
 [20] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016.
 [21] W. Hou, X. Gao, D. Tao, and X. Li, “Blind image quality assessment via deep learning,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 6, pp. 1275–1286, 2015.
 [22] C. Dong, C. C. Loy, K. He, and X. Tang, “Image superresolution using deep convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 2, pp. 295–307, 2016.
 [23] Y. Tai, J. Yang, X. Liu, and C. Xu, “MemNet: A persistent memory network for image restoration,” in ICCV, 2017.
 [24] X. Hu, G. Feng, S. Duan, and L. Liu, “A memristive multilayer cellular neural network with applications to image processing,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 8, pp. 1889–1901, 2017.
 [25] R. Dian, S. Li, A. Guo, and L. Fang, “Deep hyperspectral image sharpening,” IEEE Trans. Neural Netw. Learn. Syst., 2018.
 [26] D. Eigen, D. Krishnan, and R. Fergus, “Restoring an image taken through a window covered with dirt or rain,” in ICCV, 2013.
 [27] X. Fu, J. Huang, X. Ding, Y. Liao, and J. Paisley, “Clearing the skies: A deep network architecture for singleimage rain removal,” IEEE Trans. Image Process., vol. 26, no. 6, pp. 2944–2956, 2017.
 [28] X. Fu, J. Huang, D. Zeng, Y. Huang, X. Ding, and J. Paisley, “Removing rain from single images via a deep detail network,” in CVPR, 2017.
 [29] W. Yang, R. T. Tan, J. Feng, J. Liu, Z. Guo, and S. Yan, “Deep joint rain detection and removal from a single image,” in CVPR, 2017.
 [30] H. Zhang, V. Sindagi, and V. M. Patel, “Image deraining using a conditional generative adversarial network,” arXiv preprint arXiv:1701.05957, 2017.
 [31] H. Zhang and V.M. Patel, “Densityaware single image deraining using a multistream dense network,” in CVPR, 2018.
 [32] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” in ICLR, 2016.
 [33] P. Burt and E. Adelson, “The laplacian pyramid as a compact image code,” IEEE Trans. Commun., vol. 31, no. 4, pp. 532–540, 1983.
 [34] K. He, J. Sun, and X. Tang, “Guided image filtering,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 6, pp. 1397–1409, 2013.
 [35] Y. Tai, J. Yang, and X. Liu, “Image superresolution via deep recursive residual network,” in CVPR, 2017.
 [36] E. L. Denton, S. Chintala, and R. Fergus, “Deep generative image models using a laplacian pyramid of adversarial networks,” in NIPS, 2015.
 [37] W. S. Lai, J. B. Huang, N. Ahuja, and M. H. Yang, “Deep laplacian pyramid networks for fast and accurate superresolution,” in CVPR, 2017.
 [38] X. Shen, Y. C. Chen, X. Tao, and J. Jia, “Convolutional neural pyramid for image processing,” arXiv preprint arXiv:1704.02071, 2017.
 [39] H. Zhao, O. Gallo, I. Frosio, and J. Kautz, “Loss functions for image restoration with neural networks,” IEEE Trans. Comput. Imaging, vol. 3, no. 1, pp. 47–57, 2017.
 [40] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, 2004.
 [41] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in ICML, 2015.
 [42] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in ICML, 2013.
 [43] M. Abadi, A. Agarwal, P. Barham, et al., “Tensorflow: Largescale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv: 1603.04467, 2016.
 [44] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in ICLR, 2014.
 [45] S. Ren, K. He, R. Girshick, and J. Sun, “Faster RCNN: Towards realtime object detection with region proposal networks,” in NIPS, 2015.