1 Introduction
Rain streaks and rain drops often occlude or blur the key information of the images captured outdoors. Thus the rain removal task for an image or a video is useful and necessary, which can be served as an important preprocessing step for outdoor visual system. An effective rain removal technique can often help an image/video better deliver more accurate detection or recognition results [17].
Current rain removal tasks can be mainly divided into two categories: video rain removal (VRR) and single image rain removal (SIRR). Compared to VRR, which could utilize the temporal correlation among consecutive frames, SIRR is generally much more difficult and challenging without the aid of much prior knowledge capable of being extracted from a single image. Since being firstly proposed by Kang et al. [17], the SIRR problem has been attracting much attention. Recently, deep learning methods [9, 10, 29, 32, 31, 23, 12, 11, 8] have been empirically substantiated to achieve stateoftheart performance for SIRR by training an appropriate, carefully designed network to detect and remove the rain streaks simultaneously.
Albeit achieving good performance on this task, current deep learning approach still exists some limitations on the methodology. First, for training data, since it is hard to obtain clean/rainy image pairs from real rainy scenarios, previous methods use synthesized data as an alternative, and mainly adopt the strategy of adding the “fake” rain streaks synthesized by the Photoshop software^{1}^{1}1https://www.photoshopessentials.com/photoeffects/rain/ on the clean images. Two samples of such synthesized rainy images are shown in Figure 1.(b) and (c) (the corresponding clean image is shown in Figure 1.(a)). Albeit being varied by the rain streak direction and density, the synthesized rainy images still cannot include sufficiently wider range of rain streak patterns in real rainy images. For instance, in Figure 1.(d), the rain streaks have multiple directions in a single frame influenced by the wind; in Figure 1.(e), the rain streaks have multilayers because of their different distances to the camera; in Figure 1.(f), the rain streaks produce the effect of aggregation which is similar to fog or mist. Therefore there exists obvious bias between synthetic training data and real testing data in this task, naturally leading to an issue that the network trained on the synthetic training data possibly not capable of being finely generalized to the real test data.
Meanwhile, one of the main problems for deep learning methods lies on the preliminary conditions that they generally need sufficiently large number of supervised samples (ideal cases are natural images with/without real rain for our task), which are generally timeconsuming and cumbersome to collect, in order to train a derain network. However, one generally can easily attain large amount of practical unsupervised samples, i.e., real rainy images, while without their corresponding clean ones. How to rationally feed these cheap samples into the network training is not only meaningful and necessary for the investigated task, but also possibly inevitable in the next generation of deep learning to fully prompt its capability on unsupervised data for general image restoration tasks.
Due to the inconsistence in the distribution of training data and test data, this task can be naturally viewed as a typical domain adaption problem. How to transfer from learning the synthesized rain patterns (training, supervised) to learning real rain patterns (testing, unsupervised) is crucial. To alleviate the aforementioned issue of previous supervised deep learning methods for the SIRR task, instead of from the perspective of manually collecting more appropriate supervised dataset (real rainy images and their corresponding clean ones) to better suit this task, we propose a novel semisupervised method attempting to effectively feed unsupervised real rainy images into the network training as well, ultimately expecting to transfer from synthesized rain domain to real rain domain. Different from previous supervised deep learning methods by only using synthesized image pairs as network inputs, our method is capable of fully utilizing unsupervised practical rainy images during training in a mathematically sound manner. Specifically, our model allows both the supervised synthetic data and unsupervised real data being fed into the network simultaneously, and the network parameters can be optimized by the combination of least square residuals (for supervised samples) of network output images of supervised inputs and their ground truth labels, and negative loglikelihood (NLL) losses of a specific parametrized rain distribution (for unsupervised samples) measured by the difference of network output images of unsupervised inputs and their original rainy ones. In this manner, both supervised synthetic and unsupervised real samples can be rationally employed in our method for network training.
In summary, the main contributions of the proposed method are:

To our knowledge, this is the first work that takes notice of domain adaption issue for SIRR task. We are the first to propose a semisupervised transfer learning framework for this task. Different from the previous deep learning SIRR methods, our model can fully take use of the unsupervised real rainy images, which can be easily collected in practice, without need of the corresponding clean ones. Such unsupervised samples not only help evidently reduce the time and labor costs of precollecting image pairs with/without real rain for network parameters updating, but also alleviate the overfitting issue of the deep network on limited rain types covered by only supervised training samples through compensating those unsupervised ones containing more general and practical rain characteristics.

We provide a general methodology for simultaneously utilizing supervised and unsupervised knowledge for image restoration tasks. For supervised one, the traditional least square loss between network output images and their clean ones can be directly employed. For the unsupervised one, we can rationally formulate the residual between the expected output clean images and their original noisy ones through a likelihood term imposed on a parameterized distribution designed based on the domain understanding for residuals (e.g., rain in our study).

We design an Expectation Maximization algorithm together with a gradient descent strategy to solve the proposed model. The rain distribution parameters and network parameters can be optimized by sequence in each epoch. Experiments implemented on synthesized rainy images and especially real ones show that our model is capable of transferring from learning synthesized to real rain patterns, thus substantiating the superiority of the proposed method compared to the stateofthearts.
2 Related work
2.1 Single image rain removal methods
The problem of SIRR was firstly proposed by Kang et al. [17]. They detected the rain from the high frequency part of an image based on morphological component analysis and dictionary learning. Chen et al.’s [5] also operated on the high frequency part of the rainy image but they employed a hybrid feature set, including histogram of oriented gradients, depth of field, and Eigen color, in order to distinguish the rain portions from the image and enhance the texture/edge information. After that, Luo et al. [26] introduced screen blend model and used discriminative sparse coding for rain layer separation, and the model is solved by greedy pursuit algorithm. Li et al.’s [24]
incorporated patchbased Gaussian mixture model to deliver the prior information of image background and rain layer, and trained the model parameters under precollected clean and rainy images. Similarly, Zhang
et al. [30] learned a set of generic sparsitybased and lowrank representationbased convolutional filters to represent background and rain streaks, respectively. Gu et al. [14] combined analysis sparse representation to represent image largescale structures and synthesis sparse representation to represent image finescale textures, including the directional prior and the nonnegativeness prior in their JCAS model. More recently, Zhu et al. [33] proposed a joint optimization process that alternates between removing rainstreak details from background layer and removing nonstreak details from rain layer. Their model is aided by the rain priors, which are narrow directions and selfsimilarity of rain patches, and the background prior, which is centralized sparse representation. Chang et al. [4] transformed a rainy image into a domain where the line pattern appearance has extremely distinct lowrank structure, and proposed a model with compositional directional total variational and lowrank priors, to deal with the rain streaks as line pattern noise and camera noise at the same time.While these modelbased methods are mathematically sound, they mostly suffer from slow speed when testing because they need to solve an optimization problem. Deep learning has an advantage on test speed and has been substantiated to be effective in many computer vision tasks [21, 20, 15], so does in SIRR. Fu et al. firstly introduced deep learning technique for this task in [9]
. They trained a convolutional neural network (CNN) with three hidden layers on the high frequency domain of the image. Later, they further ameliorated the CNN by introducing deeper hidden layers, batch normalization and negative residual mapping structure, and achieved better effect
[10]. To better deal with the scenario of heavy rain images (where individual streaks are hardly seen, and thus visually similar to mist or fog). Yang et al. [29]exploited a contextualized dilated network with a binary map. In their model, a continuous process of rain streak detection, estimation and removal are predicted in a sequential order. Zhang
et al. [32]applied the mechanism of GAN and introduced a perceptual loss function for the consideration of rain removal problem. Afterwards, they developed a density aware multistream dense network for joint rain density estimation and deraining
[31]. In summary, these methods learn from synthesized rain data and test their learned network in real scenes.2.2 Video rain removal methods
For literature comprehensiveness, we simply list several representative stateoftheart video rain removal methods. Since the extra interframe information is extremely helpful, these methods showed relatively better reconstruction effect than SIRR methods. Early video derain methods [13, 2, 3, 18] designed many useful techniques to detect potential rain streaks based on their physical characteristics and removed these detected rain by image restoration algorithms. In recently years, lowrankness [6, 27], total variation [16], stochastic distribution priors [28], convolutional sparse coding [22], neural networks [25] have been applied to the task and achieved satisfying results.
Since the SIRR problem is more difficult in real world with less information provided other than a rainy image, to design an effective SIRR regime is also more challenging beyond VRR ones.
3 Semisupervised model for SIRR
We show the framework of our model which includes the training data (both supervised and unsupervised samples) and the network loss in Figure 2. As introduced aforementioned, our model is capable of feeding not only supervised synthesized rainy images but also unsupervised real rainy images into the network training process, in order to transfer from learning synthesized rain patterns to learning real rain patterns.
3.1 Model formulation
As shown in Figure 1.(d,e,f), the real rain usually shows relatively more complex patterns and representations compared to synthesized rain. However, due to the technical defects, these data’s “labels” (i.e., the corresponding clean images) are generally unavailable. Although we can hardly exactly extract the rain layer, as well as the clean background, from a real rainy image, we instead can design a parametrized distribution to finely approximate its stochastic configurations. Since the rain generally contains multimodal structures due to their occurrence on positions with different distances to the cameras, we can finely approximately express the rain as a Gaussian mixture model (GMM). That is,
(1) 
where
denote the mixture coefficients, Gaussian distribution means and variances. Mixture models can be universal approximations to any continuous functions if the parameters are learned appropriately, and thus it is suitable to be utilized to describe the rain streaks tobeextracted from the input rainy image. Thus the negative log likelihood function imposed on these unsupervised samples can be written as:
(2) 
where , , is the number of mixture components, and is the number of samples. Note that the means of Gaussian distributions are manually set to be zero, and this doesn’t affect the results in our experiments.
By utilizing the above encoding manner, we can also construct an objective function for unsupervised rainy images, which can be further used to finetune the network parameters through backpropagating its gradients to the network layers.
Meanwhile, we follow the network structure and negative residual mapping skill of DerainNet [10] (a deep convolutional neural network) to formulate the loss function on supervised samples. The network which is denoted by (here represents the network parameters) is supposed to remove the rain streaks of input image and output a rainfree one. The classical loss function of CNN is to minimize the least square loss between the expected derain output and the ground truth label , as shown in the upper panel of Figure 2. That is, the loss function imposed on the supervised samples is with the following least square form:
(3) 
where represents the samples of the synthesized rainy image.
Moreover, since GMM can be adapted to any continuous distribution, in order to let it better fit the real rain samples, we add a constraint that the discrepancy between synthesized rain data domain and real rain data domain is not too far by minimizing a KullbackLeibler divergence between a Gaussian learned from the synthesized rain and the aforementioned mixture of Gaussians learned from the real rain during training, with a small controlling parameter, as shown in the middleright of Figure 2. This is to indicate that our model is expecting to transfer from synthesized rain to real rain, other than to arbitrary domains. Since this KL divergence is not analytically tractable, we use the minimum of KL divergence between and each component of as an empirical and simple substitute, to ensure that at least one component of GMM learned from the real samples is similar to rain. That is,
(4) 
where indicates the th component of .
To further remove the potential remained rain streaks in the output image, we add a Total Variation regularizer term to slightly smooth the image. Note that together with the aforementioned likelihood term on rain, a complete MAP model (likelihood + regularizer) is formulated on the tobeestimated network outputs of the unsupervised real rain images. It facilitates a right direction for gradient descent to network training on these unsupervised data even without specific explicit guidance of corresponding clean images.
By combining Eq.(2), (3), (4) and TV term, the entire objective function to train the network is formulated as:
(5) 
where represent corresponding rainy input and ground truth label sample pairs of the synthesized supervised data, and represent the rainy input of the real unsupervised data without ground truth labels. Through the last term of Eq. (5), the unsupervised data can be fed into the same network with which imposed on the supervised data, and the term is the supposed rain extracted from the input rainy image, which is equivalent to as defined in Eq. (2). , and are the tradeoff parameters. Note that when and equal to 0, our model degenerates to the conventional supervised deep learning model [10].
By using such objective setting, the network can be trained not only by the well annotated supervised data, but also purely unsupervised inputs by fully encoding the prior information underlying rain streak distributions. As compared with the traditional deep learning techniques implemented on only supervised samples, the better generalization effect of the network is expected due to the fact that it facilitates a rational transferring effect from the supervised samples to unsupervised types of rain.
3.2 The EM algorithm
Since the loss function in Eq. (5) is intractable, we use the Expectation Maximization algorithm [7] to iteratively solve the model. In E step, the posterior distribution which represents the responsibility of certain mixture component is calculated. In M step, the mixture distribution and the convolutional neural network parameters are updated..
Introduce a latent variable where and , indicating the assignment of noise term
to a certain component of the mixture model. According to the Bayes’ theorem, the posterior responsibility of component
for generating the noise is given by:(6) 
After the E step, the loss function in Eq. (5) is unfolded into a differential one with respect to GMM parameters, shown as:
(7) 
The closedform solution of mixture coefficients and Gaussian covariance parameters are [7]:
(8) 
(9) 
Then we can employ the gradient methods to optimize the objective function as defined in Eq. (7) and the gradient so calculated can thus be easily back propagated to the network to gradually ameliorate its parameters . We readily utilize Adam [19], the offtheshelf first order gradient optimization algorithm, for network parameter training on the objective function (7) imposed on both synthesized supervised and real unsupervised training samples.
3.3 Discussions on domain transfer learning
The main difference of the proposed method from the other supervised deep learning SIRR methods is the involvement of the real world rainy images whose ground truth rainfree images or groundtruth rain images) are unavailable during training. One main motivation for this investigation is that the manually synthesized rain shapes usually differ from real ones collected in practice. According to several SIRR methods in the framework of deep learning [9, 10, 29, 31], clean images are used to synthesize rainy images by Photoshop software. Although each clean image is supposed to synthesize several different type of rainy image, as shown in upper panel of Figure 1, the difference of scale, illuminance and distance to the camera of the real rain streaks and usually accompanied fog or mist visual effect are hardly sufficiently considered, thus yielding nonnegligible gap between the synthesized rainy images for training and the real rainy images for testing.
In our method, the involvement of the unsupervised real rainy data alleviates this problem. As shown in Figure 3, we use the same synthesized rainy data with [10] as the supervised training data. To empirically show the domain transfer capability and verify the superiority of our model on this point, we use a different way to synthesize rainy images introduced in [28], and separate them as unsupervised input set and validation set. Therefore the supervised training rain and validation rain lie in distinct domains. We found that our model shows better capability to overcome the gap and transfer from the training data domain to validation data domain. Although our semisupervised model not extremely finely fit the effect of the training data when the unsupervised term in our loss function Eq. (5) plays more important role (as shown in Column 1 of Figure 3, green and blue lines are our semisupervised model, with different unsupervised term parameters), Column 2 of Figure 3 reflects that our model has better effect on the target domain (solid line represents supervised data domain while dotted line represents target domain). Moreover, with the training dataset booming, the baseline supervised CNN (red line in Figure 3) tends more and more to achieve specific patterns of the training data (i.e., the performance of training data improve), thus less being generalized to the validation data (i.e., the performance of testing data does not improve correspondingly, even slightly worsen) if they lie in separate domain, as shown in Column 3 of Figure 3. However, the involvement of the unsupervised term in our loss function can effectively alleviate this issues, as shown in Column 4 and 5 of Figure 3, which is critical in real rain removal task.
4 Experimental results
In this section, we evaluate our methods both on synthesized rainy data and real world rainy data. The compared methods include the discriminative sparse coding based method (DSC) [26], layer priors based method (LP) [24], CNN method [10], joint bilayer optimization (JBO) [33], multitask deep learning method (JORDER) [29] and multistream dense net (DIDMDN) [31]. These methods include conventional unsupervised modeldriven methods and more recent supervised datadriven deep learning methods. Our method to some extent can be viewed as an intrinsic combination of both methodologies.
4.1 Implementation details
For supervised training data, we use one million 6464 synthesized rainy/clean image patch pairs which are the same with the baseline CNN method [10]. For unsupervised training data, we collect the real world rainy images from the dataset provided by [29, 28, 32] and Google image search. We randomly cropped one million 6464 image patches from these images to constitute the unsupervised samples. Batch size is 20. The initial learning rate is
, decaying by multiplying 0.1 after every 5 epochs. We train 15 epochs in total. The training is implemented using Tensorflow
[1].We design the number of GMM components as 3. For the tradeoff parameter , we simply set it as 0.5 throughout all our experiments. The parameter which controls the TV smoothing term is set as a small value . The parameter which controls the KL divergence term is set to be . The network structures and related parameters are directly inherited from the baseline method [10].
4.2 Experiments on synthetic images
Dataset  Input  DSC[26]  LP[24]  JORDER[29]  CNN[10]  JBO[33]  DIDMDN[31]  Ours 

Dense  17.95  19.00  19.27  18.75  19.90  18.87  18.60  21.60 
Sparse  24.14  25.05  25.67  24.22  26.88  25.24  25.66  26.98 
In this subsection, we evaluate the rain removal effect of our method with synthetic data by both visual quality and performance metric. We use the skill of [28] to synthesize the rainy image as test data. Considering the complexity and multiformity of the rain streaks, we compare our methods with others under two different scenarios: sparse rain streaks and dense rain streaks. In each scenario we use ten test images. Figure 4 shows an example of synthetic data with sparse rain streaks. The added rain streaks are sparse but with multiple lengths and layers, in consideration of the different distance to the camera. As shown in Figure 4, the DSC method [26] and JBO method [33] fail to remove the main component of the rain streaks. The LP method [24] tends to blur the visual effect of the image and oversmooth the texture and edge information. The two deep learning methods CNN [10] and JORDER [29] have better rain removal effects, but rain streaks sill clearly exist in their results. Comparatively, our method could better remove the sparse rain streaks and keep the background information.
We also design the experiments with dense rain streaks scenario. In real world, the dense rain streaks have the effect of aggregation, blurring the image similar to fog or mist when the rain is heavy. In Figure 5, the added rain is heavy, with not only the long rain streaks, but also the brought blurring effect damaging the image visual quality. As shown in Figure 5, the results of DSC [26], JORDER [29] and JBO [33] still have obvious rain streaks, while LP [24] still oversmoothes the image. Compared with the baseline CNN method [10], our method has better restoration results.
Since the ground truth is known for the synthetic experiments, we use the most extensive performance metric Peak SignaltoNoise Ratio (PSNR) for a quantitative evaluation. As is evident in Table 1, our method attains the best PSNR in both two groups of data with different scenarios, in agreement with the visual effect in Figures 4 and 5.
4.3 Experiments on real images
The most direct way to evaluate a SIRR method is to see its visual effect of restoration results on the real world rainy images. We use the testing data selected from the Google search. To better represent the diversity of the real rain scenarios, we intentionally select images with different types of rain streaks as shown in Figure 6.
To confirm the necessity of investigating transfer learning for this task, we list the complete synthesized rain types [9] in our supervised training data in Figure 7. The bias of rain between Figures 6 and 7 is obvious and the transfer ability of our model can thus be substantiated. The visual effect of derained images verify that our method can remove more rain streaks and better keep the visual quality. Compared to other competing methods, our method can remove more amount of the rain streaks while still better keep the structure of image undamaged.
5 Conclusion
In this paper, we have attempted to solve the SIRR problem in a semisupervised transfer learning manner. We train a CNN on both synthesized supervised and real unsupervised rainy images. In this manner, our method especially alleviates the hardtocollecttrainingsample and overfittingtotrainingsample issues existed in conventional deep learning methods designed for this task. The experiments implemented on synthesized and real images substantiate the effectiveness of the proposed method.
We admit that our model is still not almighty for all rainy image which could be extremely complicated to handle. The involvement of more elaborate priors on rain and background layers in training the network could be the future direction to further improve the performance for this task. Also this semisupervised transfer learning methodology could be considered into other inverse problems as well. We wish to apply the human prior knowledge into the learning process of neural network framework, more sufficiently realizing the combination of databased and modelbased methods. The ultimate goal is to take advantage of both supervised databased deep learning methods, which could shorten the testing time to fulfill the online requirement, and modelbased method, to put the network training into a more explainable direction.
Acknowledge
This research was supported by National Key R&D Program of China (2018YFB1004300), China NSFC projects (61661166011, 11690011,61603292, 61721002, U1811461), National Science Foundation grant IIS1619078, IIS1815561, and the Army Research Ofice ARO W911NF1610138.
References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey
Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al.
Tensorflow: a system for largescale machine learning.
In OSDI, volume 16, pages 265–283, 2016.  [2] Peter C Barnum, Srinivasa Narasimhan, and Takeo Kanade. Analysis of rain and snow in frequency space. International Journal of Computer Vision, 86(23):256, 2010.
 [3] Jérémie Bossu, Nicolas Hautière, and JeanPhilippe Tarel. Rain or snow detection in image sequences through use of a histogram of orientation of streaks. International Journal of Computer Vision, 93(3):348–367, 2011.

[4]
Yi Chang, Luxin Yan, and Sheng Zhong.
Transformed lowrank model for line pattern noise removal.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pages 1726–1734, 2017.  [5] DuanYu Chen, ChienCheng Chen, and LiWei Kang. Visual depth guided color image rain streaks removal using sparse coding. IEEE Transactions on Circuits and Systems for Video Technology, 24(8):1430–1455, 2014.
 [6] YiLei Chen and ChiouTing Hsu. A generalized lowrank appearance model for spatiotemporally correlated rain streaks. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 1968–1975. IEEE, 2013.
 [7] Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (methodological), pages 1–38, 1977.
 [8] Zhiwen Fan, Huafeng Wu, Xueyang Fu, Yue Huang, and Xinghao Ding. Residualguide network for single image deraining. In 2018 ACM Multimedia Conference on Multimedia Conference, pages 1751–1759. ACM, 2018.
 [9] Xueyang Fu, Jiabin Huang, Xinghao Ding, Yinghao Liao, and John Paisley. Clearing the skies: A deep network architecture for singleimage rain removal. IEEE Transactions on Image Processing, 26(6):2944–2956, 2017.
 [10] Xueyang Fu, Jiabin Huang, Delu Zeng, Yue Huang, Xinghao Ding, and John Paisley. Removing rain from single images via a deep detail network. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
 [11] Xueyang Fu, Borong Liang, Yue Huang, Xinghao Ding, and John Paisley. Lightweight pyramid networks for image deraining. arXiv preprint arXiv:1805.06173, 2018.
 [12] Xueyang Fu, Qi Qi, Yue Huang, Xinghao Ding, Feng Wu, and John Paisley. A deep treestructured fusion model for single image deraining. arXiv preprint arXiv:1811.08632, 2018.
 [13] Kshitiz Garg and Shree K Nayar. Detection and removal of rain from videos. In Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on. IEEE, 2004.
 [14] Shuhang Gu, Deyu Meng, Wangmeng Zuo, and Lei Zhang. Joint convolutional analysis and synthesis sparse representation for single image layer separation. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 1717–1725. IEEE, 2017.
 [15] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.

[16]
TaiXiang Jiang, TingZhu Huang, XiLe Zhao, LiangJian Deng, and Yao Wang.
A novel tensorbased video rain streaks removal approach via utilizing discriminatively intrinsic priors.
In Proceedings of the Conference on Computer Vision and Pattern Recognition, 2017.  [17] LiWei Kang, ChiaWen Lin, and YuHsiang Fu. Automatic singleimagebased rain streaks removal via image decomposition. IEEE Transactions on Image Processing, 21(4):1742–1755, 2012.
 [18] JinHwan Kim, JaeYoung Sim, and ChangSu Kim. Video deraining and desnowing using temporal correlation and lowrank matrix completion. IEEE Transactions on Image Processing, 24(9):2658–2670, 2015.
 [19] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 [20] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pages 1097–1105, 2012.
 [21] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
 [22] Minghan Li, Qi Xie, Qian Zhao, Wei Wei, Shuhang Gu, Jing Tao, and Deyu Meng. Video rain streak removal by multiscale convolutional sparse coding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6644–6653, 2018.
 [23] Xia Li, Jianlong Wu, Zhouchen Lin, Hong Liu, and Hongbin Zha. Recurrent squeezeandexcitation context aggregation net for single image deraining. In European Conference on Computer Vision, pages 262–277. Springer, 2018.
 [24] Yu Li, Robby T Tan, Xiaojie Guo, Jiangbo Lu, and Michael S Brown. Rain streak removal using layer priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2736–2744, 2016.
 [25] Jiaying Liu, Wenhan Yang, Shuai Yang, and Zongming Guo. D3rnet: Dynamic routing residue recurrent network for video rain removal. IEEE Transactions on Image Processing, 2018.
 [26] Yu Luo, Yong Xu, and Hui Ji. Removing rain from a single image via discriminative sparse coding. In Proceedings of the IEEE International Conference on Computer Vision, pages 3397–3405, 2015.
 [27] Weihong Ren, Jiandong Tian, Zhi Han, Antoni Chan, and Yandong Tang. Video desnowing and deraining based on matrix decomposition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4210–4219, 2017.
 [28] W. Wei, L. Yi, Q. Xie, Q. Zhao, D. Meng, and Z. Xu. Should we encode rain streaks in video as deterministic or stochastic? In 2017 IEEE International Conference on Computer Vision (ICCV), volume 00, pages 2535–2544, Oct. 2018.
 [29] Wenhan Yang, Robby T Tan, Jiashi Feng, Jiaying Liu, Zongming Guo, and Shuicheng Yan. Deep joint rain detection and removal from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1357–1366, 2017.
 [30] He Zhang and Vishal M Patel. Convolutional sparse and lowrank codingbased rain streak removal. In Applications of Computer Vision (WACV), 2017 IEEE Winter Conference on, pages 1259–1267. IEEE, 2017.
 [31] He Zhang and Vishal M Patel. Densityaware single image deraining using a multistream dense network. In CVPR, 2018.
 [32] He Zhang, Vishwanath Sindagi, and Vishal M Patel. Image deraining using a conditional generative adversarial network. arXiv preprint arXiv:1701.05957, 2017.
 [33] Lei Zhu, ChiWing Fu, Dani Lischinski, and PhengAnn Heng. Joint bilayer optimization for singleimage rain streak removal. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2526–2534, 2017.