A Model-driven Deep Neural Network for Single Image Rain Removal

by   Hong Wang, et al.
Xi'an Jiaotong University

Deep learning (DL) methods have achieved state-of-the-art performance in the task of single image rain removal. Most of current DL architectures, however, are still lack of sufficient interpretability and not fully integrated with physical structures inside general rain streaks. To this issue, in this paper, we propose a model-driven deep neural network for the task, with fully interpretable network structures. Specifically, based on the convolutional dictionary learning mechanism for representing rain, we propose a novel single image deraining model and utilize the proximal gradient descent technique to design an iterative algorithm only containing simple operators for solving the model. Such a simple implementation scheme facilitates us to unfold it into a new deep network architecture, called rain convolutional dictionary network (RCDNet), with almost every network module one-to-one corresponding to each operation involved in the algorithm. By end-to-end training the proposed RCDNet, all the rain kernels and proximal operators can be automatically extracted, faithfully characterizing the features of both rain and clean background layers, and thus naturally lead to its better deraining performance, especially in real scenarios. Comprehensive experiments substantiate the superiority of the proposed network, especially its well generality to diverse testing scenarios and good interpretability for all its modules, as compared with state-of-the-arts both visually and quantitatively. The source codes are available at <https://github.com/hongwang01/RCDNet>.


page 1

page 5

page 6

page 7

page 8


RCDNet: An Interpretable Rain Convolutional Dictionary Network for Single Image Deraining

As a common weather, rain streaks adversely degrade the image quality. H...

From Rain Removal to Rain Generation

Single image deraining is an important yet challenging issue due to the ...

Structural Residual Learning for Single Image Rain Removal

To alleviate the adverse effect of rain streaks in image processing task...

InDuDoNet+: A Model-Driven Interpretable Dual Domain Network for Metal Artifact Reduction in CT Images

During the computed tomography (CT) imaging process, metallic implants w...

Adaptive Convolutional Dictionary Network for CT Metal Artifact Reduction

Inspired by the great success of deep neural networks, learning-based me...

Compressive MR Fingerprinting reconstruction with Neural Proximal Gradient iterations

Consistency of the predictions with respect to the physical forward mode...

Deep Joint Rain Detection and Removal from a Single Image

In this paper, we address a rain removal problem from a single image, ev...

1 Introduction

Images taken under various rain conditions often suffer from unfavorable visibility, and always severely affect the performance of outdoor computer vision tasks, such as objection tracking 

[5], video surveillance [37], and pedestrian detection [31]. Hence, removing rain streaks from rainy images is an important pre-processing task and has drawn much research attention in the recent years [39, 26].

In the past years, various methods have been proposed for single image rain removal task. Many researchers made focus on exploring physical properties of rain layer and background layer, and introduced various prior structures to regularize and separate them. Along this research line, the representative methods include layer priors with Gaussian mixture model (GMM) 

[28], discriminative sparse coding (DSC) [51], and joint convolutional analysis and synthesis sparse representation (JCAS) [13]. Especially, inspired by the fact that rain streaks repeatedly appear at different locations over a rainy image with similar local patterns like shape, thickness, and direction, very recently researchers represented this configuration of rain layer by the convolutional dictionary learning model [15, 16]. Such a representation finely delivers this prior knowledge by imposing rain kernels (conveying repetitive local patterns) on sparse rain maps, as intuitively depicted in Fig. 1 (a). These methods thus achieved state-of-the-art (SOTA) performance when the background can also be well represented, e.g., by low-rank prior in surveillance video sequences [25].

Albeit effective in certain applications, the rationality of these techniques depends on the subjective prior assumptions imposed on the unknown background and rain layers to be recovered. In real scenarios, however, such learning regimes could not always adapt to different rainy images with complex, diverse, and variant structures collected from different resources. Besides, these methods generally need time-consuming iterative computations, often with efficiency issue in real applications.

Driven by the significant success of deep learning (DL) in low level vision, recent years have also witnessed the rapid progress of deep convolutional neural networks (CNN) for single image rain removal 

[8, 52, 53, 40]. The current DL-based derainers mainly focus on designing network modules, and then train network parameters based on abundant rainy/clean image pairs to extract the background layer. Typical deraining network structures include deep detail network (DDN) [9], recurrent squeeze-and-excitation context aggregation module (RESCAN) [27], progressive image deraining network (PReNet) [35], spatial attentive unit (SPANet) [41], and many others.

These DL strategies, however, also possess evident deficiencies. The most significant one is their weak interpretability. Network structures are always complicated and diverse, making it difficult to analyze the role of different modules and understand the underlying insights of their mechanism. Besides, most of them treat CNN as an encapsulated end-to-end mapping module without deepening into the rationality, and neglect the intrinsic prior knowledge of rain streaks such as sparsity and nonlocal similarity. This makes this methodology easily trapped into the overfitting-to-training-sample issue.

To alleviate the aforementioned issues, this paper designs an interpretable deep network, which sufficiently considers the characteristics of rain streaks and attempts to combine the advantages of the conventional model-driven prior-based and current data-driven DL-based methodologies. Specifically, our contributions are mainly three-fold:

Firstly, we propose a concise rain convolutional dictionary (RCD) model for single image by exploiting the intrinsic convolutional dictionary learning mechanism to encode rain shapes, and specifically adopt the proximal gradient technique [2]

to design an optimization algorithm for solving it. Different from traditional solvers for the RCD model containing complex operations (e.g., Fourier transformation), the algorithm only contains simple computations (see Fig. 

1 (b)) easy to be implemented by general network modules. This facilitates our algorithm capable of being easily unfolded into a deep network architecture.

Secondly, by unfolding the algorithm, we design a new deep network architecture for image deraining, called RCDNet. The specificity of this network lies on its exact step-by-step corresponding relationship between its modules and the algorithm operators, and thus successively possesses the interpretability of all its modules as that of all steps in the algorithm. Specifically, as shown in Fig. 1 (b) and (c), each iteration of the algorithm contains two sub-steps, respectively updating the rain map (convoluted by the learned rain kernels) and background layer, and each stage of the RCDNet also contains two sub-networks (M-net and B-net). Each output of the intermediate layer in the network is thus with clear interpretation, which greatly facilitates a deeper analysis on what happens inside the network during training, and a comprehensive understanding why the network works or not (as the analysis presented in Sec. 5.2).

Thirdly, comprehensive experimental results substantiate the superiority of the RCDNet beyond SOTA conventional prior-based and current DL-based methods both quantitatively and visually. Especially, attributed to its well interpretability, not only the underlying rationality and insights of the network can be intuitively understood through visualizing the amelioration process (like the gradually rectified background and rain maps) over all network layers by general users, but also the network can yield generally useful rain kernels for expressing rain shapes and proximal operators for delivering the prior knowledge of background and rain maps for a rainy image, facilitating their general availability to more real-world rainy images.

The paper is organized as follows. Sec. 2 reviews the related rain removal work. Sec. 3 presents the RCD model for rain removal as well as the algorithm designed for solving it. Then Sec. 4 introduces the unfolding deep network for the algorithm. The experimental results are demonstrated in Section 5 and the paper is finally concluded.

2 Related work

In this section, we give a brief review on the most related work on rain removal for images. Depending on the input data, the existing algorithms can be categorized into two groups: video based and single image based ones.

2.1 Video deraining methods

Garg and Nayar [10] first tried to analyze the visual effects of raindrops on imaging systems, and utilized a space-time correlation model to capture the dynamics of raindrops and a physics-based motion blur model to illustrate the photometry of rain. For better visual quality, they further proposed to increase the exposure time or reduce the depth of field of a camera [12, 11]

. Later, both temporal and chromatic properties of rain were considered and then background layer was extracted from rainy video by utilizing different strategies such as K-means clustering 


, Kalman filter 

[33], and GMM [3]. Besides, a spatio-temporal frequency based raindrop detection method was provided in [1].

In recent years, researchers introduced more intrinsic characteristics of rainy video to the task, e.g., similarity and repeatability of rain streaks [4], low-rankness among multi-frames [20], and sparsity and smoothness of rain streaks [18]. To handle heavy rain and dynamic scenes, a matrix decomposition based video deraining algorithm was presented in [36]. Afterwards, rain streaks were encoded as a patch based GMM to adapt a wider range of rain variations [45]. More characteristics of rain streaks in a rainy video were explored including repetitive local patterns and multi-scale configurations and they were described as multiscale convolutional sparse coding model [25]. More recently, there are some DL-based methods proposed for this task. Chen et al. [19] presented a CNN architecture and utilized superpixel to handle torrential rain fall with opaque streak occlusions. To further improve visual quality, Liu et al. [30] designed a joint recurrent rain removal and reconstruction network that integrated rain degradation classification, rain removal, and background details reconstruction. To handle dynamic video contexts, they further developed a dynamic routing residue recurrent network [29]. Though these methods work well for videos, they cannot directly perform in single image cases due to the lack of temporal knowledge.

2.2 Single image deraining methods

Compared with video deraining task under a sequence of images, rain removal from a single image is much more challenging. The early attempts utilized the model-driven strategies by decomposing a single rainy image into low frequency part (LFP) and high frequency part (HFP) and then specifically extracted rain layer from the HFP based on various processing such as guided filter [6, 21] and nonlocal means filtering [23]. Later, researchers made more focus on exploring the prior knowledge of rain and rain-free layers of a rainy image, and designing proper regularizer to extract and separate them [22, 38, 51, 28, 42, 56]. E.g., [13] considered the specific sparsity characteristics of rain-free and rain parts and expressed them as the joint analysis and synthesis sparse representation models, respectively. [15] used a similar manner to deliver local repetitive patterns of rain streaks across the image as the RCD model. Albeit achieving good performance on certain scenarios, these prior-based methods rely on the subjective prior assumptions, while could not always generally work well for practical complicated and highly diverse rain shapes in real rainy images collected from different resources.

Recently, a number of DL-based single image rain streak removal methods were proposed through constructing diverse network modules [8, 9, 27, 52, 53]. To handle heavy rain, Yang et al. [49]

developed a multi-stage joint rain detection and estimation network for single image (JORDER_E). Very recently, Ren 

et al. [35] designed a PReNet that repeatedly unfolded several Resblocks and a LSTM layer. Wang et al. [41] presented an attention unit based SPANet for removing rain in a local-to-global manner. Through using abundant rainy/clean image pairs to train the deep model, these methods achieve favorable visual quality and SOTA quantitative measures of derained results. Most of these methods, however, just utilize network modules assembled with some off-the-shelf components in current DL toolkits to directly learn background layer in an end-to-end way, and largely ignore the intrinsic prior structures inside the rain streaks. This makes them lack of evident interpretability in their network architectures and still have room for further performance enhancement.

At present, there is a new type of single image derainers that try to combine prior and DL methodologies. For example, Mu et al. [32] utilized CNN to implicitly learn prior knowledge for background and rain streaks, and formulated them into traditional bi-layer optimization iterations. Wei et al. [44] provided a semi-supervised rain removal method (SIRR) that described rain layer prior as a general GMM and jointly trained the backbone–DDN. Albeit obtaining initial success, they still use CNN architectures as their main modules to construct the network, which is thus still lack of sufficient interpretability.

3 RCD model for single image deraining

3.1 Model formulation

For a observed color rainy image denoted as , where and are the height and width of the image, respectively, it can be rationally separated as:


where and represent the background and rain layers of the image, respectively. Then, the aim of most of DL-based deraining methods is to estimate the mapping function (expressed by a deep network) from to (or ).

Instead of heuristically constructing a complex deep network architecture, we first consider the problem under the conventional prior-based methodology through exploiting the prior knowledge for representing rain streaks

[13, 15, 25]. Specifically, as shown in Fig. 1 (a), by adopting the RCD mechanism, the rain layer can be modeled as:


where denotes the color channel of , and is a set of rain kernels which describes the repetitive local patterns of rain streaks, and represents the corresponding rain maps representing the locations where local patterns repeatedly appear. is the number of kernels and is the 2-dimensional (2D) convolutional operation. For conciseness, we rewrite (2) as throughout the paper, where

is the tensor form of

s and the convolution is performed between and the matrix one channel by one channel. Then, we can rewrite the model (1) as:


It should be noted that the rain kernels actually can be viewed a set of convolutional dictionary [16] for representing repetitive and similar local patterns underlying rain streaks, and a small number of rain kernels can finely represent wide range of rain shapes111We simply set for all our experiments.. They are common knowledge for representing different rain types across all rainy images, and thus could be learned from abundant training data by virtue of the strong learning capability of end-to-end training manner of deep learning (see more details in Sec. 4). Unlike rain kernels, the rain maps must vary with the input rainy image as the locations of rain streaks are totally random. Therefore, for predicting the clean image from a testing input rainy one, the key issue is to output s and from with the rain kernels s fixed, and the corresponding optimization problem is:


where is the tensor form of s. and are trade-off parameters. and mean the regularizers to deliver the prior structures of and , respectively.

Figure 2: (a) The proposed network with stages. The network takes a rainy image as input and outputs the learned rain kernel , rain map , and clean background image . (b) Illustration of the network architecture at the stage. Each stage consists of M-net and B-net to accomplish the update of rain map and background layer , respectively. The images are better to be zoomed in on screen.

3.2 Optimization algorithm

Since we want to build a possibly perfect step-by-step corresponding deep unfolding network architecture for solving the problem (4), it is critical to build an algorithm which contains only simple computations easy to be transformed to network modules. The traditional solvers for RCD-based model usually contain certain complicated operations, e.g., the Fourier transform and inverse Fourier transform [16, 46, 25], which are hard to accomplish such exact transformation from algorithm to network structure. We thus prefer to build a new algorithm for solving the problem through alternately updating and by proximal gradient method [2]. In this manner, only simple computations can be involved. The details are as follows:

Updating : The rain maps can be updated by solving the quadratic approximation [2] of the problem (4) as:


where is the updating result of the last iteration, is the stepsize parameter, and . Corresponding to general regularization terms [7], the solution of Eq. (5) is:


Moreover, by substituting


where is a 4-D tensor stacked by s, and denotes the transposed convolution222For any tensor , we can calculate the channel of by . , we can obtain the updating formula for as333It can be proved that, with small enough and , Eq. (8) and Eq. (10) can both lead to the reduction of objective function (4[2]. :


where is the proximal operator dependent on the regularization term with respect to . Instead of choosing a fixed regularizer in the model, the form of the proximal operator can be automatically learned from training data. More details will be presented in the next section.

Updating : Similarly, the quadratic approximation of the problem (4) with respect to is:


where and it is easy to deduce that the final updating rule for is3:


where is the proximal operator correlated to the regularization term with respect to .

Based on this iterative algorithm, we can then construct our deep unfolding network as follows.

4 The rain convolutional dictionary network

Inspired by the recently raised deep unfolding techniques in various tasks such as deconvolution [54], compressed sensing [50], and dehazing [48], we build a network structure for single image rain removal task by unfolding each iterative steps of the aforementioned algorithm as the corresponding network module. We especially focus on making all network modules one-to-one corresponding to the algorithm implementation operators, for better interpretability.

As shown in Fig. 2 (a), the proposed network consists of stages, corresponding to iterations of the algorithm for solving  (4). Each stage achieves the sequential updates of and by M-net and B-net. As displayed in Fig. 2 (b), exactly corresponding to each iteration of the algorithm, in each stage of the network, M-net takes the observed rainy image and the previous outputs and as inputs, and outputs an updated , and then B-net takes and as inputs, and outputs an updated .

4.1 Network design

The key issue of unrolling the algorithm here is how to represent the two proximal operators involved in (8) and (10) while other operations can be naturally performed with generally used operators in normal networks [34]. In this work, we simply choose a ResNet [14] to construct the two proximal operators as many other works did [47, 48]. Then, we can separately decompose the updating rules for as (8) and as (10) into sub-steps and achieve the following procedures for the stage of the RCDNet:


where and are two ResNets consisting of several Resblocks with the parameters and at the stage, respectively.

We can then design the network architecture, as shown in Fig. 2, by transforming the operators in (11) and (12) step-by-step. All the parameters involved can be automatically learned from training data in an end-to-end manner, including , rain kernels , , and .

It should be indicated that both of the two sub-networks are very interpretable. As shown in Fig. 2 (b), the M-net accomplishes the extraction of residual information of rain maps. Specifically, is the rain layer estimated with the previous background , and is the rain layer achieved by the generative model (2) with the estimated . Then the M-net calculates the residual information between the two rain layers obtained in this two ways, and extracts the residual information of rain maps with the transposed convolution of rain kernels to update the rain map. Next, the B-net recovers the background estimated with current rain kernel and rain maps , and fuses this estimated with the previously estimated by weighted parameters and () to get the updated background . Here, we set as 0 and initialize by a convolutional operator on 444More network design details are described in supplemental file..

Remark: From Fig. 2, the input tensor of has the same size as the to-be-estimated . Evidently, this is not beneficial for learning since most of the previous updating information would be compressed due to few channels. To better keep and deliver image features, we simply expand the input tensor at the mode for more channels in experiments (see more in supplemental file).

4.2 Network training

Training loss. For simplicity, we adopt the mean square error (MSE) [21] for the learned background and rain layer at every stage as the training objective function:


where and separately denote the derained result and extracted rain layer as expressed in (12) at the stage (). and are tradeoff parameters555In all experiments, we simply set to make the outputs at the final stage play a dominant role, and other parameters as 0.1 to help find the correct parameter in each stage. More parameter settings are discussed in supplementary material..

Implement details. We implement our network based on a NVIDIA GeForce GTX 1080Ti GPU. We adopt the Adam optimizer [24] with the batch size of 16 and the patch size of 6464. The initial learning rate is

and divided by 5 every 25 epochs. The total epoch is 100.

5 Experimental results

We first conduct ablation study and model visualization to verify the underlying mechanism of the proposed network, and then present experiments on synthesized benchmark datasets and real datasets for performance evaluation.

5.1 Ablation study

Dataset and performance metrics. In this section, we use Rain100L to perform all the ablation studies. The synthesized dataset consists of 200 rainy/clean image pairs for training and 100 pairs for testing [49]

. Two performance metrics are employed, including peak-signal-to-noise ratio (PSNR) 

[17] and structure similarity (SSIM) [43]. Note that as the human visual system is sensitive to the Y channel of a color image in YCbCr space, we compute PSNR and SSIM based on this luminance channel.

Stage No. =0 =2 =5 =8 =11 =14 =17 =20
PSNR 35.93 38.46 39.35 39.60 39.81 39.90 40.00 39.91
SSIM 0.9689 0.9813 0.9842 0.9850 0.9855 0.9858 0.9860 0.9858
Table 1: Effect of stage number on the performance of RCDNet.

Table 1 reports the effect of stage number on deraining performance of our network. Here, means that the initialization is directly regraded as the recovery result. Taking as a baseline, it is seen that only with 2 stages, our method achieves significant rain removal performance, which validates the essential role of the proposed M-net and B-net. We also observe that when , its deraining performance is slightly lower than that of since larger would make gradient propagation more difficult. Based on such observation, we easily set as 17 throughout all our experiments. More ablation results and discussions are listed in supplementary material.

Figure 3: Visualization of the recovery background , as expressed in Eq. (12), and the rain layer at different stages. The stage number is 17. PSNR/SSIM for reference. The images are better to be zoomed in on screen.
Figure 4: At the final stage , the extracted rain layer, rain kernels , and rain maps for the input in Fig. 3. The lower left is the rain kernels learned from Rain100L. The images are better to be zoomed in on screen.

5.2 Model verification

We then show how the interpretability of this RCDNet facilitates an easy analysis for the working mechanism inside the network modules.

Figure 5: column: input rainy image (upper) and groundtruth (lower). - column: derained results (upper) and extracted rain layers (lower) by 11 competing methods. PSNR/SSIM for reference. Bold indicates top rank.

Fig. 3 presents the extracted background layer ( row), ( row) that represents the role of M-net in helping restore clean background, and rain layer ( row) at different stages. We can find that with the increase of , covers more rain streaks and fewer image details, and and are also gradually ameliorated. These should be attributed to the proper guidance of the RCD prior for rain streaks and the mutual promotion of M-net and B-net that enables the RCDNet to be evolved to a right direction.

Fig. 4 presents the learned rain kernels and the rain maps for the input in Fig. 3. Clearly, the RCDNet finely extracts proper rain layers explicitly based on the RCD model. This not only verifies the reasonability of our method but also manifests the peculiarity of our proposal. On one hand, we utilize a M-net to learn sparse rain maps instead of directly learning rain streaks that makes learning process easier. On the other hand, we exploit training data to automatically learn rain kernels representing general repetitive local patterns of rain with diverse shapes. This facilitates their general availability to more real-world rainy images.

Datasets Rain100L Rain100H Rain1400 Rain12
Input 26.90 0.8384 13.56 0.3709 25.24 0.8097 30.14 0.8555
DSC[51] 27.34 0.8494 13.77 0.3199 27.88 0.8394 30.07 0.8664
GMM[28] 29.05 0.8717 15.23 0.4498 27.78 0.8585 32.14 0.9145
JCAS[13] 28.54 0.8524 14.62 0.4510 26.20 0.8471 33.10 0.9305
Clear[8] 30.24 0.9344 15.33 0.7421 26.21 0.8951 31.24 0.9353
DDN[9] 32.38 0.9258 22.85 0.7250 28.45 0.8888 34.04 0.9330
RESCAN[27] 38.52 0.9812 29.62 0.8720 32.03 0.9314 36.43 0.9519
PReNet[35] 37.45 0.9790 30.11 0.9053 32.55 0.9459 36.66 0.9610
SPANet[41] 35.33 0.9694 25.11 0.8332 29.85 0.9148 35.85 0.9572
JORDER_E[49] 38.59 0.9834 30.50 0.8967 32.00 0.9347 36.69 0.9621
SIRR[44] 32.37 0.9258 22.47 0.7164 28.44 0.8893 34.02 0.9347
RCDNet 40.00 0.9860 31.28 0.9093 33.04 0.9472 37.71 0.9649
Table 2: PSNR and SSIM comparisons on four benchmark datasets. Bold and bold italic indicate top and rank, respectively.
Figure 6: Rain removal performance comparisons on a rainy image from SPA-Data. The images are better to be zoomed in on screen.
Figure 7: Derained results for two samples with various rain patterns from Internet-Data. The images are better to be zoomed in on screen.

5.3 Experiments on synthetic data

Comparison methods and datasets. We then compare our network with the current SOTA single image derainers, including model-based DSC [51], GMM [28], and JCAS [13]; DL-based Clear [8], DDN [9], RESCAN [27], PReNet [35], SPANet [41], JORDER_E [49], and SIRR [44]666The code/project links for these comparison methods are listed in supplementary material., based on four benchmark datasets, including Rain100L, Rain100H [49], Rain1400 [9], and Rain12 [28].

Fig. 5 illustrates the deraining performance of all competing methods on a rainy image from Rain100L. As shown, the deraining result of RCDNet is better than that of other methods in sufficiently removing the rain streaks and finely recovering the image textures. Moreover, the rain layer extracted by RCDNet contains fewer unexpected background details as compared with other competing methods. Our RCNet thus achieves the best PSNR and SSIM.

Table 2 reports the quantitative results of all competing methods. It is seen that our RCDNet attains best deraining performance among all methods on each dataset. This substantiates the flexibility and generality of our method, in diverse rain types contained in these datasets.

5.4 Experiments on real data

We then analyze the performance of all methods on two real datasets from [41]: the first one (called SPA-Data) contains 638492 rainy/clean image pairs for training and 1000 testing ones, and the second one (called Internet-Data) includes 147 rainy images without groundtruth.

Table 3 and Fig. 6 compare the derained results on SPA-Data of all competing methods visually and quantitatively. It is easy to see that even for such complex rain patterns, the proposed RCDNet still achieves an evident superior performance than other methods. Especially, similar to its superiority in synthetic experiments, it is also observed that our method better removes the rain streaks and recovers image details than other competing ones.

Methods Input DSC GMM JCAS Clear DDN
PSNR 34.15 34.95 34.30 34.95 34.39 36.16
SSIM 0.9269 0.9416 0.9428 0.9453 0.9509 0.9463
PSNR 38.11 40.16 40.24 40.78 35.31 41.47
SSIM 0.9707 0.9816 0.9811 0.9811 0.9411 0.9834
Table 3: PSNR and SSIM comparisons on SPA-Data [41].

Further, we select two real hard samples with various rain densities to evaluate the generalization ability of all competing methods. From Fig. 7, we can find that traditional model-based methods tend to leave obvious rain streaks. Although DL-based comparison methods remove apparent rain streaks, they still leave distinct rain marks or blur some image textures. Comparatively, our RCDNet better preserves background details as well as removes more rain streaks. This shows its good generalization capability to unseen complex rain types.

6 Conclusion

In this paper, we have explored the intrinsic prior structure of rain streaks that can be explicitly expressed as convolutional dictionary learning model, and proposed a novel interpretable network architecture for single image deraining. Each module in the network can one-to-one correspond to the implementation operators of the algorithm designed for solving the model, and thus the network is almost “white-box” with easily visualized interpretation for all its module elements. Comprehensive experiments implemented on synthetic and real rainy images validate that such interpretability brings a good effect of the proposed network, and especially facilitates the analysis for how it happens in the network and why it works in testing prediction process. The extracted elements through the end-to-end learning by the network, like the rain kernels, are also potentially useful for the related tasks on rainy images.

Acknowledgment. This research was supported by the China NSFC projects under contract 11690011, 61721002, U1811461 and MoE-CMCC “Artifical Intelligence” Project with No. MCM20190701


  • [1] P. C. Barnum, S. Narasimhan, and T. Kanade (2010) Analysis of rain and snow in frequency space. International journal of computer vision 86 (2-3), pp. 256. Cited by: §2.1.
  • [2] A. Beck and M. Teboulle (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences 2 (1), pp. 183–202. Cited by: §1, §3.2, §3.2, footnote 3.
  • [3] J. Bossu, N. Hautière, and J. Tarel (2011) Rain or snow detection in image sequences through use of a histogram of orientation of streaks. International journal of computer vision 93 (3), pp. 348–367. Cited by: §2.1.
  • [4] Y. L. Chen and C. T. Hsu (2013) A generalized low-rank appearance model for spatio-temporally correlated rain streaks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1968–1975. Cited by: §2.1.
  • [5] D. Comaniciu, V. Ramesh, and P. Meer (2003) Kernel-based object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (5), pp. 564–575. Cited by: §1.
  • [6] X. Ding, L. Chen, X. Zheng, H. Yue, and D. Zeng (2016) Single image rain and snow removal via guided l0 smoothing filter. Multimedia Tools and Applications 75 (5), pp. 2697–2712. Cited by: §2.2.
  • [7] D. L. Donoho (1995) De-noising by soft-thresholding. IEEE transactions on information theory 41 (3), pp. 613–627. Cited by: §3.2.
  • [8] X. Fu, J. Huang, X. Ding, Y. Liao, and J. Paisley (2017) Clearing the skies: a deep network architecture for single-image rain removal. IEEE Transactions on Image Processing 26 (6), pp. 2944–2956. Cited by: §1, §2.2, §5.3, Table 2.
  • [9] X. Fu, J. Huang, D. Zeng, H. Yue, X. Ding, and J. Paisley (2017) Removing rain from single images via a deep detail network. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    pp. 3855–3863. Cited by: §1, §2.2, §5.3, Table 2.
  • [10] K. Garg and S. K. Nayar (2004) Detection and removal of rain from videos. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1, pp. I–I. Cited by: §2.1.
  • [11] K. Garg and S. K. Nayar (2005) When does a camera see rain?. In Tenth IEEE International Conference on Computer Vision, Vol. 2, pp. 1067–1074. Cited by: §2.1.
  • [12] K. Garg and S. K. Nayar (2007) Vision and rain. International Journal of Computer Vision 75 (1), pp. 3–27. Cited by: §2.1.
  • [13] S. Gu, D. Meng, W. Zuo, and Z. Lei (2017) Joint convolutional analysis and synthesis sparse representation for single image layer separation. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1708–1716. Cited by: §1, §2.2, §3.1, §5.3, Table 2.
  • [14] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §4.1.
  • [15] Z. He and V. M. Patel (2017) Convolutional sparse and low-rank coding-based rain streak removal. In IEEE Winter Conference on Applications of Computer Vision, pp. 1259–1267. Cited by: §1, §2.2, §3.1.
  • [16] F. Huang and A. Anandkumar (2015) Convolutional dictionary learning through tensor factorization. Computer Science, pp. 1–30. Cited by: §1, §3.1, §3.2.
  • [17] Q. Huynh-Thu and M. Ghanbari (2008) Scope of validity of psnr in image/video quality assessment. Electronics Letters 44 (13), pp. 800–801. Cited by: §5.1.
  • [18] T. X. Jiang, T. Z. Huang, X. L. Zhao, L. J. Deng, and Y. Wang (2017) A novel tensor-based video rain streaks removal approach via utilizing discriminatively intrinsic priors. In Proceedings of the ieee conference on computer vision and pattern recognition, pp. 4057–4066. Cited by: §2.1.
  • [19] C. Jie, C. H. Tan, J. Hou, L. P. Chau, and L. He (2018) Robust video content alignment and compensation for rain removal in a cnn framework. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6286–6295. Cited by: §2.1.
  • [20] K. Jin-Hwan, S. Jae-Young, and K. Chang-Su (2015) Video deraining and desnowing using temporal correlation and low-rank matrix completion. IEEE Transactions on Image Processing 24 (9), pp. 2658–2670. Cited by: §2.1.
  • [21] X. Jing, Z. Wei, L. Peng, and X. Tang (2012) Removing rain and snow in a single image using guided filter. In IEEE International Conference on Computer Science and Automation Engineering, Vol. 2, pp. 304–307. Cited by: §2.2, §4.2.
  • [22] L. W. Kang, C. W. Lin, and Y. H. Fu (2012) Automatic single-image-based rain streaks removal via image decomposition. IEEE Transactions on Image Processing 21 (4), pp. 1742–1755. Cited by: §2.2.
  • [23] J. H. Kim, C. Lee, J. Y. Sim, and C. S. Kim (2014) Single-image deraining using an adaptive nonlocal means filter. In IEEE International Conference on Image Processing, pp. 914–917. Cited by: §2.2.
  • [24] D. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. Computer Science. Cited by: §4.2.
  • [25] M. Li, Q. Xie, Q. Zhao, W. Wei, S. Gu, J. Tao, and D. Meng (2018) Video rain streak removal by multiscale convolutional sparse coding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6644–6653. Cited by: §1, §2.1, §3.1, §3.2.
  • [26] S. Li, I. B. Araujo, W. Ren, Z. Wang, E. K. Tokuda, R. H. Junior, R. Cesar-Junior, J. Zhang, X. Guo, and X. Cao (2019) Single image deraining: a comprehensive benchmark analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3838–3847. Cited by: §1.
  • [27] X. Li, J. Wu, Z. Lin, H. Liu, and H. Zha (2018) Recurrent squeeze-and-excitation context aggregation net for single image deraining. In Proceedings of the European Conference on Computer Vision, pp. 254–269. Cited by: §1, §2.2, §5.3, Table 2.
  • [28] Y. Li (2016) Rain streak removal using layer priors. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2736–2744. Cited by: §1, §2.2, §5.3, Table 2.
  • [29] J. Liu, W. Yang, S. Yang, and Z. Guo (2018) D3R-net: dynamic routing residue recurrent network for video rain removal. IEEE Transactions on Image Processing 28 (2), pp. 699–712. Cited by: §2.1.
  • [30] J. Liu, W. Yang, S. Yang, and Z. Guo (2018) Erase or fill? deep joint recurrent rain removal and reconstruction in videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3233–3242. Cited by: §2.1.
  • [31] O. Ludwig, D. Delgado, V. Goncalves, and U. Nunes (2009)

    Trainable classifier-fusion schemes: an application to pedestrian detection

    In International IEEE Conference on Intelligent Transportation Systems, pp. 1–6. Cited by: §1.
  • [32] P. Mu, J. Chen, R. Liu, X. Fan, and Z. Luo (2019) Learning bilevel layer priors for single image rain streaks removal. IEEE Signal Processing Letters 26 (2), pp. 307–311. Cited by: §2.2.
  • [33] W. Park and K. Lee (2008) Rain removal using kalman filter in video. In International Conference on Smart Manufacturing Application, pp. 494–497. Cited by: §2.1.
  • [34] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer (2017)

    Automatic differentiation in pytorch

    Cited by: §4.1.
  • [35] D. Ren, W. Zuo, Q. Hu, P. Zhu, and D. Meng (2019) Progressive image deraining networks: a better and simpler baseline. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3937–3946. Cited by: §1, §2.2, §5.3, Table 2.
  • [36] W. Ren, J. Tian, H. Zhi, A. Chan, and Y. Tang (2017) Video desnowing and deraining based on matrix decomposition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4210–4219. Cited by: §2.1.
  • [37] M. S. Shehata, J. Cai, W. M. Badawy, T. W. Burr, M. S. Pervez, R. J. Johannesson, and A. Radmanesh (2008) Video-based automatic incident detection for smart roads: the outdoor environmental challenges regarding false alarms. IEEE Transactions on Intelligent Transportation Systems 9 (2), pp. 349–360. Cited by: §1.
  • [38] S. Sun, S. Fan, and Y. F. Wang (2014) Exploiting image structural similarity for single image rain removal. In IEEE International Conference on Image Processing (ICIP), pp. 4482–4486. Cited by: §2.2.
  • [39] H. Wang, Y. Wu, M. Li, Q. Zhao, and D. Meng (2019) A survey on rain removal from video and single image. arXiv:1909.08326. Cited by: §1.
  • [40] H. Wang, Q. Xie, Y. Wu, Q. Zhao, and D. Meng (2020) Single image rain streaks removal: a review and an exploration.

    International Journal of Machine Learning and Cybernetics

    , pp. 1–20.
    Cited by: §1.
  • [41] T. Wang, X. Yang, K. Xu, S. Chen, Q. Zhang, and R. W. Lau (2019) Spatial attentive single-image deraining with a high quality real rain dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12270–12279. Cited by: §1, §2.2, §5.3, §5.4, Table 2, Table 3.
  • [42] Y. Wang, S. Liu, C. Chen, and B. Zeng (2017) A hierarchical approach for rain or snow removing in a single color image. IEEE Transactions on Image Processing 26 (8), pp. 3936–3950. Cited by: §2.2.
  • [43] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Processing 13 (4), pp. 600–612. Cited by: §5.1.
  • [44] W. Wei, D. Meng, Q. Zhao, Z. Xu, and Y. Wu (2019)

    Semi-supervised transfer learning for image rain removal

    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3877–3886. Cited by: §2.2, §5.3, Table 2.
  • [45] W. Wei, L. Yi, Q. Xie, Q. Zhao, D. Meng, and Z. Xu (2017) Should we encode rain streaks in video as deterministic or stochastic?. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2516–2525. Cited by: §2.1.
  • [46] B. Wohlberg (2014) Efficient convolutional sparse coding. In IEEE International Conference on Acoustics, Speech and Signal Processing, Cited by: §3.2.
  • [47] Q. Xie, M. Zhou, Q. Zhao, D. Meng, W. Zuo, and Z. Xu (2019) Multispectral and hyperspectral image fusion by ms/hs fusion net. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1585–1594. Cited by: §4.1.
  • [48] D. Yang and J. Sun (2018) Proximal dehaze-net: a prior learning-based deep network for single image dehazing. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 702–717. Cited by: §4.1, §4.
  • [49] W. Yang, R. T. Tan, J. Feng, J. Liu, S. Yan, and Z. Guo (2019) Joint rain detection and removal from a single image with contextualized deep networks. IEEE Transactions on Pattern Analysis and Machine Intelligence PP (99), pp. 1–1. Cited by: §2.2, §5.1, §5.3, Table 2.
  • [50] Y. Yang, J. Sun, H. Li, and Z. Xu (2017) ADMM-net: a deep learning approach for compressive sensing mri. arXiv preprint arXiv:1705.06869. Cited by: §4.
  • [51] L. Yu, X. Yong, and J. Hui (2015) Removing rain from a single image via discriminative sparse coding. In Proceedings of the IEEE International Conference on Computer Vision, pp. 3397–3405. Cited by: §1, §2.2, §5.3, Table 2.
  • [52] H. Zhang and V. M. Patel (2018) Density-aware single image de-raining using a multi-stream dense network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 695–704. Cited by: §1, §2.2.
  • [53] H. Zhang, V. Sindagi, and V. M. Patel (2019) Image de-raining using a conditional generative adversarial network. IEEE Transactions on Circuits and Systems for Video Technology. Cited by: §1, §2.2.
  • [54] J. Zhang, J. Pan, W. Lai, R. W. Lau, and M. Yang (2017) Learning fully convolutional networks for iterative non-blind deconvolution. Cited by: §4.
  • [55] X. Zhang, H. Li, Y. Qi, W. K. Leow, and T. K. Ng (2006) Rain removal in video by combining temporal and chromatic properties. In IEEE International Conference on Multimedia and Expo, pp. 461–464. Cited by: §2.1.
  • [56] L. Zhu, C. W. Fu, D. Lischinski, and P. A. Heng (2017) Joint bi-layer optimization for single-image rain streak removal. In Proceedings of the IEEE international conference on computer vision, pp. 2526–2534. Cited by: §2.2.