Progressive Image Deraining Networks: A Better and Simpler Baseline (CVPR 2019)
Along with the deraining performance improvement of deep networks, their structures and learning become more and more complicated and diverse, making it difficult to analyze the contribution of various network modules when developing new deraining networks. To handle this issue, this paper provides a better and simpler baseline deraining network by considering network architecture, input and output, and loss functions. Specifically, by repeatedly unfolding a shallow ResNet, progressive ResNet (PRN) is proposed to take advantage of recursive computation. A recurrent layer is further introduced to exploit the dependencies of deep features across stages, forming our progressive recurrent network (PReNet). Furthermore, intra-stage recursive computation of ResNet can be adopted in PRN and PReNet to notably reduce network parameters with graceful degradation in deraining performance. For network input and output, we take both stage-wise result and original rainy image as input to each ResNet and finally output the prediction of residual image. As for loss functions, single MSE or negative SSIM losses are sufficient to train PRN and PReNet. Experiments show that PRN and PReNet perform favorably on both synthetic and real rainy images. Considering its simplicity, efficiency and effectiveness, our models are expected to serve as a suitable baseline in future deraining research. The source codes are available at https://github.com/csdwren/PReNet.READ FULL TEXT VIEW PDF
Progressive Image Deraining Networks: A Better and Simpler Baseline (CVPR 2019)
Rain is a common weather condition, and has severe adverse effect on not only human visual perception but also the performance of various high level vision tasks such as image classification, object detection, and video surveillance [14, 7]. Single image deraining aims at restoring clean background image from a rainy image, and has drawn considerable recent research attention. For example, several traditional optimization based methods [1, 21, 22, 9] have been suggested for modeling and separating rain streaks from background clean image. However, due to the complex composition of rain and background layers, image deraining remains a challenging ill-posed problem.
|Rainy image||RESCAN |
|(a) PRN and the illustration of PRN with stages recursion|
|(b) PReNet and the illustration of PReNet with stages recursion|
Driven by the unprecedented success of deep learning in low level vision[33, 3, 18, 15, 27]
, recent years have also witnessed the rapid progress of deep convolutional neural network (CNN) in image deraining. In, Fu et al. show that it is difficult to train a CNN for directly predicting background image from rainy image, and utilize a 3-layer CNN to remove rain streaks from high-pass detail layer instead of the input image. Subsequently, other formulations are also introduced, such as residual learning for predicting rain steak layer , joint detection and removal of rain streaks 
, and joint rain density estimation and deraining.
On the other hand, many modules are suggested to constitute different deraining networks, including residual blocks [6, 10], dilated convolution [30, 29], dense blocks , squeeze-and-excitation , and recurrent layers [20, 25]. Multi-stream  and multi-stage  networks are also deployed to capture multi-scale characteristics and to remove heavy rain. Moreover, several models are designed to improve computational efficiency by utilizing lightweight networks in a cascaded scheme  or a Laplacian pyramid framework , but at the cost of obvious degradation in deraining performance. To sum up, albeit the progress of deraining performance, the structures of deep networks become more and more complicated and diverse. As a result, it is difficult to analyze the contribution of various modules and their combinations, and to develop new models by introducing modules to existing deeper and complex deraining networks.
In this paper, we aim to present a new baseline network for single image deraining to demonstrate that: (i) by combining only a few modules, a better and simpler baseline network can be constructed and achieve noteworthy performance gains over state-of-the-art deeper and complex deraining networks, (ii) unlike , the improvement of deraining networks may ease the difficulty of training CNNs to directly recover clean image from rainy image. Moreover, the simplicity of baseline network makes it easier to develop new deraining models by introducing other network modules or modifying the existing ones.
To this end, we consider the network architecture, input and output, and loss functions to form a better and simpler baseline networks. In terms of network architecture, we begin with a basic shallow residual network (ResNet) with five residual blocks. Then, progressive ResNet (PRN) is introduced by recursively unfolding the ResNet into multiple stages without the increase of model parameters (see Fig. 2(a)). Moreover, a recurrent layer [11, 26] is introduced to exploit the dependencies of deep features across recursive stages to form the PReNet in Fig. 2(b). From Fig. 1, a 6-stage PReNet can remove most rain streaks at the first stage, and then remaining rain streaks can be progressively removed, leading to promising deraining quality at the final stage. Furthermore, PRN and PReNet are presented by adopting intra-stage recursive unfolding of only one ResBlock, which reduces network parameters only at the cost of unsubstantial performance degradation.
Using PRN and PReNet, we further investigate the effect of network input/output and loss function. In terms of network input, we take both stage-wise result and original rainy image as input to each ResNet, and empirically find that the introduction of original image does benefit deraining performance. In terms of network output, we adopt the residual learning formulation by predicting rain streak layer. Moreover, in comparison to , it is feasible to directly learn a PRN or PReNet model for predicting clean background from rainy image. Finally, instead of hybrid losses with careful hyper-parameters tuning [4, 6], a single negative SSIM  or MSE loss can readily train PRN and PReNet with favorable deraining performance.
Comprehensive experiments have been conducted to evaluate our baseline networks PRN and PReNet. On four synthetic datasets, our PReNet and PRN are computationally very efficient, and achieve much better quantitative and qualitative deraining results in comparison with the state-of-the-art methods. In particular, on heavy rainy dataset Rain100H , the performance gains by our PRN and PReNet are still significant. The visually pleasing deraining results on real rainy images and videos have also validated the generalization ability of the trained PReNet and PRN models.
The contribution of this work is four-fold:
Baseline deraining networks, i.e., PRN and PReNet, are proposed, by which better and simpler networks can work well in removing rain streaks, and provide a suitable basis to future studies on image deraining.
By taking advantage of intra-stage recursive computation, PRN and PReNet are also suggested to reduce network parameters while maintaining state-of-the-art deraining performance.
Using PRN and PReNet, the deraining performance can be further improved by taking both stage-wise result and original rainy image as input to each ResNet, and our progressive networks can be readily trained with single negative SSIM or MSE loss.
Extensive experiments show that our baseline networks are computationally very efficient, and perform favorably against state-of-the-arts on both synthetic and real rainy images.
In this section, we present a brief review on two representative types of deraining methods, i.e., traditional optimization-based and deep network-based ones.
. Then, image deraining can be formulated by incorporating with proper regularizers on both background image and rain layer, and solved by specific optimization algorithms. Among these methods, Gaussian mixture model (GMM), sparse representation , and low rank representation  have been adopted for modeling background image or a rain layers. Based on linear summation model, optimization-based methods have been also extended for video deraining [8, 12, 13, 16, 19]. On the other hand, screen blend model  is assumed to be more realistic for the composition of rainy image, based on which Luo et al.  use the discriminative dictionary learning to separate rain streaks by enforcing the two layers share fewest dictionary atoms. However, the real composition generally is more complicated and the regularizers are still insufficient in characterizing background and rain layers, making optimization-based methods remain limited in deraining performance.
When applied deep network to single image deraining, one natural solution is to learn a direct mapping to predict clean background image from rainy image . However, it is suggested that plain fully convolutional networks (FCN) are ineffective in learning the direct mapping [5, 6]. Instead, Fu et al. [5, 6] apply a low-pass filter to decompose into a base layer and a detail layer . By assuming , FCNs are then deployed to predict from . In contrast, Li et al.  adopt the residual learning formulation to predict rain layer from . More complicated learning formulations, such as joint detection and removal of rain streaks , and joint rain density estimation and deraining , are also suggested. And adversarial loss is also introduced to enhance the texture details of deraining result [32, 25]. In this work, we show that the improvement of deraining networks actually eases the difficulty of learning, and it is also feasible to train PRN and PReNet to learn either direct or residual mapping.
For the architecture of deraining network, Fu et al. first adopt a shallow CNN  and then a deeper ResNet . In , a multi-task CNN architecture is designed for joint detection and removal of rain streaks, in which contextualized dilated convolution and recurrent structure are adopted to handle multi-scale and heavy rain steaks. Subsequently, Zhang et al.  propose a density aware multi-stream densely connected CNN for joint estimating rain density and removing rain streaks. In , attentive-recurrent network is developed for single image raindrop removal. Most recently, Li et al.  recurrently utilize dilated CNN and squeeze-and-excitation blocks to remove heavy rain streaks. In comparison to these deeper and complex networks, our work incorporates ResNet, recurrent layer and multi-stage recursion to constitute a better, simpler and more efficient deraining network.
Besides, several lightweight networks, e.g., cascaded scheme  and Laplacian pyrimid framework  are also developed to improve computational efficiency but at the cost of obvious performance degradation. As for PRN and PReNet, we further introduce intra-stage recursive computation to reduce network parameters while maintaining state-of-the-art deraining performance, resulting in our PRN and PReNet models.
In this section, progressive image deraining networks are presented by considering network architecture, input and output, and loss functions. To this end, we first describe the general framework of progressive networks as well as input/output, then implement the network modules, and finally discuss the learning objectives of progressive networks.
A simple deep network generally cannot succeed in removing rain streaks from rainy images [5, 6]. Instead of designing deeper and complex networks, we suggest to tackle the deraining problem in multiple stages, where a shallow ResNet is deployed at each stage. One natural multi-stage solution is to stack several sub-networks, which inevitably leads to the increase of network parameters and susceptibility to over-fitting. In comparison, we take advantage of inter-stage recursive computation [15, 27, 20] by requiring multiple stages share the same network parameters. Besides, we can incorporate intra-stage recursive unfolding of only 1 ResBlock to significantly reduce network parameters with graceful degradation in deraining performance.
We first present a progressive residual network (PRN) as shown in Fig. 2(a). In particular, we adopt a basic ResNet with three parts: (i) a convolution layer receives network inputs, (ii) several residual blocks (ResBlocks) extract deep representation, and (iii) a convolution layer outputs deraining results. The inference of PRN at stage can be formulated as
where , and are stage-invariant, i.e., network parameters are reused across different stages.
We note that takes the concatenation of the current estimation and rainy image as input. In comparison to only in , the inclusion of can further improve the deraining performance. The network output can be the prediction of either rain layer or clean background image. Our empirical study show that, although predicting rain layer performs moderately better, it is also possible to learn progressive networks for predicting background image.
We further introduce a recurrent layer into PRN, by which feature dependencies across stages can be propagated to facilitate rain streak removal, resulting in our progressive recurrent network (PReNet). The only difference between PReNet and PRN is the inclusion of recurrent state ,
where the recurrent layer takes both and the recurrent state as input at stage .
can be implemented using either convolutional Long Short-Term Memory (LSTM)[11, 26]
or convolutional Gated Recurrent Unit (GRU). In PReNet, we choose LSTM due to its empirical superiority in image deraining.
The architecture of PReNet is shown in Fig. 2(b). By unfolding PReNet with recursive stages, the deep representation that facilitates rain streak removal are propagated by recurrent states. The deraining results at intermediate stages in Fig. 1 show that the heavy rain streak accumulation can be gradually removed stage-by-stage.
We hereby present the network architectures of PRN and PReNet. All the filters are with size
and padding. Generally, is a 1-layer convolution with ReLU nonlinearity , includes 5 ResBlocks and is also a 1-layer convolution. Due to the concatenation of 3-channel RGB and 3-channel RGB , the convolution in has 6 and 32 channels for input and output, respectively. takes the output of (or ) with 32 channels as input and outputs 3-channel RGB image for PRN (or PReNet). In , all the convolutions in LSTM have 32 input channels and 32 output channels. is the key component to extract deep representation for rain streak removal, and we provide two implementations, i.e., conventional ResBlocks shown in Fig. 3(a) and recursive ResBlocks shown in Fig. 3(b).
|(a) Conventional ResBlocks||(b) Recursive ResBlocks|
Conventional ResBlocks: As shown in Fig. 3(a), we first implement with 5 ResBlocks as its conventional form, i.e., each ResBlock includes 2 convolution layers followed by ReLU . All the convolution layers receive 32-channel features without downsampling or upsamping operations. Conventional ResBlocks are adopted in PRN and PReNet.
Recursive ResBlocks: Motivated by [15, 27], we also implement by recursively unfolding one ResBlock 5 times, as shown in Fig. 3(b). Since network parameters mainly come from ResBlocks, the intra-stage recursive computation leads to a much smaller model size, resulting in PRN and PReNet. We have evaluated the performance of recursive ResBlocks in Sec. 4.2, indicating its elegant tradeoff between model size and deraining performance.
Recently, hybrid loss functions, e.g., MSE+SSIM , +SSIM  and even adversarial loss , have been widely adopted for training deraining networks, but incredibly increase the burden of hyper-parameter tuning. In contrast, benefited from the progressive network architecture, we empirically find that a single loss function, e.g., MSE loss or negative SSIM loss , is sufficient to train PRN and PReNet.
For a model with stages, we have outputs, i.e., , ,…, . By only imposing supervision on the final output , the MSE loss is
and the negative SSIM loss is
where is the corresponding ground-truth clean image. It is worth noting that, our empirical study shows that negative SSIM loss outperforms MSE loss in terms of both PSNR and SSIM.
Moreover, recursive supervision can be imposed on each intermediate result,
where is the tradeoff parameter for stage . Experimental result in Sec. 4.1.1 shows that recursive supervision cannot achieve any performance gain when , but can be adopted to generate visually satisfying result at early stages.
In this section, we first conduct ablation studies to verify the main components of our methods, then quantitatively and qualitatively evaluate progressive networks, and finally assess PReNet on real rainy images and video. All the source code, pre-trained models and results can be found at https://github.com/csdwren/PReNet.
Our progressive networks are implemented using Pytorch, and are trained on a PC equipped with two NVIDIA GTX 1080Ti GPUs. In our experiments, all the progressive networks share the same training setting. The patch size is , and the batch size is 18. The ADAM  algorithm is adopted to train the models with an initial learning rate
, and ends after 100 epochs. When reaching 30, 50 and 80 epochs, the learning rate is decayed by multiplying.
All the ablation studies are conducted on a heavy rainy dataset  with 1,800 rainy images for training and 100 rainy images (Rain100H) for testing. However, we find that 546 rainy images from the 1,800 training samples have the same background contents with testing images. Therefore, we exclude these 546 images from training set, and train all our models on the remaining 1,254 training images.
Using PReNet () as an example, we discuss the effect of loss functions on deraining performance, including MSE loss, negative SSIM loss, and recursive supervision loss.
|(a) Rainy image||(b) PReNet-MSE||(c) PReNet-SSIM||(d) PReNet-RecSSIM|
Negative SSIM v.s. MSE. We train two PReNet models by minimizing MSE loss (PReNet-MSE) and negative SSIM loss (PReNet-SSIM), and Table 1 lists their PSNR and SSIM values on Rain100H. Unsurprisingly, PReNet-SSIM outperforms PReNet-MSE in terms of SSIM. We also note that PReNet-SSIM even achieves higher PSNR, partially attributing to that PReNet-MSE may be inclined to get trapped into poor solution. As shown in Fig. 4, the deraining result by PReNet-SSIM is also visually more plausible than that by PReNet-PSNR. Therefore, negative SSIM loss is adopted as the default in the following experiments.
Single v.s. Recursive Supervision. The negative SSIM loss can be imposed only on the final stage (PReNet-SSIM) in Eqn. (4) or recursively on each stage (PReNet-RecSSIM) in Eqn. (5). For PReNet-RecSSIM, we set and , where the tradeoff parameter for the final stage is larger than the others. From Table 1, PReNet-RecSSIM performs moderately inferior to PReNet-SSIM. As shown in Fig. 4, the deraining results by PReNet-SSIM and PReNet-RecSSIM are visually indistinguishable. The results indicate that a single loss on the final stage is sufficient to train progressive networks. Furthermore, Fig. 5 shows the intermediate PSNR and SSIM results at each stage for PReNet-SSIM ( = 6) and PReNet-RecSSIM ( = 6). It can be seen that PReNet-RecSSIM can achieve much better intermediate results than PReNet-SSIM, making PReNet-RecSSIM ( = 6) very promising in computing resource constrained environments by stopping the inference at any stage .
In this subsection, we assess the effect of several key modules of progressive networks, including recurrent layer, multi-stage recursion, and intra-stage recursion.
Recurrent Layer. Using PReNet (), we test two types of recurrent layers, i.e., LSTM (PReNet-LSTM) and GRU (PReNet-GRU). It can be seen from Table 3 that LSTM performs slightly better than GRU in terms of quantitative metrics, and thus is adopted as the default implementation of recurrent layer in our experiments. We further compare progressive networks with and without recurrent layer, i.e., PReNet and PRN, in Table 4, and obviously the introduction of recurrent layer does benefit the deraining performance in terms of PSNR and SSIM.
Intra-stage Recursion. From Table 4, intra-stage recursion, i.e., recursive ResBlocks, is introduced to significantly reduce the number of parameters of progressive networks, resulting in PRN and PReNet. As for deraining performance, it is reasonable to see that PRN and PReNet respectively achieve higher average PSNR and SSIM values than PRN and PReNet. But in terms of visual quality, PRN and PReNet are comparable with PRN and PReNet, shown as in Fig. 6.
|(a) Rainy image||(b) PRN||(c) PReNet||(d)PRN||(e) PReNet|
Recursive Stage Number . Table 2 lists the PSNR and SSIM values of four PReNet models with stages . One can see that PReNet with more stages (from 2 stages to 6 stages) usually leads to higher average PSNR and SSIM values. However, larger also makes PReNet more difficult to train. When , PReNet performs a little inferior to PReNet. Thus, we set in the following experiments.
Network Input. We also test a variant of PReNet by only taking at each stage as input to (i.e., PReNet), where such strategy has been adopted in [29, 20]. From Table 3, PReNet is obviously inferior to PReNet in terms of both PSNR and SSIM, indicating the benefit of receiving at each stage.
Network Output. We consider two types of network outputs by incorporating residual learning formulation (i.e., PReNet in Table 3) or not (i.e., PReNet-LSTM in Table 3). From Table 3, residual learning can make a further contribution to performance gain. It is worth noting that, benefited from progressive networks, it is feasible to learn PReNet for directly predicting clean background from rainy image, and even PReNet-LSTM can achieve appealing deraining performance.
|Method||GMM ||DDN ||RGN ||JORDER ||RESCAN ||PRN||PReNet||PRN||PReNet|
|Rainy image||GMM ||DDN ||RESCAN |
|Image Size||DDN ||JORDER ||RESCAN ||PRN||PReNet|
Our progressive networks are evaluated on three synthetic datasets, i.e., Rain100H , Rain100L  and Rain12 . Five competing methods are considered, including one traditional optimization-based method (GMM ) and three state-of-the-art deep CNN-based models, i.e., DDN , JORDER  and RESCAN , and one lightweight network (RGN ). For heavy rainy images (Rain100H) and light rainy images (Rain100L), the models are respectively trained, and the models for light rain are directly applied on Rain12. Since the source codes of RGN are not available, we adopt the numerical results reported in . As for JORDER, we directly compute average PSNR and SSIM on deraining results provided by the authors. We re-train RESCAN  for Rain100H with the default settings. Besides, we have tried to train RESCAN for light rainy images, but the results are much inferior to the others. So its results on Rain100L and Rain12 are not reported in our experiments.
Our PReNet achieves significant PSNR and SSIM gains over all the competing methods. We also note that for Rain100H, RESCAN  is re-trained on the full 1,800 rainy images, the performance gain by our PReNet trained on the strict 1,254 rainy images is still notable. Moreover, even PReNet can perform better than all the competing methods. From Fig. 7, visible dark noises along rain directions can still be observed from the results by the other methods. In comparison, the results by PRN and PReNet are visually more pleasing.
We further evaluate progressive networks on another dataset  which includes 12,600 rainy images for training and 1,400 rainy images for testing (Rain1400). From Table 6, all the four versions of progressive networks outperform DDN in terms of PSNR and SSIM. As shown in Fig. 8, the visual quality improvement by our methods is also significant, while the result by DDN still contains visible rain streaks.
Table 7 lists the running time of different methods based on a computer equipped with a NVIDIA GTX 1080Ti GPU. We only give the running time of DDN , JORDER  and RESCAN , due to the codes of the other competing methods are not available. We note that the running time of DDN  takes the separation of details layer into account. Unsurprisingly, PRN and PReNet are much more efficient due to its simple network architecture.
Using two real rainy images in Fig. 9, we compare PReNet with two state-of-the-art deep methods, i.e., JORDER  and DDN . It can be seen that PReNet and JORDER perform better than DDN in removing rain streaks. For the first image, rain streaks remain visible in the result by DDN, while PReNet and JORDER can generate satisfying deraining results. For the second image, there are more or less rain streaks in the results by DDN and JORDER, while the result by PReNet is more clear.
Finally, PReNet is adopted to process a rainy video in a frame-by-frame manner, and is compared with state-of-the-art video deraining method, i.e., FastDerain . As shown in Fig. 10, for frame #510, both FastDerain and our PReNet can remove all the rain streaks, indicating the performance of PReNet even without the help of temporal consistency. However, FastDerain fails in switching frames, since it is developed by exploiting the consistency of adjacent frames. As a result, for frame #571, #572 and 640, rain streaks are remained in the results by FastDerain, while our PReNet performs favorably and is not affected by switching frames and accumulation error.
|Frame #510||Frame #571||Frame #572||Frame #640|
In this paper, a better and simpler baseline network is presented for single image deraining. Instead of deeper and complex networks, we find that the simple combination of ResNet and multi-stage recursion, i.e., PRN, can result in favorable performance. Moreover, the deraining performance can be further boosted by the inclusion of recurrent layer, and stage-wise result is also taken as input to each ResNet, resulting in our PReNet model. Furthermore, the network parameters can be reduced by incorporating inter- and intra-stage recursive computation (PRN and PReNet). And our progressive deraining networks can be readily trained with single negative SSIM or MSE loss. Extensive experiments validate the superiority of our PReNet and PReNet on synthetic and real rainy images in comparison to state-of-the-art deraining methods. Taking their simplicity, effectiveness and efficiency into account, it is also appealing to exploit our models as baselines when developing new deraining networks.
Proceedings of the IEEE International Conference on Computer Vision, pages 1968–1975, 2013.
Image super-resolution using deep convolutional networks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2):295–307, 2016.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1715–1723, 2017.
A novel tensor-based video rain streaks removal approach via utilizing discriminatively intrinsic priors.In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
Proceedings of the International Conference on Machine Learning, pages 807–814, 2010.