Rain streaks degrade visual quality on images and video. Due to the block and blurred effect to objects in a rainy image, undesirable result of many outdoor computer vision applications like object detection (Ren et al., 2015a)
will be adversely affected. It is because most existing algorithms are trained with well-controlled conditions. Thus, designing an effective method for removing rain streaks is desirable for a wide range of practical applications. Deep learning has been introduced for this problem since Convolutional neural networks (CNN) have proven powerful for a variety of vision tasks.
However, existing models in rain streaks removal tasks tend to learn negative residual within a single model, these models have to be carefully designed with tones of parameters to capture different modalities of rain streaks. Also most methods optimized with Euclidean distance that will inevitably generate blurry predictions since the per-pixel losses do not close to
perceptual difference between output and ground-truth images as human visual perception (Johnson et al., 2016). Further, it is wasteful to utilize a resource-hungry model to meet all kinds of demands for rain streak removal tasks. For example, under light rainy conditions, a simple model can obtain a decent derain result, whereas a heavy rainy image should be handled with a computationally intensive model to detect rain streaks with different shapes and scales.
To address above drawbacks, we propose the residual-guide feature fusion network (ResGuideNet) in a cascaded architecture. Each block contains a global shortcut to predict residual (Fu et al., 2017b) which can make the learning process much easier. However, a simply cascaded basic building blocks is of difficulty to improve the reconstructed quality in deeper blocks. We conjecture that it is because a cascaded architecture may lost valuable intermediate reconstruction features which makes the deeper blocks difficult to learn new rain streak pattern. We then proposed to concatenate the predicted residuals from shallower to the deeper blocks. By using this simple operation, the shallower residuals can guide deeper predictions to generate a finer estimation as shown in Figure 1.
In addition, we apply supervision to all intermediate outputs which can obtain a coarse-to-fine residual as the blocks go deeper. The basic rain streak removal block is based on recursive computations with a proper shortcut strategy to reduce the number of network parameter while keeping good derain performance. The final recovered image merges all outputs of intermediate reconstruction which can be viewed as an ensemble learning.
The contributions of our paper are three-fold:
We build a single and separable network that can handle different rainy conditions. By maintaining negative residual features in shallow blocks to deep blocks, a coarse-to-fine estimation of negative rain streaks residual can be obtained. As the application scenario changes, the user can detach a portion of our model to meet varying computational requirements at test.
We apply supervision to all the intermediate and final reconstructions with a combined loss function. The model combines all intermediate results to obtain the final result, which can be viewed as ensemble learning.
We discuss how ResGuideNet can be applied to other low-level vision tasks including denoising and the reconstructed images could benefit down-stream applications such as object detection.
2. Related Works
Depending on the input format, existing rain streak removal algorithm can be roughly categorized into video-based methods and single-image methods. For video-based methods (Barnum et al., 2010) (Bossu et al., 2011) (Kim et al., 2015) (Garg and Nayar, 2004) (Santhaseelan and Asari, 2015), inter-frame information between adjacent frames is leveraged to identify rainy region and remove rain streaks.
Removing rain streaks from single-image is more challenging since less information can be utilized. (Kang et al., 2012) attempt to extract rain streaks and background details from high-frequency layer by sparse-coding based dictionary learning. (Luo et al., 2015) proposed a framework to rain removal based on discriminative sparse coding. (Li et al., 2016)
learn background from pre-collected natural images and rains from rainy images by utilizing two Gaussian mixture models (GMMs).
Deep learning has also introduced for restoration problems and convolutional neural networks (CNN) have found great success in processing many computer vision problems. The first CNN-based method for single image deraining was introduced by (Fu et al., 2017a). The authors build a relative shallow network with 3 layers to learn the mapping function. In (Fu et al., 2017b), combining with ResNet (He et al., 2016b) (He et al., 2016a), the authors present a deep detail network(DDN) to learn residual with the high frequency part of rainy images. In (Zhang et al., 2017), the author proposed a conditional GAN-based algorithm for removal of rain streak from a single image. (Yang et al., 2017)learn binary rain region mask rand remove the rain streaks simultaneously through a multi-scale network(JORDER). (Zhang and Patel, 2018b) utilize the rain density information with a multi-stream densely connected network (DID-MDN) for jointly rain-density estimation and deraining. Further, single image dehaze (Cai et al., 2016) (Ren et al., 2016) (Li et al., 2017) (Zhang and Patel, 2018a) achieving promising result by introducing deep learning models.
In image restoration field, achieving good performance with a moderate number of network parameters is an important goal for designing a deep neural network, (Socher et al., 2012) (Eigen et al., 2013) (Liang and Hu, 2015)proposed to reuse the same convolutional filter weight to learn hierarchical feature representation. In order to avoid gradient vanishing problems and reduce the total parameters for very large deep models, (Kim et al., 2016) (Tai et al., 2017a)
proposed to use recursive computation with proper supervision and shortcut to achieve state-of-the-art performance in single-image super resolution while using few parameters.
Since rain streaks are always overlapped with background texture, most methods tend to learn the negative residual of its input with a complex or carefully designed model. However, this may lead to an over-smoothed result and need tons of parameters to optimize. Also, it is infeasible to apply a resource-hunger model to process video frame-by-frame for its time-consuming processing. On the other hand, to meet different kind of demands in practical applications, a light weight or detachable network is desirable since their huge number of parameters will limit their application in mobile device, automatic driving and video survillence. However, existing methods use a fixed computational budget to handle both ”easy” and ”hard” application scenarios. This is less flexible for a model to implement in real-world application.
As is evident in Figure 3, we test our ResGuideNet under heavy and light rain streaks conditions. We can observe our method has a progessively better reconstruction as blocks go deeper. However under light rainy condition, the SSIM (Wang et al., 2004) does not improve much since block to block, we can see the averaged SSIM of heavy and light test dataset in Figure 4.
Thus, we would like to build a model that receive good results on all devices, with varying computational constraints of all devices. Furthermore, users can improve the average reconstruction quality by reducing the amount of computation that spent on light rain condition to save up computation for heavy cases.
Motivated by the prior work that has a resouce-efficient implementation (Huang et al., 2017a), we aim to construct CNNs that is able to slice the network to meet the computational limitation to process rain streaks under different rainy conditions. Unfortunately, deep neural network is inherently related with the early-existed features. Thus, we build a model that incorporates a series of deraining sub-networks and progressively generate a cleaner estimation given a rainy input. We can also use a portion of the whole model to handle different rainy conditions.
3.2. Residual Feature Reuse
A major challenge for deep learning models is its optimization. To address the gradient vanish problem in back propagation, shortcuts have been proposed to stabilize the gradient flow in deep residual networks (ResNet). By assuming that the residual mapping is much easier to learn than the original unreferenced mapping, residual network explicitly learns a residual mapping for a few stacked layers. With such strategy, deep neural networks can be easily trained and therefor, ResNet has achieved very impressive performance on the a number of tasks. Also, (Huang et al., 2017b) proposed to concatenate feature maps densly from lower to deeper layers which can alleviate the gradient vanishing problem and reduce the number of model parameters. It may be interpreted as there is no need to relearn redundant features. (Tai et al., 2017b) has introduced dense connection in regression tasks and has shown densely connections could benefit the long-term memories and the restoration of mid/high frequency information.
In this paper, we adopt global residual learning with a long shortcut in each block to ease the learning process. Each block consists of several convolutional layers using Leaky Rectified Linear Units, we refer this architecture as Baseline model. However, simply cascaded blocks cannot obtain promising results. We conjecture that deeper blocks is difficult to extract new rain streak patterns and the intermediate reconstructions from lower blocks contain valuable information have lost. To deal with this problem, we suggest to integrate information from previous blocks to deeper ones, to compensate information and further enhance high-frequency signals.
We evaluate the benefit of transitioning from natively cascading deraining blocks(Baseline) to our adopted negative residual reuse(Baseline-RR) by feature fusion strategy using 5 blocks. For fair comparison, we increase the number of feature maps in each building block of Baseline model to have the same parameters with Baseline-RR. We conduct the experiments on the dataset provided by (Fu et al., 2017b). As is clear from the visual quality of reconstruction in Figure 5, Baseline-RR obtain a more eye-pleasing reconstruction and a higher SSIM value as the blocks go deeper. In Figure 6, Baseline-RR obtains a gradual incresement on SSIM as the block becomes deeper, whereas the Baseline model does not possess this property, the SSIM value is based on averging all test images.
3.3. Loss Function
Since rain streaks are blend with object edges and background scene, it is hard to distinguish between rain streaks and objects’ structure by simply optimize loss function. Per-pixel losses cannot capture perceptual difference between output and ground-truth images as human visual perception. A model with loss tend to result in a blurred reconstruction.
Therefor, for each block we adopt +SSIM loss (Wang et al., 2004) which can preserve global structure better as well as keeping per-pixel similarity. We minimize the combination of those loss functions in training stage. Figure 7 show the effectiveness of the implementation of SSIM loss with loss and prove that the supervision to intermediate outputs could benefit the whole model. Note that, the above experimental result is obtained by averaging 100 test images of dataset (Fu et al., 2017b).
The overall loss function for block is
Where is the number of training rainy patches, indicate the index of block. , and indicate rainy patches, corresponding clean patches and the input of block, respectively. and are the parameters in our model that need to tune. denotes function mapping of each block. denotes the function of SSIM.
is the hyperparameter that balance the MSE loss and SSIM loss, we setas 1 via cross-validation that achieving satisfying result.
Note the overall ResGuideNet loss that containing loss function terms if the ResGuideNet contains blocks
where the second term in the right side of the equation is the final reconstruction merged by all previous intermediate outputs, with the same format of .
3.4. Recursive Computation
As we mentioned above, the trade-off between the number of parameters and the model performance can be overcame using recursive strategy where the the nonlinear mapping operator is shared within each block. We adopt two convolutional operation in each recursive unit. We can write the structure of the input and output relationship in the and recursion () within each block as
where indicates each recursive unit within one block.
However, as the recursions continue, the network depth increases, which introduces a severe gradient vanish problem that makes training difficult. To solve the gradient vanish problem as the recursion continues and to propagate information more easily, the output feature map of first feature extraction Conv+LReLU structure is fed into all subsequent outputs of recursive blocks. We can reformulate the structure as
the recusive computation is shown in bottom-left of Figure 2. We evaluate the benefit of transitioning from ResGuideNet without recursion (ResGuideNet-NRecur) to our adopted ResGuideNet using 5 recursions in each block. We show the quanlitative result in Figure 8. As is eveident, kernel reuse and propagate all information forward directly from output of the first layer within each block benefit the restoration process of image content.
|Rainy images||GMM(Li et al., 2016)||DDN(Fu et al., 2017b)||JORDER(Yang et al., 2017)||DID-MDN(Zhang and Patel, 2018b)||ResGuideNet||ResGuideNet|
3.5. Inter-block Ensemble
(Breiman, 1996) first well studied the idea of ensemble learning which combines predictors instead of selecting a single predictor, ensemble learning has also introduced in neural networks to improve performance. (Drucker et al., 1989) arranged a committee of neural networks in a simple voting scheme, and the final output predictions is based on the averaged result. Recently, (He et al., 2016a) (Huang et al., 2017b) using deep neural networks to deel with several computer vision tasks also use the ensemble technique.
Motivating by ensemble idea, we integrate all intermediate reconstruction of each block to form the best reconstruction which is aggragated by concatenation. As is shown in the bottom-right of Figure 2, the final reconstruction is obtained from the fusion of all intermediate reconstructions by a 11 convolution. Note that, we only use the merged result in section 4 since it is convenient for comparison in other sections. We refer the output with merging operation as ResGuideNet while the output of block as ResGuideNet. We can observe an improved result from Table 1. The experiment is conducted on the test dataset of (Fu et al., 2017b).
3.6. The Proposed Architecture
As discussed, the proposed ResGuideNet consists of repeated blocks. Each block includes several convolutional kernels and a global shortcut. The ResGuideNet propagates rain streak residual information from shallow blocks into deeper ones. The network architecture is shown in Figure 2. The final reconstruction is obtained by concanating all intermediate outputs and compressed them into the final rain-streak residual. +SSIM supervision is applied to guide each blocks and the final merged output.
Our basic network structure can be expressed as:
where indicates different blocks that consists of several convolutional layers using Leaky Rectified Linear Units. and indicate rainy and clean pairs. indicates negtive residual that is the output of each block. Block’s input is expressed as . Note that the left side of the semicolon indicates input of each block while the right side indicates residual features to guide each block. It is shown that more guidance provided when the blocks go deeper. , , all should be approximated to in training stage as indicated in Equation 1, thus it is easier for deeper blocks to learn new rain streaks information with the guidance of rain streaks residual in shallow blocks.
We compare our algorithm with several state-of-the-art deep and non-deep techniques on synthetic and real-world datasets.
4.1. Implementation details
We train and test the algorithm using TensorFlow for the Python environment on a NVIDIA GeForce GTX 1080 with 8GB GPU memory. We use the Xavier method to initialize the network parameter and RMSProp for parameter learning. We select the initial learning rate to be 0.001. We set the size of training batch to 16. 50000 iterations of training were required to train ResGuideNet. For all experiments we set the filter size to beexcept the merge convolution and each convolution layer has 16 feature maps.
Since clean and rainy image pairs from real-world is hard to obtain, four synthetic datasets are aviable for comparison. (Yang et al., 2017) provide and that is synthesized with heavy and light rain, each of them contains 100 images for test. The third dataset called collected by (Li et al., 2016) which contains 12 syhthetic images. The last one is provided by (Fu et al., 2017b) constains 10K pairs of rainy/clean images with different orientations and magnitudes of rain streaks. For fair comparision, we train deep learning-based models and test them on synthetic datasets, one for and for , the model trained on is used to test . During training stage, We randomly generate 0.8 million rainy/clean patch pairs with size of 128128 in the training stage.
4.3. Evaluation on Synthetic dataset
We train and test all the methods with the same dataset (Yang et al., 2017) (Li et al., 2016) except DID-MDN since the training code has not published. SSIM (Wang et al., 2004) and PSNR are adopted to perform quantitative evaluations shown in Table 2. Our method has a comparable SSIM values with JORDER while outperforming other methods, which is in consistent with the visual result. We can observe the intermediate result in the third block(ResGuideNet) even has a decent result compared with other methods in Figure 9. However, our ResGuideNet contains far fewer parameters than others and can be sliced into a smaller network to meet light rain condition with limited resources, potentially making ResGuideNet easily implemented in varying real-world applications.
4.4. Evaluation on Real-world dataset
In this section, we show that ResGuideNet trained on synthetic training data still works well on real-world application. We implement other methods according to their optimal setting. Figure 10 show visual results on real-world rainy images. Since no ground truth exists, we only show their qualitative result. As shown, ResGuideNet generate a less blurred result and have promising results on multiple kind of rain streaks.
4.5. Running Time
To illustrate the efficiency of implementation of ResGuideNet in practical application, we show the average running time of 100 test images in Table 3, all the test are conducted with a 500500 rainy image as input. The GMM is non-deep method that is run on CPUs according to the provided code, while other deep-based methods are tested on both CPU and GPU. All experiments are performed with the same environment described in implementation details. The GMM has the slowest running time since it has complicated inference at test time. Our method has a fast computational time on GPU compared with other methods. In a light rain condition, we can use the third blocks as final output for testing that has a even faster running time. This experiment shows ResGuideNet a promising practical value.
5.1. Generalization to other image processing tasks
In this section, we show more evaluations for other general image processing tasks. We trained our ResGuideNet with the train and val set of berkeley segmentation dataset 500(BSD500) which contains 300 images in our training stage and we tested our model on the test set of BSD500 contains 100 images. We apply Gaussian noise with the standard deviation of 0.1 to both train and test datasets. The averaged SSIM on our test dataset is 0.927. We can see the reconstruction quality in Figure 11. This experiment demonstrates that ResGuideNet can generalize to similar image restoration problems.
5.2. Pre-processing for high-level vision tasks
Most exsting models for high-level tasks is trained with a well scenario, the performance will be degrated in rainy conditions since rain streaks block and blur the key structure of objects, Figure 12 show a case that under heavy rain condition, the pre-trained Faster R-CNN (Ren et al., 2015b) model trained on a well condition failed to capture some objects and produce a low recognition confidence. We incorporate our ResGuideNet as a pre-process model for the Faster R-CNN, the detection performance has a great improvement over the naive Faster R-CNN input with a degraded image.
We presented the ResGuideNet, a novel convolutional network architecture for single image deraining which is easy to implement in a number of practical applications. We build our model with several derain sub-networks in a cascaded manner. By propagating negative residuals in shallow blocks to deeper ones, the deeper blocks effectively extract new information of negative rain streak residuals to get rain residual in a coarse to fine fashion. The final reconstruction take all intermediate outputs into account to leverage more informations across blocks which can be viewed as ensemble learning. With our proposed architecture, ResGuideNet has and ResGuideNet has less than parameters while still achieving good performance. For different rain conditions and computational resources, we can detach ResGuideNet into a smaller size can still achieve decent reconstruction. Moreover, extensive experiments have shown that our ResGuideNet can generalize to other low-level tasks has potential value for high level vision problems.
- Barnum et al. (2010) Peter C Barnum, Srinivasa Narasimhan, and Takeo Kanade. 2010. Analysis of Rain and Snow in Frequency Space. International Journal of Computer Vision 86, 2-3 (2010), 256.
- Bossu et al. (2011) Jérémie Bossu, Nicolas Hautière, and Jean-Philippe Tarel. 2011. Rain or snow detection in image sequences through use of a histogram of orientation of streaks. International journal of computer vision 93, 3 (2011), 348–367.
- Breiman (1996) Leo Breiman. 1996. Stacked regressions. Machine learning 24, 1 (1996), 49–64.
- Cai et al. (2016) Bolun Cai, Xiangmin Xu, Kui Jia, Chunmei Qing, and Dacheng Tao. 2016. Dehazenet: An end-to-end system for single image haze removal. IEEE Transactions on Image Processing 25, 11 (2016), 5187–5198.
- Drucker et al. (1989) Harris Drucker, Corinna Cortes, L. D. Jackel, and Yann Lecun. 1989. Boosting and Other Ensemble Methods. Neural Computation 6, 6 (1989), 1289–1301.
- Eigen et al. (2013) David Eigen, Jason Rolfe, Rob Fergus, and Yann LeCun. 2013. Understanding deep architectures using a recursive convolutional network. arXiv preprint arXiv:1312.1847 (2013).
- Fu et al. (2017a) Xueyang Fu, Jiabin Huang, Xinghao Ding, Yinghao Liao, and John Paisley. 2017a. Clearing the skies: A deep network architecture for single-image rain removal. IEEE Transactions on Image Processing 26, 6 (2017), 2944–2956.
- Fu et al. (2017b) Xueyang Fu, Jiabin Huang, Delu Zeng, Yue Huang, Xinghao Ding, and John Paisley. 2017b. Removing Rain from Single Images via a Deep Detail Network. In CVPR.
Garg and Nayar (2004)
Kshitiz Garg and
Shree K. Nayar. 2004.
Detection and Removal of Rain from Videos. In
Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on.
- He et al. (2016a) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016a. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
- He et al. (2016b) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016b. Identity mappings in deep residual networks. In European Conference on Computer Vision. Springer, 630–645.
- Huang et al. (2017a) Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens van der Maaten, and Kilian Q Weinberger. 2017a. Multi-scale dense convolutional networks for efficient prediction. arXiv preprint arXiv:1703.09844 (2017).
- Huang et al. (2017b) Gao Huang, Zhuang Liu, Kilian Q Weinberger, and Laurens van der Maaten. 2017b. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, Vol. 1. 3.
- Johnson et al. (2016) Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision. Springer, 694–711.
- Kang et al. (2012) L. W. Kang, C. W. Lin, and Y. H. Fu. 2012. Automatic single-image-based rain streaks removal via image decomposition. IEEE Transactions on Image Processing 21, 4 (2012), 1742.
- Kim et al. (2016) Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. 2016. Deeply-recursive convolutional network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1637–1645.
- Kim et al. (2015) J. H. Kim, J. Y. Sim, and C. S. Kim. 2015. Video Deraining and Desnowing Using Temporal Correlation and Low-Rank Matrix Completion. IEEE Transactions on Image Processing 24, 9 (2015), 2658–2670.
- Li et al. (2017) Boyi Li, Xiulian Peng, Zhangyang Wang, Jizheng Xu, and Dan Feng. 2017. An All-in-One Network for Dehazing and Beyond. arXiv preprint arXiv:1707.06543 (2017).
- Li et al. (2016) Yu Li, Robby T Tan, Xiaojie Guo, Jiangbo Lu, and Michael S. Brown. 2016. Rain Streak Removal Using Layer Priors. In CVPR. 2736–2744.
- Liang and Hu (2015) Ming Liang and Xiaolin Hu. 2015. Recurrent convolutional neural network for object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3367–3375.
- Luo et al. (2015) Yu Luo, Yong Xu, and Hui Ji. 2015. Removing Rain from a Single Image via Discriminative Sparse Coding. In IEEE International Conference on Computer Vision. 3397–3405.
- Ren et al. (2015a) Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015a. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NIPS.
- Ren et al. (2015b) Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015b. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NIPS.
- Ren et al. (2016) Wenqi Ren, Si Liu, Hua Zhang, Jinshan Pan, Xiaochun Cao, and Ming-Hsuan Yang. 2016. Single image dehazing via multi-scale convolutional neural networks. In European conference on computer vision. Springer, 154–169.
- Santhaseelan and Asari (2015) Varun Santhaseelan and Vijayan K. Asari. 2015. Utilizing Local Phase Information to Remove Rain from Video. International Journal of Computer Vision 112, 1 (2015), 71–89.
- Socher et al. (2012) Richard Socher, Brody Huval, Bharath Bath, Christopher D Manning, and Andrew Y Ng. 2012. Convolutional-recursive deep learning for 3d object classification. In Advances in Neural Information Processing Systems. 656–664.
- Tai et al. (2017a) Ying Tai, Jian Yang, and Xiaoming Liu. 2017a. Image Super-Resolution via Deep Recursive Residual Network. In Proceeding of IEEE Computer Vision and Pattern Recognition. Honolulu, HI.
- Tai et al. (2017b) Ying Tai, Jian Yang, Xiaoming Liu, and Chunyan Xu. 2017b. Memnet: A persistent memory network for image restoration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4539–4547.
- Wang et al. (2004) Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.
- Yang et al. (2017) Wenhan Yang, Robby T Tan, Jiashi Feng, Jiaying Liu, Zongming Guo, and Shuicheng Yan. 2017. Deep joint rain detection and removal from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1357–1366.
- Zhang and Patel (2018a) He Zhang and Vishal M Patel. 2018a. Densely Connected Pyramid Dehazing Network. In CVPR.
- Zhang and Patel (2018b) He Zhang and Vishal M Patel. 2018b. Density-aware Single Image De-raining using a Multi-stream Dense Network. In CVPR.
- Zhang et al. (2017) He Zhang, Vishwanath Sindagi, and Vishal M Patel. 2017. Image de-raining using a conditional generative adversarial network. arXiv preprint arXiv:1701.05957 (2017).