Most, if not all, of classic and contemporary vision-oriented algorithms, such as object detection (Oneata et al., 2014) and tracking (Zhang et al., 2014), can work reasonably well when facing images of high visibility, but dramatically degenerate or even fail if they are fed with low-quality inputs. In real-world scenarios, especially for outdoor scenes, rain effect has always been such an annoying and inevitable nuisance, which would significantly alter or degrade the content and color of images (Narasimhan and Nayar, 2003). These situations frequently occur if one records an event happening at a square using a smart phone, a surveillance camera monitors a street, or an autonomous vehicle drives on a road, in rainy days. The rain in atmosphere generally has two existence styles, say steady rain and dynamic rain. The steady rain is caused by distant microscopic rain drops globally accumulated throughout the scene, while the dynamic one comes from large particles (rain streaks) that look like random and local corruptions. The left column of Fig. 1 gives two such examples. For eliminating or reducing negative effects brought by rain, the development of effective approaches is demanded.
Formally, the rainy image can be seen as a superimposition of two layers ), where designates the observed data and, and represent the rain layer and the desired clean background, respectively. In addition, is a blending function. To decompose the two layers from a single image is mathematically ill-posed since the number of unknowns to recover is twice as many as the given measurements.
1.1. Previous Arts and Challenges
Over the past decades, a lot of attentions to resolving the rain removal problem have been drawn from the community. From the perspective of required input amount, existing rain removal methods can be divided into two classes, i.e. multi-image based and single image based methods. Early attempts on deraining basically belong to the former category. A representative solution was proposed in (Garg and Nayar, 2005), based on the recognition that the visibility of rain in images depends much on the exposure time and depth of field of the camera. One can achieve the goal by testing on several images and adjusting the operational parameters of the camera. But this method is too professional to use for typical consumers. The work in (Garg and Nayar, 2007) employs two constraints to automatically find and exclude possible rain streaks, and then fills up the holes by averaging the values of their temporal neighbors, which releases the professional requirement. Several follow-ups along this technique line including (Zhang et al., 2006) and (You et al., 2013) try to improve the accuracy of rain streak detection or/and the quality of background inpainting. A more elaborated review on the multi-image based rain streak removal approaches can be found in (Tripathi and Mukhopadhyay, 2014). Generally, this kind of methods can provide reasonable results when the given information is of sufficient redundancy, but this condition is often violated in practice.
For the sake of flexibility and applicability, single image based approaches are more desirable but challenging. Kang et al. (Kang et al., 2012) proposed a two-step method. The first step is separating the input rain image into a low-frequency component containing its structure and a high-frequency one with both rain streaks and background textures. Then the image textures are distinguished from the rain streaks in the detail layer according to constructed dictionaries, and added back to the structure layer. However, the separation in the detail layer is challenging, always tending to either over-smooth the background or leave noticeable rain steaks. Its follow-ups include (Huang et al., 2014; Sun et al., 2014). Chen and Hsu (Chen and Hsu, 2013) proposed a unified objective function for rain removal by exploring the repetitive property of the rain streak appearance and using a low rank model to regularize the rain streak layer. This is problematic as other repetitive structures like building windows also fit the low-rank assumption. Kim et al. (Kim et al., 2013) tried to detect rain streaks by a kernel regression method, and remove the suspects via a nonlocal mean filtering. It frequently suffers from inaccurate detection of rain streaks. Luo et al. (Luo et al., 2015) created a new blending model and attempted to reconstruct the background and rain layers of image patches over a self-trained dictionary by discriminative sparse coding. Although the method has an elegant formulation, the blending model used still needs physical validation and the effectiveness in removing the rain is somehow weak as one can always see remaining thin structure at the rain streak locations in the output. Li et al. (Li et al., 2016) used patch-based priors for both the two layers, namely a Gaussian mixture model (GMM) learned from external clean natural images for the background and another GMM trained on rain regions selected from the input image itself for the rain layer. These prior-based methods even with the help of trained dictionaries/GMMs, on the one hand, are still unable to catch sufficiently distinct features for the background and rain layers. On the other hand, their computational cost is way too huge for practical use.
With the emergence of deep learning, a number of low-level vision tasks have benefited from deep models supported by large-scale training data, such as(Xie et al., 2012; Zhang et al., 2017) for denoising, (Dong et al., 2016)
for super-resolution,(Dong et al., 2015) for compression artifact removal and (Cai et al., 2016) for dehazing, as the deep architectures can better capture explicit and implicit features. As for deraining, Fu et al. proposed a deep detail network (DDN) (Fu et al., 2017), inspired by (Kang et al., 2012). It first decomposes a rain image into a detail layer and a structure layer. Then the network focuses on the high-frequency layer to learn the residual map of rain streaks. The restored result is formed by adding the extracted details back to the structure layer. Yang et al. (Yang et al., 2017a) proposed a convolutional neural network (CNN) based method to jointly detect and remove rain streaks from a single image (JORDER). They used a multi-stream network to capture the rain streak component with different scales and shapes. The rain information is then fed into the network to further learn rain streak intensity. By recurrently doing so, the rain effect can be detected and removed from input images. The work in (Zhang et al., 2017) proposes a single image de-raining method called image deraining conditional general adversarial network (ID-CGAN), which considers quantitative, visual and also discriminative performance into the objective function. Though the deep learning based strategies have made a great progress in rain removal compared with the traditional methods, two challenges still remain:
How to enhance the effectiveness of deep architectures for better utilizing training data and achieving more accurate restored results;
How to improve the efficiency of processing testing images for fulfilling the high-speed requirement in real-world (real-time) tasks.
1.2. Our Contributions
In order to address the aforementioned challenges, we propose a novel deep decomposition-composition network (DDC-Net) to effectively and efficiently remove the rain effect from a single image under various conditions. Concretely, the contributions can be summarized as follows. The designed network is composed by a decomposition net and a composition net. The decomposition net is built for splitting rainy images into clean background and rain layers. The volume of model is retained small with promising performance. Hence, the effectiveness of the architecture is boosted. The composition net is for reproducing input rain images by the separated two layers from the decomposition net, aiming to further improve the quality of decomposition. Different from previous deep models, ours explicitly takes care of the recovery accuracy of the rain layer. According to the screen blending mode, instead of the simple additive blending, we synthesize a training dataset containing triplets [rain image, clean background, rain information]. During the testing phase, only the decomposition net is needed. Experimental results on both synthetic and real images are conducted to reveal the high-quality recovery by our design, and show its superiority over other state-of-the-art methods. Our method is significantly faster than the competitors, making it attractive for practical use. All the trained models and the synthesized dataset are available at
2. Deep Decomposition-Composition Network
The designed network architecture is illustrated in Fig. 2. It consists of two modules, i.e., the decomposition network and the composition network.
2.1. Decomposition Net
As can be seen from Fig. 2, the decomposition network, aiming to separate the rain image into the clean background and rain layers, has two main branches: one focuses on restoring the background and the other for the rain information.
Inspired by the effectiveness of encoder-decoder networks in image denoising (Mao et al., 2016), inpainting (Pathak et al., 2016) and matting (Xu et al., 2017), we construct our decomposition branch based on the residual encoder and decoder architecture with specific designs for clean background and rain layer prediction as follows: 1) the first two convolutional layers in the encoder are changed to dilated convolution (Yang et al., 2017b)
to enlarge the receptive field. The stride of our dilated convolutional layer is
with padding. We use max-pooling to down-sample feature maps; 2) we use two decoder networks (clean background branch and auxiliary rain branch) to recover the clean background and rain layer respectively; and 3) features from the deconvolution module of the clean background branch are concatenated to the auxiliary rain branch for better obtaining rain information during the up-sampling stage (the downward arrows). The principle behind is that the background features are expected to help exclude textures belonging to the background from the rain part.111 It is worth to note that we have tried to feed rain features into the background branch, but this operation did not show noticeable improvement on performance. The reason is that since the rain textures are typically much simpler and more regular than the background textures, the improvement is not obvious or negligible.
The residual encoder and decoder network has five convolution modules, and each consists of several convolutional layers, ReLu and skip links. Specifically, the feature maps from the 1, 2, 3, 4 and 5 convolution modules are with sizes of 1/2, 1/4, 1/8, 1/16 and 1/32 of the input image size, respectively. The corresponding decoder introduces up-sampling operation to build feature maps up from low to high resolution.
Pre-train on synthetic images: Since the decomposing problem is challenging without paired supervision, a model can learn arbitrary mapping to the target domain and cannot guarantee to map an individual input to its desired clean background and rain layers. Therefore, we first cast the image deraining problem as a paired image-to-image mapping problem, in which we use the clean images and the corresponding synthesized rainy image as the paired information. To measure the content difference between the recovered and the ground truth image, we use the Euclidean distance between the obtained results and target images. Therefore, the losses of clean background and rain layer in the decomposition in the pre-training stage can be written as:
where denotes the number of training images in each process, designates the Frobenius norm and, , and stand for the i input, background and rain images, respectively. In addition, and are the clean background branch and auxiliary rain branch, respectively.
Fine-tune on real images: With the learned model by using synthesized images, our decomposition network could guarantee to map an individual rainy input to its desired clean background and rain layer. But, as the synthetic rain layer cannot distinguish the effect of attenuation and splash in real scene radiance, we use some collected real rain-free and rainy images to fine-tune our model.
We propose to solve the unpaired image-to-image problem by introducing the generative adversarial network GAN loss (Goodfellow et al., 2014; Yang et al., 2018) to better model the formation of rainy images. The concept of GAN was first proposed by Goodfellow et al. (Goodfellow et al., 2014)
, which has been attracting substantial attention from the community. The GAN is based on the minimax two-player game, which can provide a simple yet powerful way to estimate target distribution and generate new samples. The GAN framework consists of two adversarial models: a generative modeland a discriminative model . The generative model can capture the data distribution, and the discriminative model
can estimate the probability that a sample comes from the training data rather than. The generator takes the noise as input and tries to generate different samples to fool the discriminator, and the discriminator aims to determine whether a sample is from the model distribution or the data distribution, finally the generator generates samples that are not distinguishable by the discriminator.
In this stage, we do not need any supervision for rain layers, but for the purpose of promoting the performance to be realistic, on the account of learning a generative adversarial network via distinguishing the real image and the fake image to simulate the restored images, we hope the adversarial model can assist to train the decomposition network that generates enhanced images which can cheat the discriminator to distinguish from real images. The adversarial loss is defined as follows.
The discriminator consists of five convolutional layers each followed by a ReLU nonlinearity. The detailed configuration of is given in Table 1
. A sigmoidal activation function is applied to the outputs of the last convolutional layer for producing a probability of the input image being detected as “real” or “fake”.
|Layer||Kernel dimension||Stride||Output size|
2.2. Composition Net
Our composition net aims to learn the original rainy image from the outputs of decomposition model, then use the constructed rainy image as the self-supervised information to guide the back-propagation. With the decomposition model, we can disentangle a rainy image into two corresponding components. The first one is the recovered rain-free image from the clean background branch. The second one is the rain layer, denoted as , learned by the auxiliary rain branch. Therefore, we can directly compose the corresponding rain image in a simple way:
To solve this problem, we first concatenate the clean background image and the rain layer from the decomposition network, and then adopt an additional CNN block to model the real rainy image formulation. The proposed composition network could achieve a more general formation process and accounts for some unknown phenomenon in real images. Then we define a quadratic training cost function to measure the difference between the reconstruct rainy output and the original rainy image as
where is the whole network depicted in Fig. 2. We here notice that, in the testing stage, only the clean background branch is required to generate desired results .
2.3. Training Dataset
We notice that, instead of the additive mode (), we advocate the screen blend one to synthesize data for better approximating real-word cases. The screen mode is in the form:
where means the Hadamard product. In this mode, the values of the pixels in the two layers are inverted, multiplied, and then inverted again. The clean background images are from BSD300 dataset222https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/bsds/. Moreover, the rain part is generated following the steps333http://www.photoshopessentials.com/photo-effects/rain/ with varying intensities, orientations and overlaps. Finally, we fuse the two parts in the way of screen blending. In this manner, the synthesized rainy images are more physically meaningful. As a consequence, we obtain a dataset containing triplets [rainy image, clean background, rain layer]. As for the fine-tune stage, we collect real images and randomly crop samples with the size of from them.
2.4. Implementation Details
During training we use a batch size of , and patch size of
. We use the stochastic gradient descent (SGD) technique for optimization with the weight decay ofand momentum of . The maximum number of iterations is 100K and we adopt the learning rate of for the first 70K iterations, then continue to train for 30K iterations with the learning rate of
. The entire network is trained on a Nvidia GTX 1080 GPU using the Keras framework. During the testing phase, only the background decomposition branch is used, which is efficient.
3. Experimental Verification
This section evaluates our DDC-Net on the task of rain removal, in comparison with the state-of-the-arts including GMM (Li et al., 2016), JORDER (Yang et al., 2017a), DDN (Fu et al., 2017) and ID-CGAN (Zhang et al., 2017). The codes are either downloaded from the authors’ websites or provided by the authors. For quantitatively measuring the performance, we employ PSNR, SSIM and elapsed time as the metrics. All the comparisons shown in this paper are conducted under the same hardware configuration.
|Method||GMM||JORDER||DDN||ID-CGAN||Our w/o CN||Our DDC|
|Image Size||GMM||JORDER||DDN||ID-CGAN||Our DDC|
|250250||234.9 (CPU)||48.59/2.23 (CPU/GPU)||1.81/0.27 (CPU/GPU)||0.15 (GPU)||0.98/0.03 (CPU/GPU)|
|500500||772.4 (CPU)||88.61/3.34 (CPU/GPU)||13.27/0.74 (CPU/GPU)||0.55 (GPU)||4.04/0.12 (CPU/GPU)|
3.1. Synthetic Data
We first synthesize rainy images, which are shown in the top row of Fig. 3. Table 2 lists the values of PSNR and SSIM of all the competitors. From the numerical results, we can see that, except for the first case (PSNR and SSIM) and the fifth case (SSIM only), our DDC wins over the others by large margins. While, in the first case, DDC slightly falls behind DDN by about in PSNR and in SSIM, but is still superior to the rest techniques. Figure 3 provides two visual comparisons between the methods, from which we can observe that our DDC can produce very striking results. GMM leaves rain streaks in both the outputs. For ID-CGAN, it not only suffers from ineffectiveness in rain removal but also alters the color, making the results unrealistic. DDN performs reasonably well for the upper image but not for the lower one, while JORDER produces a good result for the lower image but unsatisfactory for the upper one. In addition, with the composition net disabled (denoted as Ours w/o CN), the decomposition net can still largely separate the rain effect from the inputs. By the complete DDC-Net, the results are further boosted and quite close to the ground truth. Please zoom-in to see more details. This verifies the rationality and effectiveness of our design. One may wonder if only employing the rain branch with the clean background one disabled can produce reasonable results. Our answer is negative, because the rain information is of (relatively) simple pattern and occupies a small fraction of images, merely recovering the rain is not able to guarantee the quality of background recovery. In this paper, we do not explicitly provide such an ablation study. Besides, the running time is another important aspect to test. Table 3 reports the time taken by different methods. The numbers are averaged over 10 images. We note that as the GMM is implemented using CPUs while the current version of ID-CGAN is only available for GPUs, so we only provide the CPU time for GMM and the GPU time for ID-CGAN. From the table, the evidence demonstrates that our DDC-Net is significantly faster than the competitors, making it more attractive for practice use. It is worth to mention that, GMM requires to execute complex optimization for each input image, while the deep methods including JORDER, DDN, ID-CGAN and our DDC-Net only needs simple feed-forward operations. This is the reason that GMM takes the last place in this competition.
3.2. Real Data
The performance of the proposed method is also evaluated on several real-world images. We show the de-raining results in Figs. 5,5,7 and 7. From the results by GMM (Li et al., 2016), we see that the rain streaks are not well removed in the second case, while the rest three cases seem to be over-derained. Regarding the results by JORDER (Yang et al., 2017a), the under-deraining problem always happens in all the four cases, the rain effect, although alleviated, still obviously exists. The main drawback of ID-CGAN (Zhang et al., 2017) exposed in the comparison is the severe color alteration/degradation after deraining. Furthermore, some halo artifacts emerge, for instance, the region around the right arm of the man in Fig. 7. Arguably, the top places in this competition should go to DDN (Fu et al., 2017) and our DDC-Net. By taking a closer look at the results, we find that DDN leaves some rain effect on the tree region in the first case and smooths out the texture details, for instance, the earphone wire of the rightmost women, in the fourth case. We notice that for the cases in Fig. 5 and 7, a fast dehazing technique, say LIME (Guo et al., 2017), is applied to the derained results by GMM, JORDER, DDN and our DDC-Net, while ID-CGAN itself can do the job simultaneously but with a high risk of color degradation. We again emphasize that DDC-Net is about 5 times faster than DDN.
4. Discussion and Conclusion
Although in this paper we concentrate on the rain removal task, our architecture is general to be applied to other layer-decomposition problems like dust removal and recovery tasks like inpainting. For verifying this point, we show two dust removal results by our DDC-Net as given in Fig. 8. From the results, we can observe that the dust/water-based stains on windows are removed effectively and the recovered results are visually striking. As for the efficiency, our design can be further boosted by deep neural network compression techniques, which would highly likely fulfill the real-time requirement in practice. For the composition network, besides the convolution operator, there might be more suitable manners in terms of blending. For example, the so-called Porter Duff operators define modes including both linear and non-linear blending, and some might be learnable with appropriate parameterization.
Finally, we come to conclude our work. This paper has proposed a novel deep architecture for single image rain removal, namely the deep decomposition-composition network, which consists of two main sub-nets including the decomposition network and the composition network. The decomposition network is built to split rain images into clean background and rain layers. Different from previous architectures, our decomposition model consists of, besides a component representing the desired clean image, an extra component for the rain layer. During the training phase, the additional composition structure is employed to reproduce the input by the separated clean image and rain information for further boosting the quality of decomposition. Moreover, our pre-trained model by synthetic data is further fine-tuned by unpaired supervision to have a better adaptation for real cases. Experimental results on both synthetic and real images have been conducted to reveal the efficacy of our design, and demonstrate its clear advantages in comparison with other state-of-the-art methods. In terms of running time, our method is significantly faster that the other techniques, which can broaden the applicability of deraining to the tasks with the high-speed requirement.
- Cai et al. (2016) B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao. 2016. An end-to-end system for single image haze removal. IEEE TIP 25, 11 (2016), 5187–5198.
- Chen and Hsu (2013) Y. Chen and C. Hsu. 2013. A generalized low-rank appearance model for spatio-temporally correlated rain streaks. In ICCV. 1968–1975.
- Dong et al. (2016) C. Dong, C. L. Chen, K. He, and X. Tang. 2016. Image super-resolution using deep convolutional networks. IEEE TPAMI 38, 2 (2016), 295–307.
- Dong et al. (2015) C. Dong, Y. Deng, C. C. Loy, and X. Tang. 2015. Compression artifacts reduction by a deep convolutional network. In ICCV.
- Fu et al. (2017) X. Fu, J. Huang, D. Zeng, Y. Huang, X. Ding, and J. Paisley. 2017. Removing rain from single images via a deep detail network. In CVPR. 1715–1723.
- Garg and Nayar (2005) K. Garg and S. Nayar. 2005. What does a camera see rain?. In ICCV. 1067–1074.
- Garg and Nayar (2007) K. Garg and S. Nayar. 2007. Vision and Rain. IJCV 75, 1 (2007), 3–27.
- Goodfellow et al. (2014) I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2014. Generative adversarial nets. In NIPS. 2672–2680.
- Guo et al. (2017) X. Guo, Y. Li, and H. Ling. 2017. LIME: Low-light Image Enhancement via Illumination Map Estimation. IEEE TIP 26, 2 (2017), 982–993.
- Huang et al. (2014) D. Huang, L. Kang, Y. Wang, and C. Lin. 2014. Self-learning based image decomposition with applications to single image denoising. IEEE TMM 16, 1 (2014), 83–93.
- Kang et al. (2012) L. Kang, C. Lin, and Y. Fu. 2012. Automatic single-image-based rain streaks removal via image decomposition. IEEE TIP 21, 4 (2012), 1742–1755.
- Kim et al. (2013) J. Kim, C. Lee, J. Sim, and C. Kim. 2013. Single-image deraining using an adaptive nonlocal means filter. In ICIP. 914–917.
- Li et al. (2016) Y. Li, R. Tan, X. Guo, J. Lu, and M. Brown. 2016. Rain Streak Removal Using Layer Priors. In CVPR.
- Luo et al. (2015) Y. Luo, Y. Xu, and H. Ji. 2015. Removing rain from a single image via discriminative sparse coding. In ICCV. 3397–3405.
- Mao et al. (2016) X. Mao, C. Shen, and Y.-B. Yang. 2016. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In NIPS.
- Narasimhan and Nayar (2003) S. Narasimhan and S. Nayar. 2003. Contrast restoration of weather degraded images. IEEE TPAMI 25, 6 (2003), 713–724.
- Oneata et al. (2014) D. Oneata, J. Revaud, J. Verbeek, and C. Schmid. 2014. Spatio-Temporal Object Detection Proposals. In ECCV. 737–752.
- Pathak et al. (2016) D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. Efros. 2016. Context encoders: Feature learning by inpainting. In CVPR.
- Sun et al. (2014) S. Sun, S. Pan, and Y. Wang. 2014. Exploiting image structural similarity for single image rain removal. In ICME. 4482–4486.
- Tripathi and Mukhopadhyay (2014) A. Tripathi and S. Mukhopadhyay. 2014. Removal of rain from videos: a review. Signal, Image and Video Processing 8, 8 (2014), 1421–1430.
- Xie et al. (2012) J. Xie, L. Xu, E. Chen, J. Xie, and L. Xu. 2012. Image denoising and inpainting with deep neural networks. In NIPS. 341–349.
- Xu et al. (2017) N. Xu, B. Price, S. Cohen, and T. Huang. 2017. Deep Image Matting. arXiv preprint arXiv:1703.03872 (2017).
- Yang et al. (2017a) W. Yang, R. T. Tan, J. Feng, J. Liu, Z. Guo, and S. Yan. 2017a. Deep joint rain detection and removal from a single image. In CVPR. 1357–1366.
- Yang et al. (2017b) W. Yang, R. T. Tan, J. Feng, J. Liu, Z. Guo, and S. Yan. 2017b. Deep Joint Rain Detection and Removal from a Single Image. 1357–1366.
- Yang et al. (2018) X. Yang, Z. Xu, and J. Luo. 2018. Towards Perceptual Image Dehazing by Physics-based Disentanglement and Adversarial Training. (2018).
- You et al. (2013) S. You, R. Tan, R. Kawakami, and K. Ikeuchi. 2013. Adherent raindrop detection and removal in video. In CVPR. 1035–1042.
- Zhang et al. (2017) H. Zhang, V. Sindagi, and V. M. Patel. 2017. Image De-raining Using a Conditional Generative Adversarial Network. arXiv:1701.05957v2 (2017).
- Zhang et al. (2014) K. Zhang, L. Zhang, and M. Yang. 2014. Real-Time Compressive Tracking. In ECCV. 866–879.
- Zhang et al. (2017) K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. 2017. Beyond a gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE TIP 26, 7 (2017), 3142–3155.
- Zhang et al. (2006) X. Zhang, H. Li, Y. Qi, W. Leow, and T. Ng. 2006. Rain removal in video by combining temporal and chromatic properties. In ICME. 461–464.