Physical Model Guided Deep Image Deraining

03/30/2020
by   Honghe Zhu, et al.
Dalian University of Technology
4

Single image deraining is an urgent task because the degraded rainy image makes many computer vision systems fail to work, such as video surveillance and autonomous driving. So, deraining becomes important and an effective deraining algorithm is needed. In this paper, we propose a novel network based on physical model guided learning for single image deraining, which consists of three sub-networks: rain streaks network, rain-free network, and guide-learning network. The concatenation of rain streaks and rain-free image that are estimated by rain streaks network, rain-free network, respectively, is input to the guide-learning network to guide further learning and the direct sum of the two estimated images is constrained with the input rainy image based on the physical model of rainy image. Moreover, we further develop the Multi-Scale Residual Block (MSRB) to better utilize multi-scale information and it is proved to boost the deraining performance. Quantitative and qualitative experimental results demonstrate that the proposed method outperforms the state-of-the-art deraining methods. The source code will be available at <https://supercong94.wixsite.com/supercong94>.

READ FULL TEXT VIEW PDF

Authors

page 2

page 4

page 5

08/03/2020

DCSFN: Deep Cross-scale Fusion Network for Single Image Rain Removal

Rain removal is an important but challenging computer vision task as rai...
03/26/2021

Towards a Unified Approach to Single Image Deraining and Dehazing

We develop a new physical model for the rain effect and show that the we...
11/05/2021

Single Image Deraining Network with Rain Embedding Consistency and Layered LSTM

Single image deraining is typically addressed as residual learning to pr...
02/14/2022

Online-updated High-order Collaborative Networks for Single Image Deraining

Single image deraining is an important and challenging task for some dow...
09/22/2021

Single Image Dehazing with An Independent Detail-Recovery Network

Single image dehazing is a prerequisite which affects the performance of...
11/10/2021

Multi-Scale Single Image Dehazing Using Laplacian and Gaussian Pyramids

Model driven single image dehazing was widely studied on top of differen...
05/14/2019

An Effective Two-Branch Model-Based Deep Network for Single Image Deraining

Removing rain effects from an image automatically has many applications ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Rain is a very common weather phenomenon, and images and videos captured in rain consist of raindrops and rain streaks with different speeds, different directions and various density levels, which causes many computer vision systems likely fail to work. So, removing the rain components from rainy images or videos, which obtains a clear background scene, is needed. There are two categories of deraining: single image-based methods [16, 10, 1, 13, 5, 6, 20] and video-based methods [8, 22, 7, 19]. As the temporal information can be leveraged by analyzing the difference between adjacent frames in a video, hence, video-based methods are easier than single image-based methods. In this paper, we explore the more difficult problem, single image deraining.

Image deraining has attracted much attention in recent years, which is always based on this physical rainy model: the observed rainy image is generally modeled as a linear sum of a rain-free background image and rain streaks. In the mathematical representation, the model can be expressed as:

(1)

where , , and denote the observed rainy images, clear background images, and rain streaks, respectively. Based on the Eq. (1), deraining methods should remove from to get , which is a highly ill-posed problem, due to there are a series of solutions of , for a given , theoretically.

To make the problem well be solved, numerous conventional methods adopt various priors about rain streaks or clean background scene to constrain the solution space, such as the sparse code [16], image decomposition [10], low-rank [1]

and Gaussian mixture model 

[13]. These deraining methods always make simple hypotheses on , i.e. rain streaks, such as the assumptions that the rain streaks are sparse and have similar characters in falling directions and shapes, which only work in some specific cases.

With the rise of deep learning, numerous methods have achieved greatly succeeded in many computer vision tasks 

[2, 15, 14] due to the powerful feature representation capability. Deraining methods also acquire significantly improvement via these deep learning-based methods [9, 21, 5, 6, 12]. However, they still exist some limitations.

On the one hand, many existing methods usually only estimate the rain streak or rain-free image [12, 6, 20], and they neglect that the estimated rain streaks and rain-free image can serve as a physical model guide for the deraining process. On the other hand, multi-scale operations can better acquire the rain streaks information with different levels, which should have a boost effect for deraining. However, numerous deep learning-based methods [6, 21, 12] do not consider the effect of multi-scale information into deraining.

To handle with above limitations, we propose a novel network based on physical model guided learning that utilizes physical model to guide the learning process and applies the multi-scale manner into feature maps. Specifically, the sum of the estimated rain streaks and rain-free image is compared with their corresponding rainy image as a constraint term according to the rainy physical model 1 and the concatenation of them is input into guide-learning as a guide to learn. Moreover, we design a Multi-Scale Residual Block (MSRB) to obtain different features with different levels.

Our contributions are summarized as followings:

  • We design the guide-learning network based on the rainy physical model and the guide boost the deraining performance on both details and texture information.

  • We propose a Multi-Scale Residual Block (MSRB) to better utilize multi-scale information and experiments prove that the block is favorable for improving the rain streaks representation capability.

  • Our proposed network outperforms the state-of-the-art methods on synthetic and real-world datasets in visually, quantitatively and qualitatively.

2 Related Work

In this section, we present a brief review on single image deraining approaches that can be split into prior-based methods and deep learning-based methods.

For prior based methods, Kang et al[10] first decomposed the rainy image into a low- and high-frequency layer, and then utilized sparse coding to remove the rain streaks in high-frequency layer. Chen et al[1] assumed the rain steaks are low-rank and proposed an effective low-rank structure to model rain streaks. Luo et al[16] proposed a discriminative sparse coding framework to accurately separate rain streaks and clean background scenes. Li et al[13] used patch priors based on Gaussian Mixture Models for both rain steaks and clear background to remove the rain streaks.

For deep learning-based methods, Fu et al[5, 6]

first applied deep learning in single image deraining that they decompose rainy image into low- and high-frequency parts, and then put the high-frequency part into a Convolutional Neural Network (CNN) to estimate residual rain streaks. Yang

et al[20] proposed a recurrent contextual network that can jointly detect and remove rain steaks. Zhang et al[21] designed a generative adversarial network to prevent the degeneration of background image and utilized perceptual loss to refine visual quality. Fan et al[4] generalized a residual-guide network for deraining. Li et al[12] utilized squeeze-and-excitation to learn different weights of different rain streaks layer for deraining. Ren et al[18]

considered network architecture, input and output, and loss functions and provided a better and simpler baseline deraining network.

3 Proposed Method

Figure 1: Overall Network Framework. MSRB is shown in Fig 2. The overall network consists of three sub-networks: Rain Streaks Network, Rain-free Network, and Guide-learning Network. The Rain Streak Network and Rain-free Network learn to estimate rain streaks and rain-free images, respectively, and their outputs are cascaded to input the Guide-learning Network as the further guided learning.

In this section, we state more details about our proposed method, including its overall network framework, the multi-scale residual block (MSRB) and loss functions.

3.1 Overall framework

As shown in Fig. 1, the proposed network consists of three sub-networks: rain streaks network, rain-free network, and guide-learning network. The first two sub-networks have the same structures that are both encoder-decoder. And in order to learn better spatial contextual information to further guide to restore clear image, the estimated rain streaks and rain-free image are cascaded to input the guide-learning network with multi-stream dilation convolution to further refine the deraining results. Moreover, to restrain the rain streaks network and rain-free network to generate better according results, the add between estimated rain streaks and rain-free images is restrained via norm according to rainy physical model 1. Furthermore, MSRB is designed to acquire multi-scale information by combining the multi-scale operations and residual block.

3.2 Multi-Scale Residual Block (MSRB)

Multi-scale features have been widely leveraged in many computer vision systems, such as face-alignment [17], semantic segmentation [24], depth estimation [3]

and single image super-resolution 

[23]. Combining features at different scales can result in a better representation of an object and its surrounding context. Therefore, multi-scale residual block (MSRB) is proposed that is the concatenation between different scales of feature maps and the residual block, as shown in Fig. 2.

Figure 2: Multi-Scale Residual Block (MSRB).

We describe the MSRB mathematically: Firstly, we utilize

operation with different size of kernels and strides to obtain the multi-scale features:

(2)

where denotes operation with kernel and stride.
Lastly, all the scales are fused and feed into three convolution layers then add the original input signal to learn the residual:

(3)

where denotes Upsampling operation and denotes concatenation operation at the channel dimension. denotes a series of operations that consist of two and one convolution operations. The MSRB can learn features with different scales and all different features are fused to learn the primary feature.

3.3 Loss function

We use -norm as the loss function.

For the rain streaks network and rain-free network:

(4)
(5)

where , denote the estimated rain streaks layer and clean background image, and denote the ground truth of rain streaks and rain-free image. For guide-learning network:

(6)

where denote the output of guide-learning network, i.e. the final estimated rain-free image.

Moreover, we compute the -norm of the input rainy image and the sum of , in order to constrain the solution space of rain streaks and rain-free network according to the rainy physical model 1:

(7)

So the overall loss function is defined as:

(8)

where are constant.

Dataset Metric DSC [16] LP [13] DDN [6] JORDER [20] RESCAN [12] PReNet [18] Ours
Rain100H PSNR 15.66 14.26 22.26 23.45 25.92 27.89 28.96
SSIM 0.42 0.54 0.69 0.74 0.84 0.89 0.90
Rain100L PSNR 24.16 29.11 34.85 36.11 36.12 36.69 38.64
SSIM 0.87 0.88 0.95 0.97 0.96 0.98 0.99
Rain1200 PSNR 21.44 22.46 30.95 29.75 32.35 32.38 33.42
SSIM 0.79 0.80 0.86 0.87 0.89 0.92 0.93
Table 1: Quantitative experiments evaluated on three synthetic datasets. The best results are marked in bold.

4 Experimental Results

PSNR/SSIM 22.12/0.79 20.31/0.75 24.49/0.79 24.93/0.92 25.84/0.93 Inf/1
PSNR/SSIM 22.89/0.71 20.86/0.69 24.73/0.72 26.20/0.88 27.49/0.89 Inf/1
PSNR/SSIM 30.80/0.88 31.20/0.91 32.74/0.87 33.81/0.95 35.12/0.96 Inf/1
(a) Input (b) DDN (c) JORDER (d) RESCAN (e) PReNet (f) Ours (g) GT
Figure 3: Visual and quantitative comparisons of three synthetic examples. Obviously, the proposed method performs better than the other four deep learning-based methods, especially the region in masked box. Our results shown in (f) have the highest PSNR and SSIM values and are the cleanest.

In this section, we conduct a number of deraining experiments on three synthetic datasets and real-world datasets compared with six state-of-the-art deraining methods, including discriminative sparse coding (DSC) [16], layer priors (LP) [13], deep detail network (DDN) [6], the recurrent version of joint rain detection and removal (JORDER) [20], RESCAN [12] and PReNet [18].

4.1 Experiment settings

Synthetic Datasets

. We carry out experiments to evaluate the performance of our method on three synthetic datasets: Rain100H, Rain100L, and Rain1200, which all have various rain streaks with different sizes, shapes, and directions. There are 1800 image pairs for training and 200 image pairs for testing in Rain100H and Rain100L. In Rain1200, 12000 images are for training and 1200 images for testing. We choose Rain100H as our analysis dataset.

Real-world Testing Images

. We also evaluate the performance of our method on real-world images, which are provided by Zhang et al. [21] and Yang et al. [20]. In these images, they have different rain components from orientation to density.

Training Settings

. In the training process, we randomly crop each training image pairs to

patch pairs. The batch size is chosen as 64. For each convolution layer, we use leaky-ReLU with

as the activation function except for the last layer. We use ADAM algorithm 

[11] to optimize our network. The initial learning rate is , and is updated twice by a rate of

at 1200 and 1600 epochs and the total epoch is 2000.

and

are set as 0.5, 0.5 and 0.001, respectively. Our entire network is trained on 8 Nvidia GTX 1080Ti GPUs based on PyTorch.

Evaluation Criterions

. We use peak signal to noise ratio (PSNR) and structure similarity (SSIM) to evaluate the quality of the recovered results in comparison with ground truth images. PSNR and SSIM are only computed for synthetic datasets, because not only the estimated rain-free images are needed, but also corresponding ground truth images during the computing process. For real-world images, they can only be evaluated by visual comparisons.

4.2 Results on synthetic datasets

Tab. 1 shows quantitatively comparative results between our method and six state-of-the-art deraining methods on Rain100H, Rain100L and Rain1200. There are two conventional methods: DSC [16] (ICCV15) and LP [13] (CVPR16), and four deep learning-based methods: DDN [6] (CVPR17), JORDER [20] (CVPR17), RESCAN [12] (ECCV18) and PReNet [18] (CVPR19). As we can see that our proposed method outperforms these state-of-the-art approaches on the three datasets.

We also show several challenging synthetic examples for visual comparisons in Fig. 3. As the prior based methods are obviously worse than deep learning-based methods according to Tab. 1, we only compare the visual performances with deep learning methods. The first column in Fig. 3 are synthetic images that are severely degraded by rain streaks. Fig 3 (b) and Fig. 3 (c) are the results of DDN [6] and JORDER [20]. Obviously, they both fail to recover an acceptable clean image. Fig. 3 (d) and Fig. 3 (e) are the results of RESCAN [12] and PReNet [18], which have unpleasing artifacts in the masked boxes. As shown in Fig. 3 (f), our results generate best deraining results no matter in quantitatively or visually.


(a) Input
(b) DDN (c) JORDER (d) RESCAN (e) PReNet (f) Ours
Figure 4: Results on two real-world examples of our method compared with four deep learning-based deraining methods. Our results shown in (f) perform better than others visually. Especially, for the first example, our method generates the clearest and cleanest result, while the other methods remain some obvious artifacts or rain streaks. This demonstrates that our method performs superiorly to the state-of-the-art methods. Please zoom in for the best visual comparisons!

4.3 Results on real-world images

To evaluate the robustness of our method on real-world images, we also provide two examples on real-world rainy datasets in Fig. 4. For the first example, our method generates the clearest and cleanest result, while the other methods remain some obvious artifacts or rain streaks. For the second example, the other methods get unpleasing artifacts in the masked box, while our approach generates better clear results. We provide more examples on both synthetic and real-world datasets in our supplemental materials.

4.4 Ablation study

Exploring the effectiveness of multi-scale manner and the restraint of physical model in our network is meaningful. So we design some experiments with different combinations of the proposed network components, such as three sub-networks, multi-scale structure, multi-stream dilation convolution, and . Tab. 2 shows the comparative results and W and W/O mean whether using the multi-scale structure or not. We can observe that the multi-scale manner boosts deraining performance on all models. This illustrates that our designed multi-scale manner is meaningful. Furthermore, Tab. 3 compares the effectiveness of multi-stream dilation convolution and physical model constraint . Fig. 5 provides the outputs of three sub-networks on two real-world rainy images. As we can see, the cropped patches in Fig. 5 (d) perform better than Fig. 5 (c) in detail and texture information, which demonstrates that the guide-learning network is effective in our proposed network.

Metric M_1 M_2 M_3 M_4
W PSNR 28.63 28.56 28.62 28.97
SSIM 0.8949 0.8946 0.8968 0.9015
W/O PSNR 28.24 28.24 28.46 28.72
SSIM 0.8898 0.8900 0.8941 0.8986
Table 2: Ablation study on different models. The best results are marked in bold.
Input Rain Streaks Rain-free Output
Figure 5: Outputs of different sub-networks on two real-world images. Image in (b) is the estimated rain streaks, (c) and (d) show the outputs of rain-free network and guide-learning network, respectively.
  • M_1: Only rain streaks network.

  • M_2: Only rain-free network.

  • M_3: Only input the estimated rain-free image to guide-learning network.

  • M_4: (Default) the input is the concatenation of the estimated rain streaks and rain-free image to the guide-learning network.

  • R_1: Our proposed network without multi-stream dilation convolution.

  • R_2: Our proposed network without .

  • R_3: Our proposed network with multi-stream dilation convolution and , i.e. our proposed final network.

is the -norm of subtraction between the rainy image and the direct sum of the estimated two images from the first two sub-networks.

Metric R_1 R_2 R_3
PSNR 28.95 28.92 28.97
SSIM 0.9011 0.9012 0.9015
Table 3: Analysis on the effectiveness of multi-stream dilation convolution and physical model constraint. The best results are marked in bold.

5 Conclusion

In this paper, we propose an effective method to handle single image deraining. Our network is based on the rainy physical model with guide-learning and the experiments demonstrate the physical model constraint and guide-learning are meaningful. Multi-Scale Residual Block is proposed and verified to boost the deraining performance. Quantitative and qualitative experimental results on both synthetic datasets and real-world datasets demonstrate the favorable of our network for single image deraining.

Acknowledgement

This work was supported by National Natural Science Foundation of China [grant numbers 61976041]; National Key R&D Program of China [grant numbers 2018AAA0100301]; National Science and Technology Major Project [grant numbers 2018ZX04041001-007, 2018ZX04016001-011].

References

  • [1] Y. Chen and C. Hsu (2013) A generalized low-rank appearance model for spatio-temporally correlated rain streaks. In ICCV, pp. 1968–1975. External Links: Link, Document Cited by: §1, §1, §2.
  • [2] Y. Chen, Z. Wang, Y. Peng, Z. Zhang, G. Yu, and J. Sun (2018)

    Cascaded pyramid network for multi-person pose estimation

    .
    In CVPR, pp. 7103–7112. External Links: Link, Document Cited by: §1.
  • [3] D. Eigen, C. Puhrsch, and R. Fergus (2014) Depth map prediction from a single image using a multi-scale deep network. Vol. abs/1406.2283v1. External Links: Link, 1406.2283v1 Cited by: §3.2.
  • [4] Z. Fan, H. Wu, X. Fu, Y. Huang, and X. Ding (2018) Residual-guide network for single image deraining. In ACM MM, pp. 1751–1759. External Links: Link, Document Cited by: §2.
  • [5] X. Fu, J. Huang, X. Ding, Y. Liao, and J. Paisley (2017) Clearing the skies: a deep network architecture for single-image rain removal. IEEE Transactions on Image Processing 26 (6), pp. 2944–2956. External Links: Link, Document Cited by: §1, §1, §2.
  • [6] X. Fu, J. Huang, D. Zeng, Y. Huang, X. Ding, and J. Paisley (2017) Removing rain from single images via a deep detail network. In CVPR, pp. 1715–1723. External Links: Link, Document Cited by: §1, §1, §1, §2, Table 1, §4.2, §4.2, §4.
  • [7] K. Garg and S. K. Nayar (2006) Photorealistic rendering of rain streaks. ACM Trans. Graph. 25 (3), pp. 996–1002. External Links: Link, Document Cited by: §1.
  • [8] K. Garg and S. K. Nayar (2007) Vision and rain. International Journal of Computer Vision 75 (1), pp. 3–27. External Links: Link, Document Cited by: §1.
  • [9] D. Huang, L. Kang, M. Yang, C. Lin, and Y. F. Wang (2012) Context-aware single image rain removal. In ICME, pp. 164–169. External Links: Link, Document Cited by: §1.
  • [10] L. Kang, C. Lin, and Y. Fu (2012) Automatic single-image-based rain streaks removal via image decomposition. IEEE Transactions on Image Processing 21 (4), pp. 1742–1755. External Links: Link, Document Cited by: §1, §1, §2.
  • [11] D. P. Kingma and J. Ba (2014) Adam: A method for stochastic optimization. In CoRR, Vol. abs/1412.6980. External Links: Link, 1412.6980 Cited by: §4.1.
  • [12] X. Li, J. Wu, Z. Lin, H. Liu, and H. Zha (2018) Recurrent squeeze-and-excitation context aggregation net for single image deraining. In ECCV, pp. 262–277. External Links: Link, Document Cited by: §1, §1, §2, Table 1, §4.2, §4.2, §4.
  • [13] Y. Li, R. T. Tan, X. Guo, J. Lu, and M. S. Brown (2016) Rain streak removal using layer priors. In CVPR, pp. 2736–2744. External Links: Link, Document Cited by: §1, §1, §2, Table 1, §4.2, §4.
  • [14] T. Lin, P. Dollár, R. B. Girshick, K. He, B. Hariharan, and S. J. Belongie (2017) Feature pyramid networks for object detection. In CVPR, pp. 936–944. External Links: Link, Document Cited by: §1.
  • [15] J. Long, E. Shelhamer, and T. Darrell (2015) Fully convolutional networks for semantic segmentation. In CVPR, pp. 3431–3440. External Links: Link, Document Cited by: §1.
  • [16] Y. Luo, Y. Xu, and H. Ji (2015) Removing rain from a single image via discriminative sparse coding. In ICCV, pp. 3397–3405. External Links: Link, Document Cited by: §1, §1, §2, Table 1, §4.2, §4.
  • [17] X. Peng, R. S. Feris, X. Wang, and D. N. Metaxas (2016) A recurrent encoder-decoder network for sequential face alignment. In ECCV, pp. 38–56. External Links: Link, Document Cited by: §3.2.
  • [18] D. Ren, W. Zuo, Q. Hu, P. Zhu, and D. Meng (2019) Progressive image deraining networks: a better and simpler baseline. In CVPR, Cited by: §2, Table 1, §4.2, §4.2, §4.
  • [19] A. K. Tripathi and S. Mukhopadhyay (2014) Removal of rain from videos: a review. Signal, Image and Video Processing 8 (8), pp. 1421–1430. External Links: Link, Document Cited by: §1.
  • [20] W. Yang, R. T. Tan, J. Feng, J. Liu, Z. Guo, and S. Yan (2017) Deep joint rain detection and removal from a single image. In CVPR, pp. 1685–1694. External Links: Link, Document Cited by: §1, §1, §2, Table 1, §4.1, §4.2, §4.2, §4.
  • [21] H. Zhang, V. Sindagi, and V. M. Patel (2017) Image de-raining using a conditional generative adversarial network. In CoRR, Vol. abs/1701.05957. External Links: Link, 1701.05957 Cited by: §1, §1, §2, §4.1.
  • [22] X. Zhang, H. Li, Y. Qi, W. K. Leow, and T. K. Ng (2006) Rain removal in video by combining temporal and chromatic properties. See DBLP:conf/icmcs/2006, pp. 461–464. External Links: Link, Document Cited by: §1.
  • [23] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu (2018) Residual dense network for image super-resolution. Vol. abs/1802.08797. External Links: Link, 1802.087971 Cited by: §3.2.
  • [24] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia (2017) Pyramid scene parsing network. In CVPR, pp. 6230–6239. External Links: Link, Document Cited by: §3.2.