A Deep Tree-Structured Fusion Model for Single Image Deraining

11/21/2018 ∙ by Xueyang Fu, et al. ∙ Xiamen University 14

We propose a simple yet effective deep tree-structured fusion model based on feature aggregation for the deraining problem. We argue that by effectively aggregating features, a relatively simple network can still handle tough image deraining problems well. First, to capture the spatial structure of rain we use dilated convolutions as our basic network block. We then design a tree-structured fusion architecture which is deployed within each block (spatial information) and across all blocks (content information). Our method is based on the assumption that adjacent features contain redundant information. This redundancy obstructs generation of new representations and can be reduced by hierarchically fusing adjacent features. Thus, the proposed model is more compact and can effectively use spatial and content information. Experiments on synthetic and real-world datasets show that our network achieves better deraining results with fewer parameters.



There are no comments yet.


page 1

page 3

page 4

page 6

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Rain can severely impair the performance of many computer vision systems, such as road surveillance, autonomous driving and consumer camera. Effectively removing rain streaks from images is an important task in the computer vision community. To address the deraining problem, many algorithms have been designed to remove rain streaks from single rainy images. Unlike video based methods

[11, 2, 3, 31, 30, 18, 35, 24, 5], which have useful temporal information, single image deraining is a significantly harder problem. Furthermore, success in single images can be directly extended to video, and so single image deraining has received much research attention.

In general, single image deraining methods can be categorized into two classes: model-driven and data-driven. Model-driven methods are designed by using handcrafted image features to describe physical characteristics of rain streaks, or exploring prior knowledge to constrain the ill-posed problem. In [20], the derained image is obtained by filtering a rainy image with a nonlocal mean smoothing filter. Several model-driven methods adopt various priors to separate rain streaks and content form rainy images. For example, in [19] morphological component analysis based dictionary learning is used to remove rain streaks in high frequency regions. To recognize rain streaks, a self-learning based image decomposition method is introduced in [16]. In [27], based on image patches, a discriminative sparse coding is proposed to distinguish rain streaks from non-rain content. In [6, 4], low-rank assumptions are used to model and separate rain streaks. In [33], the authors use a hierarchical scheme combined with dictionary learning to progressively remove rain and snow. In [13], the authors utilize convolutional analysis and synthesis sparse representation to extract rain streaks. In [40], three priors are explored and combined into a joint optimization process for rain removal.

(a) Rainy image
(b) Our result
Figure 1: A deraining example of real-world image under extreme rainy conditions. Our network contains 35,427 parameters.
Figure 2: The framework of the proposed deep tree-structured fusion model for single image deraining. Our network contains eight dilated convolution blocks. indicates the dilated factor. The fusion operation is expressed in equation (2.2).

Recently, data-driven methods using deep learning have dominated high-level vision tasks

[14, 17] and low-level image processing [8, 7, 32, 28, 29]. The first deep learning method for rain streaks removal was introduced by [9], where the authors use domain knowledge and train the network on high-frequency parts to simplify the learning processing. This method was improved in [10] by combining ResNet [14] and a global skip connection. Other methods focus on designing advanced network structure to improve deraining performance. In [36], a recurrent dilated network with multi-task learning is proposed for joint rain streaks detection and removal. In [25]

, the recurrent neural network architecture is adopted and combined with squeeze-and-excitation (SE) blocks

[15] for rain removal. In [38], a density aware multi-stream dense CNN, combined with generative adversarial network [12, 39]

, is proposed to jointly estimate rain density and remove rain.

Deep learning methods can focus on incorporating domain knowledge to simplify the learning process with generic networks [9, 10] or on designing new network architectures for effective modeling representations [36, 38]. These works do not model the structure of the features themselves for deraining. In this paper, we show that feature fusion can improve single image deraining and reduce the number of parameters, as shown in Figure 1.

In this paper, we propose a deep tree-structured hierarchical fusion model. The proposed tree-structured fusion operation is deployed within each dilated convolutional block and across all blocks, and can explore both spatial and content information. The proposed network is easy to implement by using standard CNN techniques and has far fewer parameters than typical networks for this problem.

2 Proposed method

Figure 2 shows the framework of our proposed hierarchical network. We adopt the multi-scale dilated convolution as the basic operation within each network block to learn multi-scale rain structures. Then, a tree-structured fusion operation within and across blocks is designed to reduce the redundancy of adjacent features. This operation enables the network to better explore and reorganize features in width and depth. The direct output of the network is the residual image, which is a common modeling technique used in existing deraining methods [10, 36] to ease learning. The final derained result the difference between the estimated residual and the rainy image. We describe our proposed architecture in more detail below.

2.1 Network components

Our proposed network contains three basic network components: one feature extraction layer, eight dilated convolution blocks and one reconstruction layer. The feature extraction layer is designed to extract basic features from the input color image. The operation of this layer is defined as by


where is the input rainy image, is the feature map, indexes layer number, indicates the convolution operation, and are the parameters in the convolution, and is the non-linear activation.

Different from typical image noise, rain streaks are spatially long. We therefore use dilated convolutions [37] in the basic network block to capture this structure. Dilated convolutions can increase the receptive field, increasing the contextual area while preserving resolution by dilating the same filter to different scales. To reduce the number of parameters, in each block we use one convolutional kernel with different dilation factors. The multi-scale features within each dilated convolution block are obtained by


where is the dilation factor, is the output feature of convolution with , and is the total number of layers. Note that the parameters and are shared for different dilated convolutions. The multi-scale features are fused through tree-structured aggregation to generate single-scale features . (This hierarchical operation will be detailed at the following section.) To better propagate information, we use a skip connection to generate the output of each block by


The reconstruction layer is used to generate the color residual from previous features. The final result is obtained by


where and are the output residual and derained image.

2.2 Tree-structured feature fusion

In this section, we will detail our proposed tree-structured feature fusion strategy. In [36], a parallel fusion structure directly added all feature maps of different dilated factors. In contrast, we design a tree-structured operation that fuses adjacent features. We use a convolution to allow the network to automatically perform this fusion. As illustrated in Figure 3, the parallel structure of [36] can be seen as an instance of this tree-structured fusion in which Equation (2.2) is replaced by a summation.

(a) Parallel-structured fusion
(b) Tree-structured fusion
Figure 3: Comparison on fusion strategy with JORDER [36]. Red circles denote the operation (2.2). If all red circles are replaced by summation, (b) becomes (a).

We adopt and deploy this hierarchical feature fusion within each basic block and across the entire network. This allows for information propagation similar to ResNet [14], and information fusion similar to DenseNet [17]. It will also provide a sparser structure that can reduce parameters and memory usage. The fusion operation is defined as


where and are adjacent features that have the same dimensions, denotes the concatenation. is a kernel of size to fuse and . After fusion, the generated has the same dimensions as and . As shown in Figure 2, by employing this fusion operation within each block and across all blocks, the network has a tree-structured representation in both width and depth. We design this strategy based on the assumption that adjacent features contain redundant information.

(a) Feature maps ()
(b) Feature maps ()
(c) Directly adding (a) and (b)
(d) Fusing (a) and (b)
Figure 4: One example of fusion within the 1st block. The fused features are generated by using equation (2.2). Yellow rectangles indicates that the fused features (d) contain more effective representations for spatial information, which gives significant enhancement of object details and edges.

To illustrate the value of this fusion operation, we show the learned, within-block feature maps in Figure 4. In Figures 4(a) and (b) we show the adjacent feature maps generated with dilation factors and in the first block. As can be seen, the two feature maps have similar appearance and thus contain redundant information. Figure 4(d) shows the fused feature maps using Equation (2.2). It is clear that these fused features are more informative. Both rain and object details are highlighted.

(a) 3rd block features
(b) 4th block features
(c) Directly adding (a) and (b)
(d) Fusing (a) and (b)
Figure 5: One example of fusion effect across blocks. The fused features are generated by using Equation (2.2). Yellow rectangles indicates that the fused features (d) contain more effective representations for content information, which have relatively high contrast to recognize objects.

The tree-structured fusion is also used across blocks, which we illustrate in Figure 5 for the third and forth blocks. As can be seen in Figures 5(a) and (b), the two feature maps have similar content, and so features from adjacent blocks are still redundant. As shown in Figure 5(d), compared with the two input features, the fused feature maps not only remain similar in their high-frequency content, but also generate new representations. Moreover, compared with direct addition, shown in Figures 4(c) and 5(c), using the proposed hierarchical fusion can generate more effective spatial and content representations.

We show a statistical analysis of this redundancy in Figure 6. Figures 6(a) and (b) show statistics of the difference between adjacent features generated by different dilation factors. It is clear that adjacent features are similar, indicating a duplication of information, as also illustrated in Figures 4(a) and (b). This redundancy also exists across blocks, as shown in Figures 6(c) and (d). This is because for this regression task the resolution of feature maps at deeper layers are the same as the input image , meaning deeper features have no significant change in global content [10, 36, 25]. This is in contrast to high-level vision problems that require pooling operations to extract high-level semantic information [23, 14]. To a certain extent the redundancy in global content will persist as the network deepens, motivating fusion in this direction as well. The corresponding fused features have a significant change, shown in Figure 6

. The average appears shifted and the standard deviation becomes larger, indicating that the fused features remove redundant spatial information, also illustrated in Figures

4(d) and 5(d).

(a) Within the 1st block
(b) Within the 3rd block
(c) Across 1st and 2nd blocks
(d) Across 5th and 6th blocks
Figure 6: Error bars verifying the redundancy in adjacent features.

2.3 Loss Function

The most widely used loss function for training a network is mean squared error. However, MSE usually generates over-smoothed results because of its

penalty. To address this drawback, we combine the MSE with the SSIM [34] as our loss function to balance between rain removal and detail preservation,


where and are the MSE and SSIM loss, respectively. is the number of training data and is the ground truth. is the parameter to balance the two losses.

2.4 Parameters settings and training details

We set the size of the kernels in Equations (4) and (2.2) are and the rest are . The number of feature maps is for all convolutions and the non-linear activation

is ReLU

[23]. The dilated factor is set from to within each block. We found dilated convolution blocks are enough to generate good results. We set . The layer number which indicates two convolutional layers and eight blocks shown in Figure 2

. All network components can be built by using standard CNN techniques. Other activation functions and network structures can be directly embedded, such as xUnit

[22] and recurrent architectures [25].

We use TensorFlow

[1] and Adam [21] with a mini-batch size of 10 to train our network. We initialize the learning rate to 0.001, divide it by 10 at 100K and 200K iterations, and terminate training after 300K iterations. We randomly select patch pairs from training image datasets as inputs. All experiment are performed on a server with Intel(R) Xeon(R) CPU E5-2683, 64GB RAM and NVIDIA GTX 1080. The network is trained end-to-end.

3 Experiments

We compare our network with two model-based deraining methods: the Gaussian Mixture Model (GMM)

[26] and Joint Convolutional Analysis and Synthesis (JCAS) [13]. We also compare with four deep learning-based methods: Deep Detail Network (DDN) [10], JOint Rain DEtection and Removal (JORDER) [36], Density-aware Image Deraining (DID) [38] and REcurrent Squeeze-and-excitation Context Aggregation Net (RESCAN) [25]. All methods are retrained for a fair and meaningful comparison.111Due to retraining and different data, the quantitative results may be different from the results reported in the corresponding articles.

Rain100H 0.43 15.05 0.51 15.23 0.81 26.88 0.84 26.54 0.83 26.12 0.85 26.45 0.88 27.46
Rain1400 0.83 26.53 0.85 26.80 0.89 29.99 0.90 28.90 0.90 29.84 0.91 31.18 0.92 31.32
Rain1200 0.80 22.46 0.81 25.16 0.86 30.95 0.87 29.75 0.90 29.65 0.89 32.35 0.92 32.30
Parameters # - - 58,175 (-39%) 369,792 (-90%) 372,839 (-90%) 54,735 (-35%) 35,427
Table 1: Average SSIM and PSNR values on synthesized images. The best and the second best results are boldfaced and underlined. Numbers in parentheses indicates the parameter reduction.
512 512 1.9910 0.9710 1.51 0.16 2.9510 0.18 7.26 0.31 6.29 0.13 1.94 0.16
1024 1024 6.5210 6.5710 5.40 0.32 1.2010 0.82 1.7810 0.78 1.2310 0.28 7.51 0.28
Table 2: Comparison of running time (seconds).

3.1 Synthetic data

We us three public synthetic datasets provided by JORDER [36], DDN [10] and DID [38]. These three datasets were generated using different synthetic strategies. The JORDER dataset contains 100 testing images with heavy rain streaks. The other two contain 1400 and 1200 testing images, respectively. We call them Rain100H, Rain1400 and Rain1200 below.

(a) Clean SSIM
(b) Input 0.81
(c) GMM 0.90
(d) JCAS 0.90
(e) DDN 0.89
(f) JORDER 0.91
(g) DID 0.91
(h) RESCAN 0.93
(i) Ours 0.93
(a) Clean SSIM
(b) Input 0.35
(c) GMM 0.41
(d) JCAS 0.49
(e) DDN 0.78
(f) JORDER 0.81
(g) DID 0.82
(h) RESCAN 0.79
(i) Ours 0.85
Figure 7: One visual comparisons on ‘Rain1400’ dataset.
Figure 8: One visual comparisons on ‘Rain100H’ dataset.
Figure 7: One visual comparisons on ‘Rain1400’ dataset.
(a) Clean SSIM
(b) Input 0.29
(c) GMM 0.45
(d) JCAS 0.51
(e) DDN 0.82
(f) JORDER 0.86
(g) DID 0.91
(h) RESCAN 0.89
(i) Ours 0.91
Figure 9: One visual comparisons on ‘Rain1200’ dataset.

Figures 8 to 9 show three visual results from each dataset with different rain orientations and magnitudes. It is clear that GMM and JCAS fail due to modeling limitations. As shown in the red rectangles, DDN and JORDER are able to remove the rain streaks while tending to generate artifacts. DID has a good deraining performance while slightly blurring edges, shown in Figure 9(e) and (g). RESCAN and our model have similar global visual performance and outperform other methods. We also calculate PSNR and SSIM for quantitative evaluation in Table 1. Our method has the best overall results on both PSNR and SSIM, indicating our tree-structured fusion can better represent spatial information. Moreover, our network contains the fewest parameters which makes it more suitable for practical applications.

3.2 Real-world data

We also show that our learned network, which is trained on synthetic data, translates well to real-world rainy images. Figures 11 and 11 show two typical visual results on real-world images. The red rectangles indicate that our network can simultaneously remove rain and preserve details.

(a) Input
(b) GMM
(c) JCAS
(d) DDN
(f) DID
(h) Our
(a) Input
(b) GMM
(c) JCAS
(d) DDN
(f) DID
(h) Our
Figure 10: Visual comparisons on real-world rainy images.
Figure 11: Visual comparisons on real-world rainy images.
Figure 10: Visual comparisons on real-world rainy images.

We further collect 300 real-world rainy images from the Internet and existing articles [10, 36, 38] as a new dataset.222We will release our code and this new dataset. We then asked participants to conduct a user study for realistic feedback. The participants are told to rank each derained result from to randomly presented without knowing the corresponding algorithm. ( represents the worst quality and represents the best quality.) We show the scatter plots in Figure 12 where we see our network has the best performance. This provides some additional support for our tree-structured fusion model.

Figure 12: Scatter plots of rainy inputs vs derained user scores. Global mean and standard deviation also shown. (Best viewed zoomed in.)

3.3 Running time

We show the average running time of different methods in Table 2. Two different image sizes are chosen and each one is averaged over 100 images. The GMM and JCAS are implemented on CPUs according to the provided code, while other deep learning-based methods are tested on both CPU and GPU. The GMM has the slowest running time since complicated inference is still required to process each new image. Our method has a comparable GPU running time compared with other deep learning methods, and is significantly faster than several deep models on a CPU. This is because our network is straightforward without extra operations, such as recurrent structures [36, 38].

3.4 Ablation study

We provide an ablation study to demonstrate the advantage of each part of our network structure.

3.4.1 Fusion deployment

To validate our tree-structured feature fusion strategy, we design two variants of the proposed network for exhaustive comparison. One is called Network that only uses the fusion operation within each block. The other is called Network that only uses the fusion operation across all blocks. This experiment is trained and tested on the Rain100H dataset and we use JORDER as the baseline. Table 5 shows the SSIM performance changes on Rain100H. As can be seen, compared with JORDER [36], Network brings a performance increase of 1.56%, while Network leads to a 3.35% improvement. This is because large receptive fields help to capture rain streaks in larger areas, which is a crucial factor for the image deraining problem. However, deploying fusion operations across blocks bring more benefits when building very deep models, since more and richer content information will be generated. By combining the hierarchical structure of Network and Network into single final network (our proposed model), the highest SSIM values can be obtained.

JORDER Network Network Final
SSIM 0.835 0.848 0.863 0.877
Table 3: Ablation study on fusion deployment.
0.823 0.857 0.863
(default) 0.841 0.877 0.879
0.846 0.881 0.882
Table 4: Ablation study on dilated factors and block numbers.

3.4.2 Dilation factor versus block number

We test the impact of dilation factor and block number on the Rain100H dataset. Specifically, we test for dilation factors and basic block numbers . The SSIM results are shown in Table 4. As can be seen, increasing dilation and blocks can generate higher SSIM values. Moreover, larger dilations result in larger receptive fields, which has a greater advantage over increasing the number of blocks. However, increasing and block number eventually brings only limited improvement at the cost of slower speed. Thus, to balance the trade-off between performance and speed, we choose depth and as our default setting.

3.4.3 Parameter reduction

We next design an experiment that defines all deep learning based methods to have a similar number of parameters. Note that this keeps the respective network structures unchanged. Table 5 shows the respective parameter numbers and a quantitative comparison on the Rain100H dataset. As is clear, the improvement of our model becomes more significant when all methods have similar number of parameter. Our combination of dilated convolutions with a tree-structured fusion process can more efficiently represent and remove rain from images with a relatively lightweight network architecture.

SSIM 0.79 0.81 0.82 0.84 0.88
PSNR 25.14 25.47 25.65 26.17 27.46
Parameters # 33,267 36,528 35,812 34,790 35,427
Table 5: Quantitative comparisons on parameters reduction.

3.4.4 Loss function

We also test the impact of using the MSE and SSIM loss functions. Figure 13 shows one visual comparison on the Rain100H dataset. As shown in Figure 13(b), using only MSE loss generates an overly-smooth image with obvious artifacts, because the penalty over-penalizes larger errors, which tend to occur at edges. SSIM focuses on structural similarity and is appropriate for preserving details. However, the result has a low contrast as shown in Figure 13(c). This is because the image contrast is related to low-frequency information, which is not penalized as heavily by SSIM. Using the combined loss in Equation (6) can further improve the performance. In Table 6, we show the quantitative evaluations of using different loss functions. We observe that PSNR is a function of MSE, and so in both cases the quantitative performance measure should favor the objective function that optimizes over this measure. Figure 13 shows that this balance results in a better image.

Rain100H 0.85 28.63 0.88 25.47 0.88 27.46
Rain1400 0.91 33.24 0.92 30.41 0.92 31.32
Rain1200 0.90 33.21 0.93 30.13 0.92 32.30
Table 6: Quantitative comparisons for different loss functions.
(a) Input
(b) MSE loss
(c) SSIM loss
(d) Eq. (6)
Figure 13: An example by using different losses. Using SSIM + MSE loss generates a clean result with good global contrast.
(a) ResNet (shallow)
(b) DenseNet (shallow)
(c) ResNet (deep)
(d) DenseNet (deep)
(e) Our
Figure 14: Visualizations of feature maps of last convolutional layers of ResNet [14], DenseNet [17] and our model.

3.5 Comparison with ResNet and DenseNet

Finally, to further support our tree-structured feature fusion strategy, we compare with two popular general network architectures, i.e., ResNet [14] and DenseNet [17]. These methods are also designed with feature aggregation in mind. We use the same hyper-parameter setting of kernel number to 16, and use the same loss function (6). We remove all pooling operations for this image regression problem and performance is evaluated on the Rain100H dataset.

First, we build relatively shallow models based on ResNet (10 convolutional layers) and DenseNet (3 dense blocks), to enforce all networks to have roughly the same number of parameters. Then, we increase the depth of ResNet (82 convolutional layers) and DenseNet (20 dense blocks) to construct deeper models, which is a common means for significantly increasing model capacity and improving performance with these models [32]. As shown in Table 7, our model significantly outperforms other two models under similar parameter number settings. Moreover, deeper models only achieve marginal improvement at the cost of parameter increase and computational burden.

We also show an visual comparison of the feature maps of the last convolutional layers in Figure 14. Intuitively, deeper networks can generate richer representations due to more nonlinear operations and information propagation [17, 32]. However, as shown in Figure 14, even when compared to the deepest features, our relatively shallow tree-structured model can generate more distinguishing features. Both rain streaks and objects are represented and highlighted well. We believe this is because as the ResNet and DenseNet deepen, direct and one-way feature propagation can only reduce a limited amount of redundancy, which hinders new features generation. Our model, on the other hand, reduces feature redundancy hierarchically using a tree-structured representation, which as seen in models such as wavelet trees is a good method for representing redundant information within an image.

ResNet DenseNet our
shallow deep shallow deep -
SSIM 0.84 0.89 0.85 0.90 0.88
PSNR 25.71 28.03 25.87 28.24 27.46
Parameters # 38,003 186,483 35,683 232,883 35,427
Table 7: Comparisons with ResNet and DenseNet on Rain100H.

4 Conclusion

We have introduced a deep tree-structured fusion model for single image deraining. By using a simple feature fusion operation in the network, both spatial and content information are fused to reduce redundant information. These new and compact fused features leads to a significant improvement on both synthetic and real-world rainy images with a significantly reduced number of parameters We anticipate that this tree-structured framework can improve other vision and image restoration tasks and plan to investigate this in future work.


  • [1] M. Abadi, A. Agarwal, P. Barham, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv: 1603.04467, 2016.
  • [2] P. C. Barnum, S. Narasimhan, and T. Kanade. Analysis of rain and snow in frequency space. Int’l. J. Computer Vision, 86(2):256–274, 2010.
  • [3] J. Bossu, N. Hautiere, and J. P. Tarel. Rain or snow detection in image sequences through use of a histogram of orientation of streaks. Int’l. J. Computer Vision, 93(3):348–367, 2011.
  • [4] Y. Chang, L. Yan, and S. Zhong. Transformed low-rank model for line pattern noise removal. In ICCV, 2017.
  • [5] J. Chen, C. Tan, J. Hou, L. Chau, and H. Li. Robust video content alignment and compensation for rain removal in a CNN framework. In CVPR, 2018.
  • [6] Y. L. Chen and C. T. Hsu. A generalized low-rank appearance model for spatio-temporally correlated rain streaks. In ICCV, 2013.
  • [7] C. Dong, C. C. Loy, K. He, and X. Tang.

    Image super-resolution using deep convolutional networks.

    IEEE Trans. Pattern Anal. Mach. Intell., 38(2):295–307, 2016.
  • [8] D. Eigen, D. Krishnan, and R. Fergus. Restoring an image taken through a window covered with dirt or rain. In ICCV, 2013.
  • [9] X. Fu, J. Huang, X. Ding, Y. Liao, and J. Paisley. Clearing the skies: A deep network architecture for single-image rain removal. IEEE Trans. Image Process., 26(6):2944–2956, 2017.
  • [10] X. Fu, J. Huang, D. Zeng, Y. Huang, X. Ding, and J. Paisley. Removing rain from single images via a deep detail network. In CVPR, 2017.
  • [11] K. Garg and S. K. Nayar. Vision and rain. Int’l. J. Computer Vision, 75(1):3–27, 2007.
  • [12] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In NIPS, 2014.
  • [13] S. Gu, D. Meng, W. Zuo, and L. Zhang. Joint convolutional analysis and synthesis sparse representation for single image layer separation. In ICCV, 2017.
  • [14] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.
  • [15] J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks. In CVPR, 2018.
  • [16] D. A. Huang, L. W. Kang, Y. C. F. Wang, and C. W. Lin. Self-learning based image decomposition with applications to single image denoising. IEEE Trans. Multimedia, 16(1):83–93, 2014.
  • [17] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. In CVPR, 2017.
  • [18] T.-X. Jiang, T.-Z. Huang, X.-L. Zhao, L.-J. Deng, and Y. Wang.

    A novel tensor-based video rain streaks removal approach via utilizing discriminatively intrinsic priors.

    In CVPR, 2017.
  • [19] L. W. Kang, C. W. Lin, and Y. H. Fu. Automatic single image-based rain streaks removal via image decomposition. IEEE Trans. Image Process., 21(4):1742–1755, 2012.
  • [20] J. H. Kim, C. Lee, J. Y. Sim, and C. S. Kim. Single-image deraining using an adaptive nonlocal means filter. In IEEE ICIP, 2013.
  • [21] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In ICLR, 2014.
  • [22] I. Kligvasser, T. Rott Shaham, and T. Michaeli. xUnit: Learning a spatial activation function for efficient image restoration. In CVPR, 2018.
  • [23] A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. In NIPS, 2012.
  • [24] M. Li, Q. Xie, Q. Zhao, W. Wei, S. Gu, J. Tao, and D. Meng. Video rain streak removal by multiscale convolutional sparse coding. In CVPR, 2018.
  • [25] X. Li, J. Wu, Z. Lin, H. Liu, and H. Zha. Recurrent squeeze-and-excitation context aggregation net for single image deraining. In ECCV, 2018.
  • [26] Y. Li, R. T. Tan, X. Guo, J. Lu, and M. S. Brown. Rain streak removal using layer priors. In CVPR, 2016.
  • [27] Y. Luo, Y. Xu, and H. Ji. Removing rain from a single image via discriminative sparse coding. In ICCV, 2015.
  • [28] J. Pan, S. Liu, D. Sun, et al. Learning dual convolutional neural networks for low-level vision. In CVPR, 2018.
  • [29] R. Qian, R. T. Tan, W. Yang, J. Su, and J. Liu. Attentive generative adversarial network for raindrop removal from a single image. In CVPR, 2018.
  • [30] W. Ren, J. Tian, Z. Han, A. Chan, and Y. Tang. Video desnowing and deraining based on matrix decomposition. In CVPR, 2017.
  • [31] V. Santhaseelan and V. K. Asari. Utilizing local phase information to remove rain from video. Int’l. J. Computer Vision, 112(1):71–89, 2015.
  • [32] Y. Tai, J. Yang, X. Liu, and C. Xu. MemNet: A persistent memory network for image restoration. In ICCV, 2017.
  • [33] Y. Wang, S. Liu, C. Chen, and B. Zeng. A hierarchical approach for rain or snow removing in a single color image. IEEE Trans. Image Process., 26(8):3936–3950, 2017.
  • [34] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process., 13(4):600–612, 2004.
  • [35] W. Wei, L. Yi, Q. Xie, Q. Zhao, D. Meng, and Z. Xu. Should we encode rain streaks in video as deterministic or stochastic? In ICCV, 2017.
  • [36] W. Yang, R. T. Tan, J. Feng, J. Liu, Z. Guo, and S. Yan. Deep joint rain detection and removal from a single image. In CVPR, 2017.
  • [37] F. Yu and V. Koltun. Multi-scale context aggregation by dilated convolutions. In ICLR, 2015.
  • [38] H. Zhang and V. Patel. Density-aware single image de-raining using a multi-stream dense network. In CVPR, 2018.
  • [39] H. Zhang, V. Sindagi, and V. M. Patel. Image de-raining using a conditional generative adversarial network. arXiv preprint arXiv:1701.05957, 2017.
  • [40] L. Zhu, C. W. Fu, D. Lischinski, and P. A. Heng. Joint bi-layer optimization for single-image rain streak removal. In ICCV, 2017.