Lightweight Pyramid Networks for Image Deraining

05/16/2018 ∙ by Xueyang Fu, et al. ∙ Xiamen University 0

Existing deep convolutional neural networks have found major success in image deraining, but at the expense of an enormous number of parameters. This limits their potential application, for example in mobile devices. In this paper, we propose a lightweight pyramid of networks (LPNet) for single image deraining. Instead of designing a complex network structures, we use domain-specific knowledge to simplify the learning process. Specifically, we find that by introducing the mature Gaussian-Laplacian image pyramid decomposition technology to the neural network, the learning problem at each pyramid level is greatly simplified and can be handled by a relatively shallow network with few parameters. We adopt recursive and residual network structures to build the proposed LPNet, which has less than 8K parameters while still achieving state-of-the-art performance on rain removal. We also discuss the potential value of LPNet for other low- and high-level vision tasks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 6

page 7

page 8

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

As a common weather condition, rain impacts not only human visual perception but also computer vision systems, such as self driving vehicles and surveillance systems. Due to the effects of light refraction and scattering, objects in an image are easily blurred and blocked by individual rain streaks. When facing heavy rainy conditions, this problem becomes more severe due to the increased density of rain streaks. Since most existing computer vision algorithms are designed based on the assumption of clear inputs, their performance is easily degraded by rainy weather. Thus, designing effective and efficient algorithms for rain streak removal is a significant problem with many downstream uses. Figure

1 shows an example of our lightweight pyramid network.

I-a Related works

(a) Rainy image
(b) Our result
Fig. 1: An deraining example of our LPNet for single image deraining. The whole network only contains 7,548 parameters.
Fig. 2: The proposed structure of our deep lightweight pyramid of networks (LPNet) based on Gaussian-Laplacian image pyramids. The bottom level of the reconstructed Gaussian pyramid is the final de-rained image.

Depending on the input data, rain removal algorithms can be categorized into video and single-image based methods.

I-A1 Video based methods

We first briefly review the rain removal methods in a video, which was the major focus in the early stages of this problem. These methods use both spatial and temporal information from video. The first study on video deraining removed rain from a static background using average intensities from the neighboring frames [1]. Other methods focus on deraining in the Fourier domain [2]

, using Gaussian mixture models

[3], low rank approximations [4] and via matrix completions [5]. In [6], the authors divide rain streaks into sparse ones and dense ones, then a matrix decomposition based algorithm is proposed for deraining. More recently, [7] proposed a patch-based mixture of Gaussians for rain removal in video. Though these methods work well, they require temporal content of video. In this paper we instead focus on the single image deraining problem.

I-A2 Single-image methods

Since information is drastically reduced in individual images, single image deraining is a much more difficult problem. Methods for addressing this problem have employed kernels [8], low rank approximations [4, 9] and dictionary learning [10, 11, 12, 13]. In [8], rain streaks are detected and removed by using kernel regression and a non-local mean filtering. In [10], the authors decompose a rainy image into its low- and high- frequency components. The high-frequency part is processed to extract and remove rain streaks by using sparse-coding based dictionary learning. In [11], a self learning method is proposed to automatically distinguish rain streaks from the high-frequency part. A discriminative sparse coding method is proposed in [12]

. By forcing the coefficient vector of rain layer to be sparse, the objective function is solved to separate background and rain streaks. Other methods have used mixture models

[14] and local gradients [15] to model and then remove rain streaks. In [14], by utilizing Gaussian Mixture Models (GMMs), the authors explore patch-based priors for both the clean and rain layers. The GMM prior for background layers is learned from natural images, while that for rain streaks layers is learned from rainy images. In [15], three new priors are defined by exploring local image gradients. The priors are used to modeling the objective function which is solved by using alternating direction method of multipliers (ADMM).

Deep learning has also been introduced for this problem. Convolutional neural networks (CNN) have proven useful for a variety of high-level vision tasks [16, 17, 18, 19, 20] as well as various image processing problems [21, 22, 23, 24, 25]. In [26], a related work based on deep learning was introduced to remove static raindrops and dirt spots from pictures taken through windows. Our previous CNN-based method for removing dynamic rain streaks was introduced by [27]. Here the authors build a relative shallow network with 3 layers to extract features of rain streaks from the high frequency content of a rainy image. Based on the introduction of an effective strategy for training very deep networks [20], two deeper networks were proposed based on image residuals [28] and multi-scale information [29]. In [30], the authors utilize the generative adversarial framework to further enhance the textures and improve the visual quality of de-rained results. Recently, in [31]

, a density aware multi-stream densely connected CNN is proposed for joint rain density estimation and de-raining. This method can automatically generate rain density label, which is further utilized to guide rain streaks removal.

I-B Our contributions

Though very deep networks achieve excellent performance on single image deraining, a main drawback that potentially limits their application in mobile devices, automatic driving, and other computer vision tasks is their huge number of parameters. As a networks become deeper, more storage space is required [32]

. To address this issue, we propose a lightweight pyramid network (LPNet), which contains fewer than 8K parameters, with the single image rain removal problem in mind. Instead of designing a complex network structure, we use problem-specific knowledge to simplify the learning process. Specifically, we first adopt Laplacian pyramids to decompose a degraded/rainy image into different levels. Then we use recursive and residual networks to build a sub-network for each level to reconstruct Gaussian pyramids of derained images. A specific loss function is selected for training each sub-network according to its own physical characteristics and the whole training is performed in a multi-task supervision. The final recovered image is the bottom level of the reconstructed Gaussian pyramid.

The main feature of our LPNet approach is to use the mature Gaussian-Laplacian image pyramid technique [33] to transform one hard problem into several easier sub-problems. In other words, since the Laplacian pyramid contains different levels that can differentiate large scale edges from small scale details, one can design simple and lightweight sub-network to handle each level in a divide-and-conquer way. The contributions of our paper are summarized as follows:

  1. We show how by combining the classical Gaussian-Laplacian pyramid technique with CNN, a simple network structure with few parameters and relative shallow depth is sufficient for excellent performance. To our knowledge the resulting network is far more lightweight (in terms of parameters) among deep networks with comparable performance.

  2. Through multi-scale techniques and recursive and residual deep learning, our proposed network achieves state-of-the-art performances on single image deraining. Although LPNet is trained on synthetic data by necessity, it still generalizes well to real-world images.

  3. We discuss how LPNet can be applied to other fundamental low- and high-level vision tasks in image processing. We also show how LPNet can improve downstream applications such as object recognition.

Ii Lightweight pyramid network for deraining

In Figure 2, we show our proposed LPNet for single image deraining. To summarize at a high level, we first decompose a rainy image into a Laplacian pyramid and build a sub-network for each pyramid level. Then each sub-network is trained with its own loss function according to the specific physical characteristics of the data at that level. The network outputs a Gaussian pyramid of the derained image. The final derained result is the bottom level of the Gaussian pyramid .

Ii-a Motivation

Since rain streaks are blended with object edges and the background scene, it is hard to directly learn the deraining function in the image domain [27]. To simplify the problem, it is natural to train a network on the high-frequency information in images, which primarily contain rain streaks and edges without background interference. Based on this motivation, the authors in [27, 28] use the guided filter [34] to obtain the high-frequency component of an image as the input to a deep network, which is then derained and fused back with the low-resolution information of the same image. However, these two methods fail when very thick rain streaks cannot be extracted by the guided filter. Inspired by this decomposition idea, we instead build a lightweight pyramid of networks to instead simplify the learning processing and reduce the number of necessary parameters as a result.

Ii-B Stage 1: The Laplacian pyramid

We first decompose a rainy image into its Laplacian pyramid, which is a set of images with levels:

(1)

where is the Gaussian pyramid, . The function is computed by downsampling using a Gaussian kernel, with and .

The reasons we choose the classical Laplacian pyramid to decompose the rainy image are fourfold: 1) The background scene can be fully extracted at the top level of while the other levels contain rain streaks and details at different spatial scales. Thus, the rain interference is removed and each sub-network only needs to deal with high-frequency components at a single scale. 2) This decomposition strategy will allow the network to take advantage of the sparsity at each level, which motivates many other deraining methods [8, 11, 27], to simplify the learning problem. However, unlike previous deraining methods that use a single-scale decomposition, LPNet performs a multi-scale decomposition using Laplacian pyramids. 3) As shown in Figure 3, compared with the image domain, deep learning at each pyramid level is more like an identity mapping (e.g., the top row is more similar to the middle row, as evident in the bottom row) which is known to be the situation where residual learning (ResNet) excels [20]. 4) The Laplacian pyramid is a mature algorithm with low computation cost. Most calculations are based on convolutions (Gaussian filtering) which can be easily embedded into existing systems with GPU acceleration.

(a) Rainy image
(b)
(c)
(d)
(e) Clean image
(f)
(g)
(h)
(i) (a) - (e)
(j) (b) - (f)
(k) (c) - (g)
(l) (d) - (h)
Fig. 3: An example of Laplacian pyramid. We show three levels here. The 3rd and 5th level are increased in size for better visualization. The bottom row shows the histogram of the residual to demonstrate the increased sparsity over the image domain.
(a) Image domain
(b) 5th level
(c) 4th level
(d) 3rd level
(e) 2nd level
(f) 1st level
Fig. 4: Statistical histogram distributions of 200 clean and rainy pairs from [29]. To highlight the tail error, (c)-(f) are logarithmic transformed.

Ii-C Stage 2: Sub-network structure

After decomposing into different pyramid levels, we build a set of sub-networks independently for each level to predict a corresponding clean Gaussian pyramid . All the sub-networks have the same network structure with different numbers of kernels. We adopt residual learning [20] for each network structure and recursive blocks [35] to reduce parameters. The sub-network structure can be expressed as follows:

Feature extraction

The first layer extracts features from the th input level,

(2)

where indexes the feature map, is the convolution operation, are weights and are biases.

is an activation function for non-linearity.

Recursive block

To reduce the number of parameters, we build intermediate inference layers in a recursive fashion. The basic idea is to share parameters among recursive blocks. Motivated by our experiments, we adopt three convolutional operations in each recursive block. Calculations in the th recursive block are

(3)
(4)
(5)

where are intermediate features in the recursive block, and are shared parameters among recursive blocks and .

To help propagate information and back-propagate gradients, the output feature map of the th recursive block is calculated by adding :

(6)
Gaussian pyramid reconstruction

To obtain the output level of the pyramid, the reconstruction layer is expressed as:

(7)

After obtaining the output of the Laplacian pyramid , the corresponding Gaussian pyramid of the derained image can be reconstructed by

(8)

where . Since each level of a Gaussian pyramid should equal or lager than 0, we use

, which is actually the rectified linear units (ReLU) operation

[16], to simply correct the outputs. The final derained image is the bottom level of the Gaussian pyramid, i.e., .

In methods [36, 37, 38]

, the authors build similar networks based on the image pyramid, which are the most related to our own work. However, these papers apply similar structures to other tasks such a image generation or super-resolution using different network approaches on the pyramid.

Ii-D Loss Function

Given a training set , where is the number of training data and is the ground truth, the most widely used loss function for training a network is mean squared error (MSE). However, MSE usually generates over-smoothed results due to the squared penalty that works poorly at edges in an image. Thus, for each sub-network we adopt different loss functions and minimize their combination. Following [39], we choose and SSIM [40] as our loss functions. Specifically, as shown in Figure 3, since finer details and rain streaks exist in lower pyramid levels we use SSIM loss to train the corresponding sub-networks for better preserving high-frequency information. On the contrary, larger structures and smooth background areas exist in higher pyramid levels. Thus we use the loss to update the corresponding network parameters there. The overall loss function is

(9)

where is the SSIM loss and is the loss. In this paper, we set the pyramid level based on our experiments. We use SSIM loss for levels and loss for all levels.

Ii-E Removing batch normalization

As one of the most effective way to alleviate the internal co-variate shift, batch normalization (BN)

[41]

is widely adopted before the nonlinearity in each layer in existing deep learning based methods. However, we argue that by introducing image pyramid technology, BN can be removed to improve the flexibility of networks. This is because BN constrains the feature maps to obey a Gaussian distribution. While during our experiments, we found that distributions of lower Laplacian pyramid levels of both clean and rainy images are sparse. To demonstrate this viewpoint, in Figure

4, we show the histogram distributions of each Laplacian pyramid level from 200 clean and light rainy training image pairs from [29]. As can be seen, compared to the image domain in Figure 4(a), distributions of lower pyramid levels, i.e., Figures 4(c) to (f), are more sparse and do not obey Gaussian distribution. This implies that we do not need BN to further constrain the feature maps since the mapping problem already becomes easy to handle. Moreover, removing BN can sufficiently reduce GPU memory usage since the BN layers consume the same amount of memory as the preceding convolutional layers. Based on the above observation and analysis, we remove BN layers from our network to improve flexibility and reduce parameter numbers and computing resource.

Ii-F Parameter settings

We decompose an RGB image into a -level Laplacian pyramid by using a fixed smoothing kernel , which is also used to reconstruct the Gaussian pyramid. In our network architecture, each sub-network has the same structure with a different numbers of kernels. The kernel sizes for are . For , the kernel size is to further increase non-linearity and reduce parameters. The number of recursive blocks is for each sub-network. For the activation function , we use the leaky rectified linear units (LReLUs) [42] with a negative slope of 0.2.

Moreover, as shown in the last row of Figure 3, higher levels are closer to an identity mapping since rain streaks only remain in lower levels. This means for higher levels, fewer parameters are required for learning a good network. Thus, from low to high levels, we set the kernel numbers to and , respectively. Since the top level is a tiny and smoothed version of image and rain streaks remain in high-frequency parts, the function of top level sub-network is more like a simple global contrast adjustment. Thus we set the kernel numbers to kernel for the top level. As shown in Figure 2, by connecting the up-sampled version of the output from the higher level, the direct prediction of all sub-networks is actually the clean Laplacian pyramid. We show the intermediate results predicted by each sub-network in Figure 5. It is clear that rain streaks remain in lower levels while higher levels are almost the same. This demonstrates that our diminishing parameter setting is reasonable. As a result, the total number of trainable parameters is only , far fewer than the hundreds of thousands often encountered in deep learning.

Fig. 5: One example of intermediate results predicted by our LPNet.

Ii-G Training details

We use synthetic rainy images from [29] as our training data. This dataset contains 1800 images with heavy rain and 200 images with light rain. We randomly generate one million

clean/rainy patch pairs. We use TensorFlow

[43] to train LPNet using the Adam solver [44] with a mini-batch size of . We set the learning rate as and finish the training after epochs. The whole network is trained in a end-to-end fashion.

(a) Ground Truth
(b) Rainy images
(c) GMM
(d) SRCNN
(e) DDN
(f) JORDER
(g) Our LPNet
Fig. 6: Two synthetic images from “Rain100H[29] with different rain orientations and magnitudes.
(a) Ground Truth
(b) Rainy images
(c) GMM
(d) SRCNN
(e) DDN
(f) JORDER
(g) Our LPNet
Fig. 7: Two synthetic images from “Rain100L[29] with different rain orientations and magnitudes.
(a) Ground Truth
(b) Rainy images
(c) GMM
(d) SRCNN
(e) DDN
(f) JORDER
(g) Our LPNet
Fig. 8: Two synthetic images from “Rain12[14] with different rain orientations and magnitudes.

Iii Experiments

We compare our LPNet with four state-of-the-art deraining methods: the Gaussian Mixture Model (GMM) of [14], a CNN baseline SRCNN [22], the deep detail network (DDN) of [28] and joint rain detection and removal (JORDER) [29], which is also a deep learning method. For fair comparison, all CNN based methods are retrained on the same training dataset.

Iii-a Synthetic data

Three synthetic datasets are chosen for comparison. Two of them are from [29] and each one contains images. One is synthesized with heavy rain called Rain100H and the other one is with light rain called Rain100L. The third dataset called Rain12 is from [14] which contains synthetic images. All testing results shown are not included in the training data. Following [29], for each CNN method we train two models, one for heavy and for light rain datasets. The model trained on the light rainy dataset is used to test Rain12.

Figures 6 to 8 shows visual results from each dataset. As can be seen, GMM [14] fails to remove rain streaks form heavy rainy images. SRCNN [22] and DDN [28] are able to remove the rain streaks while tend to generate obvious artifacts. Our LPNet has comparable visual results with JORDER and outperforms other methods.

We also adopt PSNR and SSIM [40] to perform quantitative evaluations in Table I. Our method has comparable SSIM values with JORDER while outperforming other methods, in agreement with the visual results. Though our result has a lower PSNR value than JORDER method, the visual quality is comparable. This is because PSNR is calculated based on the mean squared error (MSE), which measures global pixel errors without considering local image characters. Moreover, as shown in Table I our LPNet contains far fewer parameters, potentially making LPNet more suitable for storage, e.g., in mobile devices.

Rainy images GMM [14] SRCNN [22] DDN [28] JORDER [29] Our LPNet
SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR
Rain100H 0.38 13.56 0.43 15.05 0.70 22.84 0.76 21.92 0.83 26.54 0.81 23.73
Rain100L 0.84 26.90 0.86 28.65 0.91 29.39 0.93 32.16 0.97 36.63 0.95 34.26
Rain12 0.86 30.14 0.91 32.02 0.92 31.90 0.94 31.76 0.95 33.92 0.95 35.35
Parameters # - - 20,099 57,369 369,792 7,548
TABLE I: Average SSIM and PSNR values on synthesized images.

Iii-B Real-world data

In this section, we show that the LPNet learned on synthetic training data still performs well on real-world data. Figure 9 shows five visual results on real-world images. The model trained on the dataset with light rain is used for testing on real-world images. As can be seen, LPNet generates consistently promising derained results on images with different kinds of rain streaks.

(a) Rainy images
(b) GMM
(c) SRCNN
(d) DDN
(e) JORDER
(f) Our LPNet
Fig. 9: Three results on real-world rainy images with different rain orientations and magnitudes.

Since no ground truth exists, we construct an independent user study to provide realistic feedback and quantify the subjective evaluation. We collect 50 real-world rainy images from the Internet as a new dataset 111Our code and data will be released soon.. We use the compared five methods to generate de-rained results and randomly order the outputs, as well as the original rainy image, and display them on a screen. We then separately asked 20 participants to rank each image from 1 to 5 subjectively according to quality, with the instructions being that visible rain streaks should decrease the quality and clarity should increase quality (1 represents the worst quality and 5 represents the best quality). We show the average scores in Table II from these 1,000 trials and our LPNet has the best performance. In Figure 10, we show the scatter plot of the rainy inputs vs de-rained user scores. This small-scale experiment gives additional support that our LPnet improves the de-raining on real-world images.

Inputs GMM SRCNN DDN JORDER Our LPNet
1.31 2.12 3.39 3.41 3.41 3.58
TABLE II: Average scores of user study.
Fig. 10: Scatter plot of user study.

Moreover, when dealing with dense rain, LPNet trained on images with heavy rain has a dehazing effect as shown in Figure 11, which can further improve the visual quality. This is because the highest level sub-network (low-pass component) can adjust image contrast. Although dehazing is not the main focus of this paper, we believe that LPNet can be easily modified for joint deraining and dehazing.

(a) Light rainy model
(b) Heavy rainy model
Fig. 11: An example of dehazing effect. Our LPNet trained on the heavy rainy dataset can further improve image contrast.

Iii-C Running time and convergence

To demonstrate the efficiency of LPNet, we show the average running time for a test image in Table III. Three different image sizes are chosen and each one is tested over 100 images. The GMM is implemented on CPUs according to the provided code, while other deep CNN-based methods are tested on both CPU and GPU. All experiments are performed on a server with Intel(R) Xeon(R) CPU E5-2683, 64GB RAM and NVIDIA GTX 1080. The GMM has the slowest running time since complicated inference is required to process each new image. Our method has a comparable and even faster computational time on both CPU and GPU compared with other deep models. This is because LPNet uses relatively shallow networks for each level, so requires fewer convolutions.

GMM [14] SRCNN [22] DDN [28] JORDER [29] Our LPNet
Image size CPU GPU CPU GPU CPU GPU CPU GPU CPU GPU
500 500 1.9910 - 0.25 0.03 1.51 0.16 2.9510 0.18 0.67 0.12
750 750 3.0910 - 0.58 0.09 3.33 0.22 5.9810 0.36 1.49 0.16
1024 1024 6.5210 - 1.07 0.11 5.40 0.32 1.2010 0.82 2.46 0.20
TABLE III: Comparison of running time (seconds).

We also show the average training loss as a function of training epoch in Figure 12. We observe that LPNet converges quickly on training with both light and heavy rainy datasets. Since heavy rain streaks are harder to handle, as shown in the 1st row of Figure 6, the training error of heavy rain streaks has a vibration.

Fig. 12: Convergence on different training datasets.

Iii-D Parameter settings

In this section, we discuss different parameters setting to study their impact on performance.

Iii-D1 Increasing parameter number

We have conducted an experiment on the Rain100H dataset with increased parameters, i.e., 16 feature maps for all convolution layers at each sub-network. The results are shown in Table IV. As can be seen, the SSIM evaluation is better than JORDER and PSNR value is also improved. We believe that the performance can be further improved by using more parameters. However, increasing parameter number requires more storage and computing resources. Figure 13 shows one example by using different parameter numbers. As can be seen, the visual quality is almost the same. Thus, we use our diminishing parameter setting to achieve the balance between effectiveness and efficiency.

JORDER [29] Our LPNet (default) Our LPNet (increasing)
SSIM PSNR SSIM PSNR SSIM PSNR
Rain100H 0.83 26.54 0.81 23.73 0.84 24.09
Parameters # 369,792 7,548 27,055
TABLE IV: SSIM and PSNR value comparison for different parameters.
(a) Rainy image
(b) Default numbers
(c) 16 feature maps
Fig. 13: One example by using different parameter numbers.

Iii-D2 Skip connections

Though Laplacian pyramid images introduce sparsity in each level to simply the mapping problem, it is still essential to add skip connection in each sub-network. We adopt skip connection for two reasons. First, image information may be lost during feed-forward convolutional operations, using skip connection helps to propagate information flow and improve the deraining performance. Second, using skip connection helps to back-propagate gradient, which can accelerate the training procedure, when updating parameters. In Figure 14 we show the training curves on the heavy rainy dataset with and without all skip connections. As can be seen, using skip connection can bring a faster convergence rate and lower training loss.

Fig. 14: Training curves w/ and w/o skip connections.

Iii-D3 Loss function

We use SSIM as a part of loss function (9) for two main reasons. First, SSIM is calculated based on local image characteristics, e.g., local contrast, luminance and details, which are also the characteristics of rain streaks. Thus, using SSIM as the loss function is appropriate to guide the network training. Second, the human visual system is also sensitive to local image characteristics. SSIM has been motivated as generating more visually pleasing results, unlike PSNR. It has therefore become a more prominent measure in the image processing community. We also use loss because does not over-penalize larger errors and thus can preserve structures and edges. On the contrary, the widely used loss (which PSNR is based on) often generates over-smoothed results because it penalizes larger errors and tolerates small errors. Therefore, struggles to preserve underlying structures in the image compared with . Figure 15 shows two results generated by using our combined loss (9) and loss, respectively. As can be seen, using our combined loss (9) can preserve more details.

(a) Rainy image
(b) loss
(c) SSIM + loss
Fig. 15: An deraining example by using different losses. Using SSIM + loss generates a more sharpen result.

Iii-E Extensions

Iii-E1 Generalization to other image processing tasks

Since both Laplacian pyramids and CNNs are fundamental and general image processing technologies, our network design has potential value for other low-level vision tasks. Figure 16 shows the experimental result on image denoising and JPEG artifacts reduction, which shares the property of rainy images in that the desired image is corrupted by high frequency content. This test demonstrates that LPNet can generalize to similar image restoration problems.

(a) Noise (top) and JPEG (bottom)
(b) Our results
Fig. 16: Denoising and reducing JPEG artifact.
(a) Direct detection
(b) Deraining + detection
Fig. 17: An example of joint deraining and object detection on a real-world image. We use the Faster R-CNN [45] to perform object detection with a confidence threshold of 0.8.

Iii-E2 Pre-processing for high-level vision tasks

Due to the lightweight architecture, our LPNet can potentially be efficiently incorporated into other high-level vision systems. For example, we study the problem of object detection in rainy environments. Since rain steaks can blur and block objects, the performance of object detection will degrade in rainy weather. Figure 17 shows a visual result of object detection by combining with the popular Faster R-CNN model [45]. It is obviously that rain streaks can degrade the performance of Faster R-CNN, i.e., by missing detections and producing low recognition confidence. On the other hand, after deraining by LPNet, the detection performance has a notable improvement over the naive Faster-RCNN.

Additionally, due to the lightweight architecture, using LPNet with Faster R-CNN does not significantly increase the complexity. To process a color image with size of , the running time is 3.7 seconds for Faster R-CNN, and 4.0 seconds for LPNet + Faster R-CNN.

Iv Conclusion

In this paper, we have introduced a lightweight deep network that is based on the classical Gaussian-Laplacian pyramid for single image deraining. Our LPNet contains several sub-networks and inputs the Laplacian pyramid to predict the clean Gaussian pyramid. By using the pyramid to simplify the learning problem and adopting recursive blocks to share parameters, LPNet has fewer than K parameters while still achieving good performance. Moreover, due to the generality and lightweight architecture, our LPNet has potential values for other low- and high-level vision tasks.

References

  • [1] K. Garg and S. K. Nayar, “Detection and removal of rain from videos,” in CVPR, 2004.
  • [2] P. C. Barnum, S. Narasimhan, and T. Kanade, “Analysis of rain and snow in frequency space,” Int’l. J. Computer Vision, vol. 86, no. 2, pp. 256–274, 2010.
  • [3] J. Bossu, N. Hautiere, and J. P. Tarel, “Rain or snow detection in image sequences through use of a histogram of orientation of streaks,” Int’l. J. Computer Vision, vol. 93, no. 3, pp. 348–367, 2011.
  • [4] Y. L. Chen and C. T. Hsu, “A generalized low-rank appearance model for spatio-temporally correlated rain streaks,” in ICCV, 2013.
  • [5] J. H. Kim, J. Y. Sim, and C. S. Kim, “Video deraining and desnowing using temporal correlation and low-rank matrix completion,” IEEE Trans. Image Process., vol. 24, no. 9, pp. 2658–2670, 2015.
  • [6] W. Ren, J. Tian, Z. Han, A. Chan, and Y. Tang, “Video desnowing and deraining based on matrix decomposition,” in ICCV, 2017.
  • [7] W. Wei, L. Yi, Q. Xie, Q. Zhao, D. Meng, and Z. Xu, “Should we encode rain streaks in video as deterministic or stochastic?,” in ICCV, 2017.
  • [8] J. H. Kim, C. Lee, J. Y. Sim, and C. S. Kim, “Single-image deraining using an adaptive nonlocal means filter,” in IEEE ICIP, 2013.
  • [9] Y. Chang, L. Yan, and S. Zhong, “Transformed low-rank model for line pattern noise removal,” in ICCV, 2017.
  • [10] L. W. Kang, C. W. Lin, and Y. H. Fu, “Automatic single image-based rain streaks removal via image decomposition,” IEEE Trans. Image Process., vol. 21, no. 4, pp. 1742–1755, 2012.
  • [11] D. A. Huang, L. W. Kang, Y. C. F. Wang, and C. W. Lin, “Self-learning based image decomposition with applications to single image denoising,” IEEE Trans. Multimedia, vol. 16, no. 1, pp. 83–93, 2014.
  • [12] Y. Luo, Y. Xu, and H. Ji, “Removing rain from a single image via discriminative sparse coding,” in ICCV, 2015.
  • [13] Y. Wang, S. Liu, C. Chen, and B. Zeng, “A hierarchical approach for rain or snow removing in a single color image,” IEEE Trans. Image Process., vol. 26, no. 8, pp. 3936–3950, 2017.
  • [14] Y. Li, R. T. Tan, X. Guo, J. Lu, and M. S. Brown, “Rain streak removal using layer priors,” in CVPR, 2016.
  • [15] L. Zhu, C. W. Fu, D. Lischinski, and P. A. Heng, “Joint bi-layer optimization for single-image rain streak removal,” in ICCV, 2017.
  • [16] A. Krizhevsky, I. Sutskever, and G. E. Hinton,

    ImageNet classification with deep convolutional neural networks,”

    in NIPS, 2012.
  • [17] M. P. Eckstein H. Cecotti and B. Giesbrecht, “Single-trial classification of event-related potentials in rapid serial visual presentation tasks using supervised spatial filtering,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 11, pp. 2030–2042, 2014.
  • [18] L. Liu X. Luo T. Chen, L. Lin and X. Li, “DISC: Deep image saliency computing via progressive representation learning,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 6, pp. 1135–1149, 2016.
  • [19] J. Liu Q. Miao M. Gong, J. Zhao and L. Jiao, “Change detection in synthetic aperture radar images based on deep neural networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 1, pp. 125–138, 2016.
  • [20] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016.
  • [21] W. Hou, X. Gao, D. Tao, and X. Li, “Blind image quality assessment via deep learning,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 6, pp. 1275–1286, 2015.
  • [22] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 2, pp. 295–307, 2016.
  • [23] Y. Tai, J. Yang, X. Liu, and C. Xu, “MemNet: A persistent memory network for image restoration,” in ICCV, 2017.
  • [24] X. Hu, G. Feng, S. Duan, and L. Liu, “A memristive multilayer cellular neural network with applications to image processing,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 8, pp. 1889–1901, 2017.
  • [25] R. Dian, S. Li, A. Guo, and L. Fang, “Deep hyperspectral image sharpening,” IEEE Trans. Neural Netw. Learn. Syst., 2018.
  • [26] D. Eigen, D. Krishnan, and R. Fergus, “Restoring an image taken through a window covered with dirt or rain,” in ICCV, 2013.
  • [27] X. Fu, J. Huang, X. Ding, Y. Liao, and J. Paisley, “Clearing the skies: A deep network architecture for single-image rain removal,” IEEE Trans. Image Process., vol. 26, no. 6, pp. 2944–2956, 2017.
  • [28] X. Fu, J. Huang, D. Zeng, Y. Huang, X. Ding, and J. Paisley, “Removing rain from single images via a deep detail network,” in CVPR, 2017.
  • [29] W. Yang, R. T. Tan, J. Feng, J. Liu, Z. Guo, and S. Yan, “Deep joint rain detection and removal from a single image,” in CVPR, 2017.
  • [30] H. Zhang, V. Sindagi, and V. M. Patel, “Image de-raining using a conditional generative adversarial network,” arXiv preprint arXiv:1701.05957, 2017.
  • [31] H. Zhang and V.M. Patel, “Density-aware single image de-raining using a multi-stream dense network,” in CVPR, 2018.
  • [32] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” in ICLR, 2016.
  • [33] P. Burt and E. Adelson, “The laplacian pyramid as a compact image code,” IEEE Trans. Commun., vol. 31, no. 4, pp. 532–540, 1983.
  • [34] K. He, J. Sun, and X. Tang, “Guided image filtering,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 6, pp. 1397–1409, 2013.
  • [35] Y. Tai, J. Yang, and X. Liu, “Image super-resolution via deep recursive residual network,” in CVPR, 2017.
  • [36] E. L. Denton, S. Chintala, and R. Fergus, “Deep generative image models using a laplacian pyramid of adversarial networks,” in NIPS, 2015.
  • [37] W. S. Lai, J. B. Huang, N. Ahuja, and M. H. Yang, “Deep laplacian pyramid networks for fast and accurate super-resolution,” in CVPR, 2017.
  • [38] X. Shen, Y. C. Chen, X. Tao, and J. Jia, “Convolutional neural pyramid for image processing,” arXiv preprint arXiv:1704.02071, 2017.
  • [39] H. Zhao, O. Gallo, I. Frosio, and J. Kautz, “Loss functions for image restoration with neural networks,” IEEE Trans. Comput. Imaging, vol. 3, no. 1, pp. 47–57, 2017.
  • [40] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, 2004.
  • [41] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in ICML, 2015.
  • [42] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in ICML, 2013.
  • [43] M. Abadi, A. Agarwal, P. Barham, et al., “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv: 1603.04467, 2016.
  • [44] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in ICLR, 2014.
  • [45] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in NIPS, 2015.