Single Image Deraining: From Model-Based to Data-Driven and Beyond

12/16/2019 ∙ by Wenhan Yang, et al. ∙ City University of Hong Kong Nanyang Technological University Peking University 19

Rain removal or deraining methods attempt to restore the clean background scenes from images degraded by rain streaks and rain accumulation (or rain veiling effect). The early single-image deraining methods employ optimization methods on a cost function, where various priors are developed to represent the properties of rain and background-scene layers. Since 2017, single-image deraining methods step into a deep-learning era. They are built on deep-learning networks, i.e. convolutional neural networks, recurrent neural networks, generative adversarial networks, etc., and demonstrate impressive performance. Given the current rapid development, this article provides a comprehensive survey of deraining methods over the last decade. The rain appearance models are first summarized, and then followed by the discussion on two categories of deraining approaches: model-based and data-driven approaches. For the former, we organize the literature based on their basic models and priors. For the latter, we discuss several ideas on deep learning, i.e., models, architectures, priors, auxiliary variables, loss functions, and training datasets. This survey presents milestones in cuttingedge single-image deraining methods, reviews a broad selection of previous works in different categories, and provides insights on the historical development route from the model-based to data-driven methods. It also summarizes performance comparisons quantitatively and qualitatively. Beyond discussing existing deraining methods, we also discuss future directions and trends.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 6

page 7

page 8

page 9

page 13

page 15

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Rain introduces a series of visual degradations to captured images and videos, leading to visibility impairment. Rain streaks bring in severe intensity changes, and occlusion. Rain accumulation [1], where distant rain streaks cannot be seen individually together with water particles accumulating into a layer of veil on the background, generates noticeable degradation of contrast and visibility. Considering the temporal properties of rain scenes, the fast-changing rain accumulation can cause flicker. This intensity changes along the temporal dimension is called rain accumulation flow [2]. Some examples of the above mentioned degradation are shown in Fig. 1

. Besides the human vision, many computer vision algorithms also suffer from this degradation, since most of them assume clear weather, without the interference of rain streaks and rain accumulation. Hence, restoring images from rain, called deraining or rain removal, has a significant practical value

, is much desired in practical applications.

An early study of video deraining was started in 2004 by Garg and Nayar [3], who made the first analysis of the photometry, dynamics and visual appearances of rain, and developed an approach based on these properties to detect and remove rain streaks from videos. The single image deraining was pioneered by Kang et al. [4] in 2012. The method extracts the high-frequency layer of a rain image and decomposes it into rain and non-rain components using dictionary learning and sparse coding. Starting from 2017, data-driven deep-learning methods learning features automatically become dominant in the literature.

Fig. 1: Different types of visibility degradation caused by rain. (a) Rain streaks. (b) Rain accumulation.
Fig. 2:

Milestones of single image rain removal methods, including sparse coding, Gaussian mixture model, deep convolutional network, generative adversarial network, semi/un-supervised learning, and benchmark.

In this survey, we focus on

single-image deraining. In this task, given an image degraded by rain streaks and rain accumulation, the aim is to estimate the rain-free background image.

Different from video deraining methods which leverage temporal redundancy and dynamics of rains, the signal image deraining methods only use spatial redundancy (correlations of neighboring pixels) and photometry properties of rain for deraining operation. In this survey, we briefly include some of deep-learning based adherent raindrop removal methods, since adherent raindrops (i.e., water droplets attached to a lens or windscreen) are also part of rain degradation. Although, in some cases, adherent raindrops can be avoided by placing the camera under a shelter.

The milestones of single-image deraining in the past years are presented in Fig. 2. Most of existing methods can be categorized into six classes based on basic models and architectures: sparse coding, Gaussian mixture model (GMM), deep convolutional network (CNN), generative adversarial network (GAN), benchmark, and semi/un-supervised methods. Before 2017, the typical methods are model-based, namely non-deep learning based approaches. Since 2017, single-image deraining steps into a period of deep learning. The image deraining community has since witnessed the efficacy of deep networks during which significant improvements were reported. In 2017-2019, there are at least more than 30 papers on this deep learning approach, significantly more than that number before 2017.

Model-based methods rely more on the photometric analysis of rain streaks. They enforce handcrafted priors on both rain and background layers. The prior models are estimated from paired rain/non-rain images or patches. Then, an optimization on a cost function is employed. The prior models of rain streaks can be extracted from the synthetic paired rain images. For example, Luo et al. [5] train dictionaries for both rain streak and background layers. Besides, these priors can also be learned online from the extracted rain patches. For example, Li et al. [6] trained Gaussian mixture model from human selected rain patches and Zhu et al. [7] estimated rain directions based on the rain-dominated regions.

In recent years, the popularity of model-based methods has been overtaken by data-driven deep-learning methods. These methods employ deep networks to extract hierarchical features, thus they are capable to model more complicated mappings from rain images to clean ones, than hand-crafted features. Some rain-related priors are injected into the networks to learn more effective features, e.g. rain masks [1], background features [8], etc. Some methods utilize recurrent architectures [1] or recursive networks [9]

to remove rains progressively. There are also a series of works focusing on better utilization of the hierarchical rich information of deep features,

e.g. [10, 11].

Deep networks lead to rapid progress deraining performance. However, many deep-learning deraining methods train the networks using fully supervised learning. This might cause some problems as follows. Since, to obtain paired images of rain and rain-free images is intractable. The simplest solution is to utilize synthetic images. Yet, there are domain gaps between synthetic rain images and real rain images. These gaps make the deraining performance not optimum. To overcome the problems, unsupervised/semi-supervised methods [12] and the methods [13] are introduced.

Our paper aims to provide a comprehensive survey on single-image deraining methods that focus on removing rain streaks and rain accumulation. This survey can provide a starting point to quickly understand the main development of the field, the limitations of existing methods, and the future directions. In Section II, the rain appearance model is introduced. We also provide a detailed survey of single-image rain removal methods in Section III, including their synthetic rain models , a brief illustration of deraining challenges, methods architectures, and the related technical details. Particular emphasize is placed on the deep-learning based methods as they offer the most flexibility in the network architecture design and side information utilization. Subsequently, in Section IV, we give detailed illustrations on technical evolutions of network architectures, basic blocks, and summaries of datasets and loss functions.

As a part of this survey, we summarize the quantitative comparisons of a number of single-image rain removal methods published in previous works and make our efforts in qualitative comparisons in Section V. We collect a dataset consisting of real rain images. Particular emphasize is placed on the results of qualitative evaluations. The quality of different results is also measured by human annotators. Finally, the paper is concluded in Section VI with a discussion on the current state of single-image deraining methods, and valuable insights from the evaluation results, as well as limitations of existing methods and possible avenues for future methods.

Fig. 3: The improvement of single-image rain removal, from model-based methods to data-driven approaches.

2 Raindrop Appearance Models

Understanding the appearance of raindrops will help us understand the appearance of rain, since the photometric properties of rain depends on those of raindrops. A raindrop is usually approximated to have a spherical shape [14]. As shown in Fig. 4, considering a point B on the surface of the raindrop with a surface normal . Scene rays , and are directed to the observer via refraction, specular reflection, and internal reflection, respectively. Hence, the radiance of point B is approximated as the sum of the radiance of refracted ray, radiance of specularly reflected ray and radiance of internally reflected rays as follows:

(1)

After that, considering that radiances depend on the environmental radiance in the direction of the reflected or refracted ray, Eq. (1) can be represented as:

(2)

where , and denote the fractions of incident environmental radiance that reach the camera after refraction, reflection and internal reflection, respectively. We refer to these fractions as radiance transfer functions.

With more derivations and approximations, we can reach the composite raindrop model, expressed as:

(3)

where is the incident angle, is the refractive index of the water and is the Fresnel’s reflectivity coefficient for unpolarized light. Based on the statistics from [14], the radiance of the drop is mainly decided by the refraction, and the appearance of the raindrop is mainly based on refraction through the drop.

When we observe the raindrop captured in real scenes, raindrops are at a low resolution and their regions are usually just a few pixels wide. In this case, their complex appearances are generally lost, and only the average brightness of raindrops is relevant. We can draw several conclusions: 1) A drop is more likely to be much brighter than its background (the regions where the raindrops occludes); 2) Without motion blur, the average brightness of a stationary raindrop does not depend heavily on the background signal; 3) A raindrop’s brightness does not affect others’ brightness. Based on these three conclusions, we can approximate the raindrop signal captured in real scenes as: locally stationary spherical regions with a constant intensity value.

Fig. 4: A raindrop’s appearance is a complex mapping of the environmental radiance, which is determined by reflection, refraction and internal reflection.

For a moving raindrop, its appearance changes significantly. The raindrop easily becomes a rain streak. The appearance of a rain streak depends on the brightness of the raindrop, the background scene radiances, and the camera’s exposure time. The change of pixel’s intensity value caused by a moving raindrop can be approximated as [14]:

(4)

with , and , where is the time when a drop remains within a pixel, and is the exposure time. is the time-averaged irradiance caused by the drop. Based on Eq. (4), we can reach two conclusions: 1) A raindrop causes a positive change in intensity and stays at a pixel for a time far less than the integration time of a typical video camera; 2) the change in intensities observed along a rain streak is linearly correlated to the background intensities . Based on the derived numerical bounds [14], we have: and . In most real cases, dominates the appearance of . Therefore, in successive rain synthesis models, rain streaks are assumed to be linearly superimposed on the background image:

(5)
Method Degradation Factors Main Features Publication
Additive Composite Model (ACM) Streak Simple and effective Li et al. 2016 [6]
Screen Blend Model (SBM) Streak Streaks and backgrounds are combined nonlinearly Luo et al. 2015 [5]
Heavy Rain Model (HRM) Streak, Accumulation Overlapping streaks generating accumulation Yang et al. 2017 [1]
Rain Model with Occlusion (ROM) Streak, Occlusion Considering rain occlusions Liu et al. 2018 [15]
Comprehensive Rain Removal (CRM)
Streak, Occlusion,
Accumulation, Flow
Considering comprehensive visual degradation Yang et al. 2019 [2]
Depth-Aware Rain Model (DARM) Streak, Accumulation Streaks and accumlation modeling correlated with depth Hu et al. 2019 [16]
TABLE I: Summary of rain synthetic models in the literature.

3 Literature Survey

In this section, we first review the rain synthesis models proposed in some existing methods. Unlike in the previous section (Sec. 2), the models we discuss here are loosely based on physics and, to our knowledge, their correctness has not been verified both theoretically or experimentally. Despite this, the methods that use these models show the effectiveness of the models on real rain images. Having discussed various rain models, we briefly explain the categorization of existing methods we make in this paper. Subsequently, we do a comprehensive survey on deraining methods.

3.1 Synthetic Rain Models

Additive Composite Model The most simple and popular rain model used in existing studies is the additive composite model [6, 4], which follows Eq. (5) and is expressed as:

(6)

where denotes the background layer, and is the rain streak layer. is the image degraded by rain streaks. Here, the model assumes that appearance of rain streaks is simply additive to the background, and there is no rain accumulation (or rain veiling effect) in the rain degraded image.

Screen Blend Model Luo et al. [5] propose a non-linear composite model – screen blend model. The two layers influence the appearance of each other:

(7)

where denotes the operation of point-wise multiplication.

Luo et al. [5] claim that the screen blend model can model some visual properties of real rain images, such as the effect of internal reflections, and thus generate more visually authentic rain images. The combination of rain and background layers are signal-dependent. When the brightness of a background is low, the rain layer plays a dominant role in the appearance of the rain image. When a region has bright intensities, the rain image is dominated by the background layer.

Heavy Rain Model Yang et al. [1] propose a rain model that include both rain streaks and rain accumulation. This is the first time in the image deraining literature, a model is proposed to include the two rain phenomena. Rain accumulation or rain veiling effect is a result of water particles in the atmosphere and distant rain-streaks that cannot be seen individually. The visual effect of rain accumulation is similar to mist or fog, which leads to low contrast. To model the rain accumulation, relying on Koschmieder model to approximately the visual appearances in turbid media, and overlapping rain streaks that have different directions and shapes, a novel rain model is introduced:

(8)

where denotes the rain streak layer where all rain streaks are in the same direction. indexes the rain streak layer and is the maximum number. A is the global atmospheric light, and is the atmospheric transmission.

Rain Model with Occlusion Liu et al. [15] extend the heavy rain model to an occlusion-aware hybrid rain model for modeling rain in video. The model separates rain streaks into two types : transparent rain streaks that are added to the background layers, and solid rain streaks that totally occlude the background layers. The locations of these solid rain streaks are indicated by a map, called the reliance map. The formulation of this rain model is expressed as:

(9)

where R is the rain reliance map and is defined as:

(10)

where is defined as the rain occluded region.

Comprehensive Rain Model Yang et al. [2] combine all above mentioned degradation factors into a comprehensive rain model for modeling rain appearance in video. The degradation factors include rain streaks, rain accumulation, rain occlusion, and rain flow. The model is formulated as follows,

(11)

where is the rain flow modeling the regionally inconsistent abrupt accumulation changes.

Depth-Aware Rain Model Hu et al. [16] further connect to the scene depth , to create a depth-aware rain model:

(12)

where and are connected with the scene depth as follows:

(13)

where

is an intensity image of uniformly-distributed rain streaks in the image space, and

is the rain streak intensity map relying on the depth. denotes the depth and controls the rain streak intensity. determines the thickness of fog. A larger denotes a thicker fog, and vice versa.

Discussions Following these different rain models, various rain degradation can be synthesized. In general, heavy rain models [1, 13] and depth-aware rain models [16] (these two in fact being equivalent) cover the most comprehensive rain degradation for single rain image synthesis, and we recommend to use them to produce synthesized paired rain/non-rain images for model training and evaluation. From our evaluation in Sec. 5.2

, one can observe that, JORDER-E and HeavyRainRestorer achieve good visual results. Nevertheless, as we mentioned in the beginning of this section that all these models are heuristic; meaning, they might not entirely correct physically, but to some extent they can be effective in dealing with image deraining problem.

3.2 Deraining Challenges

The goal of single image deraining is to recover the clean and rain- free background scene from a rain degraded image. We identify a few main challenges to achieve this goal:

  • Difficulties in appearance modeling of rain images In the real world, rain appearances are complicated. Spatially, rain can appear in many different ways. Rain streaks’ appearance can vary in terms of sizes, shapes, scales, densities, directions, etc. Rain accumulation is actually not visually as refined as fog. Depending on the conditions of water particles in the atmosphere, rain accumulation or rain veiling effect can be dense even in light rain. All these cause difficulties in modeling the spatial appearance of rain, which in turn, makes synthesizing physically-correct rain images is a complex task.

  • Ill-posedness of deraining problem Even with a simple rain model that only considers rain-streaks, to estimate the background scene from a degraded image is an ill-posed problem. The reason is that we only have the pixel intensity values produced by lights carrying fused information of rain and background scenes. To make the matter worse, in some cases the background information can be totally occluded by rain streaks or dense rain accumulation or both.

  • Difficulties in finding proper priors As rain and background information might overlap in the feature space, it is non-trivial to separate them. Background textures can be falsely deemed as rain, resulting in incorrect deraining. Hence, strong priors for background textures and rain are necessary. However, finding these priors is difficult, since background textures are diverse, and some have similarity to the appearance of rain-streaks or rain accumulation.

  • Real paired ground-truths Most of deep-learning methods rely on paired rain and clean background images to train their networks. However, to obtain real rain images and their exact pairs of clean background images is intractable. Even for a static background, lighting conditions always change. This difficulty does not only impact on deep-learning methods, but also for evaluating the effectiveness of any method. Currently, for qualitative evaluation, all methods rely on human subjective judgement on whether the restored images are good; and for quantitative evaluation, all current methods rely on synthetic images. Unfortunately, up to now, there are significant gaps between synthetic and real images.

In the following, we will discuss how existing deraining methods deal with these challenges.

3.3 Single-image Deraining Methods

We categorize single-image deraining methods into two basic approaches: model-based (non-deep-learning) and data-driven (deep-learning) approaches. The model-based methods can be further split into two categories: Sparse coding and GMMs (Gaussian Mixture Models) , while the deep learning based approaches can be categorized into deep CNN, generative adversarial networks (GANs), and semi/unsupervised methods.

3.3.1 Model-based Methods

Existing model-based methods employ optimization frameworks for deraining, as shown in the top panel of Fig. 3. These methods deal only with rain streak only and ignore the presence of rain accumulation. A general optimization framework can be expressed as:

(14)

where denotes the priors on background layers, represents the priors on rain streak layers, and is the joint prior to describe the intrinsic relationship between rain streaks and background layers. Different prior terms are designed to better describe and separate the rain streak from background layers.

Sparse Coding Methods Sparse coding (SC) [56]

represents the input vectors as a sparse linear combination of basis vectors. The collection of these basis vectors is called dictionary, which is used to reconstruct the certain type of signals,

e.g. rain streaks and background signals in the deraining problem. Lin et al. [4] make the first attempt on single deraining via image decomposition using a morphological component analysis. The initially extracted high-frequency components of rain images are further decomposed into rain and non-rain components by dictionary learning and sparse coding. This pioneer work successfully removes sparse light rain streaks. However, it heavily relies on the bilateral filter pre-processing, and thus generates blurred background details.

In a successive work, Luo et al. [5] enforce the sparsity of rain, and introduce a mutual exclusivity property into a discriminative sparse coding (DSC) to facilitate accurately separating of the rain/non-rain layers from their non-linear composite. Benefiting from the mutual exclusivity property, the DSC preserves clean texture details but also shows residual rain streaks, particularly for large and dense rain streaks.

To further improve the modeling capacity for deraining, Zhu et al. [7] construct an iterative layer separation process to remove rain streaks from the background layer and to remove texture details from the rain streak layer using layer-specified priors. The method obtains comparable performance on some synthetic datasets to deep learning-based methods in the same time period, i.e. JORDER [1] and DDN [18]. While this method is good at removing light rain streaks in the same direction within an image, it tends to fail in handling heavy rain cases, where rain streaks might be in different directions.

To model the group directionality and the sparsity of rain streaks, Deng et al. [17] formulate a global sparse model, directional group sparse model (DGSM), including three sparse terms on the intrinsic directional and structural knowledge of rain streaks. The unidirectional total variation is introduced, and the proposed method does not require training with much less computation. The method can effectively remove blurred rain streaks but fail to clean sharp rain streaks.

Gaussian Mixture Model Li et al. [6] apply Gaussian mixture models (GMMs) to model layer priors of rain and background layers, which accommodate multiple scales and orientations of rain streaks. A selected rain patch from the input image that has no background interference is introduced to train the GMMs of the rain layer. The total variation is utilized to remove small sparkle rains. The method is capable to effectively remove rain streaks of small and moderate scales, but fail to handle large and sharp rain streaks.

3.3.2 Deep Learning Based Methods

Deep CNNs The era of deep-learning-based deraining starts in year 2017. Yang et al. [1] construct a joint rain detection and removal network (JORDER) during the same period which pays attention to heavy rain removal, removing overlapping rain streaks and rain streak accumulation. The network can detect the rain locations by predicting and modulating the binary rain masks and take a recurrent framework to remove rain streaks and clear up accumulation progressively. The method achieves impressive results in heavy rain cases. However, it might falsely remove vertical textures and present under-exposed illumination.

In the same year, Fu et al. [18, 19] made the first attempt to remove rain streaks via a deep detail network (DetailNet). The network only takes as input the high frequency detail and predicts the residue between the rain and clean images. It is demonstrated to be beneficial to remove the background interference in the network input, which makes the training phase more easy and stable. However, the method still cannot handle large and sharp rain streaks.

Following Yang et al. [1] and Fu et al. [18, 19], many methods based on deep CNNs [20, 10, 8, 21, 22, 23] are proposed, by employing more advanced network architectures or injecting new rain-related priors. These methods progressively achieve better results in both quantitative and qualitative metrics. However, due to the limitation of their learning paradigm, namely training on synthetic rain and non-rain ground truth images, they might fail when dealing with some conditions of real rain streaks that have never been seen during training.

Fig. 5: Summary of GAN-based rain removal methods.
Fig. 6:

Summary of semi/unsupervised learning-based rain removal methods.

Fig. 7: Summary of the side information and priors for single-image rain removal.
Fig. 8: Summary of the network improvement for single-image deraining.
Fig. 9: The basic block improvement route for single-image rain removal.

Generative Adversarial Networks To capture some visual properties that cannot be modeled by traditional metrics and synthesized data, the adversarial learning is introduced to reduce the domain gaps of the generated results to that of real clean images; to capture some visual properties that cannot be modeled by traditional metrics and synthesized data. The typical network architecture consists of two parts: generator and discriminator, where the discriminator attempts to assess whether a generated result is real or fake, which provides an additional feedback to regularize the generator to produce more visually pleasing results. Zhang et al. [24] directly apply the conditional generative adversarial network (CGAN) for the single image rain removal task, as shown in Fig. 5 (a). CGAN is capable of capturing the visual properties beyond the signal fidelity, and presents results with better illumination, color and contrast distribution. However, CGAN sometimes might generate visual artifacts when the background of the testing rain image is different from those in the training set.

Li et al. [13] propose a single-image deraining method that combines the physical-driven network and adversarial learning refinement network, as shown in Fig. 5 (c). The first stage learns from the synthesized data and estimates physics-related factors, i.e. rain streaks, the transmission, and the atmospheric light. At the second refinement stage, a depth-guided GAN is proposed to compensate for the lost details and to suppress the introduced artifacts at the first stage. Learning from real rain data, some visual properties of the results by these methods are significantly improved, e.g. removing rain accumulation more thoroughly and achieving a more balanced luminance distribution. However, some fine-grained details in real rain images, e.g. the diversified appearances of rain streaks cannot be well captured by GAN-based methods.

Semi/Unsupervised Learning Methods Recently, semi-supervised and unsupervised learning methods make an attempt to improve the generality, scalability and practicality by learning directly from real rain data. Wei et al[25]

propose a semi-supervised learning

method to make use of the priors in both synthesized paired data and unpaired real data, as shown in Fig. 6 (a). In the proposed method, the residual is formulated as a specific parametrized rain streak distribution between an input rain image and its expected network output. The models trained on synthesized paired rain images are adapted to handle diversified rain in real scenarios with the guidance of the rain streak distribution model. Hence, the model is generalizable to handle real rain images. By generalizing on real images, the model is capable of handling more real cases. However, the generalization process also leads to the distillation of the knowledge extracted from paired training set and the model becomes less effective in handling heavy and dense rain.

In [12], an unsupervised deraining generative adversarial network (UD-GAN) is proposed by introducing self-supervised constraints, the intrinsic priors extracted from unpaired rainy and clean images, as shown in Fig. 6 (b). Two collaboratively optimized modules are designed : one module is utilized to detect the difference between real rainy images (real background images) and generated rainy images (generated background images) , while the other is introduced to adjust the luminance of the generated results, making the results more visually pleasing. The method is capable to remove real rain from rain images, yet inevitably generating detail contamination issues when the rain streaks are dense.

Benchmark Li et al. [26] provided extensive study and evaluation of existing single image deraining algorithms with a newly proposed large-scale dataset including both synthetic and real-world rainy images with various rain types, i.e. rain streak, raindrops, and mist. The benchmark also contains a wide range of evaluation criteria including the results of different methods in full and no-reference objective, subjective, and task-specific metrics.

3.3.3 Adherent Raindrop Removal

Raindrops adhered to the camera lens can severely degrade visibility of a background scene in an image. The aim of adherent raindrop removal is to transform a raindrop degraded image into a clean one. Deraining is different from adherent raindrop removal, since rain images do not always suffer from adherent raindrop degradation; or vice-versa: adherent raindrop images do not always suffer from the degradation of rain streaks or rain accumulation.

Nevertheless , we discuss it briefly for the sake of completeness of the survey. In [57], Yamashita et al. develop a stereo system to detect and remove raindrops. After that, a successive work [58] is built based on the image sequence instead of stereo video. Later on, in [59], You et al. propose a motion based method to detect raindrops, and apply video completion to remove the detected regions. Eigen et al. [27] make the first attempt to tackle the problem of single-image raindrop removal. A three-layer CNN is trained with pairs of raindrop degraded images and the corresponding clean ones. It can handle relatively sparse and small raindrops as well as dirt, however, it fails to produce clean results for large and dense raindrops.

Recently, Qian et al. [28] develop an attentive GAN (AttGAN) [28] to inject visual attention into both the generative and discriminative networks, as shown in Fig. 5 (b). The visual attention will not only guide the discriminative network to focus more on reasonablity and local consistency of the restored raindrop regions, but also make the generative model pay more attention to restoring the raindrop regions based on the related information and the surrounding structures.

4 Technical Improvement Review

In this section, we summarize existing deep-learning methods from the perspective of the network designs, basic blocks, datasets, and loss functions. These factors significantly influence the network’s learning capacity and thus determine deraining performance of the network.

4.1 Network Architecture

Starting from DetailNet [19] (Fig. 9 (a)), a cascaded CNN with the high-frequency component as its input, successive methods aim to construct more effective network architectures to exploit rain-related properties, inject auxiliary variables, and borrow from typical signal structures to facilitate image deraining. In this part, we take a look on these improvements systematically.

Rain-Related Priors Rain-related priors are the physical factors that are potentially correlated to rain image generation, i.e. rain masks and rain densities, etc. By injecting these priors, a network is expected to better perceive contextual information of rain, and better separate rain streaks from the background layers.

In the study [1], a joint rain detection and removal network (JORDER) as shown in Fig. 7 (b) is constructed to detect rain locations, estimate rain densities and predict rain sequentially, which boosts the capacity of the network to process rain and non-rain regions differently. Li et al[20] focus on the scale diversity of rain streaks. A scale-aware network as shown in Fig. 7 (c) consisting of parallel sub-networks is built to make it aware of different scales of rain streaks, producing better rain removal performance in real applications. Zhang et al. [22] propose a density-aware rain removal method (DID-MDN) as shown in Fig. 7 (d) to automatically detect the rain-density as the guidance information for the successive rain removal.

Auxiliary Variables There is some information that is not directly correlated to the rain image model, but captures some visual properties of the background scenes. This information can be useful for image deraining. Hu et al. [16] analyze the complex visual effects in real rain and formulate a rain imaging model related to the scene depth for rain streaks and fog synthesis. Then, an end-to-end deep neural network as shown in Fig. 7 (e) is developed to extract depth-attentional features and to regress a residual map to predict the clean image. In [23], a dual convolutional neural network as shown in Fig. 7 (f) is presented where two branches learn the estimation of two parts of the target signal respectively: structures and details. In Fig. 7 (g), the model [29] learns the rain streak guided by a per-pixel uncertainty map.

Signal Structures Some signal structure ideas from the classical signal processing theory, e.g. multi-scale structure, Laplacian pyramid, wavelet transform , etc. can be useful to make deraining more effective. A scale-free network [30] as shown in Fig. 8 (a) pays attention to the scale variety of rain streaks in real scenes, and constructs a scale-free deraining architecture by unrolling a wavelet transform into a recurrent neural network that can handle various kinds of rain at different scales. Guided by the hierarchical representation of the wavelet transform, a recurrent network consisting of two stages is built: 1) rain removal on the low-frequency component; 2) recurrent detail recovery on high-frequency components gudied by the recovered low-frequency component.

PyramidDerain [11] pursues a light-weighted pyramid of networks as shown in Fig. 8 (b) to remove rain from a single image. The decomposed Gaussian Laplacian image pyramid is combined with the deep neural network. The learning paradigm at each pyramid layer can be largely simplified, and the obtained network becomes shallow and includes less parameters. The model is quite light-weighed and achieves the comparable state-of-the-art performance with only 8K parameters.

Li et al. [31] propose a recurrent network to remove rain streaks progressively as shown in Fig. 8 (c). The intermediate result from the last recurrence is taken as the input of the next recurrence and the features are also forwarded and fused by RNN units, e.g. GRU and LSTM, across recurrences. Ren et al. [9] utilize recursive computation to realize more effective processing as shown in Fig. 8 (d). The PReNet performs stage-wise operations that process the input and intermediate results to generate the clean output images progressively.

Method Category Rain Model
Variables or Priors
           Key Idea Publication
Image Decomposition Sparse Representation ACM
Low/high-frequency;
Rain/No-rain dictionaries.
Rain images are decomposed into the low/high-frequency (HF)
parts. Then, the “nonrain component” is removed from HF part
by dictionary learning and sparse coding. details.
Lin et al. 2012 [4]
DSC SBM
Rain/No-rain codes;
Rain/No-rain dictionaries.
Patches of two layers are sparsely approximated by very high
discriminative codes over a learned dictionary with strong
mutual exclusivity property.
Luo et al. 2015 [5]
Bi-Layer Optimization ACM
Centralized sparse prior;
Rain direction prior;
Rain layer prior.
A joint optimization process is used that alternates between
removing rain-streak details from the estimated background and
removing non-streak details from the estimated rain streak layer.
Zhu et al. 2017 [7]
Hierarchical Deraining ACM
High/low-frequency;

Sensitivity of variance

across color channels;
Principal direction of
an image patch.
A three-layer hierarchical scheme is designed. The first layer

uses sparse coding to classify the high-frequency part into

rain/snow and non-rain/snow components. The second layer
relies on guided filtering. The third layer enhances the visual
quality with the sensitivity of variance across color channels.
Wang et al. 2017 [32]
DGSM ACM
Unidirectional TV;
Rotation Angle.
A global sparse model is formulated to consider the intrinsic
directional, structural knowledge of rain streaks, and the
property of image back-ground information.
Deng et al. 2018 [17]
LP GMM ACM
Gaussian mixture model;
Total variation.
Two patch-based priors for the background and rain layers
are built based on Gaussian mixture models and can
accommodate multiple orientations and scales of rain streaks.
Li et al. 2016 [6]
CNN Deep CNN RIM Clean image.
A three-layer CNN is constructed to learn the mapping between
the corrupted image patches to clean ones.
Eigen et al. 2013 [27]
JORDER
JORDER-E
HRM
Binary rain mask;
Rain intensity;
Residual.
A multi-task architecture is constructed to learn the binary rain
streak map, the appearance of rain streaks, and the clean
background. A recurrent network is also built to remove rain
streaks and clears up the rain accumulation progressively.
Yang et al. 2017 [1]
Yang et al. 2019 [33]
DetailNet ACM Residual.
A deep detail network taking as input the high frequency detail
and predict the residual between the input image and the ground
truth image.
Fu et al. 2017 [18]
Fu et al. 2017 [19]
Scale-Aware HRM
Rain streak;
Transmission map.
Parallel sub-networks are built to predict different scales of rain
streaks and the transmission maps.
Li et al. 2017 [20]
NLEDN ACM Residual.
An auto-encoder network is built with non-locally enhanced
dense blocks, where a non-local feature map weighting follows
four densely connected convolution layers.
Li et al. 2018 [10]
Residual-Guide ACM Residual.
The residuals generated from shallower blocks are used to guide
deeper blocks. The negative residual is predicted coarse to fine
and the outputs of different blocks are fused finally.
Fan et al. 2018 [8]
RESCAN ACM
Residual;
Intermediate results.
A recurrent network is constructed and the rain removal result
of the previous stage is fed into the next stage. The information
is flowed across stages at the feature level.
Li et al. 2018 [21]
DID-MDN ACM
Rain density;
Residual.
A multi-path densely connected network is constructed to
automatically detect the rain-density to guide the rain removal.
Zhang et al. 2018 [22]
DualCNN ACM
Structure layer;
Detail layer.
A dual CNN estimates the two parts of the target signal:
structures and details for a series of low-level vision tasks.
Pan et al. 2018 [23]
DAF-Net DARM
Depth;
Attention;
Residual.
The work creates a RainCityscapes dataset related to the scene
depth. A deep network is developed based on a depth-guided
attention mechanism to predict the residual map.
Hu et al. 2019 [16]
Spatial Attention +
Dataset
ACM
Spatial attention;
Residual.
A high-quality dataset including 29.5K paired real rain and
pseudo real non-rain images is constructed. Then, a novel
spatial attentive network is built to effectively learn
discriminative features for rain removal from local to global.
Wang et al. 2019 [34]
PReNet ACM
Residual;
Intermediate results.
The network utilizes recursive computation at two levels:
1) Progressive ResNet is built by repeatedly unfolding a
shallow ResNet; 2) The ResNet performs stage-wise operations
processing the input and intermediate results progressively.
Ren et al. 2019 [9]
Scale-Free HRM
Residual;
Wavelet decomposition
results.
A recurrent network performs two-stage deraining: 1) rain
removal on the low-frequency component; 2) recurrent detail
recovery on high-frequency components guided by the
recovered low-frequency component.
Yang et al. 2019 [30]
PyramidDerain ACM
Gaussian-Laplacian
pyramid.
The proposed network combines Gaussian Laplacian image
pyramid decomposition and the deep neural network. Recursive
and residual network structures are employed to aggregate the
features at different layers.
Fu et al. 2019 [11]
UMRL ACM
Confident map.
The network is guided based on the confidence measure about
the estimate. The cycle spinning is introduced to remove
artifacts and improve the deraining performance.
Yasarla et al. 2019 [29]
AttGAN GAN RIM
Attention map;
Clean image.
Visual attention is injected into both the generative and
discriminative networks for learning to attend raindrop regions
and percept their surroundings.
Qian et al. 2018 [28]
CGAN ACM Clean image.
The work directly applies a multi-scale conditional generative
adversarial network to address single image de-raining task.
Zhang et al. 2019 [24]
HeavyRainRestorer HRM
Transmission map;
Atmospheric light;
Rain streak;
Clean image.
A two-stage network is built: a physics-driven model followed
by a depth-guided generative adversarial refinement.
Li et al. 2019 [26]
Semi-supervised CNN Semi/Un- Supervised ACM
Rain streak;
Residual.
A semi-supervised learning method formulates the residual as a
specific parametrized rain streak distribution between an input
rainy image and its expected network output.
Wei et al. 2019 [25]
UD-GAN ACM
Rain streak;
Residual.
An Unsupervised Deraining Generative Adversarial Network is
built to introduce self-supervised constraints, the intrinsic priors
extracted from unpaired rainy and clean images.
Jin et al. 2019 [12]
Benchmark Benchmark
ACM +
RIM
It provides extensive study and evaluation of existing single
image deraining algorithms with a new proposed large-scale
dataset including both synthetic and real-world rainy images of
various rain types.
Li et al. 2019 [26]
TABLE II: An overview of single-image rain removal methods.

4.2 Basic Blocks

With the development of deep-learning based methods, the trend of newly proposed methods is to have more complex basic blocks with more powerful modeling capacities, which are further stacked into a more complex deraining network.

Existing Advanced Networks DetailNet [18, 19] (Fig. 9 (a) and (b)) introduces residual network and cascaded CNN for rain removal. AttGAN [28] (Fig. 9 (d)) utilizes U-Net as the baseline of the generator, which is effective to fuse the information from different scales to obtain global information while maintaining local details. RESCAN [31] (Fig. 9 (h)) introduces channel-wise attention to adjust the relative weighting among channels for better separating rain streaks from background layers. Multi-stream dense network [22] (Fig. 9 (g)) combines dense block and convolutional networks. Residual dense network [30] (Fig. 9 (i)) integrates dense blocks into residual networks. In [8] (Fig. 9 (k) and (e)), basic blocks are connected in the recursive way, where the input feature is also forwarded to the intermediate features of the network. In [9] (Fig. 9 (f)), residual blocks are also aggregated in the recursive way and the LSTMs are selected to connect different recurrences.

Multi-Path Architectures One of the common architectures is the multi-path network. As shown in Fig. 9 (c), (g) and (l), the networks have different paths owning different properties, i. e. kernel sizes, dilation factors, and filter directions, to gather different kinds of information to facilitate rain removal. In Fig. 9 (c) and (e), different paths have different receptive fields, and thus can obtain both global information and maintain local structural details. In Fig. 9 (l), the spatial redundancies are aggregated from different directions to form visual attention.

Progressive Architectures In [8, 31, 9, 11], recursive blocks are nested aggregated in the recursive way as shown in Fig. 9 (f), (k), (e) and (i). The networks perform stage-wise operations that process the input and intermediate results to generate the output clean images progressively. Inter-stage recursive computation of different blocks is sometimes adopted to propagate information across blocks.

Signal Structures Non-locally enhanced encoder-decoder network [10] as shown in Fig. 9 (j) incorporates nonlocal operations to the design of an end-to-end network for deraining. The non-local operation calculates the feature at a spatial position as a weighted sum of the features at a specific range of positions. In [34], a spatial attentive module as shown in Fig. 9

(l) employs recurrent neural networks with ReLU and identity matrix initialization, to accumulate global contextual information in four directions. It utilizes another branch to capture the spatial contexts to selectively highlight the transformed rain features.

Side information Methods
Rain Mask JORDER [1]
Rain Density JORDER [1], DID-MDN [22]
Depth DAF-Net [16], HeavyRainRestorer [13]
Attention DAF-Net [16], SPA-Net [34], AttGAN [28]
Intermediate Results JORDER-E [33], PReNet [9],
Bands Results Scale-Free Rain Removal [30], PyramidNet [11]
TABLE III: Summary of side information used in previous works.
Loss Function Methods
MSE (L2)
JORDER [1, 33] DetailNet [18, 19], DID-MDN [22],
DAF-Net [16], CGAN [24], DualCNN [23],
PReNet [9], RESCAN [31], Scale-Free [35],
Residual guided net [8], Semi-Supervised [12],
Scale-Aware [35]
MAE (L1)
PyramidDerain [11], NLEDN [10], SPA-Net [34],
UD-GAN [12]
SSIM
PyramidDerain [11], PReNet [9], SPA-Net [34],
Residual guided net [8]
Adversarial AttGAN [28], CGAN [24]
Perceptual AttGAN [28], DID-MDN [22], CGAN [24]
Multi-Scale AttGAN [28]
Variable
JORDER [1, 33], Semi-Supervised [12],
DID-MDN [22], DAF-Net [16], AttGAN [28],
Scale-Aware [35], SPA-Net [34], UD-GAN [12]
TABLE IV: Summary of loss functions used in existing works.
Dataset
Number
(#train/#test)
            Highlight Rain Model Publication
Rain12 12
Only for testing.
ACM Li et al. [6]
Rain100L 1,800/100
Synthesized with only one type of rain streaks (light rain case).
ACM Yang et al. [1]
Rain100H 1,800/100
Synthesized with five types of rain streaks (heavy rain case).
ACM Yang et al. [1]
Rain800 700/100
Clean images are selected from BSD500 and UCID [36].
ACM Zhang et al. [24]
Rain14000 9,100/4,900
1,000 clean image used to synthesize 14,000 rain images.
ACM Fu et al. [18]
Rain12000 12,000/4000
The data has three kinds of densities.
ACM Zhang et al. [22]
RealDataset 28,500/1,000
The ground truth data is synthesized based on temporal redundancy
and visual properties.
ACM Wang et al. [34]
NYU-Rain 13,500/2,700
Background images and the depth information are selected from
NYU-Depth V2 [37].
HRM Li et al. [13]
Outdoor-Rain 9,000/1,500
The background images are collected from [28], and the depth
information is estimated by [38].
HRM Li et al. [13]
RainCityscapes 9,432/1,188
The rain-free images are selected from the training and validation
sets of Cityscape [39]. Rain patches are selected from [6].
DARM Hu et al. [16]
MPID 1,561/419
The MPID dataset covers a much larger diversity of rain models,
including both synthetic and real-world images, serving both human
and machine visions.
ACM + HRM Li et al. [26]
TABLE V: Summary of datasets used in previous works.

4.3 Datasets

There are a few benchmarking datasets for image deraining, as introduced in Table V:

  • Rain12 [6]. It includes 12 synthesized rain images with only one type of rain streaks.

  • Rain100L and Rain100H [1]. These two datasets include the synthesized rain images with only one type and five types of rain streaks, respectively.

  • Rain800 [24]. The training set in this database consists of a total of 700 images, where 500 images are randomly chosen from the first 800 images in the UCID dataset [36] and 200 images are randomly chosen from the BSD500’s training set [40]. The testing set consists of a total of 100 images, where 50 images are randomly chosen from the last 500 images in the UCID dataset and 50 images are randomly chosen from the testing set of the BSD-500 dataset.

  • Rain14000 [18]. It includes 1000 clean images from UCID dataset [36], BSD dataset [40] and Google image search being used to synthesize rainy images.

  • Rain12000 [22]. It consists of 12,000 images in the training set, where each image is assigned a label based on its corresponding rain-density level (i.e. light, medium and heavy). There are 4,000 images per rain-density level in the dataset. The synthesized testing set includes 1,200 images.

  • RealDataset [34]. It includes 29,500 rain/rain-free image pairs that cover a wide range of natural rain scenes, where the rain-free images are synthesized based on temporal redundancy and visual properties.

  • NYU-Rain [13]. It is a new synthetic rain dataset taking images from NYU-Depth V2 [37] as background and the provided depth information to generate rain streak and accumulation layers. The dataset also considers the effect of image blurring presented in the rain image. It contains 16,200 image samples, where 13,500 images are used as the training set.

  • Outdoor-Rain [13]. The background images are collected from [28], and the depth information used to synthesize accumulation is produced by [38]. The dataset includes 9000 training images and 1,500 validation images.

  • MPID [26]. The training set includes 2400 synthetic rain streak image pairs, 861 synthetic raindrop image pairs, and 700 synthetic rain and mist image pairs. The testing set includes 200 synthetic rain streak image pairs, 149 synthetic raindrop image pairs, and 70 synthetic rain and mist image pairs, as well as 50 real rain streak images, 58 real raindrop images, and 30 real rain and mist images. The testing set also includes 2,496 and 2,048 real captured images in the driving and surveillance video conditions with human annotated object bounding boxes.

  • RainCityscapes [16]. 262 training images and 33 testing images from the training and validation sets of Cityscape [39] are selected as the clean background images. Rain patches are selected from [6]. There are in total 9,432 training images and 1,188 testing images.

4.4 Loss Functions

In existing methods, several loss functions are proposed to regularize the training of the deraining network.

Signal Fidelity-Driven Metrics Most studies need to use signal fidelity-driven matrices as the loss functions, such as Mean Squared Error (MSE) (L2), mean squared error (MAE) (L1), and SSIM [41]. They are defined as follows,

(15)
(16)
(17)

where and are the ground truth and predicted clean images. and are the average of and , respectively. and are the variance of and , respectively. and are two numbers to stabilize the division with weak denominator.

Rain-Related Loss The rain-related variable prediction loss makes some outputs of the network predict the rain-related variable. For example, in [1], the streak and the binary streak maps are connected to the corresponding losses as follows,

(18)

where , , and are the ground truth rain streak, predicted rain streak, ground truth rain mask, and predicted rain mask, respectively. indexes the spatial pixel location.

Signal Structure Loss The multi-scale loss [28] constrains the deraining network at different scales as follows,

(19)

where indexes the scale, and and are the down-sampled versions of and with the scaling factor .

Perception-Driven Loss It is beneficial to apply perceptual and adversarial losses [28] to improve the perceptual quality of generated results. The perceptual loss is formulated as follows,

(20)

where is a pretrained CNN transformation. The adversarial loss used for deraining network is represented as follows,

(21)

where is a discriminator network that differentiates the generated and the ground truth .

A summary of loss functions used in previous works is given in Table IV.

(a) Rain100L
(b) Rain100H
(c) Rain1400
Fig. 10: The objective results of different methods. Top panel: PSNR. Bottom panel: SSIM. All methods are sorted by year. A red curve connects the top performance from 2015 to 2019. It is shown that, the objective performance gains converge gradually.
Methods Input ID DSC LP DetailNet Heavy DID-MDN RESCAN JCAS JORDER-E PReNet SPANet or
NIQE 5.38 4.46 4.55 5.46 5.34 4.97 5.07 3.78 4.97 3.93 4.78 3.92
PIQE 38.36 39.41 40.04 64.86 36.50 52.81 38.97 24.28 35.64 29.77 45.72 30.40
BRISQUE 34.00 31.44 32.51 41.01 32.99 37.32 30.12 25.70 32.91 26.51 30.28 28.38
ILNIQE 31.09 31.20 29.16 41.78 30.72 31.84 26.32 30.63 26.53 28.46 29.70 30.58
SSEQ 24.91 28.66 29.02 44.84 22.83 32.76 26.41 21.94 17.94 23.27 26.48 25.15
SR-Metric 7.90 7.73 7.46 4.76 8.12 7.26 8.06 7.90 8.29 7.82 7.66 7.85
ENIQA 0.1508 0.2012 0.1886 0.2631 0.1323 0.2091 0.1331 0.1347 0.1166 0.1394 0.1499 0.1445
BIQAA 0.0107 0.0036 0.0051 0.0031 0.0123 0.0069 0.0085 0.0041 0.0165 0.0040 0.0043 0.0078
BIQI 42.84 11.75 34.12 18.58 40.05 -5.15 37.05 35.40 29.81 34.46 31.02 24.34
BLIINDS-II 22.55 19.80 22.00 22.85 21.78 25.33 20.60 12.58 20.13 12.53 24.03 16.08
FRISQUE 54.99 54.32 52.95 38.40 58.97 52.75 25.29 57.56 66.70 58.49 58.95 58.54
MOS - 0.1042 0.2232 0.2440 1.8377 1.0974 0.4004 1.2413 0.3527 3.1366 2.3252 1.0
Subjective - 0.1466 0.2804 0.3050 1.8897 1.2510 0.4825 1.3037 0.4233 3.1988 2.3802 1.0
TABLE VI: The no-reference metric results of different methods. Heavy denotes HeavyRainRemoval. Red, blue, and green denote the best, second best, and third best results.

5 Performance Summary

We select and evaluate a number of deraining algorithms from different categories introduced in recent years:

  1. Image Decomposition, ID [4],

  2. Discriminative Sparse Coding, DSC [5],

  3. Gaussian mixture model Layer Prior, LP [6],

  4. Joint Convolutional Analysis and Synthesis Sparse Representation, JCAS [42],

  5. Deep Detail Network, DetailNet [19], DDN [18],

  6. Directional Global Sparse Model, DGSM [17],

  7. Recurrent Squeeze-and-Excitation Context Aggregation Net, RESCAN [31],

  8. Progressive Recurrent Network, PReNet [9],

  9. Enhanced JOint Rain DEtection and Removal, JORDER-E [33],

  10. Heavy Rain Image Restoration, HeavyRainRestorer [13],

  11. Spatial Attentive Network, SPANet [16],

  12. Semi-supervised Image rain Removal, SSIR [25], and

  13. Density-aware Image De-raining using Multi-stream Dense Network, DID-MDN [22].

LP is built based on Gaussian mixture model. ID, DSC, and JCAS are designed based on sparse coding. JORDER-E, DetailNet, DDN, DID-MDN, PReNet, SPANet, and RESCAN are deep-learning based methods. HeavyRainRestorer integrates deep CNN and generative adversarial learning for rain removal.

In the comparison experiment, JORDER-E, DetailNet, PReNet, and RESCAN are trained on Rain100H. SPANet is trained on RealDataset. DID-MDN is is trained on Rain800. HeavyRainRestorer is trained on NYU-Rain and Outdoor-Rain.

The comparison metrics of

Peak signal-to-noise ratio (PSNR) and SSIM 

[43] are used for performance evaluation. Some no-reference metrics are also used including Naturalness Image Quality Evaluator (NIQE) [44], Perception-based Image Quality Evaluator (PIQE) [45], Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) [46], Integrated Local NIQE (IL-NIQE) [47], Spatial-Spectral Entropy based Quality (SSEQ) [48], SR Metrics [49], Entropy-based No-reference Image Quality Assessment (ENIQA) [50], Blind Image Quality Assessment through Anisotropy (BIQAA) [51], and Blind Image Quality Assessment (BIQA) [52], BLind Image Integrity Notator using DCT Statistics (BLIINDS-II) [53]. These metrics measure the visual quality of different methods in a general way, considering human perception in lightness distortion, texture preservation, spatial domain statistics, and natural preservation, etc.

5.1 Quantitative Evaluation

We compare the quantitative results of different rain removal methods in Fig. 10. The number of objective metrics are cited from [54]. To observe the trend of objective performance change across years, we order different methods by year in Fig. 10. From this figure, some interesting findings can be obtained. First, most of the deep learning-based methods achieve significantly superior performance to the non-deep based methods. For example, DDN obtains more than 3dB, 7dB, and 0.7dB on Rain100L, Rain100H, and Rain1400, respectively. Second, the best performance of different methods gradually converges. The performance gaps between RESCAN, PReNet and JORDER-E are considerably close.

5.2 Qualitative Evaluation

We also show the visual results of different methods in Fig. LABEL:fig:sub_results. The input images shown in the figure are diversified and difficult to be handled, including large rain streaks and dense rain accumulation. It is clearly shown from the top two panels that, JORDER-E (Fig. 10 (j)) and PReNet (Fig. 10 (k)) are better at handling large rain streaks than other methods. JORDER-E (Fig. 10 (j)) and HeavyRainRemoval (Fig. 10 (f)) achieve better results in removing rain accumulation and enhancing the visibility from the bottom three panels.

We also use no-reference quality assessment metrics for performance comparison of different methods by evaluating the consistency between the results of deraining methods and the subjective results by Mean Opinion Score (MOS). The are total 20 image samples for evaluating eleven methods. The 20 rain images are processed by eleven methods and their results are evaluated by human annotators. Here, we use pair comparison in the subjective experiment for human ratings. 40 participants are invited in the subjective experiment. Most of them are research students, staffs or faculties in computer vision and image processing. Each of them is required to provide subjective results for 550 image pairs.

The comparison results are visualized in Fig. LABEL:fig:visual_all. Based on the compared pairs, we further fit a Bradley-Terry [55]

model to estimate the MOS score for each method so that they can be ranked. We can infer the MOS score for each input sample, and then combine the results of different samples via geometric mean, which is

denoted as the MOS in Table VI. We can also directly infer the MOS score with the accumulated ranking results of all samples, which is denoted as Subjective value in Table VI. It is observed that, DetailNet, JORDER-E, and PReNet obtain overall slightly higher MOS and subjective values. In general, the paper published in 2019 are on average superior to previous methods. However, the superiority of the subjective comparison is not the same significant to that of the objective one, which reflects the disagreements between optimizing the objective metrics on the synthesized data and achieving the better visual quality on real images due to the existence of the domain gap between the real rain images and synthesized data.

We also observe that, all no-reference quality assessment metrics are not well aligned to MOS and subjective values. We calculate Spearman rankorder correlation coefficient (SROCC), Kendallrank-order correlation coefficient (KROCC), and Pearson linear correlation coefficient (PLCC) in Table VII, where large absolute values denote that the metric can obtain more consistent results with human perception. It is shown that, the values for the best result are only 0.2216, 0.1473, 0.1864 for SROCC, KROCC, and PLCC. It is demonstrated that, all existing metrics are not suitable to measure the performance of rain removal, and there is great potential for future works on the deraining performance evaluation.

Methods SROCC KROCC PLCC
NIQE 0.0780 0.0461 0.0700
PIQE 0.2118 0.1437 0.1215
BRISQUE 0.1896 0.1297 0.1508
ILNIQE 0.0778 0.0508 0.1458
SSEQ 0.2216 0.1473 0.1257
SR-Metric 0.1132 0.0760 0.1129
ENIQA 0.1333 0.0932 0.1487
BIQAA 0.1365 0.0927 0.1383
BIQI 0.2001 0.1299 0.1558
BLIINDS2 0.1705 0.1208 0.1752
FRISQUE 0.2083 0.1407 0.1864
TABLE VII: The evaluation results of all quality assessment models.
Baseline ID DSC LP UGSM JCAS DetailNet DID-MDN
Time (Seconds) 283.69 398.66 1177.3 2.51 188.28 0.61 0.53
GPU/GPU (C) (C) (C) (C) (C) (G) (G)
Para. - - - - - 57,369 372,839
Baseline JORDER-E RESCAN ID-CGAN SPANet URML HeavyRainRemoval PReNet
Time (Seconds) 0.13 0.61 0.50 1.72 2.02 0.73 0.11
GPU/GPU (G) (G) (G) (G) (G) (G) (G)
Para. 4,169,024 149,823 263,686 283,716 984,356 40,627,038 168,963
TABLE VIII: The model complexity and running time of different methods. The size of the input image is . C and G denote CPU and GPU, respectively.

5.3 Computational Complexity

Table VIII compares the running time of different state-of-the-art methods. All sparse coding based methods are implemented in MATLAB and tested on CPU, following the original setting of all released codes , while other methods are accelerated by GPU. ID-CGAN is implemented in Torch7. The rest is

implemented in Pytorch. It is observed that, JORDER-E and HeavyRainRemoval use many

more parameters than other methods. There are also many parameters for URML. URML’s parameter number is also large. The comparison results on both performance and parameter number show that, PReNet is an impressive method quantitatively and qualitatively, and also keep a light-weighted framework.

6 Future Directions

6.1 Integration of Physics Model and Real Images

Many existing learning-based methods rely on synthetic rain images to train the networks , since to obtain paired rain images and their exact clean ground-truths is intractable. While such training scheme shows some degree of success, to improve the performance, we need to incorporate both real rain images and clean real background images in the training process; and, they do not need to be paired. If we do not include real rain images in the training stage, the network will never been exposed to the real rain images, impeding the network’s effectiveness in the testing stage. Incorporating real images that are not paired, however, can pose problems, mainly because there is no loss that can provides error for a network to learn. To address this problem, we may rely on physics-based constraints. The attempts in [13, 24] have witnessed the feasibility of this direction. Specifically, with the adversarial learning, image-level labels can be converted to the pixel-level prior knowledge. In the future, more works are expected in this direction to make efforts to combine of physical models and real rain images for rain removal.

6.2 Rain Modeling

The current synthetic rain models can only cover limited types of rain streaks, e.g. a range of scales, shapes, directions, etc. However, in practice, the appearance of rain streaks is diverse, due to many different factors that influence rain conditions, e.g. 3D environments, distances, wind directions/speed, etc. Currently, when the distributions of captured rain streaks are different from the synthetic images in the training, the methods tend to fail to remove rain properly. The studies [25, 12] attempt to model the rain appearance via the generation model and unpaired learning. However, observing their generated rain images, one can visibly see that they are not as diverse as real rain can be, and they are visibly not real enough. The latter can also cause problems, since it means there are significant gaps between synthetic and real rain images.

6.3 Evaluation Methodology of Rain Removal Results

With an explosive growth of works on rain removal, it is still challenging to measure whether a method is sufficiently good. As shown in Sec. 5.2, existing quality assessment methods are still far from capturing real visual perception of human. Thus, there is a potential direction which the community can pay more attention to. The quality assessment of rain removal methods can be considered from two aspects. First, for human vision, the metric should be designed to model the typical distortions caused by rain and deraining methods, and describe the human preferences to different deraining results. The effective quality assessment metrics can be used to measure the superiority of different methods. Besides, they can also be used as a guidance to inspire the development of new methods or a constraint directly put into an end-to-end optimized framework. Second, for machine vision, we could consider the performance of high-level vision tasks in rain conditions. The MPID dataset makes the preliminary attempt by constructing task-driven evaluation sets for traffic detection. The captured rain degradation is not diversified. In the future, more large-scale task-driven evaluation sets with more applications in diverse rain conditions are expected.

6.4 More Related Tasks and Real Applications

When existing rain removal/deraining methods are applied to real applications, there are a few factors which should be considered. First, the computational complexity will be the bottleneck of real applications. Existing methods are far from the requirement of real-time processing (30 fps). How to accelerate existing methods to fulfill the real-time processing is an meaningful direction. Second, the real rain images usually contain more complicated visual degradation. For example, the surveillance videos are compressed and also include compression distortion, e.g. blocking artifacts. Effective rain removal methods also need to take care of these issues. Third, there are scenarios where composite degradation might be involved, e.g. night-time rain conditions, mixture of raindrop and rain streak, etc. It will be interesting to detect the degradation types and handle them in a unified framework adaptively.

7 Concluding Remarks

This survey reviews single-image deraining methods based on model-based and data-driven approaches. According to the basic models, model-based methods are categorized into: sparse coding and GMM approaches, and data driven (deep-learning) based methods are classified into: deep CNN, generative adversarial network, and semi/un-supervised learning methods. A comprehensive survey of the previous approaches is conducted under each category. The category evolution suggests that, deep-learning based methods are getting popular and the existing approaches trained on synthesized data might not provide desirable results in the real cases. The future researches will embrace the endeavors of building semi/unsupervised methods, conduct domain generalization, create reasonable evaluation quality metrics for rain removal, and explore the potential of applying deraining to real applications.

References

  • [1] W. Yang, R. T. Tan, J. Feng, J. Liu, Z. Guo, and S. Yan, “Deep joint rain detection and removal from a single image,” in

    Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition

    , July 2017.
  • [2] W. Yang, J. Liu, and J. Feng, “Frame-consistent recurrent video deraining with dual-level flow,” in Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, June 2019.
  • [3] K. Garg and S. K. Nayar, “Detection and removal of rain from videos,” in Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, vol. 1, 2004, pp. I–528.
  • [4] L. W. Kang, C. W. Lin, and Y. H. Fu, “Automatic single-image-based rain streaks removal via image decomposition,” IEEE Trans. on Image Processing, vol. 21, no. 4, pp. 1742–1755, April 2012.
  • [5] Y. Luo, Y. Xu, and H. Ji, “Removing rain from a single image via discriminative sparse coding,” in Proc. IEEE Int’l Conf. Computer Vision, 2015, pp. 3397–3405.
  • [6] Y. Li, R. T. Tan, X. Guo, J. Lu, and M. S. Brown, “Rain streak removal using layer priors,” in Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2016, pp. 2736–2744.
  • [7] L. Zhu, C. Fu, D. Lischinski, and P. Heng, “Joint bi-layer optimization for single-image rain streak removal,” in Proc. IEEE Int’l Conf. Computer Vision, Oct 2017, pp. 2545–2553.
  • [8] Z. Fan, H. Wu, X. Fu, Y. Huang, and X. Ding, “Residual-guide network for single image deraining,” in ACM Trans. Multimedia, 2018, pp. 1751–1759.
  • [9] D. Ren, W. Zuo, Q. Hu, P. Zhu, and D. Meng, “Progressive image deraining networks: A better and simpler baseline,” in Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, June 2019.
  • [10] G. Li, X. He, W. Zhang, H. Chang, L. Dong, and L. Lin, “Non-locally enhanced encoder-decoder network for single image de-raining,” in ACM Trans. Multimedia.   ACM, 2018, pp. 1056–1064.
  • [11] X. Fu, B. Liang, Y. Huang, X. Ding, and J. Paisley, “Lightweight pyramid networks for image deraining,” IEEE Trans. on Neural Networks and Learning Systems, pp. 1–14, 2019.
  • [12] X. Jin, Z. Chen, J. Lin, Z. Chen, and W. Zhou, “Unsupervised single image deraining with self-supervised constraints,” in Proc. IEEE Int’l Conf. Image Processing, Sep. 2019, pp. 2761–2765.
  • [13] R. Li, L.-F. Cheong, and R. T. Tan, “Heavy rain image restoration: Integrating physics model and conditional adversarial learning,” in Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, June 2019.
  • [14] K. Garg and S. K. Nayar, “Vision and rain,” Int. J. Comput. Vision, vol. 75, no. 1, pp. 3–27, October 2007.
  • [15] J. Liu, W. Yang, S. Yang, and Z. Guo, “Erase or fill? deep joint recurrent rain removal and reconstruction in videos,” in Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, June 2018, pp. 3233–3242.
  • [16] X. Hu, C.-W. Fu, L. Zhu, and P.-A. Heng, “Depth-attentional features for single-image rain removal,” in Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, June 2019.
  • [17] L.-J. Deng, T.-Z. Huang, X.-L. Zhao, and T.-X. Jiang, “A directional global sparse model for single image rain removal,” Applied Mathematical Modelling, vol. 59, pp. 662 – 679, 2018.
  • [18] X. Fu, J. Huang, X. Ding, Y. Liao, and J. Paisley, “Removing rain from single images via a deep detail network,” in Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, Honolulu, Hawaii, USA, July 2017.
  • [19] ——, “Clearing the skies: A deep network architecture for single-image rain removal,” IEEE Trans. on Image Processing, vol. 26, no. 6, pp. 2944–2956, June 2017.
  • [20] R. Li, L.-F. Cheong, and R. T. Tan, “Single Image Deraining using Scale-Aware Multi-Stage Recurrent Network,” arXiv e-prints, p. arXiv:1712.06830, Dec 2017.
  • [21] X. Li, J. Wu, Z. Lin, H. Liu, and H. Zha, “Recurrent squeeze-and-excitation context aggregation net for single image deraining,” in Proc. IEEE European Conf. Computer Vision, 2018, pp. 262–277.
  • [22] H. Zhang and V. M. Patel, “Density-aware single image de-raining using a multi-stream dense network,” in Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, June 2018.
  • [23] J. Pan, S. Liu, D. Sun, J. Zhang, Y. Liu, J. Ren, Z. Li, J. Tang, H. Lu, Y.-W. Tai, and M.-H. Yang, “Learning dual convolutional neural networks for low-level vision,” in Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, June 2018.
  • [24] H. Zhang, V. Sindagi, and V. M. Patel, “Image De-raining Using a Conditional Generative Adversarial Network,” arXiv e-prints, p. arXiv:1701.05957, Jan 2017.
  • [25]

    W. Wei, D. Meng, Q. Zhao, Z. Xu, and Y. Wu, “Semi-supervised transfer learning for image rain removal,” in

    Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, June 2019.
  • [26] S. Li, I. B. Araujo, W. Ren, Z. Wang, E. K. Tokuda, R. H. Junior, R. Cesar-Junior, J. Zhang, X. Guo, and X. Cao, “Single image deraining: A comprehensive benchmark analysis,” in Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, June 2019.
  • [27] D. Eigen, D. Krishnan, and R. Fergus, “Restoring an image taken through a window covered with dirt or rain,” in Proc. IEEE Int’l Conf. Computer Vision, December 2013.
  • [28] R. Qian, R. T. Tan, W. Yang, J. Su, and J. Liu, “Attentive generative adversarial network for raindrop removal from a single image,” in Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, June 2018.
  • [29] R. Yasarla and V. M. Patel, “Uncertainty guided multi-scale residual learning-using a cycle spinning cnn for single image de-raining,” in Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, June 2019.
  • [30] W. Yang, J. Liu, S. Yang, and Z. Guo, “Scale-free single image deraining via visibility-enhanced recurrent wavelet learning,” IEEE Trans. on Image Processing, vol. 28, no. 6, pp. 2948–2961, June 2019.
  • [31] X. Li, J. Wu, Z. Lin, H. Liu, and H. Zha, “Rescan: Recurrent squeeze-and-excitation context aggregation net,” in Proc. IEEE European Conf. Computer Vision, Oct. 2018.
  • [32] Y. Wang, S. Liu, C. Chen, and B. Zeng, “A hierarchical approach for rain or snow removing in a single color image,” IEEE Trans. on Image Processing, vol. 26, no. 8, pp. 3936–3950, Aug 2017.
  • [33] W. Yang, R. T. Tan, J. Feng, J. Liu, S. Yan, and Z. Guo, “Joint rain detection and removal from a single image with contextualized deep networks,” IEEE Trans. on Pattern Analysis and Machine Intelligence, pp. 1–1, 2019.
  • [34] T. Wang, X. Yang, K. Xu, S. Chen, Q. Zhang, and R. W. Lau, “Spatial attentive single-image deraining with a high quality real rain dataset,” in Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, June 2019.
  • [35] R. Li, L.-F. Cheong, and R. T. Tan, “Single Image Deraining using Scale-Aware Multi-Stage Recurrent Network,” ArXiv e-prints, December 2017.
  • [36] M. S. Gerald Schaefer, “Ucid: an uncompressed color image database,” 2003.
  • [37] P. K. Nathan Silberman, Derek Hoiem and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in Proc. IEEE European Conf. Computer Vision, 2012.
  • [38] C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” 2017.
  • [39]

    M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in

    Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2016.
  • [40] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, “Contour detection and hierarchical image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 5, pp. 898–916, May 2011.
  • [41] Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Processing, vol. 13, no. 4, pp. 600–612, April 2004.
  • [42] S. Gu, D. Meng, W. Zuo, and L. Zhang, “Joint convolutional analysis and synthesis sparse representation for single image layer separation,” in Proc. IEEE Int’l Conf. Computer Vision, Oct 2017, pp. 1717–1725.
  • [43] A. C. Brooks, X. Zhao, and S. Member, “Structural similarity quality metrics in a coding context: Exploring the space of realistic distortions,” IEEE Trans. on Image Processing, pp. 1261–1273, 2008.
  • [44] A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a “completely blind” image quality analyzer,” IEEE Signal Processing Letters, vol. 20, no. 3, pp. 209–212, March 2013.
  • [45] N. Venkatanath, D. Praneeth, B. M. Chandrasekhar, S. S. Channappayya, and S. S. Medasani, “Blind image quality evaluation using perception based features,” in Proc. IEEE National Conf. Communications, 2008.
  • [46] A. Mittal, A. K. Moorthy, and A. C. Bovik, “Blind/referenceless image spatial quality evaluator,” in Conf. Record of Asilomar Conf. on Signals, Systems and Computers, Nov 2011, pp. 723–727.
  • [47] L. Zhang, L. Zhang, and A. C. Bovik, “A feature-enriched completely blind image quality evaluator,” IEEE Trans. on Image Processing, vol. 24, no. 8, pp. 2579–2591, Aug 2015.
  • [48] L. Liu, B. Liu, H. Huang, and A. C. Bovik, “No-reference image quality assessment based on spatial and spectral entropies,” Signal Processing: Image Communication, vol. 29, no. 8, pp. 856 – 863, 2014.
  • [49]

    C. Ma, C.-Y. Yang, X. Yang, and M.-H. Yang, “Learning a no-reference quality metric for single-image super-resolution,”

    Comput. Vis. Image Underst., vol. 158, pp. 1–16, May 2017.
  • [50] X. Chen, Q. Zhang, M. Lin, G. Yang, and C. He, “No-reference color image quality assessment: from entropy to perceptual quality,” EURASIP Journal on Image and Video Processing, vol. 2019, no. 1, p. 77, Sep 2019. [Online]. Available: https://doi.org/10.1186/s13640-019-0479-7
  • [51] S. Gabarda and G. Cristóbal, “Blind image quality assessment through anisotropy,” J. Opt. Soc. Am. A, vol. 24, no. 12, pp. B42–B51, Dec 2007.
  • [52] L. Zhang, L. Zhang, and A. C. Bovik, “A feature-enriched completely blind image quality evaluator,” IEEE Trans. on Image Processing, vol. 24, no. 8, pp. 2579–2591, Aug 2015.
  • [53] M. A. Saad, A. C. Bovik, and C. Charrier, “A dct statistics-based blind image quality index,” IEEE Signal Processing Letters, vol. 17, no. 6, pp. 583–586, June 2010.
  • [54] H. Wang, Y. Wu, M. Li, Q. Zhao, and D. Meng, “A Survey on Rain Removal from Video and Single Image,” arXiv e-prints, p. arXiv:1909.08326, Sep 2019.
  • [55] R. A. Bradley and M. E. Terry, “Rank analysis of incomplete block designs: The method of paired comparisons,” Biometrika, vol. 39, no. 3-4, pp. 324–345, 12 1952.
  • [56] M. Elad and M. Aharon, “Image denoising via learned dictionaries and sparse representation,” in Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, June 2006, pp. 895–900.
  • [57] A. Yamashita, Y. Tanaka, and T. Kaneko, “Removal of adherent waterdrops from images acquired with stereo camera,” in IEEE/RSJ Int’l Conf. on Intelligent Robots and Systems, Aug 2005, pp. 400–405.
  • [58] A. Yamashita, I. Fukuchi, and T. Kaneko, “Noises removal from image sequences acquired with moving camera by estimating camera motion from spatio-temporal information,” in IEEE/RSJ Int’l Conf. on Intelligent Robots and Systems, Oct 2009, pp. 3794–3801.
  • [59] S. You, R. T. Tan, R. Kawakami, Y. Mukaigawa, and K. Ikeuchi, “Adherent raindrop modeling, detectionand removal in video,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 38, no. 9, pp. 1721–1733, Sep. 2016.