Images captured in rainy weather suffer from severe visibility degradation, which may impose great negative effects on many computer vision tasks, including object detection and tracking, autonomous driving, and semantic segmentation. In this regard, image deraining is an essential prerequisite for many vision applications, seeking to recover the clean image from its complex entanglement with rain streaks. This problem is, however, very challenging and ill-posed, as the underlying background is totally unknown.
Many efforts have been dedicated to addressing the problem. Early investigations are mainly based on various image priors. One of the image priors closely related to deraining is that the main structures of an image are usually of low frequency while the details, such as rain streaks, are often of high frequency . Naturally, the pioneering work of image deraining first adopts bilateral filtering to decompose the images into low frequency maps and high frequency maps, and then deal with rain streaks in the high frequency maps using dictionary learning . Later, more approaches have been proposed based on this prior 
, as well as other image priors, such as Gaussian mixture model and low-rank representations . However, these hand-crafted image priors are incapable of disentangling structures, particularly exquisite ones, from rain streaks, thereby reconstructing unsatisfactory clean images.
Recently, the performance of image deraining is boosted by deep convolutional neural networks (CNNs), which aim at capturing a variety of image characteristics by learning a complex model from massive data. The first deep learning-based deraining framework is proposed by. In this work, after a frequency-based decomposition operation, a three-layer CNN is directly adopted to extract rain streaks from the high-frequency maps. Later, more CNN models have been proposed by introducing either extra network modules  or task-specific auxiliary information [8, 26] to guide the learning process, attempting to capture more powerful features to distinguish image structures and rain streaks. However, these models still have several shortcomings. First, these approaches perform frequency decomposition only at the image level, making the rain streaks mistakenly assigned to low-frequency maps difficult to be effectively removed and, simultaneously, the delicate structures assigned to high-frequency maps difficult to be recovered in the clean images. Second, there lack of interactive mechanisms between the low-frequency maps and high-frequency maps in the training procedure. Third, traditional convolutional filters are often consistent in all directions while the rain streaks in an image usually head in one direction, i.e. the wind direction; this property is ignored by most of current solutions.
In order to comprehensively address these shortcomings, in the paper, we propose a novel network with dual branches for single image deraining. Compared with existing solutions, the proposed network has three compelling characteristics. First, unlike previous algorithms, we propose to perform frequency decomposition at feature-level instead of image-level. By this way, the proposed network is able to generate low-frequency maps and high-frequency maps from feature maps at different layers, and hence allows these maps to be continuously refined during the training procedure (see Fig. 1). Second, we further establish communication channels between the dual branches, promoting the information propagation between low-frequency maps and high-frequency maps during the training procedure. Such a mechanism is not only helpful for separating more rain streaks from low-frequency maps to facilitate deraining but also useful for extracting more delicate features from high-frequency maps and add them back to low-frequency maps, enhancing clean image reconstruction. Third, most existing methods employ convolutional filters harmonious in all directions but ignore the fact that the rain streaks always head to the wind direction in an image, and hence are sub-optimal for image deraining. In order to take full advantage of this phenomenon, we propose a novel cross-median filter to capture the direction of rain streaks, aiming at producing more representative features to thoroughly purge the input image of rain streaks. We extensively evaluate the proposed network on three famous image deraining datasets. Experimental results demonstrate the effectiveness of the proposed network, consistently outperforming state-of-the-art approaches in most metrics. Our contributions can be summarized as:
We propose a novel network with dual branches for single image deraining, conducting frequency decomposition at feature level instead of image level so as to gradually and iteratively refine both low frequency maps and high frequency maps during training.
We propose a new mechanism to promote interactions between low frequency maps and high frequency maps, facilitating both rain streak removal and fine features recovery; we further propose a novel direction-aware filter to more efficiently and effectively capture rain streaks in training.
We set state-of-the-art performance of single image deraining on three famous datasets.
2 Related Work
2.1 Conventional Methods
Mainstream in conventional methods models image deraining as an image decomposition problem, where a rainy image is decomposed into a clean background layer and a rain streak layer . This strategy is followed by 
, which introduce more prior knowledge, such as depth of field and color variance, for better extracting the rain streaks from the detail layer. In addition, many other image priors are exploited for image deraining. first adopts low rank representation to describe the non-local similarity in different rain patches, which is further explored by .  considers the difference in rain streak layer and background layer, based on which they propose a novel discriminative sparse coding method.  exploits Gaussian mixture models for rain removal, which is learned on small patches that can accommodate a variety of background appearances and rain streak appearances.  combines analysis sparse representation and synthesis sparse representation to better separate rain streaks and image textures.
2.2 Deep Learning-based Methods
, or adopt a multi-stage strategy using recurrent neural networks to progressively recover the clean image[11, 18].  proposes to recover low frequency image structures and high frequency image details separately using two parallel network branches.  takes advantage of another network branch to find back lost details. GAN is also exploited by [17, 27] to refine the deraining results for more visual appealing effects. Besides,  builds a dataset to describe heavy rainy scenes using depth images to associate rain streaks and rainy haze.  proposes a real rain dataset using video-based deraining results and adopt a directional IRNN to learn spatial attention for guiding the network.  presents a comprehensive benchmark named MPID for evaluation of various deraining methods.  first utilizes CycleGAN for single image deraining. For removing different scales of rain streaks,  designs a fractal band learning network trained with self-supervision for scale-robust rain streak removal.
In this section, we introduce our proposed method built on frequency decomposition. Denote the rain-free label image as , it is first decomposed into a low-frequency structure map and a high-frequency detail map . Our goal is to accurately predict both and using a single rainy input
, so as to recover a high-fidelity derained image with abundant details and minimum distortions. Our solution resorts to feature-level frequency decomposition along and across a parallel network architecture. The extraction of a particular frequency is learned along each branch towards the decomposed label, while components of the other frequency are continuously delivered across the branch. We first interpret the multiple labels and their loss functions, and then go into details of the direction-aware Cross-Median Filter (dCMF) and the interactive adapter to explain how we isolate the different frequency to enhance communication between branches.
3.1 Label Decomposition and Loss Functions
Label decoupling in low-level vision is a strategy that models the final task as a composition of several easier sub-tasks [21, 15], which we argue is particularly effective in image deraining due to the complex entanglement of rain streaks and image contents . In our case, the label image is decomposed into low- and high- frequencies using a low-pass image filter, where the high frequency part contains abundant image details and the low frequency part characterizes main image structures. We employ two network branches to deal with structures and details respectively. For the detail branch, we minimize the distance between the detail and the output of the high-frequency branch to preserve gradient discontinuities, while for the structure branch we enforce loss to encourage global smoothness
where and represent the detail branch and the structure branch, respectively. Furthermore, to ensure the fidelity and structure integrity in the composited derained image, we combine loss and SSIM loss to constrain the final result, which can be written as
where is the output of the whole network as the prediction of the rain-free background. Given the aforementioned three kinds of loss, the overall loss can be formulated as
where , and are the weighting parameters, which in our experiments are all fixed to be 1.
3.2 Feature-level Frequency Decomposition
As indicated in , feature maps are also composed of different frequencies. While in low-level vision, as can be observed in Fig. 1, low- and high- frequency feature maps also reflect image structures and details, which are complementary and tightly correlated. To enhance communication across frequencies, we propose a direction-aware Cross-Median Filter (dCMF) to explicitly extract low-frequency components from an entanglement of background features and rain streak patterns with varying falling directions, and an interactive adapter to implicitly enhance feature decomposition through interactive connections.
Direction-aware Cross-Median Filter: Direction-aware Cross-Median Filter (dCMF) aims at separating different frequencies on rain-affected features in the communication paths across the branches. As shown in Fig. 2, in the HILO module, dCMF extracts low-frequency components, which are adapted by channel-wise attention using Squeeze-and-Excitation and then sent to the structure learning branch, while in the LIHO module, the residual of the dCMF filtering result is delivered to the high-frequency branch.
Our key observation in designing dCMF is that, rain patterns, not only in rainy images but also in their feature maps, have the properties of relatively high intensity and globally consistent orientation, which makes them distinguishable to background patterns in similar scales (see Fig. 1). To take full advantage of this prior, we make three additional modifications upon a naive low-pass image filter. First, we adopt median pooling rather than averaging to avoid extreme values brought by rain streaks in feature maps. Second, we replace 2D filtering kernels with 1D directional lines. It is intuitive that rain mainly falls in the vertical direction and leaves traces of globally vertical streaks. Suppose a 1D filtering line centered at a single rain streak, and the angle between them is denoted as . It is clear that, when the filtering kernel is least affected by this rain streak and other possible rain directions in this rainy image. However, real scenes are much more complicated, the direction of rain streaks can be affected by many factors, such as wind and obstacles. To take this complexity into consideration, we enumerate different orientations of the 1D kernels as shown in the first column of Fig. 3 (a), extending to the set and leverage a self-attention mechanism to learn the importance of each direction. To enforce attention on both feature channels and different groups of filtering results, we adopt a similar strategy with : three groups of filtering results are aggregated through addition and global average pooling for computing the individual attentions for each group, and the weighted sum of the groups constitute the final output of dCMF. Third, each 1D kernel in is followed by a crisscross counterpart to constitute a complete Cross-Median Filter (CMF). Each CMF has exactly the same receptive field with the corresponding 2D kernel, but is more robust against rain streaks since the second kernel operates on the feature maps with greatly reduced rain patterns.
Interactive Adapters: For frequency exchange across branches, dCMF leverages prior knowledge on rain streaks to explicitly compute low-frequency components, while the interactive adapter uses learnable convolutional kernels as the frequency filter. The behavior of the interactive adapter is guided by the decomposed labels as a complementary to dCMF-based frequency decomposition. As shown in Fig. 2, we adopt asymmetric convolution blocks (ACB)  to integrate features from different branches and automatically adjust the information exchange between branches. Suppose are the input features of the adapter, and denotes the ACB unit, and refer to the detail branch and the structure branch, respectively. The basic function of the interactive adapter can be expressed as
whereis the output feature. For each interactive adapter, the above function is computed twice, and the second output feature can be obtained by simply replace with . Due to the symmetry in interactive adapters, the corresponding function in the structure branch can be easily inferred. Through these dual interaction functions, redundant information can be efficiently transferred to another path and thus encourages exploration on new features. In the adapter, computation occurs in parallel at both the dual branches and the interactive paths, which allows accurate decomposition in an information intensive but computation efficient way.
4 Experiments and Results
In this section, we evaluate our method on three synthetic datasets: Rain200L, Rain200H  and Rain800 . Since no rain-free ground truths for real-world images are provided, we performed a user study on several real-world datasets . Please refer to the supplementary materials for more details.
4.1 Comparison with the State-of-the-Arts
The quantitative evaluation results of PSNR and SSIM are shown in Tab. 1. As can be observed, our proposed method mostly obtains the highest values of PSNR and SSIM than other methods on the synthetic datasets. The visual comparisons are shown in Fig. 4, from which one can observe that our method better remains the structure and preserves the detail of images.
Furthermore, visual evaluation on a series of real-world rainy mages is provided in Fig. 5, from which one can observe that our method can not only remove real rain streaks but also better preserve the image structures and details. As can be seen, challenging areas, such as the textures of the pillars and the border of the wall, are well preserved by our method.
4.2 Ablation Study
Ablation Study on Different Components: In Tab. 2, we show quantitative results in order to validate the effectiveness of: dual branch architecture, interactive adapter, direction-aware Cross-Median Filter in HILO and LIHO modules.
BL: Baseline (BL) indicates that we use a single branch with the residual network to learn a rainy-to-derained function.
DBL: Dual Baseline (DBL) indicates that we use two same branches without interaction for single image rain removal, which learns the detail image and the structure image respectively.
DBL+I: Replacing the residual block with Interactive Adapter in DBL (We remove the HILO and LIHO from our proposed network).
DBL+I+O: Adding HILO, LIHO and OC to DBL ( Interactive adapter is replaced by the Octave Conv (OC) in our network).
Analysis on dCMF: To validate the effectiveness of our proposed dCMF in HILO and LIHO, we remove them from our method and the result can be found in the ’DBL+I‘ column of Tab. 2. In addition, we replace dCMF with ordinary kernel median filter and Gaussian filter, as shown in Tab. 3. Generally speaking, both results show clear advantage of dCMF in the deraining task, and we also visually inspected that deraining results without dCMF suffer from heavier degradations.
Analysis on Interactive Adapter: In order to further analyze the necessity of the interactive adapter, we replace it with another frequency decomposition method: octave convolution (OC) , in our network and the result is shown in Tab. 2, which demonstrates that not only can the interactive blocks between two branches improve the performance of the network but also can our interactive adapter outperforms the Octave Conv in the deraining task.
4.3 Running Time
We compare the running time of our method with different approaches on Rain200H. As shown in Tab. 4, our method is not the fastest one, but reaches a reasonable balance between performance and efficiency.
We propose an interactive dual-branch network where features of different frequencies are learned and exchanged to enhance the performance of single image deraining. The communication between high- and low- frequency branches relys on two key designs: (1) instead of using convolutional filters consistent in all directions, we propose direction-aware Cross-Median Filter to thoroughly purge rain patterns in frequency decomposition; (2) we present the interactive adapter to enhance feature learning and interaction towards decomposed labels.
This work was supported by the National Natural Science Foundation of China (Nos. 62032011, 61502137).
-  (2013) A generalized low-rank appearance model for spatio-temporally correlated rain streaks. In ICCV, pp. 1968–1975. Cited by: §2.1.
Drop an octave: reducing spatial redundancy in convolutional neural networks with octave convolution. In ICCV, pp. 3435–3444. Cited by: §3.2, §4.2.
-  (2020) Detail-recovery image deraining via context aggregation networks. In CVPR, pp. 14548–14557. Cited by: §2.2, §3.1, §4.1.
-  (2017) Removing rain from single images via a deep detail network. In CVPR, pp. 3855–3863. Cited by: §1, §2.2, §4.1.
-  (2011) Single-frame-based rain removal via image decomposition. In ICASSP, pp. 1453–1456. Cited by: §1, §2.1.
-  (2017) Joint convolutional analysis and synthesis sparse representation for single image layer separation. In ICCV, pp. 1708–1716. Cited by: §2.1.
-  (2019) Robust low-rank subspace segmentation with finite mixture noise. PR 93, pp. 55–67. Cited by: §1.
-  (2019) Depth-attentional features for single-image rain removal. In CVPR, pp. 8022–8031. Cited by: §1, §2.2, §4.1.
-  (2019) Acnet: attention based network to exploit complementary features for rgbd semantic segmentation. In ICIP, pp. 1440–1444. Cited by: §3.2.
-  (2019) Single image deraining: a comprehensive benchmark analysis. In CVPR, pp. 3838–3847. Cited by: §2.2.
-  (2018) Recurrent squeeze-and-excitation context aggregation net for single image deraining. In ECCV, pp. 254–269. Cited by: §2.2, §4.1.
-  (2019) Selective kernel networks. In CVPR, pp. 510–519. Cited by: §3.2.
-  (2016) Rain streak removal using layer priors. In CVPR, pp. 2736–2744. Cited by: §1, §2.1, §4.1.
-  (2015) Removing rain from a single image via discriminative sparse coding. In ICCV, pp. 3397–3405. Cited by: §2.1, §4.1.
-  (2018) Learning dual convolutional neural networks for low-level vision. In CVPR, pp. 3070–3079. Cited by: §2.2, §3.1, §4.1.
-  (1990) Scale-space and edge detection using anisotropic diffusion. IEEE TPAMI 12 (7), pp. 629–639. Cited by: §1.
Removing rain based on a cycle generative adversarial network. In ICIEA, pp. 621–626. Cited by: §2.2.
-  (2019) Progressive image deraining networks: a better and simpler baseline. In CVPR, pp. 3937–3946. Cited by: §1, §2.2, §4.1.
-  (2019) Spatial attentive single-image deraining with a high quality real rain dataset. In CVPR, pp. 12270–12279. Cited by: §2.2, §4.1.
-  (2017) A hierarchical approach for rain or snow removing in a single color image. IEEE TIP 26 (8), pp. 3936–3950. Cited by: §1, §2.1.
-  (2020) Label decoupling framework for salient object detection. In CVPR, pp. 13025–13034. Cited by: §3.1.
-  (2017) Deep joint rain detection and removal from a single image. In CVPR, pp. 1357–1366. Cited by: §4.
-  (2020) Towards scale-free rain streak removal via self-supervised fractal band learning.. In AAAI, pp. 12629–12636. Cited by: §2.2.
-  (2019) Gradual network for single image de-raining. In ACM MM, pp. 1795–1804. Cited by: §2.2.
-  (2017) Convolutional sparse and low-rank coding-based rain streak removal. In WACV, pp. 1259–1267. Cited by: §2.1.
-  (2018) Density-aware single image de-raining using a multi-stream dense network. In CVPR, pp. 695–704. Cited by: §1.
-  (2019) Image de-raining using a conditional generative adversarial network. IEEE TCSVT. Cited by: §2.2, §4.
-  (2019) Singe image rain removal with unpaired information: a differentiable programming perspective. In AAAI, Vol. 33, pp. 9332–9339. Cited by: §2.2.