Semantic Guided Single Image Reflection Removal

07/27/2019 ∙ by Yunfei Liu, et al. ∙ Beihang University University of Illinois at Urbana-Champaign 5

Reflection is common in images capturing scenes behind a glass window, which is not only a disturbance visually but also influence the performance of other computer vision algorithms. Single image reflection removal is an ill-posed problem because the color at each pixel needs to be separated into two values, i.e., the desired clear background and the reflection.To solve it, existing methods propose priors such as smoothness, color consistency. However, the low-level priors are not reliable in complex scenes, for instance, when capturing a real outdoor scene through a window, both the foreground and background contain both smooth and sharp area and a variety of color. In this paper, inspired by the fact that human can separate the two layers easily by recognizing the objects, we use the object semantic as guidance to force the same semantic object belong to the same layer. Extensive experiments on different datasets show that adding the semantic information offers a significant improvement to reflection separation. We also demonstrate the applications of the proposed method to other computer vision tasks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 4

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

When taking a photo of objects behind the glass window, unwanted reflection always appears. It is not only visually disturbing but may also affect the performance of other computer vision algorithms (, object detection, scene parsing, ). To solve this problem, reflection removal has been exploited by a number of existing works [15, 16, 17, 10, 25]. Single image reflection removal is a challenging problem. It only takes a single image as input and aims to separate it into two outputs, the clear background and the reflection. Specifically, given a input image with reflection, denoted as , we need to separate it into background and reflection  [16, 10]:

(1)
Figure 1: Comparison of single image reflection removal with and without semantic guidance. In this case, the state-of-the-art method [31] fails to separate the layers correctly (upper row), while our semantic guided method can obtain clearer separation (bottom row).

Apparently, Eqn. 1 is ill-posed due to there are double unknowns (B and R) to the one known (I

). To obtain meaningful solutions, existing methods either introduced various low-level priors or use a deep neural network. For example, Li

[16] take the smoothness prior which assumes the reflection is always smoother than the background. Shih [20] propose the ghost effect prior which models the double reflection property caused by the two sides of a glass. However, such priors are all based on low-level cues which are not robust in most real scenes. Later, Fan [10]

trained a two-stage deep learning approach with low-level losses on color and edges to learning the mapping between the mixture images and the clean images. Recently, Zhang

[31] exact features from the few layers of a pre-trained VGG-19 network and consider them as perceptual feature. Low-level information is insufficient for the reflection separation when there is low-level appearance ambiguity.

As shown in Fig. 1, both the reflection (bus) and the background (child) contains a variety of complex texture and color which share similar statics. And therefore, reflection cannot be easily removed by existing methods.

In this paper, we are inspired by the human cognition, that we humans can easily separate visual appearance into reflection and background. We notice that human vision achieves such capability by understanding the objectiveness. in Fig. 1, we understand that head, torso, hands and legs all belong to human and therefore belong to the same layer. This enable us to know that the red and blue coat belongs to the background while the light black and white components belong to the reflection.

Implementing such idea is not trivial. Because understanding the semantic in image with reflection and later use it to separate the appearance is a “chicken and egg” problem. In another word, a naive semantic estimation network is not guaranteed to work robustly with the presence of reflection, and a cleaner image will benefit the estimation of semantic. To solve it, same as all existing works, we assume the intensity of the background image is stronger than the reflection. We then propose the multi-task Semantic guided Reflection Removal Network (SRRN). This means we simultaneously learn the semantic estimation and reflection removal, and thereby solves the ”chicken and egg” problem. Particularly, for our implementation of the multi-task learning, we let the semantic task and reflection removal task share the same encoder and hidden parameters. Furthermore, we explicitly let the semantic to guide the reflection removal, which explicitly reflects our idea.

To evaluate the effectiveness of SRRN, we conducted systematical experiments on three datasets: first, a real images dataset proposed by Zhang [31]; second, a real benchmark proposed by Wan [23]; third, our synthetic dataset. Experiments report consistent and significant performance improvement on all three datasets. Rigorous experiments also show that our implementation of multi-task learning out-perform the baselines.

Contributions. We summary the contributions as follows:

  • To the best of our knowledge, we are the first method to use object semantic prior to reflection removal, and jointly solve the semantic separation and reflection removal from single image.

  • We propose a novel multi-task and end-to-end network structure single image reflection removal (main-task) with semantic guidance (sub-task).

  • We demonstrated the consistent effectiveness of the method by systematical experiments on two existing dataset sets and our new dataset.

2 Related Work

Multiple-view methods. Many works solve Eqn. 1 with multiple input. Methods [22, 15, 11] assume the reflection and background layer are at different depth plane which can be separated by multi-view depth estimation. To align multiple inputs, optical-flow is adopt to make reflection removal [27, 29]. Method [21] make reflection removal on in-vehicle black box to get a cleaner video of outside of the car. Recently, low rank matrix completion [17] is also used to make reflection removal in videos.

Multiple-modality methods. Another group of work using a pair of with/without flash points to make reflection removal like [1]. Schechner [19] using a group images with different focal length, remove reflections by solving the depth of different layers. Kong [14, 26] explore the polarization and take multiple image to solve the optimal separation through angle filter.

Non-CNN Single-image methods. Eqn. 1 is not directly solvable for single image. To tackle this Li [16] assume that the reflection layer is more blurry than background layer and model these as two different gradient distributions in the two layers for the separation. Shih [20] explore the ghost effect in reflection layer and designed a GMM model to make reflection removal. Arvanitopoulos [2] make reflection suppression through the relative gradient prior between different layers. Sandhan [18] use the symmetry in human face to remove the reflections on glasses. Yun [30] propose an algorithm to remove virtual points in large scale 3D points clouds using the conditions of the reflection symmetry and the geometric similarity.

CNN Based Single Image Methods. Fan [10] propose the Cascade Edge and Image Learning Network (CEIL Net) for reflection removal, in this work, background’s edge is predicted at first and then adopt to guide the reflection separation. Wan utilize existing prior information, designed a benchmark [23] for reflection removal, then train an end-to-end model called CRRN [24] to separate layers. Yang [28] presented Bidirectional Network (BDN) to predict background and reflection layer sequentially, they use these two layers to constrain each other and extract layers from coarse to fine. Baslamisli [3] designed a reflection and retinex model based on CNNs to decompose intrinsic image in two-stage methods. Zhang [31] proposed perceptual loss, which is extracted from the first layers of VGG, later they combine feature loss, adversarial loss and exclusive loss together. The main difference between perceptual loss and ours is that we explicitly utilize high level semantic information to guide reflection removal during training.

3 Semantic Guided Reflection Removal

Figure 2: Previous low-level priors (e.g., relative smoothness (LB14 [16], AN17 [2]), ghost cues (SY15 [20])) based methods fail to remove reflections on the person, because reflection layer is not smoother than background, and there is no ghost cues in reflection layer. Besides, deep learning based methods (FY17 [10], ZN18 [31] and YG18 [28]) still cannot remove such reflections. With the semantic information, enforcing different parts of human belong to the same layer, thus, reflection is removed properly.

3.1 A Case Study on Prior Based Methods

Before we start to introduce our proposed method, we perform a study on exiting methods to see their limitations. An example case is illustrated in Fig. 2, where a human face is occluded by reflection interference. In this real case, all exiting methods based on low level features, smoothness prior [16], ghost effect [20] fails to remove the reflection. The worst result is from  [2] that seriously smooth the content image. Even recent CNN based methods [10, 31, 28] cannot handle this case well. These reveal the fact that neither these priors based nor the direct image-to-image training based methods are general enough in reflection separation problem. However, with this reflection, it is found the semantic information still can be reliably estimated as shown in last row. This is likely due to that semantic estimation gathers more global information that has the ability to recognize the human upper body as a whole. With this help from semantic information, our method (details are presented later) can generate the most clear reflection separation (more comparisons can be found in Sec. 5).

3.2 Study on Semantic Information with Reflection Interference

Figure 3: Relationship between mIoU (semantic map is generated through DeeplabV3+ [6]) and the reflectance intensity (

) based on 5000 test cases, with one visual example. CI is confidence interval. Note that semantic estimation is sensitive to the images with various reflection when

, but still robust to the observations with low reflectance intensity.

It is not guarantee that the semantic estimation is still robust with the presence of reflection. Following the above study, we further validate the robustness of semantic segmentation estimation against different intensity of reflectance. We randomly sample images, from Pascal VOC dataset [9], where ground truth of semantic label is provided in 21 categories. Based on it, we synthesize the image with reflection by linear blending two images using , where larger can simulate larger reflectance intensity. In total, we sample and generate sets of images with . Fig. 3 illustrates the relationship between semantic segmentation quality and the reflectance intensity. Note that the semantic estimation result is robust with , but the mIoU drop rapidly when . We observe that current semantic estimation doesn’t work well in the case of features are completely occluded, because the reflectance intensity is too strong and non-transmitted reflections with low transmittance occurred.

Figure 4: An overview of our proposed SRRN network. For the input image I, we first extract features via the Feature Extractor, then estimate the semantic map of the background through Semantic Module (the orange branch). Next, the semantic information is used to guide the Reconstruction Module (the green branch), Finally, B and R are predicted to perform the background and the reflection.

3.3 Multi-task Learning for Simultaneous Reflection Removal and Semantic Estimation

From the the two studies presented in Sec. 3.1 and Sec. 3.2, we can see the benefit of using semantic information in the reflection removal task and we confirm the semantic segmentation estimation is relatively robust too reflection interference with moderate intensity. Our method tries to solve these normal cases, leaving the rare extreme cases as our future direction.

Given an input image with reflection interference, we perform two tasks: (1) : extracting background semantic map from input , and 2) : Recovering background layer (also reflection layer ) from input along with the semantic information () obtained in the first task, denoted as:

(2)

Using multi-task learning, we train a convolution neural network (CNN) to achieve these two tasks together.

Network architecture. Our SRRN layout is illustrated in Fig. 4, containing a Feature Extraction module, a Layer Reconstruction module, and a Semantic Estimation module. The Feature Extraction module aims to extract features for subsequent tasks. We use ResNet-101 [12] as the Feature Extractor and add the Atrous Spatial Pyramid Pooling (ASPP) module in Deeplab [5] at the end to capture information at different scales. The Semantic Estimation module estimates the semantic information and this information is further used in reflection removal task. The last Layer Reconstruction module utilize both the extracted features and the semantic information to recover the background and reflection layers. Fully convolutional layers are used to perform these two tasks. We also use skip connections (green arrows) between Feature Extractor and Reconstruction Module to forward and fuse the features from lower levels.

Loss function design

. As we are jointly performing two tasks, the final loss functions are built on these two tasks together as:

(3)

where , and are the enforced losses on , , , respectively. is the sub-task’s observation noise parameter. Large scale values will decrease the contribution of , and vice versa. Detail definitions of , and will be provided in Sec. 4.2.

4 Implementation Detail

4.1 Multi-task Information Fusion Study

Figure 5: Different settings of the semantic guided reflection removal model. #1 Train semantic segmentation task solely. #2 These two tasks share the encoder and some other hidden parameters, followed by the different task-specific branches. #3 Our final model in which semantic estimation branch share guidance to reflection removal branch.

We design three different models to implement multi-task information fusion between semantic estimation and reflection removal.

Basic guidance. As shown in the left of Fig. 5, in the version of basic semantic guided reflection removal, the semantic map is estimated firstly, then its features are merged into the reflection removal branch.

Representation sharing without fusion. To simultaneous reflection removal and semantic estimation, we make these two tasks share representation and followed task-specific branch. In this way, semantic segmentation and reflection removal are trained simultaneously. Experiments show that results of this version are comparable to the state-of-the-art. (See Sec. 5 for detail).

Final pipeline. To further boost the performance of the proposed method, we combine these two structures before and design the final pipeline of our SRRN. As illustrated in the right of Fig. 5. The effectiveness of SRRN is demonstrated in Sec. 5.2.

4.2 Loss Detail

Our background loss is penalized by the difference between currently estimated background and the corresponding ground-truth :

(4)

Following  [24], we use SSIM (structural similarity index [32]) and is the L1 norm. denotes the canny operation [8], which is used to constrain the difference between and in gradient level . is the matrix’s Frobenius norm.

As the reflection in input contains less information than the background, the reflection loss is just L1 distance between the estimation and the ground truth :

(5)

For semantic, we use cross entropy as the loss:

(6)

where is the size of training batch and is the summation over classes, is the prediction and the ground truth label is .

To prevent over-fitting, we follow the settings in  [4] to add regularization on the parameters, the final loss is organized as follow:

(7)

where and is the L2 regularization, is the total number of trainable parameters in SRRN.

is the corresponding loss’s variance. We set

, , , , and to balance each loss item in experiments. We employ ResNet-101 with pre-trained parameters on ILSVRC-2012-CLS [7], parameters in this part are frozen. We train the ASPP module, Reconstruction Module and Semantic Module in our SRRN. Convolution weights are initialized as CZ18 [6]. Momentum optimizer [13] is employed with , where the cycle learning rate is initially set to 0.007 and decay in every 30000 iterations until 0.0001.

4.3 Training Data Generation

Figure 6: The variety of our generate images. We generate reflection contaminated observations with (1) with different intensities and (2) different contents to make the SRRN robust to real images.

For each batch in our training set, we require four data: (1) image with reflection , (2) Clear background , (3) reflection , and most importantly (4) semantic labels .

To build dataset to train the proposed model, we make use of two existing dataset, and also synthesize our own data. First, we use the dataset proposed by Zhang [31] which contains 110 real image sets with and provided. We then generate . We use one of the state-of-the-art semantic segmentation method DeeplabV3+ [6] to generate the semantic label from clear background, we further manually fix the error on the generated label and therefore obtains a high quality ground-truth. It contains 21 categories and is considered as ground-truth in our study.

Second we use the dataset proposed by Wan  [24], it contains 454 image sets, Each sets contains , , all ready provided. Then we generate semantic label in the same way as described before.

Noticing the existing dataset contains only 564 images in total. Therefore, we first generate the dataset with semantic label for reflection removal. We use clear images (provided by Pascal VOC [9]) as background and reflection, semantic ground truth is provided in Pascal VOC. Then we blend background image and reflection image together as input image. In total, we generated 5000 image sets. Fig. 6 illustrates our generated images. Table 1 is a brief summary of all the dataset.

Dataset Source Volume
Zhang  [31] 110 w/o GT w/o GT
Wan  [24] 454 GT w/o GT
Ours 5000 GT GT
Table 1: Brief summary of all the dataset. GT denotes dataset provided corresponding ground truth, w/o GT means we generate (or ) as ground truth (described in Sec. 4.3).

Our final dataset is the combination of all the three datasets. For each dataset, we randomly choose 80% as training set. Images are randomly cropped to to feed into the network.

5 Experiments

In this section, we first quantitatively and qualitatively evaluate our approach on single image reflection removal against previous methods  [16, 10, 31, 28, 24]

, then we demonstrate the state-of-art performance. For numerical analysis, we firstly employ peak-signal-to-noise-ratio (PSNR) and SSIM as evaluation metrics. Secondly, we analyse the effect of different parts of the SRRN. Next, we make additional experiments on how the intensity of reflectance affects the final performance of the semantic segmentation and the reflection removal task. Finally, we show additional applications of our model and make a discussion on failure cases.

5.1 Comparison with Previous Works

Input AN17 [2] FY17 [10] YG18 [28] ZN18 [31] Ours
Figure 7: Visual background layers comparison of our method with four previous methods, evaluated on the synthetic dataset . We highlight the regions that disturb the semantic integrality with rectangles in different colors.

Input
B(FY17[10]) B(ZN18[31]) R(ZN18[31]) (Ours) B(Ours) R(Ours)
Figure 8: Visual results comparisons among two previous works and our method on real images collected from the web. Reflection contaminated regions are highlighted in the bounding boxes for better visualization.

We make qualitative and quantitative comparisons with prior works on our dataset. Here we compare our method with the layer separation method by Li and Brown [16] and reflection suppression method by Arvanitopoulos et.al. [2] with their default parameters. We re-trained the method by Zhang  [31], fine-tune the CEILNet [10] based on its released pre-trained model on our training set. We use the pre-trained BDN [28] directly to evaluate on our validation set because the training code is not published yet. We sent the test images to the authors of CRRN [24] and they provided their results kindly. The quantitative and qualitative comparisons are presented below:

Background Reflection
Method SSIM PSNR SSIM PSNR Runtime
Input 0.801 19.02 N/A N/A N/A
LB14[16] 0.763 17.77 0.231 16.58 0.475
AN17[2] 0.786 19.28 0.285 15.74 99.3
FY17[10] 0.820 21.65 N/A N/A 0.095
WS18[24] 0.812 19.03 N/A N/A 0.619
YG18[28] 0.800 20.03 0.221 9.75 0.024
ZN18[31] 0.849 22.16 0.463 18.50 0.332
Ours 0.860 23.09 0.559 20.19 0.061
Table 2: Quantitative comparison and runtime results among our method and other 4 prior works on . In the column of runtime (second), results with * test on GPU and the others test on CPU.

Background Reflection
Method SSIM PSNR SSIM PSNR

Input 0.783 19.86 N/A N/A
FY17[10] 0.832 22.04 N/A N/A
WS18[24] 0.725 18.98 N/A N/A
YG18[28] 0.766 18.97 0.065 7.25
ZN18[31] 0.852 23.14 0.420 21.60
Ours 0.886 25.53 0.654 28.51

Input 0.869 22.15 N/A N/A
FY17[10] 0.873 21.87 N/A N/A
WS18[24] 0.820 18.87 N/A N/A
YG18[28] 0.858 21.71 0.256 8.92
ZN18[31] 0.881 22.39 0.266 17.84
Ours 0.898 22.76 0.479 21.07

Table 3: Quantitative comparison results between our results and other 3 CNNs based methods on dataset and . The numerical result shows that our method outperforms state-of-the-art results. We also provide a trivial baseline that takes the input image as the result background.

Quantitative Comparison: As shown in Table 2, we compare our SRRN with previous works. Note that we only show background results because methods [10, 24] only provide the background layer. Results are shown in Table 3.

Qualitative Comparison: We qualitatively compare the results of our proposed method against previous state-of-the-arts methods over synthetic and real-world images with reflection. We mainly present the results on synthetic data in Fig. 7, real data in Fig. 8.

Next, we test the running time of prior works and ours and present the result in the last column of Table 2. We test different approaches on , with a Intel ® i7-7700 CPU and a GPU card. The comprehensive comparison is illustrated in Fig. 9.

Figure 9: Similarity and speed comparison. Comparison with previous methods: SSIM/PSNR versus single image runtime, numbers are taken from Table 2. The YG18 [28] gets the fastest runtime and our method out-performed among previous methods in a comprehensive way.

5.2 Ablation Study

In this section, to verify the the effectiveness of semantic guidance, we re-train the network on these three ablations: without semantic information (w/o ), with out semantic guidance as shown middle in  5 (w/o fusion), and we add ground-truth semantic map to explore the relationship between different quality of semantic map and reflection removal. Furthermore, we conduct the ablation study of in Eqn. 4 (w/o ).

As shown in Fig. 10, we observe that with semantic guidance, layers are separated clearly where color or structure is ambiguous. We list the numerical results in Table 4 to show the effectiveness of our SRNN, result shows that SRRN could performance well without extremely high quality of semantic information.

Figure 10: Visual comparisons of training with and without semantic information. In the first row, we remove semantic task totally from SRNN, noticeable residuals remain on the dog. The second row shows artifacts of color degradation without semantic guidance. Our complete model in the third row is able to produce better and cleaner prediction.
Background Reflection
Method SSIM PSNR SSIM PSNR
Input 0.801 19.02 N/A N/A
w/o 0.820 20.98 0.317 15.46
w/o 0.833 21.91 0.451 18.53
w/o fusion 0.854 22.97 0.513 19.33
SRRN 0.860 23.09 0.559 20.19
with 0.867 23.85 0.571 19.71
Table 4: Controlled experiment of our method on our synthetic dataset . Numerical result presents the final SRNN’s performance is very close to the result of case in which ground truth is known. Different cases are illustrated in Fig. 5. is the ground-truth semantic map.

5.3 Exploration of Performance vs the Reflectance

In this section, we make experiments on the relationship between the SRRN performance and the reflectance intensities. We generate a series of image quadruples of , where , , we compare the final mIoU of DeeplabV3+ [6] and the final SSIM/PSNR of our baseline [31] on such images. As presented in Fig. 11, the proposed SRRN perform a higher score than the baseline in most cases with different values. Furthermore, the SRRN performs a more robust result to different intensities of reflectance, as illustrated in Fig. 12.

Figure 11: Layer separation results in images with different reflection intensity.
Figure 12: Robust semantic branch of SRRN. Compared to CZ18 [6], our proposed model achieve a robust performance to images with different intensities of reflection obstacles.

5.4 Extend Applications

We extend our method to another two image enhancement tasks: image dehazing and color enhancement, using our trained SRRN without any other fine-tune on image dehazing or color enhancement dataset. These two image tasks could be treated as image layers separation task, semantic segmentation module could provide guidance to color and structure priors reconstruction. For image dehazing, we aim at removing the haze layer which subjects to visibility degradation caused by particle-scattered light. For color enhancement, we aim to enhance different scene color from color shifting, contrast loss if saturation attenuation. The results are presented in Fig. 13.

Figure 13: Extension applications on image dehazing and color enhancement. For each column, from top to bottom: input, our predicted enhanced layer. Best viewed on screen with zoom.

5.5 Failure Cases and Discussion

Although the SRRN achieves the state-of-the-art on these three datasets, there are still challenging cases illustrated in Fig. 14. One of the challenging scenarios where reflection in the input is too strong, background is contaminated heavily that our model may not separate layers successfully. Note that reflections cannot be totally removed by these methods, but still, our result is superior to [31] (, the person in background is more distinguishable, the reflection layer is more clearer).



Input
B ([31]) B ([31]) (Ours) B (Ours) R (Ours)

Figure 14: Partial failure cases for our method due to the complex reflection or the strong reflection and weak transmitted light.

6 Conclusion

In this paper, We have presented an approach to use semantic clues for the task of single image reflection separation. Unlike prior works that use only low-level information, we employ the semantic information as guidance to extract the background layer and reflection layer. We design a deep encoder-decoder network for image feature extraction and use a semantic segmentation network in parallel. Then with the two kinds of information fused together, our separation network can correctly separate the background layer and reflection layer. We evaluate our method with other prior works extensively on three different datasets. The comparison result shows that our approach can outperform the existing methods both quantitatively and visually on all three datasets.

References

  • [1] A. Agrawal, R. Raskar, S. K. Nayar, and Y. Li. Removing photography artifacts using gradient projection and flash-exposure sampling. TOG, 24(3):828–835, 2005.
  • [2] N. Arvanitopoulos, R. Achanta, and S. Susstrunk. Single image reflection suppression. In CVPR, 2017.
  • [3] A. S. Baslamisli, H.-A. Le, and T. Gevers. Cnn based learning using reflection and retinex models for intrinsic image decomposition. In CVPR, 2018.
  • [4] L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. TPAMI, 40(4):834–848, 2016.
  • [5] L. C. Chen, G. Papandreou, F. Schroff, and H. Adam. Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587v3, 2017.
  • [6] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, 2018.
  • [7] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR, 2009.
  • [8] L. Ding and A. Goshtasby. On the canny edge detector. Pattern Recognition, 34(3):721–725, 2001.
  • [9] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.
  • [10] Q. Fan, J. Yang, G. Hua, B. Chen, and D. Wipf. A generic deep architecture for single image reflection removal and image smoothing. In ICCV, 2017.
  • [11] X. Guo, X. Cao, and Y. Ma. Robust separation of reflection from multiple images. In CVPR, 2014.
  • [12] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.
  • [13] S. Ilya, M. James, D. George, and H. Geoffrey. On the importance of initialization and momentum in deep learning. In ICML, 2013.
  • [14] N. Kong, Y. W. Tai, and J. S. Shin. A physically-based approach to reflection separation: from physical modeling to constrained optimization. TPAMI, 36(2):209–221, 2014.
  • [15] Y. Li and M. S. Brown. Exploiting reflection change for automatic reflection removal. In ICCV, 2013.
  • [16] Y. Li and M. S. Brown. Single image layer separation using relative smoothness. In CVPR, 2014.
  • [17] A. Nandoriya, M. Elgharib, C. Kim, M. Hefeeda, and W. Matusik. Video reflection removal through spatio-temporal optimization. In ICCV, 2017.
  • [18] T. Sandhan and Y. C. Jin. Anti-glare: Tightly constrained optimization for eyeglass reflection removal. In CVPR, 2017.
  • [19] Y. Y. Schechner, N. Kiryati, and R. Basri. Separation of transparent layers using focus. In ICCV, 1998.
  • [20] Y. Shih, D. Krishnan, F. Durand, and W. T. Freeman. Reflection removal using ghosting cues. In CVPR, 2015.
  • [21] C. Simon and I. K. Park. Reflection removal for in-vehicle black box videos. In CVPR, 2015.
  • [22] S. N. Sinha, J. Kopf, M. Goesele, D. Scharstein, and R. Szeliski. Image-based rendering for scenes with reflections. TOG, 31(4):1–10, 2012.
  • [23] R. Wan, B. Shi, L. Y. Duan, A. H. Tan, and A. C. Kot. Benchmarking single-image reflection removal algorithms. In IEEE ICCV, 2017.
  • [24] R. Wan, B. Shi, L.-Y. Duan, A.-H. Tan, and A. C. Kot. Crrn: Multi-scale guided concurrent reflection removal network. In CVPR, 2018.
  • [25] P. Wieschollek, O. Gallo, J. Gu, and J. Kautz. Separating reflection and transmission images in the wild. In ECCV, September 2018.
  • [26] P. Wieschollek, O. Gallo, J. Gu, and J. Kautz. Separating reflection and transmission images in the wild. In ECCV, 2018.
  • [27] T. Xue, M. Rubinstein, C. Liu, and W. T. Freeman. A computational approach for obstruction-free photography. TOG, 34(4):1–11, 2015.
  • [28] J. Yang, D. Gong, L. Liu, and Q. Shi. Seeing deeply and bidirectionally: A deep learning approach for single image reflection removal. In ECCV, 2018.
  • [29] J. Yang, H. Li, Y. Dai, and R. T. Tan. Robust optical flow estimation of double-layer images under transparency or reflection. In CVPR, 2016.
  • [30] J.-S. Yun and J.-Y. Sim. Reflection removal for large-scale 3d point clouds. In CVPR, 2018.
  • [31] X. Zhang, R. Ng, and Q. Chen. Single image reflection separation with perceptual losses. In CVPR, 2018.
  • [32] W. Zhou, B. Alan Conrad, S. Hamid Rahim, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process, 13(4):600–612, 2004.