Fast Single Image Reflection Suppression via Convex Optimization

03/10/2019 ∙ by Yang Yang, et al. ∙ The University of Iowa Tencent The Hong Kong University of Science and Technology 4

Removing undesired reflections from images taken through the glass is of great importance in computer vision. It serves as a means to enhance the image quality for aesthetic purposes as well as to preprocess images in machine learning and pattern recognition applications. We propose a convex model to suppress the reflection from a single input image. Our model implies a partial differential equation with gradient thresholding, which is solved efficiently using Discrete Cosine Transform. Extensive experiments on synthetic and real-world images demonstrate that our approach achieves desirable reflection suppression results and dramatically reduces the execution time compared to the state of the art.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 5

page 6

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The work of W. Xu was supported in part by Simons Foundation 318608 and in part by NSF DMS-1418737.

Images taken through glass usually contain unpleasant reflections. It is highly desirable if such reflections can be removed. In particular, with the advent of the popularity of portable digital devices such as smartphones and tablets, a lot of such images are taken in everyday life. A fast-response and user-friendly image reflection suppression technology is of great practical significance so that such images can be processed on portable devices in seconds with the best dereflected results produced in real-time according to a user’s visual perception.

Given an input reflection-contaminated image , traditional approaches that attempt to remove the reflection focus on separating the image into the transmission layer (the true background) and the reflection layer [3], i.e., the following assumption is made

(1)

where and are unknowns. This problem is highly ill-posed since the number of unknowns is twice the number of conditions. Multiple ways of separation are possible. Different priors and assumptions have been introduced to narrow down the range of valid solutions, despite specific limitations therein.

0 Original Image
0 Dereflected Image
Figure 1: (): A real-world image taken through the window on a train. Notice the reflection of the seat and the lights in the train. (): The result after the reflection suppression by our proposed method. Image size: . Execution time: 1.15s. https://github.com/yyhz76/reflectSuppress

Instead of separating the image into two layers, suppressing the reflection in a single input image, as proposed in Arvanitopoulos et al. [2], is more practical. In most cases, people are more interested in the transmission layer of an image. Also, perfect layer separation of a single image is in general difficult. The separated layers using existing approaches more or less contain misclassified information, especially when the reflection is sharp and strong, which might yield dark dereflected outputs. This is caused by the removal of a large portion of the energy which concentrates in the reflection layer (See Sec. 3).

Most image reflection removal approaches so far emphasize the performance in the aspects of the quality of the dereflection. In addition, they can only handle relatively small-sized images and are often computationally inefficient. With the rapid development of portable device technologies, megapixel smartphone images are very common nowadays. Therefore, the efficiency of such methods also needs to be improved to handle large images. We propose an image reflection suppression approach that is highly efficient, which is able to process large smartphone images in seconds, yet can achieve competitive dereflection quality compared to state-of-the-art approaches. Fig. 1 is an example of our approach applying on a smartphone image.

1.1 Related Work

Prior research in image reflection removal can be categorized by the number of input images. One branch relies on multiple input images that are closely related to each other. The other branch only has one image as input.

1.1.1 Multiple Image Reflection Removal

The multiple images used for reflection removal are usually related to each other in certain aspects. For example, Schechner et al. [16], Farid and Adelson[5], Kong et al. [9] separate transmission and reflection layers by taking images of objects at different angles through polarizers. Agrawal et al. [1] use images taken with and without flash to reduce reflection. Approaches based on different characteristics of fields in transmission and reflection layers are also proposed[6, 11, 7, 23, 18, 8]. Xue et al. [23] utilize the difference of motion fields to separate layers. Li and Brown[11] use SIFT-flow to align multiple images and separate layers according to the variation of gradient fields across images. Similarly, Han and Sim[8] extend this idea and compute gradient reliability at each pixel and recover the transmission gradients by solving a low-rank matrix completion problem. Reflection removal using multiple images generally achieves better performance than that using a single image since information across images can be exploited to improve layer separation results. However, these approaches usually requires special settings such as images taken from certain angles and locations, or special devices such as polarizers and flashes, which significantly limit their practicality.

1.1.2 Single Image Reflection Removal

On the other hand, several approaches have also been attempted to remove reflection from a single input image. Although a single input image is more likely to be encountered in everyday life, it is in fact more challenging than multiple image cases due to the lack of additional inter-image information. Existing approaches rely on different prior assumptions on transmission and reflection layers. Levin and Weiss[10] employ the gradient sparsity prior with user assisted labels to distinguish between layers. Li and Brown[12] exploit the relative smoothness of different layers to separate them using a probabilistic framework. Shih et al. [17] explore the removal of reflection from double-pane glass with ghosting artifacts. Wan et al. [19]

utilize multi-scale depth of field to classify edges into different layers.

Instead of separating layers, Arvanitopoulos et al. [2]

propose to suppress the reflection in a single input image using Laplacian-based data fidelity term and gradient sparsity prior, which achieves desirable quality of dereflection but is not quite efficient due to the fact that their model is non-convex and a large number of iterations is needed to achieve desirable result. Other latest methods include deep learning strategies (Fan

et al. [4]), and nonlocal similar patch search (Wan et al. [20]). However, either extra network training time or external image datasets are required.

1.2 Our Contribution

In this paper, we propose an approach for single image reflection suppression that achieves desirable performance in terms of both efficiency and dereflection quality. Our contribution is summarized as follows, which contribute to the high efficiency of our approach:

  • [label = •]

  • Our proposed model is convex. The solution is guaranteed to be the global optimal of the model.

  • The optimal solution is in closed form and doesn’t rely on iterative algorithms. It is obtained through solving a partial differential equation, which can be done efficiently using Discrete Cosine Transform.

  • Our method doesn’t require any external dataset or training time as in the aforementioned neural network approaches.

2 Our Proposed Model

2.1 Notations

Throughout the paper, we use bold letters such as , , to denote matrices. Plain letters with subscripts denotes the element of at the intersection of the -th row and the -th column. Elementwise multiplication between matrices is denoted by and convolution is denoted by .

1 Transmission layer
a
1 Reflection layer
a
1 Synthetic blend,
1 [2], = 0.05.
Execution time: 382s
1 Proposed, .
Execution time: 0.63s
Figure 2: Comparison of the proposed model with [2] on a 2D synthetic toy example. The proposed model removes the reflection layer content (i.e., the letter ‘R’) more thoroughly. It also retains more transmission layer texture content. The execution time (averaged over 20 repeated runs) of the proposed model is about 600 times faster than [2]. Image size . Texture images from [21].

2.2 Model Formulation

Our proposed model relies on the assumption that the camera focuses on the transmission layer (i.e., the objects behind the glass) so that sharp edges appear mostly in this layer. On the other hand, the reflection layer (i.e., the reflection off the surface of the glass) is less in focus so that edges in this layer are mostly weaker than those in the transmission layer. This is often true in real world scenarios since the distance from the camera to the object in focus is different from that to the glass. We formally express our assumption using the following equation, as mentioned in [2]:

(2)

where is the input camera image, is the transmission layer and is the reflection layer. is a parameter that measures the weight between the two layers. is a Gaussian blurring kernel.

Our proposed model is inspired from [2], where the original model minimizes the data fidelity term which is the difference on the edges between the output and input images (See Eq.(6) in [2]). The edge information of an image is obtained by applying the Laplacian operator . In addition, an prior of the image gradient is added to the objective function. It encourages smoothing of the image while maintaining the continuity of large structures. The Laplacian-based data fidelity term better enforces consistency in structures of fine details in the transmission layer compared to a more straightforward data fidelity term111The data fidelity term combined with the prior is used in image smoothing. A detailed discussion can be found in [22]. . The model in [2] removes more gradients as the regularization parameter increases, which is the consequence of using the prior. Essentially, it sets a threshold on the gradients of the input image and removes the gradients whose magnitudes are larger than the given threshold. The gradient-thresholding step appears as a closed-form solution in each iteration of their algorithm (See Eq.(12) in [2]). Similarly, we fuse this idea into our model formulation, but in a different way. Rather than solving the minimization problem and threshold the gradient from the solution, we adopt the idea from [14, 13] and put the gradient-thresholding step directly into the objective function. We hence propose the following model:

(3)

where

(4)
(5)

The data fidelity term imposes the gradient-thresholding step on the input image before taking the divergence of . The gradients whose magnitudes are less than will become zero. Since the data fidelity term only contains a second order term of the variable , the second term is added to guarantee the uniqueness of the solution (see Sec. 2.3 for details), where is taken to be a very small value so as not to affect the performance of the data fidelity term.

Fig.  2 is a toy example demonstrating the effect of our proposed model on synthetic images. We created the transmission layer (Fig. 1) consisting of a letter ‘T’ and background wooden grain texture. The reflection layer (Fig. 1) consists of a letter ‘R’ and the background sand beach texture. These two layers are then blended (Fig. 1) according to Eq.(2) with blending weight

and the standard deviation of the Gaussian blurring kernel

is set to . We compare the result of [2] (Fig. 1) with our proposed model (Fig. 1). As can be seen, our proposed model outperforms [2] both in the quality of dereflection and the execution time. Our proposed method removes the letter ‘R’ in the reflection layer while largely preserves the wooden grains in the transmission layer. In contrast, the approach in [2] doesn’t remove the letter ‘R’ as thoroughly as ours and a lot more wooden grains are lost. Further increasing the parameter in [2] will remove more of the letter ‘R’ but at the same time even more wooden grains will be lost as well. In addition, the execution time of our proposed model is much faster than the approach in [2].

2.3 Solving the Model

Unlike the model proposed in [2] which is non-convex due to the presence of the term, our proposed model (3) is convex with respect to the target variable . Therefore, the optimal solution can be obtained by solving a system of equations, which guarantees the optimality of the solution and contributes to the fast execution time compared to iterative methods that are common among existing approaches (See Sec. 3 for details).

The gradient of the objective function (3) is given by

(6)

Let the gradient be zero, we obtain the following equation

(7)

This equation is a variation of 2D Poisson’s equation. We associate it with Neumann boundary condition since we assume a mirror extension at the boundary of the image, which implies zero gradient on the boundary. This boundary value problem can hence be solved via Discrete Cosine Transform (DCT). Let denote the two dimensional DCT and its inverse. We introduce the following result

Theorem 2.1.

The discretization of 2D Poisson’s equation

(8)

with Neumann boundary condition on an grid is solved by

(9)

where . . .

See [15] for a proof of this conclusion. Essentially it says that after taking DCT, the left side of Eq.(8) becomes elementwise multiplication, i.e., so the above conclusion follows. It is worth mentioning that the solution (9) has a singularity at . To guarantee a unique solution, extra condition (for example, the value at ) must be specified beforehand.

We apply Theorem 2.1 to solve Eq.(7). Notice that after taking DCT on both sides, the equation becomes

(10)

where denotes the right hand side of Eq.(7) and is a matrix of all 1’s. Therefore, the solution to Eq.(7) is

(11)

where is the same as in Theorem 2.1. The uniqueness of the solution is automatically guaranteed because of the presence of in the denominator, which is the consequence of adding the term in Eq.(3). Our algorithm is summarized as follows:

Input:

  return  .

Output:

Algorithm 1 Image Reflection Suppression via Gradient Thresholding and Solving PDE

3 Experiments

All experiments are implemented using MATLAB 2017a on a PC with 8-core Intel i7-8550U 1.80GHz CPU and 16 GB memory. We compare our method with state-of-the-art approaches Arvanitopoulos et al. [2], Li and Brown[12] and Wan et al. [19]. These approaches are implemented using the original MATLAB source code provided from the authors. These approaches are selected for comparison since only a single image is required as the input. Other single image reflection removal approaches mentioned in Sec. 1.1.2 either require external image datasets[4, 20] or additional conditions (user labels[10], double-pane glass and ghosting cues[17]). We use PSNR and SSIM (adopted in [2]) together with execution time as metrics to evaluate the performance of the selected approaches. The execution times reported throughout this paper are all averaged over 20 repeated runs.

The parameter in (3) represents the level of the gradient thresholding. The gradients whose magnitudes are less than will be smoothed out. Fig. 3 shows the effect of increasing . The larger is, the more reflection components and transmission layer details are removed. Similar to the regularization parameter in [2]’s approach, the value of that produces the best visual result depends on the strength of the reflection in each input image since the best visual result is a balance between the preservation of transmission details and the suppression of reflection. Typically, values within the interval yield desirable results. As will be demonstrated below, finding the best parameter for each image is almost instantaneous.

2 Input
2
2
2
Figure 3: The effect of increasing the threshold parameter in the proposed reflection suppression model. Increasing the parameter removes more reflection as well as some details from the transmission layer. Best viewed on screen.

3.1 Synthetic Images

We blend two pairs of images of size pixels in Fig.  4 according to the assumption (2), where and

represent transmission and reflection layers, respectively. The variance of the Gaussian blurring kernel

is fixed to and two blending weights are used. For parameters in other models, we use the default values as reported in their papers ( in [12], in [19], in [2]). In our proposed model, we fix and .

3
3
3
3
Figure 4: Images used as transmission layers () and reflection layers () for the synthetic experiments. is blended with . is blended with .

The images before and after the reflection suppression are demonstrated in Fig. 5. The method of Li and Brown[12] tends to produce dark images with false colors. This is partially due to the fact that the energy from the reflection layer accounts for a large portion in our synthetic images. Removing the reflection ends up with significant energy loss and hence produces dark outputs. The method of Wan et al. [19] removes most of the reflection but oversmoothes transmission layer details (For example, top edge of Lena’s hat in the mirror, bottom edge of the green pepper, especially in cases (See Fig. 4 and Fig. 4)). Arvanitopoulos et al.’s approach[2] produces outputs that are the closest to our proposed method. However, as shown in Table 1 and 3, our outputs achieve better performance in terms of PSNR, SSIM and execution time in all cases. Particularly, notice that the execution time of our method outperforms all the others by a significant margin.

3.2 Real-World Images

The size of the real-world images used here are pixels. We captured these images directly using smartphone. Default parameter settings are used in the method of Li and Brown[12]. As for the method of Arvanitopoulos et al. [2], we tune the regularization parameter for each input image to get the best visual result since the outcome is much more sensitive to parameter tuning compared to Li and Brown’s approach. In our proposed model (3), the parameter is tuned for each input image for the same reason. However, parameter tuning in our model is almost instantaneous, which will be demonstrated below. The parameter is empirically fixed to .

Table LABEL:time_real demonstrates the advantage of the proposed model in terms of the execution time. It is much faster compared to other state-of-the-art algorithms.222At such picture size, the approach in Wan et al. [19] reports out-of-memory error, indicating that it is not suitable for large-sized images. Typically it only takes less than 1.5 seconds to output the dereflected images. Moreover, the dereflection quality also outperforms other methods as demonstrated in Fig. 6 (Notice the difference in the zoomed-in boxes). Our proposed method not only suppresses the reflection satisfactorily but also maintains as much transmission details as possible. Being fast and effective, our proposed method has the potential of being implemented directly on portable devices such as smartphones and tablets. The high efficiency makes it possible for a mobile device user to adjust the parameter easily (for example, via moving a slider on the phone screen) to get an immediate response and select the best dereflected image according to the user’s visual perception (See Fig. 7).

However, our model also has limitation when the model assumption (2) is violated. If the reflection layer contains sharp edges, the corresponding gradients at the edge pixels will be large. Therefore, increasing the threshold parameter won’t removed these reflection edges before losing some gentle transmission layer details. Failure cases are shown in Fig.  8, where none of the methods in comparison completely removes the reflection. That being said, our proposed method still retains more details even if edges in the transmission layer are not sharp enough, for example, in dark images like Fig. 7.

4
4 [12]
4 [19]
4 [2]
4 Proposed
4
4 [12]
4 [19]
4 [2]
4 Proposed
4
4 [12]
4 [19]
4 [2]
4 Proposed
4
4 [12]
4 [19]
4 [2]
4 Proposed
Figure 5: Comparison of reflection suppression on synthetic images. Column 1: Blended images. Column 2: Li and Brown[12]’s results. Column 3: Wan et al. [19]’s results. Column 4: Arvanitopoulos[2]’s results. Column 5: our proposed results. Best viewed on screen.
Image Li and Brown [12] Wan et al. [19] Arvanitopoulos et al. [2] Proposed
PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM
Fig. 4 16.08 0.549 19.81 0.874 20.87 0.896 20.99 0.903
Fig. 4 13.46 0.344 16.65 0.700 16.80 0.716 16.93 0.736
Fig. 4 16.64 0.762 17.10 0.840 19.42 0.896 19.44 0.897
Fig. 4 13.55 0.574 14.54 0.751 15.10 0.787 15.14 0.789
Table 1: Comparison of PSNR and SSIM of reflection suppression methods on synthetic images in Fig. 5. Image size: pixels

4 Conclusion and Future Work

We proposed an efficient approach for single image reflection suppression. It is formulated as a convex problem, which is solved via gradient thresholding and solving a variation of 2D Poisson’s equation using DCT. We validated the effectiveness and efficiency of our approach through experiments on synthetic and real-world images. It is able to output desirable dereflected smartphone images in seconds. However, single image reflection suppression remains a challenging problem as there are still cases where current approaches fail to completely remove the reflection. Future work includes designing effective and efficient algorithms to handle sharp and strong reflections for large images.

5 Input 1
 
5 [12]
 
5 [2],
5 Proposed,
5 Input 2
 
5 [12]
 
5 [2],
5 Proposed,
5 Input 3
 
5 [12]
 
5 [2],
5 Proposed,
5 Input 4
 
5 [12]
 
5 [2],
5 Proposed,
Figure 6: Comparison of reflection suppression methods on real-world images taken at various scenes. The method of Li and Brown [12] yields images that appear darker than the original input. Some reflection edges are not completely removed (e.g. upper left corner in Fig. 5 and Fig. 5). The method of Arvanitopoulos et al. [2] achieves better color reproduction but suffers from some loss of details in the transmission layer (e.g. the top corner of the building in Fig. 6, the vegetation in Fig. 5, the disk on the glass in Fig. 5). Our Proposed method retains the most transmission layer details with superior reflection layer suppression among these methods. Best viewed on screen.

Image [12] [19] [2] Proposed Fig. 4 12.06 49.31 185.32 0.19 Fig. 4 11.68 49.24 185.82 0.18 Fig. 4   7.25 48.74 185.51 0.19 Fig. 4   7.69 47.86 185.83 0.19

Table 2: Execution times (sec) of reflection suppression methods on synthetic images in Fig. 5. Image size: pixels
6
6
6
Figure 7: A slider demo simulated in MATLAB. As we move the slider to the right, the value increases and the reflection is gradually suppressed. The response time is less than 1.5 seconds for smartphone images of size . Best viewed on screen.
7 Input 1
 
7 [12]
 
7 [2],
7 Proposed,
7 Input 2
 
7 [12]
 
7 [2],
7 Proposed,
Figure 8: Failure cases of our proposed method. Failure is likely to occur when edges in the reflection layer are sharp and strong. This limitation is also observed in the other two methods. In Row 1, the reflection of the fluorescent lamps outside the room is almost as sharp as the real ones inside, which makes it hard to distinguish between them. In Row 2, although our proposed method fails to completely remove the reflection (the inside of a bus), it retains more transmission details than [2] as shown in the zoomed-in regions. The method in [12] again produces dark outputs. Best viewed on screen.

References

  • [1] A. Agrawal, R. Raskar, S. K. Nayar, and Y. Li. Removing photography artifacts using gradient projection and flash-exposure sampling. ACM Transactions on Graphics (TOG), 24(3):828–835, 2005.
  • [2] N. Arvanitopoulos Darginis, R. Achanta, and S. Süsstrunk. Single image reflection suppression. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), number EPFL-CONF-227363, 2017.
  • [3] H. Barrow and J. Tenenbaum. Recovering intrinsic scene characteristics. Comput. Vis. Syst, 2, 1978.
  • [4] Q. Fan, J. Yang, G. Hua, B. Chen, and D. Wipf. A generic deep architecture for single image reflection removal and image smoothing. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017.
  • [5] H. Farid and E. H. Adelson.

    Separating reflections and lighting using independent components analysis.

    In Computer Vision and Pattern Recognition, 1999. IEEE Computer Society Conference on., volume 1, pages 262–267. IEEE, 1999.
  • [6] K. Gai, Z. Shi, and C. Zhang. Blindly separating mixtures of multiple layers with spatial shifts. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8. IEEE, 2008.
  • [7] X. Guo, X. Cao, and Y. Ma. Robust separation of reflection from multiple images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2187–2194, 2014.
  • [8] B.-J. Han and J.-Y. Sim. Reflection removal using low-rank matrix completion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  • [9] N. Kong, Y.-W. Tai, and J. S. Shin. A physically-based approach to reflection separation: from physical modeling to constrained optimization. IEEE transactions on pattern analysis and machine intelligence, 36(2):209–221, 2014.
  • [10] A. Levin and Y. Weiss. User assisted separation of reflections from a single image using a sparsity prior. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(9), 2007.
  • [11] Y. Li and M. S. Brown. Exploiting reflection change for automatic reflection removal. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 2432–2439. IEEE, 2013.
  • [12] Y. Li and M. S. Brown. Single image layer separation using relative smoothness. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2752–2759, 2014.
  • [13] W. Ma, J. M. Morel, S. Osher, and A. Chien. An -based variational model for retinex theory and its applications to medical images. In CVPR, 2011.
  • [14] W. Ma and S. Osher. A tv bregman iterative model of retinex theory. Inverse Problem and Imaging, 6(4):697–708, 2012.
  • [15] W. H. Press. Numerical recipes 3rd edition: The art of scientific computing. Cambridge university press, 2007.
  • [16] Y. Y. Schechner, J. Shamir, and N. Kiryati. Polarization-based decorrelation of transparent layers: The inclination angle of an invisible surface. In Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on, volume 2, pages 814–819. IEEE, 1999.
  • [17] Y. Shih, D. Krishnan, F. Durand, and W. T. Freeman. Reflection removal using ghosting cues. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3193–3201, 2015.
  • [18] C. Sun, S. Liu, T. Yang, B. Zeng, Z. Wang, and G. Liu. Automatic reflection removal using gradient intensity and motion cues. In Proceedings of the 2016 ACM on Multimedia Conference, pages 466–470. ACM, 2016.
  • [19] R. Wan, B. Shi, T. A. Hwee, and A. C. Kot. Depth of field guided reflection removal. In Image Processing (ICIP), 2016 IEEE International Conference on, pages 21–25. IEEE, 2016.
  • [20] R. Wan, B. Shi, A.-H. Tan, and A. C. Kot. Sparsity based reflection removal using external patch search. In Multimedia and Expo (ICME), 2017 IEEE International Conference on, pages 1500–1505. IEEE, 2017.
  • [21] A. G. Weber. The usc-sipi image database version 5. USC-SIPI Report, 315:1–24, 1997.
  • [22] L. Xu, C. Lu, Y. Xu, and J. Jia. Image smoothing via gradient minimization. In ACM Transactions on Graphics (TOG), volume 30, page 174. ACM, 2011.
  • [23] T. Xue, M. Rubinstein, C. Liu, and W. T. Freeman. A computational approach for obstruction-free photography. ACM Transactions on Graphics (TOG), 34(4):79, 2015.