Dim the Lights! – Low-Rank Prior Temporal Data for Specular-Free Video Recovery

12/17/2019 ∙ by Samar M. Alsaleh, et al. ∙ University of Cambridge George Washington University UnivClermontAuvergne 7

The appearance of an object is significantly affected by the illumination conditions in the environment. This is more evident with strong reflective objects as they suffer from more dominant specular reflections, causing information loss and discontinuity in the image domain. In this paper, we present a novel framework for specular-free video recovery with special emphasis on dealing with complex motions coming from objects or camera. Our solution is a twostep approach that allows for both detection and restoration of the damaged regions on video data. We first propose a spatially adaptive detection term that searches for the damage areas. We then introduce a variational solution for specular-free video recovery that allows exploiting spatio-temporal correlations by representing prior data in a low-rank form. We demonstrate that our solution prevents major drawbacks of existing approaches while improving the performance in both detection accuracy and inpainting quality. Finally, we show that our approach can be applied to other problems such as object removal.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 10

page 13

page 15

page 16

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In the science of vision, light is what enables our biological vision system to see our surroundings and identify different objects, regardless of the many unpredictable changes in the realistic environment including lighting conditions. This adaptation, however, does not extend to computer vision systems as they still struggle to robustly process illumination changes as humans do and counter for their side-effects. This is more evidenced with objects that have strong reflectivity as light changes result in more dominant specular reflections which cause information loss and discontinuity in the image domain 

Tan et al. (2004); Yang et al. (2015).

Images and videos captured in the real world are expected to have specular reflections due to the inhomogeneous nature of many materials such as plastic, metals and human skin Tan et al. (2004). Based on the dichromatic reflection model Shafer (1985), those materials, and many others of natural objects, tend to have two reflection components: diffuse and specular. These reflection components are formed by different physical light-surface interactions. Diffuse reflection is caused by the subsurface scattering of light, at many angles, and is a direct representation of the shape of an object. Specular reflection, on the other hand, only appears at some locations on an object’s surface and exhibits less scattering causing it to have strong intensity Umeyama and Godin (2004). This reflection component depends heavily on the local orientation and degree of roughness of the surface and, therefore, captures important scene information, such as surface shape and light source characteristics Fleming et al. (2004); Yang et al. (2015).

At the practical level, the behavior and presence of specular reflections often cause significant inaccuracies or even failure of common vision algorithms such as segmentation Deng et al. (1999); Yang et al. (2010), visual recognition  Chen et al. (2006); Li et al. (2017), stereo matching Heo et al. (2011), medical image analysis Kong et al. (2016); Aviles et al. (2017), tracking Zhang et al. (2006), and scene reconstruction Furukawa and Ponce (2010). For all these reasons, restoration of specular reflection has become crucial to the practicality, accuracy and robustness of computer vision systems. The problem of dealing with specular artifacts can be addressed using either single image or a video sequence. Although both perspectives have been explored by the community, the question of - how to exploit, in an efficient manner, the spatio-temporal correlations for complex sequences while keeping low computational complexity? still remains open and therefore it is worthy of exploration.

Contributions. We present a novel framework for specular-free video recovery that is equally effective for both static and moving cameras and with the presence of object motion. Our two-step solution is recast as an optimization problem and allows for fully-automatic detection and restoration of specular regions on video data. Our main contributions are:

  • We propose a computationally tractable solution based on two main components:

    • A low computational yet accurate detection approach based on a set of adaptive conditions.

    • A variational framework that exploits spatio-temporal correlations in low-rank representation.

  • We validate exhaustively our approach, using different complex scenes. We show that our approach achieves a better approximation of the damage area than the body of literature techniques.

2 Related Work

The problem of specular-free video recovery can be broken down into two sub-problems: (i) damaged regions detection and (ii) missing information recovery. In this section, we review the body of literature on both sub-problems.

Specularity Detection. Generally speaking, the reflection removal problem can be seen as the problem of separating two linearly superposed components into two intrinsic images - a diffuse and a reflection image. Different works in the body of literature have been reported to solve this problem in which solutions can rely on using either multiple or single images. The former relies on the use of images of the same scene taken under different lighting Lin et al. (2002), from different viewpoints Lin et al. (2002) or utilizing an additional polarizing filter Nayar et al. (1997). Nevertheless, the necessity of having multiple images with specific varying conditions, or of having specific hardware assistance, limits their applicability to general cases. The later overcomes this by utilizing a single image and relying on neighborhood Tan and Ikeuchi (2005); Mallick et al. (2005, 2006); Shen et al. (2008); Akashi and Okatani (2014) or color space Ortiz and Torres (2006); Kim et al. (2013); Nguyen et al. (2014); Yang et al. (2015); Yamamoto et al. (2017) analysis and propagation. Those approaches however cannot handle large highlight areas and might lead to false specularity detection.

Multiple-Image/Video Restoration. Whether it is an image or a video, inpainting and restoration of damage areas is an ill-posed inverse problem that has no well-defined unique solution Guillemot and Le Meur (2014)

. Although the vast amount of redundant information that exists in multiple-images and video sequences is advantageous to the inpainting process, it presents more challenges related to computational complexity and temporal coherency. While many literature reviews attempt to classify the solutions of video inpainting, they can be broadly viewed as variations of the exemplar-based concept originally presented in 

Criminisi et al. (2004) in which missing areas are filled by propagating information from known regions. The variations come in terms of the exemplar’s shape, regular (such as patch) vs irregular (such as segment), and the exemplar’s best match search, local Patwardhan et al. (2007); Strobel et al. (2014); Daisy et al. (2015); Huang et al. (2016) vs. global Wexler et al. (2007); Granados et al. (2012a); Newson et al. (2014); Ebdelli et al. (2015); Le et al. (2017). Although these works achieve promising results, they suffer from unpleasing artifacts and are based on a set of minimization procedures, sometimes up to five, which results in high computational demands.

3 Specular-Free Video Recovery

In this section, we describe the two key parts of our proposed approach illustrated in Figure 1.

Figure 1: Overview of our approach. (From left to right) prior temporal data is aligned and then used to construct a search space that, together with the result from our detection process, are used to find the optimal information for the damaged regions.

3.1 Detecting Specular Reflections

Consider an image sequence , with frames where , , an open bounded Lipschitz subset of of integer width , and height . Our first stage is to detect the damaged regions and relies on the following assumptions, which appear to be realistic for complex scenes with specular reflections:

(A.1) The reflection artifacts only appear on some locations of the image.
(A.2) The reflections are stronger than or as intense as the target scene exhibiting less scattering.

We thus start by computing the amount of color dispersion at the current -frame , where and denote the -th pixel value and the mean value of the observations on the current frame respectively, and the total number of pixels in the image. We then use this information as a guideline to label pixels into two classes: specular region (), i.e., damaged regions, and non-specular region () such that . To this end, for a given position , the label assignation is as follows:

1# DETECTING SPECULAR AND NON-SPECULAR REGIONS
2Set   # compute optimal 
3# COMPUTING SPECULAR REGIONS
4if  
5    
6# COMPUTING NON-SPECULAR REGIONS
7else
8    

where is the mean of the RGB channel values computed as , denotes the minimum value between and , and

is optimally determined by maximizing the intra-class variance following the philosophy of 

Jenks (1967); Otsu (1975). It is worth noticing that for every frame

, we generate a specularity characteristic function

such that and . For notation convenience, we will refer to for this binary map in the remainder of the text.

3.2 Specular-Free Video Recovery

Recently, low-rank data representation has attracted great attention in many areas including computer vision and machine learning. This is mainly because it allows keeping the relevant information in a low-dimensional space

Haeffele et al. (2014) and enables recovery of a low-rank matrix from a set of sampled entries Candes and Recht (2012)

. Our motivation for promoting low-rank data representation is two-fold: firstly to significantly decrease the computation time, and secondly to achieve better performance while reducing artifacts and increasing robustness to outliers. Our approach relies on this further assumption:

(A.3) For a given image sequence , we consider that there are undamaged frames.
In this work, to obtain a good approximation for , we project the computed nearest neighbour patch with respect to the damage area.

Figure 2: Prior temporal data is arranged in a Casorati Matrix in order to exploit the correlation between the data and find the low rank component .

Let be a set of past frames with respect to the current one where . We start by constructing the following Casorati matrix , illustrated in Figure 2, in the form:

(1)

for each channel color, where is the scalar value at a given pixel location in frame for the respective channel color. The idea is to exploit the strong correlation between the columns of this matrix to create a low-rank representation of the sequence (see Figure 2

). To do so, we rely on the well-established Singular-Value Decomposition (SVD,

Strang (2005)) , with , orthogonal matrices, and a diagonal matrix composed of the singular values with . Now, let . We thus aim at solving the following problem:

(2)

with the Frobenius matrix norm invariant to rotation and to rank. Therefore, instead of computing the SVD of a large and dense matrix, we aim at retrieving the set of

dominant singular values with the associated right and left singular vectors in order to keep the most relevant information in a subspace smaller than the original one while eliminating the subspace where noise lies. This makes our method more robust to noise and outliers and reduces artifacts. Finally, using the definition of the Casoratori matrix, one is able to recover the low-rank image sequence:

(3)

with , , . The following computations will be done using this low-rank sequence.

Our next step is to reconstruct a common search space to recover the lost information in the region using . This space is constructed by registering all the images in , that is to say by finding optimal diffeomorphic transformations characterized by the deformation fields such that the images in are aligned. This task can be cast as the following optimization problem in a variational framework, for each :

(4)

where and are positive weighting parameters.

is a discrepancy measure motivated by robust statistics and the Huber estimator to deal with outliers and increase accuracy and robustness:

with a likelihood type estimator, which is computed as:

(5)

This particular choice of the Tukey estimator is driven by its hard rejection of outliers Stewart (1999).

Whilst is a regularization term for the deformation field to practically solve the original highly ill-posed registration problem. The following Tikhonov penalizer is considered to stabilize the energy functional and to smooth the deformation field.

The last component ensures physically and mechanically meaningful deformations and deals with topology preservation. This is of great importance since it guarantees the non-destruction and non-creation of structures during the registration process and controls the degree of expansions and contractions allowed. This translates as a positivity constraint on the Jacobian determinant denoted . In this work, we use the following topology-preserving regularizer as in Aviles et al. (2017) , with:

(6)

is a margin of acceptance for values close to restricting then the range of expansion and contraction allowed. balances the two terms. The first component heavily penalizes the negative values of and thus prevents violations of topology preservation and overlapping while the second one penalizes large expansions to achieve more realistic deformations.

However, this penalization does not guarantee that remains positive everywhere. Since the deformations must remain injective to prohibit self-penetration and distortions, we propose to add a regridding method Christensen et al. (1996). This technique has the advantage of being easy to implement and being performed simultaneously with the resolution without slowing the computations. The idea is to reinitialize the registration process as soon as becomes too small by taking as the new moving image the previous computed deformed image. The final deformation is the composition of all the deformations. The algorithm is summarized next:

1#pseudocode for computing Regridding Step
2Initialization: , .
3for :
4    if :
5        
6         
7         set .
8         
9.

Eventually, the deformation model plays a crucial role in the accuracy and the speed of the method, and defines the representation of . In order to reduce the computational cost, we consider free-form deformations Rueckert et al. (1999); Sederberg and Parry (1986)

, in which a rectangular lattice, initialized as uniformly spaced points, is superimposed on the image pixel grid and is deformed while the deformation on the finer pixel grid is recovered using a summation of tensor of cubic B-splines for their local support and smoothness. We therefore parameterize

as follows:

(7)

with , are the basis spline functions, and are the control points constituting the lattice. With this formulation, explicit expressions of the derivatives and of can be easily derived and the problem amounts to find the control points to reconstruct the deformation field .

To achieve small computational cost with accurate results, we use the Levenberg-Marquardt optimization method to solve this problem in a multi-level setting.

The final stage of our approach consists of finding the best approximation for the damaged regions from . This process is achieved by iteratively minimizing patch distances in the form:

(8)

where is a distance measure and is the shift-map between the damage and search space pixels.

4 Experimental Results

This section describes the set of experiments that we conducted to evaluate our solution.

Figure 3: Sample frames of the different datasets we used to evaluate our proposed solution.

4.1 Data Description

We evaluate our approach using ten datasets coming from different sources:

Vision from Graphics Data. We generated five datasets for our experiments, in which four are from synthetic-to-physical datasets, and one is rendered. Following common practice for graphics data (e.g., Netz and Osadchy (2013)), for the first group we used 3D object models from the AIM@SHAPE repository 111http://visionair.ge.imati.cnr.it that we texturized, 3D printed and painted with red glossy paint. We then created the video sequences by positioning the objects on a moving station with a non-fixed source of light, and with different types of lighting. For the remaining dataset, we created video sequences directly from the 3D model with rotational motion of the object and change of lighting. Our main motivation to follow this procedure is that we can generate the specular-free versions for quantitative analysis purposes.

Medical Imaging Data. The medical domain is a very challenging scenario, in which the problem of removing specular reflections from the video sequences is of great interest. It is indeed a common preliminary stage in medical image processing followed by subsequent tasks such as tracking, stereo reconstruction and segmentation whose robustness and accuracy heavily depend on having a consistent surface appearance Stoyanov et al. (2010). In this setting, one can find large and very small damage areas, moving camera and non-fixed light conditions (significant changes on illumination conditions over time). Therefore, to test the robustness under these conditions, we use three in-vivo datasets that come from endoscopic video sequences 222http://hamlyn.doc.ic.ac.uk/vision/.

Entertainment Data. To demonstrate the effectiveness and generalization ability of our approach, we evaluate our method on two datasets from the entertainment industry. We obtained two video sequences from popular movies with objects-removal task in mind. For both datasets, we created masks for the objects to be removed from the scene and used that to evaluate the performance of the inpainting part of our approach on standard entertainment data.

Further details on the datasets can be seen in Figure 3. All results and comparisons were run under the same condition using an Intel(R) Core i7- 6700 at 3.40GHz-64GB RAM, and a Nvidia GeForce GT 610.

4.2 Evaluation Scheme.

To validate our approach, we design a two-part evaluation scheme, where the protocol for each part is as follows:

E1. Specular Reflection Detection. To demonstrate the advantages of our algorithmic approach, we firstly offer visual comparison of our method against: TAN05Tan and Ikeuchi (2005), SH09Shen and Cai (2009), AK15Akashi and Okatani (2014), and YAM17Yamamoto et al. (2017) in which the philosophy is closely related to ours. To further support the visual results, we performed a quantitative analysis which is based on the Dice’s coefficient along with the CPU time for two datasets with ground truth annotations.

E2. Video Recovery Approach. To evaluate the global performance of our approach, we compared it against one of the state-of-the-art methods in the area NW14 Newson et al. (2014). Although the approaches of Wexler et al. (2004); Patwardhan et al. (2005); Granados et al. (2012b) are indeed interesting, they rely on a specific modeling hypothesis such as static background, color distances and homographic transformations that somehow make comparisons unfair. However, NW14 Newson et al. (2014) demonstrates a more general approach. With this motivation in mind, in this work we offer a detailed comparison against NW14 Newson et al. (2014). Finally, we use a case study where our approach can be applied - the task of object removal.

4.3 Results and Discussion

In this section, we describe our findings following the scheme described in subsection 4.2.

Figure 4: Labeling results of our approach compared against four state-of-the art methods on two of the datsets.
Evaluaton Measure (average)
Dice’s coefficient
Accuracy Precision
Error Rate %
Torso 0.7230 0.9978 0.6966 2.2
Kitten 0.8214 0.9983 0.8304 1.17
Duck 0.8741 0.998 0.835 1.15
Buddah 0.7039 0.9969 0.6642 3.1
Heart 1 0.8179 0.994 0.9883 2.6
Heart 2 0.9175 0.9985 0.9527 1.5
Kidney 0.8771 0.9981 0.9535 1.9
Table 1: Quantitative evaluation of our detection process.
Dice’s Coefficient CPU Time
Heart Dragon Heart Dragon
Auto
TAN05Tan and Ikeuchi (2005) 0.38 0.26 9.94 42.01
SH09 Shen and Cai (2009) 0.1 0.1 3.4e 3.2e
AK14 Akashi and Okatani (2014) 0.1 0.2 34.54 32.87
YAM17 Yamamoto et al. (2017) 0.68 0.44 2.6e 1.7e
Ours 0.95 0.88 3.4e 3.2e
Table 2: Performance comparison against state-of-the-art methods. Auto denotes if parameters selection is automatic or manual.
CPU Time [s/frame]
Minima
[Min, Max] # Iterations
Torso
Kitten
Duck
Buddah

Full-Rank

Heart1
Torso
Kitten
Duck
Buddah

Low-Rank

Heart1
Table 3: Performance evaluation of our global solution.

E1. Detection Approach. We start by evaluating our detection approach using four performance metrics which can be seen in Table 1. These numerical results were obtained using both the output of our detection solution and the ground truth of the corresponding dataset. Our quantitative analysis starts by computing the Dice’s coefficient () to measure how similar the detection result is to the ground truth. We can see that is greater than for all datasets and with an overall average of . In terms of accuracy, we reported values greater than for all datasets while in terms of precision our approach ranges from to with overall average of . As it can be seen in the last column of Table 1, the error rate of our detection approach ranges from to with a global average error of .

We then evaluate our approach by comparing its performance against TAN05Tan and Ikeuchi (2005), SH09Shen and Cai (2009), AK15Akashi and Okatani (2014), and YAM17Yamamoto et al. (2017). It is important to note that all the compared methods had built in fixed parameters that we had to adjust manually to achieve the reported output. Not doing this manual adjustment would result in a massive over-segmentation of the specular region. We ran our detection approach and Table 2 shows the performance comparison on two different datasets. We chose to report Dice’s coefficient as it is the best metric to indicate how close the results are from the ground truth.

In terms of computational time, unlike our proposed method, all the compared approaches have at least one iterative operation to achieve the detection. As such, they reported high computational time while we achieved the lowest computational complexity with a minimum of 3.2e3 second/frame.

This performance improvement, both in accuracy and speed, is due to the fact that our method is designed to have a simple yet effective way to isolate specular regions. This is combined with the fact that our approach is able to automatically adapt and adjust to the specifications of each frame in process taking into account the changes in illumination in the scene.

This is further supported by Figure 4 where we show the output samples of the ground truth and all compared approaches on the same two datasets. Visual inspection of the results shows that outputs generated by our approach agrees with the theory, in which a spatially adaptive detection is more robust than fixed-parameter detection approaches. The zoom in views show that our approach yielded results that are more consistent with the ground truth in which specular reflection were precisely detected. Other methods failed to obtain reliable results across all frames as they either over/under segment the specular region.

Figure 5: Specular-free results of our approach compared against NW14 Newson et al. (2014).

E2. Restoration Approach. We compared our approach against Newson’s algorithm Newson et al. (2014) that is, to our best knowledge, one of the most robust approaches. We begin by a visual evaluation of our method against that of NW14 Newson et al. (2014) using three of the datasets. To this purpose, in Figure 5, we display interesting frames containing different cases including different sizes of damage areas, light (e.g., white and incandescent), and rigid and non-rigid objects. For example, the Torso dataset shows large areas that need to be recovered while the heart dataset represents deformable objects. Another case can be seen with the dragon dataset where there is a different kind of light, incandescent light.

Although the compared approaches gave visually good results, in many cases, the method of NW14 Newson et al. (2014) tends to over-restore areas on the boundaries of target regions producing noticeable artifacts, where inside and outside of the boundaries of the damaged region are mistakenly assigned black/grey. By contrast, our approach allows for a visually more pleasing video restoration with better preservation of details and texture. Overall, the zoom in views in the figure show that our approach consistently results in a smoother inpainting of the damage area that blends nicely with the surrounding region of the frame.

Figure 6: Our approach can be used for other applications such as object removal. We display two output samples using the entertainment datasets. From left to right. A set of temporal past frames are used to capture the target object (see white highlighted objects) to be removed and to gain temporal consistency. Last column displays the result of our algorithmic approach, in which we successfully removed both Dori and Rabbit.

In terms of computational time, Table 3 shows evaluation of the performance of our approach. We first want to point out the repercussion of promoting low-rank on our solution in terms of computational time. We can see that promoting low rank decreased the computational time to an average of 6 times less than the time given if using full-rank data. Moreover, the last column of the table also shows that using low-rank allows our solution to reach a better minimum with less number of iterations. On the average, full-rank demanded 23 iterations per frame while low-rank only needed 18 iterations per frame.

We ran our approach and that of NW14 Newson et al. (2014) on the same computer and under similar conditions and our approach required an average 0.69 sec/frame while NW14 Newson et al. (2014) needed an average of 33.155 sec/frame to perform the restoration process.

Further Applications: Finding Dori. To show the generalization capabilities of our approach, we show how it can be used for applications such as object removal. Visual results are displayed on Figure 6, where on closer inspection one can see that the objects in mind, Dori and Rabbit, are successfully removed offering pleasing visual results.

5 Conclusion

In this work, we addressed the challenging problem of specular-free video recovery. We proposed a new framework, in which two contributions are introduced. The first is a spatially adaptive detection approach that searches for specular regions allowing for a better bounding of the damage areas. The second is a variational based solution for restoring efficiently the damage areas that exploits spatio-temporal correlations by representing prior data in a low-rank manner. We showed that this combination allows for an improvement with respect to the state-of-the-art in terms of reducing over-restoration yielding to visually more pleasing results with less artifacts. Finally, we show that our work can be applied to other tasks such as object removal.

References

  • Y. Akashi and T. Okatani (2014) Separation of reflection components by sparse non-negative matrix factorization. In Asian Conference on Computer Vision, pp. 611–625. Cited by: §2, §4.2, §4.3, Table 2.
  • A. I. Aviles, S. M. Alsaleh, J. K. Hahn, and A. Casals (2017) Towards retrieving force feedback in robotic-assisted surgery: a supervised neuro-recurrent-vision approach. IEEE Transactions on Haptics 10 (3), pp. 431–443. Cited by: §1.
  • A. I. Aviles, T. Widlak, A. Casals, M. M. Nillesen, and H. Ammari (2017) Robust cardiac motion estimation using ultrafast ultrasound data: a low-rank topology-preserving approach. Physics in Medicine and Biology 62 (12), pp. 4831–4851. External Links: Document, Link Cited by: §3.2.
  • E. Candes and B. Recht (2012) Exact matrix completion via convex optimization. Foundations of Computational Mathematics 55 (6), pp. 111–119. Cited by: §3.2.
  • T. Chen, W. Yin, X. S. Zhou, D. Comaniciu, and T. S. Huang (2006)

    Total variation models for variable lighting face recognition

    .
    IEEE Transactions on Pattern Analysis and Machine Intelligence 28 (9), pp. 1519–1524. Cited by: §1.
  • G. E. Christensen, R. D. Rabbitt, M. I. Miller, et al. (1996) Deformable templates using large deformation kinematics. IEEE transactions on image processing 5 (10), pp. 1435–1447. Cited by: §3.2.
  • A. Criminisi, P. Pérez, and K. Toyama (2004)

    Region filling and object removal by exemplar-based image inpainting

    .
    IEEE Transactions on image processing 13 (9), pp. 1200–1212. Cited by: §2.
  • M. Daisy, P. Buyssens, D. Tschumperlé, and O. Lézoray (2015) Exemplar-based video completion with geometry-guided space-time patch blending. In SIGGRAPH Asia Technical Briefs, pp. 3. Cited by: §2.
  • Y. Deng, B. S. Manjunath, and H. Shin (1999) Color image segmentation. In

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    ,
    Vol. 2, pp. 446–451. Cited by: §1.
  • M. Ebdelli, O. Le Meur, and C. Guillemot (2015) Video inpainting with short-term windows: application to object removal and error concealment. IEEE Transactions on Image Processing 24 (10), pp. 3034–3047. Cited by: §2.
  • R. W. Fleming, A. Torralba, and E. H. Adelson (2004) Specular reflections and the perception of shape. Journal of Vision 4 (9), pp. 10–10. Cited by: §1.
  • Y. Furukawa and J. Ponce (2010) Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (8), pp. 1362–1376. Cited by: §1.
  • M. Granados, K. I. Kim, J. Tompkin, J. Kautz, and C. Theobalt (2012a) Background inpainting for videos with dynamic objects and a free-moving camera. In European Conference on Computer Vision, pp. 682–695. Cited by: §2.
  • M. Granados, J. Tompkin, K. Kim, O. Grau, J. Kautz, and C. Theobalt (2012b) How not to be seen—object removal from videos of crowded scenes. In Computer Graphics Forum, Vol. 31, pp. 219–228. Cited by: §4.2.
  • C. Guillemot and O. Le Meur (2014) Image inpainting: overview and recent advances. IEEE signal processing magazine 31 (1), pp. 127–144. Cited by: §2.
  • B. Haeffele, E. Young, and R. Vidal (2014) Structured low-rank matrix factorization: optimality, algorithm, and applications to image processing. In Proceedings of the 31st International Conference on Machine Learning (ICML), pp. 2007–2015. Cited by: §3.2.
  • Y. S. Heo, K. M. Lee, and S. U. Lee (2011) Robust stereo matching using adaptive normalized cross-correlation. IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (4), pp. 807–822. Cited by: §1.
  • J. Huang, S. B. Kang, N. Ahuja, and J. Kopf (2016) Temporally coherent completion of dynamic video. ACM Transactions on Graphics 35 (6), pp. 196. Cited by: §2.
  • G. F. Jenks (1967) The data model concept in statistical mapping. International yearbook of cartography, pp. 186–190. Cited by: §3.1.
  • H. Kim, H. Jin, S. Hadap, and I. Kweon (2013) Specular reflection separation using dark channel prior. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1460–1467. Cited by: §2.
  • H. Kong, Z. Lai, X. Wang, and F. Liu (2016) Breast cancer discriminant feature analysis for diagnosis via jointly sparse learning. Neurocomputing 177, pp. 198–205. Cited by: §1.
  • T. Le, A. Almansa, Y. Gousseau, and S. Masnou (2017) MOTION-consistent video inpainting. pp. . Cited by: §2.
  • C. Li, S. Lin, K. Zhou, and K. Ikeuchi (2017) Specular highlight removal in facial images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3107–3116. Cited by: §1.
  • S. Lin, Y. Li, S. B. Kang, X. Tong, and H. Shum (2002) Diffuse-specular separation and depth recovery from image sequences. In European conference on computer vision, pp. 210–224. Cited by: §2.
  • S. P. Mallick, T. Zickler, P. N. Belhumeur, and D. J. Kriegman (2006) Specularity removal in images and videos: a pde approach. In European Conference on Computer Vision, pp. 550–563. Cited by: §2.
  • S. P. Mallick, T. E. Zickler, D. J. Kriegman, and P. N. Belhumeur (2005) Beyond lambert: reconstructing specular surfaces using color. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 2, pp. 619–626. Cited by: §2.
  • S. K. Nayar, X. Fang, and T. Boult (1997) Separation of reflection components using color and polarization. International Journal of Computer Vision 21 (3), pp. 163–186. Cited by: §2.
  • A. Netz and M. Osadchy (2013) Recognition using specular highlights. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (3), pp. 639–652. Cited by: §4.1.
  • A. Newson, A. Almansa, M. Fradet, Y. Gousseau, and P. Pérez (2014) Video inpainting of complex scenes. SIAM Journal on Imaging Sciences 7 (4), pp. 1993–2019. Cited by: §2, Figure 5, §4.2, §4.3, §4.3, §4.3.
  • T. Nguyen, Q. Vo, S. Kim, H. Yang, and G. Lee (2014) A novel and effective method for specular detection and removal by tensor voting. In IEEE International Conference on Image Processing (ICIP), pp. 1061–1065. Cited by: §2.
  • F. Ortiz and F. Torres (2006) Automatic detection and elimination of specular reflectance in color images by means of MS diagram and vector connected filters. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 36 (5), pp. 681–687. Cited by: §2.
  • N. Otsu (1975) A threshold selection method from gray-level histograms. Automatica 11 (285-296), pp. 23–27. Cited by: §3.1.
  • K. A. Patwardhan, G. Sapiro, and M. Bertalmio (2005) Video inpainting of occluding and occluded objects. In IEEE International Conference on Image Processing, Vol. 2, pp. II–69. Cited by: §4.2.
  • K. A. Patwardhan, G. Sapiro, and M. Bertalmío (2007) Video inpainting under constrained camera motion. IEEE Transactions on Image Processing 16 (2), pp. 545–553. Cited by: §2.
  • D. Rueckert, L. I. Sonoda, C. Hayes, D. L. G. Hill, M. O. Leach, and D. J. Hawkes (1999) Nonrigid registration using free-form deformations: application to breast mr images. IEEE Transactions on Medical Imaging 18 (8), pp. 712–721. External Links: Document, ISSN 0278-0062 Cited by: §3.2.
  • T. W. Sederberg and S. R. Parry (1986) Free-form deformation of solid geometric models. SIGGRAPH Comput. Graph. 20 (4), pp. 151–160. External Links: ISSN 0097-8930, Link, Document Cited by: §3.2.
  • S. A. Shafer (1985) Using color to separate reflection components. Color Research & Application 10 (4), pp. 210–218. Cited by: §1.
  • H. Shen and Q. Cai (2009) Simple and efficient method for specularity removal in an image. Applied optics 48 (14), pp. 2711–2719. Cited by: §4.2, §4.3, Table 2.
  • H. Shen, H. Zhang, S. Shao, and J. H. Xin (2008) Chromaticity-based separation of reflection components in a single image. Pattern Recognition 41 (8), pp. 2461–2469. Cited by: §2.
  • C. V. Stewart (1999) Robust parameter estimation in computer vision. SIAM review 41 (3), pp. 513–537. Cited by: §3.2.
  • D. Stoyanov, M. V. Scarzanella, P. Pratt, and G. Yang (2010) Real-time stereo reconstruction in robotically assisted minimally invasive surgery. pp. 275–282. Cited by: §4.1.
  • G. Strang (2005) Linear algebra and its applications.: thomson brooks. Cole, Belmont, CA, USA. Cited by: §3.2.
  • M. Strobel, J. Diebold, and D. Cremers (2014) Flow and color inpainting for video completion. In German Conference on Pattern Recognition, pp. 293–304. Cited by: §2.
  • R. T. Tan and K. Ikeuchi (2005) Separating reflection components of textured surfaces using a single image. IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (2), pp. 178–193. Cited by: §2, §4.2, §4.3, Table 2.
  • R. T. Tan, K. Nishino, and K. Ikeuchi (2004) Separating reflection components based on chromaticity and noise analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 26 (10), pp. 1373–1379. Cited by: §1, §1.
  • S. Umeyama and G. Godin (2004) Separation of diffuse and specular components of surface reflection by use of polarization and statistical analysis of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 26 (5), pp. 639–647. Cited by: §1.
  • Y. Wexler, E. Shechtman, and M. Irani (2004) Space-time video completion. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 1, pp. I–I. Cited by: §4.2.
  • Y. Wexler, E. Shechtman, and M. Irani (2007) Space-time completion of video. IEEE Transactions on pattern analysis and machine intelligence 29 (3). Cited by: §2.
  • T. Yamamoto, T. Kitajima, and R. Kawauchi (2017) Efficient improvement method for separation of reflection components based on an energy function. In 2017 IEEE International Conference on Image Processing (ICIP), pp. 4222–4226. Cited by: §2, §4.2, §4.3, Table 2.
  • Q. Yang, J. Tang, and N. Ahuja (2015) Efficient and robust specular highlight removal. IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (6), pp. 1304–1311. Cited by: §1, §1, §2.
  • Q. Yang, S. Wang, and N. Ahuja (2010) Real-time specular highlight removal using bilateral filtering. In European conference on computer vision, pp. 87–100. Cited by: §1.
  • J. Zhang, L. McMillan, and J. Yu (2006) Robust tracking and stereo matching under variable illumination. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 1, pp. 871–878. Cited by: §1.