I Introduction
Image scaling plays an important role in image processing. Especially when coarsetofine schemes are used. A major drawback of coarsetofine algorithms is errorpropagation. For this reason, various interpolation methods were developed, from simple and popular algorithms of bilinear and bicubic interpolation to the wide class of polynomial and sincinterpolations [28].
As a rule, scaling deteriorates the quality of the restored image due to various interpolation artefacts related to the loss of highfrequency information in the restored image. Difficulties of the sparse data interpolation problem increase severely when the data sparsity is irregular or possesses considerable gaps to be filled. There are two main approaches to solve this specific variation of the sparse data interpolation task. One approach is to combine a Delaunay triangulation [12] and a barycentric Interpolation [5]
. Another method is the so called NadarayaWatson estimation
[23], where a desired value in any uncertain point is expressed by a sum of matches weighted by their proximity with a Gaussian kernel for a distance between the interpolated value point and its known neighbor value points. All these methods either fail to recover highfrequency information or in the image space lead to edge smoothing artefacts.Fortunately, if the sparse data has additional information correlated with the restored function we can solve the sparse data interpolation more accurately with the class of edge preserving filters. The large class of edgepreserving smoothing filters [3, 21, 22] has received considerable attention in image processing, computer graphics, and computer vision. The filters have been applied to a wide variety of applications such as edit propagation [2, 8], denoising [4, 7], stereo matching [14], optical flow [25], video abstraction and demosaicing [18, 24]. In general, the classic bilateral kernel is a function of the Euclidean distance in the joint color and spatial coordinates multidimensional space. The computational complexity of the bruteforce implementation for this kind of kernels is highly demanding. Several fast algorithms were proposed in recent years [1, 11, 16, 17], where the approximation achieves high quality. The computational complexity for the fast realization usually depends on filter parameters that make this kind of filters less flexible and still demanding for several sets of parameters [1, 11].
The topology of the standard bilateral filter with Gaussian kernel does not fit well to the stereo matching and optical flow estimation problems. This problem arises because disparity or motion vector values of two near pixels can be considerably different but they might have similar color values in the cover image. In this case the bilateral filter would blur both output values. This ability to affect over color edges (e. g. the bilateral distance between two white pixels separated by a thin black line is considerably smaller than the geodesic distance) is useful for color based segmentation, but can produce estimation artefacts for stereo and motion estimation. Therefore edge preserving filters based on the geodesic distance measure of a cover stereo or motion image are more relevant to the above mentioned problems and also faster than the classic bilateral filters. We have to note that only a small part of image edges corresponds to motion boundaries, and this issue can cause inaccuracy for several nonconfident regions, in particular for pixels which are isolated from pixels with known values. Nevertheless, the edgepreserving interpolation is still better than the interpolation that does not use a cover image color information.
This paper is partly inspired by the paper of Revaud et al. [19], where a sparse data interpolation method, called EpicFlow, was proposed. The main idea is to use geodesic distance to estimate the influence of known pixel values in a neighborhood of a recovered pixel value. The interpolation method was proposed and applied to optical flow estimation. The EpicFlow sparse data interpolation approach is used in several stateoftheart optical flow estimation algorithms (e.g. the DCflow [26] method).
However, this interpolation technique provides an heuristic problem solution that consists of three heuristic algorithmic steps: main edge extraction with ”structured edge detector” (SED)
[9]; Voronoi cells segmentation; and geodesic distance field approximation using Dijkstra’s algorithm [20]. Each step includes its own set of parameters that are weakly connected to each other. Despite the fact that the full pipeline of this interpolation is faster than direct geodesic distance implementation [13] it is still computably demanding. A theoretical basis for edgepreserving filtering with geodesic distance has been proposed in [10] and further extended in [27]. The latest approach [15] improves the filter approximation in the sense of the geodesic distance based filtering accuracy and collateral artefacts suppression.Consequently, we propose a simple and fast geodesic based interpolation using a bilateral filter with geodesic distance kernel, based on the method [15], where a fast and accurate approximation to the ideal filter with a geodesic kernel is proposed. We have to note that the filter in [15] was initially proposed for the denoising problem and we adopt the filter for the sparse data interpolation problem. Finally, the proposed approach faster, more general and with clearer theoretical motivation than the baseline algorithm [19]. We applied our interpolation method to the sparse optical flow data obtained by the DCflow [26] method and compared with the interpolation result of the interpolation in [19] on the same sparse data set. Formally we included our interpolation method in the pipeline of the DCflow [26] method and compared it with the result of the same DCflow pipeline that included the EpicFlow interpolation (EFI) instead ours. The comparison shows that our algorithm makes the fast version of the DCflow [26] pipeline more accurate than the EFI technique while being considerably faster.
Ii Problem definition
The interpolation problem can be defined via a more general solver for confidence mapping, because this strict definition makes the proposed algorithm clearly motivated. In this case one aims to minimize the global mean squared error between a known input data and a desired output solution as follows:
(1) 
where corresponds to pixels or vertices and set to edges of an image graph . A variable is defined on all the known values of the input sparse data with nonzero confidence, and is the desired output function that has to be recovered. Usually confidence weights belong to the interval . In the case of the sparse data interpolation weights are exactly equaling 1 (known values) and 0 elsewhere, or formally:
The weights in Eq. (1) define the influence of a known input value on the desired output value . For conventional interpolation, where the only known information is the values , this influence usually depends only on the Euclidean or barycentric distance between pixels and in the image plane. However in the case of the optical flow or stereo matching we can use the cover image, whose values usually strongly correlate with the optical flow or the disparity map values. It is reasonable to define the bilateral affinity space were the distance depends also on the cover image values and .
Consequently, we define the sparse data interpolation problem via the general functional minimization (Eq. (1)). We extend the sparse data set to the full image graph domain by combination with confidence factors in the form:
It follows then the solution in Eq. (1) can be obtained in the closedform as a fraction of two standard nonnormalized bilateral filters:
(2) 
Finally, the formula in Eq. (2) represented as the bilateral filter is the desired closedform solution for the general sparse data interpolation problem Eq. (1).
Application of the classical filters has several drawbacks, and we propose an interpolation method that is based on the geodesic distance affinity space.
The geodesic distance is a generalization of the straight line distance in the Euclidean space to the distance measure in a curved space. In the case of images, the geodesic distance is defined as the shortest path on the surface between two points. Here the surface is formed by the image value function defined on the 2D spatial domain.
For the geodesic distance based filter weights are usually chosen (e. g. [10]) as follows
(3) 
which makes the filter recursive. Thus the weight defines a geodesic distance based affinity between any two image pixels, and the variable is the geodesic distance between image pixels which for an image can be defined on the discrete grid graph as
(4) 
where is a path between two graph vertices and is the spatial interval related to discretization. Note that the parameters and in Eqs. (3,4) approximately correspond to the parameters of the classic bilateral filter with the Gaussian kernel as follows
(5) 
From Eq. (4) one can derive the following useful relation
(6) 
In [15] it is shown that both filter sums in Eq. (2) can be calculated recursively using specific calculation trees (in the paper called the optimal tree). The full calculation tree, which corresponds to this fast algorithm [15], is composed by four quadrantdomains (or branches of the tree). One of four branches is illustrated inside our algorithm scheme in Fig. 1, and represents the sum of weights for the first quadrant, when gives the numerator sum in Eq. (2) also for this quadrant. The pipeline of our algorithm is shown in Fig. 1.
Iii Experiments
average 
alley1 
alley2 
ambush2 
ambush4 
ambush5 
ambush6 
ambush7 
bamboo1 
bamboo2 
bandage1 
bandage2 
cave2 
cave4 
market2 
market5 
market6 
mountain1 
shaman2 
shaman3 
sleeping1 
sleeping2 
temple2 
temple3 

Method  
DCF + Ours (nocc)  6.91  1.35  1.47  28.0  28.3  9.74  15.4  1.68  1.75  2.89  1.46  1.47  12.7  9.87  2.28  14.1  5.68  1.59  0.82  1.18  0.92  1.09  5.33  9.77 
DCF + EFI  7.25  1.38  1.53  29.3  29.1  9.42  18.7  1.46  1.80  2.92  1.59  1.55  12.5  9.97  2.25  16.0  4.82  1.50  0.95  1.48  1.22  1.16  5.36  10.6 
DCF + Ours (all)  9.86  1.60  1.68  37.0  32.9  16.5  19.5  2.75  1.95  4.66  2.09  1.64  18.2  12.4  3.14  27.4  11.3  2.25  0.87  1.27  0.92  1.09  8.77  17.0 
DCF + EFI  9.95  1.64  1.75  38.0  32.0  15.5  21.6  2.40  2.01  4.31  2.29  1.78  18.7  12.6  3.21  29.3  8.08  1.95  0.99  1.59  1.22  1.16  9.84  17.1 
Time in seconds  
Ours  0.32  0.32  0.32  0.32  0.32  0.32  0.32  0.32  0.32  0.32  0.32  0.32  0.32  0.32  0.32  0.32  0.32  0.32  0.32  0.32  0.32  0.32  0.32  0.32 
EFI  2.71  3.53  3.49  1.20  1.49  2.15  1.56  2.37  3.39  3.33  3.22  3.34  2.43  2.70  2.27  1.97  2.51  1.86  3.47  3.46  3.50  3.71  2.73  1.64 
The experiments have been designed to demonstrate the potential of the proposed approach. They are divided into two parts: First, several experiments are performed to give a qualitative and quantitative comparison between the proposed edge preserved interpolation and a traditional interpolation method. Also we compare our algorithm with the bilateral based interpolation technique. Second, we analyze the advantage of our approach over the EFI method [19] for the optical flow problem. In our experiments, we used the MPI Sintel data set [6] and the fast version of the DCflow [26] method pipeline.
For the first, second and the third experiments, we use two example stereo images and corresponding ground truth disparity maps: the Motorcycle and the Art with the disparity scopes 1140 and 1128 respectively. Featureless image regions are a problem for stereo matching and optical flow computation. and featurebased sampling could lead to more accurate results. In contrast, the regular downsampling is more simple and popular. For our first three model experiments we propose three different downsampling methods: a pure featurebased sampling; a tradeoff downsampling algorithm that includes both featurebased and regular downsampling techniques; and a pure regulargrid downsampling.
The first sparse data set is obtained by downsampling disparity values of a disparity map mostly near visual edges of the image. In other words, we label a pixel as known if the norm of the image gradient is and as unknown elsewhere. The threshold is chosen in such a way that the density of known pixels is equal to a chosen constant. The second data set for our model experiment is obtained by downsampling pixels values of a disparity map almost uniformly. We partition a full original image into equal squared patches and label a pixel as known if the gradient value of the corresponding image pixel reaches maximum values inside the current patch, and as an unknown elsewhere. The sparse data for both experiments obtained using Motorcycle and Art and the relevant ground truth disparity maps are illustrated in Fig. 2 3(a) and (e) with the density equal to 4% for (a) and 1% for (e). Then the sparse data is interpolated by the proposed edge preserved algorithm, using the corresponding stereo images for the affinity space calculation. We also interpolate the same sparse data with the NadarayaWatson estimation [23] and the classic bilateral filter. The quantitative evaluation is represented under each interpolated disparity map, where interpolation errors relative to ground truth is measured by the rootmeansquared error (RMSE) metric. The density of the known pixels in the first data set is distributed quite nonuniformly due to the image edge distribution. Thus the standard methods (for example NadarayaWatson estimation) loses almost all edge information after interpolation: Fig. 2(b) and (f). In contrast, our method and interpolation with the bilateral kernel keep the meaningful edges as it is illustrated in Fig. 2(d),(i), (c), (g) respectively. However, our method is more accurate than the bilateral kernel interpolation. Also our method does not possess artefacts due to color communication over image edges as in the case of the bilateral interpolation. For the second experiment the qualitative evaluation shows that the NadarayaWatson estimation [23] in this case demonstrates better visual performance and does not loose essential information as we observed the first experiment shown in Fig. 3(b) and (f) is still considerably worse than the edge preserved approach. Our method and interpolation with the bilateral kernel keep the base edges as illustrated in Fig. 3(d),(i), (c), (g) respectively.
The third experiment uses known pixels of the Motorcycle disparity map that spread regularly (with a fixed sampling step) and evaluates accuracy of the interpolation with respect to the density of known pixels. The results are illustrated in Fig. 4, where the domain variable is equal to the sampling step. One can see that the quality of interpolation is almost linear with respect to the inverserootdensity for all three considered interpolation methods: NadarayaWatson [23], with the bilateral kernel (BK) and the proposed.
Sparse data [23] Bilateral kernel Ours 
(a) (b) 6.25 (c) 3.77 (d) 3.31 
(e) (f) 8.39 (g) 4.47 (i) 3.78 
Sparse data [23] Bilateral kernel Ours 
(a) (b) 5.59 (c) 3.58 (d) 3.30 
(e) (f) 7.99 (g) 7.41 (i) 6.45 

(a)  (b)  
(c)  (d)  (e) 
The next experiment demonstrates the advantage of using the proposed geodesic distance based interpolation applied to optical flow estimation in comparison with the popular interpolation method [19]. The sparse data set for this experiment is the result of optical flow estimation obtained by another stateoftheart algorithm [26] using the MPI Sintel training data set [6]. Formally we include our interpolation method in the pipeline of the DCflow [26] method and compare it with the result of the same DCflow pipeline that included the EFI instead ours. The MPI training data set includes 23 different video sequences, in turn each sequence consists of up to 50 frames with the known ground truth optical flow. Fig. 5 illustrates result of interpolation of sparse data Fig. 5(a) with two methods: Fig. 5(c)  EFI [19]; Fig. 5(d)  the proposed algorithm; Fig. 5(e)  the ground truth result. Fig. 5(b)  illustrates an affinity image or a current frame that correspond to the estimated motion vectors map. We can see that the EFI technique produces false segmentation in the image scene background, thus decreasing accuracy of the interpolation step.Note that our method provides smoother results.
Quantitative comparison of our method and EFI is illustrated in Table I, where comparison is given for every sequence of the MPI training data set. Our method provides quantitatively better results than the EFI in interpolation accuracy for the endpoint error metric over the average of all sequences as 9.86 : 9.95 (all pixels mask), and 6.91 : 7.25 (nonoccluded pixels mask). Here for our algorithm we set parameters and for all sequences.
Our algorithm is considerably faster than the EFI approach and this fact is illustrated in Table I (bottom part). From the table one can see that the computational time per frame of the EFI depends on the considered sequence. The reason is that the computational time of the EFI directly depends on the sparsity (or density of the known pixels per image). We found that this time is equal to 3.2 sec for one frame with size 1024x436 for 1/9 sparsity factor, 0.5 sec for 1/100 sparsity factor, and 0.3 for 1/1000 sparsity factor. In contrast, our method belongs to the O(1) class of algorithms and the computational time per frame depends only on the size of the processed image.
Iv Conclusions
In this work we developed a fast and flexible sparse data interpolation algorithm using the geodesic distance affinity filter [15] and apply the derived pipeline to the optical flow estimation problem. Moreover, we found that our approach is more general, faster and with clearer theoretical motivation in comparison with the EFI approach.
Acknowledgements
This work has been supported by the Spanish project TIN201565464R, TIN201679717R (MINECO/FEDER). and the COST Action IC1307 iV&L Net (European Network on Integrating Vision and Language), supported by COST (European Cooperation in Science and Technology).
References
 [1] A. Adams, J. Baek, and M. Davis. Fast highdimensional filtering using the permutohedral lattice. Computer Graphics Forum, 29(2):753–762, 2010.
 [2] X. An and F. Pellacini. AppProp: allpairs appearancespace edit propagation. ACM Transactions on Graphics, 27(3):40, 2008.
 [3] V. Aurich and J. Weule. Nonlinear gaussian filters performing edge preserving diffusion. In Mustererkennung 1995, pages 538–545. Springer, 1995.
 [4] E. Bennett, J. Mason, and L. McMillan. Multispectral bilateral video fusion. IEEE Trans. Image Processing, 16(5):1185–1194, 2007.
 [5] J.P. Berrut and L. N. Trefethen. Barycentric lagrange interpolation. SIAM review, 46(3):501–517, 2004.
 [6] D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black. A naturalistic open source movie for optical flow evaluation. In A. Fitzgibbon et al. (Eds.), editor, European Conf. on Computer Vision (ECCV), Part IV, LNCS 7577, pages 611–625. SpringerVerlag, Oct. 2012.

[7]
X. Chen, S. Bing Kang, J. Yang, and J. Yu.
Fast patchbased denoising using approximated patch geodesic paths.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pages 1211–1218, 2013.  [8] A. Criminisi, T. Sharp, C. Rother, and P. Pérez. Geodesic image and video editing. ACM Trans. Graph., 29(5):134–1, 2010.
 [9] P. Dollár and C. L. Zitnick. Structured forests for fast edge detection. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 1841–1848. IEEE, 2013.
 [10] E. Gastal and M. Oliveira. Domain transform for edgeaware image and video processing. ACM Transactions on Graphics, 30(4):69, 2011.
 [11] E. Gastal and M. Oliveira. Adaptive manifolds for realtime highdimensional filtering. ACM Transactions on Graphics, 31(4):33, 2012.
 [12] D.T. Lee and B. J. Schachter. Two algorithms for constructing a delaunay triangulation. International Journal of Computer & Information Sciences, 9(3):219–242, 1980.
 [13] M.Y. Liu, O. Tuzel, and Y. Taguchi. Joint geodesic upsampling of depth images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 169–176, 2013.
 [14] X. Mei, X. Sun, M. Zhou, S. Jiao, H. Wang, and X. Zhang. On building an accurate stereo matching system on graphics hardware. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pages 467–474. IEEE, 2011.
 [15] M. Mozerov and J. van de Weijer. Improved recursive geodesic distance computation for edge preserving filter. IEEE Transactions on Image Processing, 26(8):3696–3706, 2017.
 [16] M. G. Mozerov and J. van de Weijer. Global color sparseness and a local statistics prior for fast bilateral filtering. Image Processing, IEEE Transactions on, 24(12):5842–5853, 2015.
 [17] S. Paris and F. Durand. A fast approximation of the bilateral filter using a signal processing approach. In Proc. European Conf. on Computer Vision, pages 568–580, 2006.
 [18] R. Ramanath and W. E. Snyder. Adaptive demosaicking. J. Electron. Imag., 12(4):633–642, 2003.
 [19] J. Revaud, P. Weinzaepfel, Z. Harchaoui, and C. Schmid. Epicflow: Edgepreserving interpolation of correspondences for optical flow. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1164–1172, 2015.
 [20] S. Skiena. Dijkstra algorithm. Implementing Discrete Mathematics: Combinatorics and Graph Theory with Mathematica, Reading, MA: AddisonWesley, pages 225–227, 1990.
 [21] S. M. Smith and J. M. Brady. Susan—a new approach to low level image processing. International journal of computer vision, 23(1):45–78, 1997.
 [22] C. Tomasi and R. Manduchi. Bilateral filtering for gray and color images. In Computer Vision, 1998. Sixth International Conference on, pages 839–846. IEEE, 1998.
 [23] O. Weber, Y. S. Devir, A. M. Bronstein, M. M. Bronstein, and R. Kimmel. Parallel algorithms for approximation of distance maps on parametric surfaces. ACM Transactions on Graphics (TOG), 27(4):104, 2008.
 [24] H. Winnemöller, S. Olsen, and B. Gooch. Realtime video abstraction. ACM Transactions on Graphics, 25(3):1221–1226, 2006.
 [25] J. Xiao, H. Cheng, H. Sawhney, C. Rao, and M. Isnardi. Bilateral filteringbased optical flow estimation with occlusion detection. In Proc. European Conf. on Computer Vision, pages 211–224, 2006.
 [26] J. Xu, R. Ranftl, and V. Koltun. Accurate optical flow via direct cost volume processing. CVPR2017: arXiv preprint arXiv:1704.07325, 2017.
 [27] Q. Yang. Recursive bilateral filtering. Proc. European Conf. on Computer Vision, pages 399–413, 2012.
 [28] L. Yaroslavsky. Digital holography and digital image processing: principles, methods, algorithms. Springer Science & Business Media, 2013.