1 Introduction
Optical flow is the apparent motion field between two consecutive frames of a video. More generally, it can be defined as a dense correspondence field between an arbitrary pair of images. There are two large families of methods for computing image correspondences: local and global methods. Local methods establish a point correspondence by minimizing a distance measure between the matching neighborhoods gabriele_buades ; kukjin . They provide a sparse correspondence field since not all the image points are discriminative enough to guarantee a single correspondence. On the other hand, global or variational methods Horn_Schunk:1981 ; alvarez2000reliable ; Bro04a ; Zach ; Zimmeretal09 ; strekalovskiyetalsiims14
provide a dense solution by minimizing a global energy. Recent work on optical flow estimation
bailer2015flowICCV ; chen2013large ; fortun2014aggregation ; kennedy2015optical ; menze2015discrete ; yang2015dense ; EpicFlow ; leordeanu2013locally is mostly focused on solving the major challenges that appear in realistic scenarios and outdoor scenes, such as large displacements, motion discontinuities, illumination changes, and occlusions.In this article we introduce a new strategy to compute good local minima of any optical flow energy functional which allows to capture large displacements. The method relies on a discrete set of matches between the two input images, which can be extremely sparse and contaminated by outliers, and is used as a guide to find a local minimum of the chosen energy functional. The novelties include how these matches are used in the optimization problem. The proposed method is a twostep minimization process of the optical flow energy. The first step computes a good and dense local minimum by propagating the initial matches with a region growing strategy. The propagation is driven by the minimization of the energy on patches and the order of update is established by the value of the local version of the energy, i.e. the energy restricted to a square patch of the image domain. It can be thought as a grouped coordinate descent of the target functional, where the variables are defined by the optical flow values in a patch, and the functional is minimized on the patch while fixing the rest of flow values. Coordinate descent methods provide efficient and fast optimization schemes even for hugescale problems Nesterov2012 . The basic idea is to minimize a multivariable objective function by optimizing over a single coordinate while holding others fixed. The sweep pattern is the name used to define how the alternating minimization process moves along all the coordinates. In order to obtain a good minimum and accelerate convergence, we follow an adaptive choice of the sweep pattern driven by a seed growing algorithm based on the value of the energy after local minimization in a patch centered at that coordinate. This information is stored in a priority queue from which the coordinate is selected and the adaptive sweep pattern is iteratively created. The result of the first step is a dense optical flow which is further refined in the second step: A global minimization of the energy taking as initial condition the flow obtained in the first stage. Both minimization steps work with the full resolution of the image directly.
Our method allows to choose the energy functional. For instance, by choosing it with desirable properties such as robustness to illumination changes, occlusions and motion discontinuities, those properties are inherited by our method. We can compare our method to the common coarsetofine (also called multiscale) approach or to the strategy of including the information of sparse matches by incorporating an extra term in the energy that penalizes deviations from the flows given by the matches Brox_Malik_LD ; Stoll2013 ; Weinzaepfel2013 ; both of them also find local minima of arbitrary optical flow energy functionals. We find that we consistently outperform the multiscale strategy for a variety of input sparse matches and energy functionals. We present qualitative and quantitative results on several datasets such as Middlebury Middlebury , MPISintel Sintel , KITTI 2012 Kitti , and KITTI 2015 Menze2015CVPR datasets. The performance is better than LDOF Brox_Malik_LD and comparable to DeepFlow Weinzaepfel2013 while being more robust, less dependent on the density of seeds and based only on a single energy with two terms (no need of extra parameters). In our case, the ability to find large displacements requires only that at least one match is correctly given for each object in motion, as shown in the experiment of Figure 1. Moreover, we find that our minimization strategy is very robust to the presence of many incorrect keypoints in the input, as illustrated in Figure 3, where there are only 2 correct matches and 508 outliers. In contrast, the alternative strategy of including sparse matches in the energy functional with an additional term is not robust to a small number of matches (see Figure 10) nor to a large percentage of wrong matches or high levels of noise (see Table 6).
The remainder of the paper is organized as follows. In Section 2 we revise previous work on optical flow estimation. Section 3 presents our large displacement optical flow method (which we denote by FALDOI, as an acronym made of the first letter of the words ”large displacement optical flow method by an astute initialization”). Section 4 presents results from several energy functionals and an analysis of the properties and performance of our strategy depending on the density of the initial set of discrete matches. Finally, the conclusions are summarized in Section 5.
2 Related work
The seminal work of Horn and Schunk Horn_Schunk:1981 has inspired many subsequent works also based on an energybased formulation of the optical flow problem and focused on different kind of limitations and their improvements. Initial progress was devoted to the use of more robust data terms and TotalVariationbased regularizers in order to obtain sharper solutions Bro04a ; black1996robust ; sun2010secrets ; Zach . The correct preservation of motion discontinuities is a key issue that has motivated many proposals in the smoothness term: Starting from the use of decreasing functions of the image gradients chen2013large ; sanchez2014preserving ; xu2012motion
, diffusion tensors
nagel1986investigation ; werlberger2009anisotropic ; zimmer2011optic , coupled regularization of the flow channels strekalovskiyetalsiims14 ; Palomares2014 or secondorder regularization ranftl2014non , to the more recent nonlocal regularization terms krahenbuhl2012efficient ; sun2014quantitative ; werlberger2010motion . The classical hypothesis for defining the data term has been the brightness constancy assumption but this is very limiting in realistic scenarios where illumination changes as well as occlusions may appear. Robustness against additive illumination changes can be obtained by using the gradient constancy assumption papenberg2006highly ; while advanced data terms based on patch measures present more general invariances, in particular, the Normalized Cross Correlation Steinbrueckeretalvmv09 ; vogel2013evaluation ; werlberger2010motion is invariant to linear brightness changes and the Census transform is invariant under monotonically increasing rescalings hafner2013census ; muller2011illumination ; stein2004efficient . Patchbased data terms are used in the stateoftheart methods, including stereo and optical flow problems. As they are more informative and allow to better characterise the local image structure they result in more accurate flow estimations. On the other hand, occlusions represent an important problem for optical flow estimation methods and some works include a characterization of the occlusion areas in the energy, alvarez2002symmetrical ; ayvaci2012sparse ; BGLC ; fortun2014aggregation , while others estimate the occlusion areas based on a triangulation of the image kennedy2015optical .Energybased methods are called global methods since they find correspondences by minimizing an energy defined on the whole image. Traditionally, global methods include a linearization of the warped image in the data term to make the optimization problem more tractable. The linear approximation is only valid for small displacements, that is why these methods are usually embedded in a coarsetofine warping scheme in order to better capture large displacements. However, they still fail to correctly handle large motions of small objects (not present at coarser scales). On the other hand, local methods are much better adapted to capture large movements. The work of Álvarez et al. alvarez2000reliable
was probably the first to note that the standard coarsetofine approach may not be enough and proposed to modify it by using a linear scalespace focusing strategy from coarse to fine scales in order to avoid convergence to incorrect local minima. More recently, Steinbrücker et al.
steinbrucker2009large proposed an algorithm that does not require the coarsetofine warping strategy. It was one of the first attempts to capture large motions with variational methods, where the data term and the regularizer are decoupled by a quadratic relaxation technique and the optimization problem is directly solved at the finest scale by alternating two global minimizations while decreasing the decoupling parameter. One of the minimization problems is convex; the other one is nonconvex but it is a pointwise optimization problem, so its solution is found using an exhaustive search for every pixel, which can be performed in parallel. Another early work is the proposal by Brox and Malik Brox_Malik_LD , that incorporates sparse matches into the variational model by adding an extra term that penalizes deviations of the estimated optical flow from the flow given by the matches obtained by HOG descriptors dalal2005histograms at some sparse locations. The extended variational method is solved with a coarsetofine strategy as usual. DeepFlow Weinzaepfel2013 and SparseFlow timofte2015sparse follow the same strategy as proposed by Brox and Malik Brox_Malik_LD , the difference among them lying in the way they find the matches that define the matching term in the variational approach. The descriptor matching proposed by Weinzaepfel et al. Weinzaepfel2013 is inspired by nonrigid 2D warping and deep convolutional networks and permits a more dense and deformable matching compared to the popular HOG/SIFTlike descriptors dalal2005histograms ; SIFT . The matching algorithm proposed by Timofte et al. timofte2015sparse is also robust to nonrigid deformations and is based on the compressed sensing theory. The proposal by Xu et al. xu2012motion also uses a coarsetofine approach but the initial flow in each level of the pyramid is modified. At each level, different candidate flows are considered: the flow propagated from the previous level, sparse feature matching (computed by SIFT), and dense nearest neighbor patch matching. Then, for each pixel, the optimal flow is selected among the different candidates by solving a discrete optimization problem.Other works that do not use a coarsetofine strategy are based on correspondences obtained by a nearest neighbor algorithm; one is based on a purely local method bao2014fast while another chen2013large refines the initial dense (and noisy) correspondence field using a motion segmentation and the minimization of a global energy. Some recent methods are based on a sparsetodense approach; they start from a sparse set of correspondences that capture large displacements and these are densified by edgeaware interpolation techniques leordeanu2013locally ; EpicFlow . Later, the densified flow is refined by minimizing a global energy at the finest scale. Other works menze2015discrete ; yang2015dense also include a final refinement of the flow by minimizing a global continuous energy in the original image scale directly; either using a discrete inference problem in a conditional random field as a first step menze2015discrete
, or a piecewise adaptive parametric model embedded in an energy function that combines both continuous and discrete variables
yang2015dense . The recent proposal of Fortun et al. fortun2014aggregation solves a discrete optimization of a global energy in the original scale where the different flow candidates are obtained in a previous step by patchbased parametric motion estimation; this method also makes a special treatment of occlusions. On the other hand, an almost purely databased dense optical flow is indeed possible, by working with approximate nearest neighbor fields with a hierarchical search strategy and an advanced outlier filtering bailer2015flowICCV .Here, we also propose to get rid of the coarsetofine strategy and directly minimize the energy at the finest scale with the help of some sparse correspondences that capture the large displacement motions. The main difference with respect to the previous works leordeanu2013locally ; EpicFlow is the way of combining the sparse matches with the variational approach: the initial matches are densified by a region growing process driven by the minimization of a local version of the target energy functional. Thus, the same variational tool is used both for the densification step and for the final global optimization problem. In contrast, the sparsetodense interpolation of the sparse matches in leordeanu2013locally ; EpicFlow is based on a regularity measure that takes into account occlusion and boundary constraints. In this way, since in our case we are always taking into account a data term and a regularizer, together with a smart local minimization of the energy guided by the best matches, our sparsetodense intermediate step is more robust to outliers in the initial set of sparse correspondences and is able to correctly recover a dense smooth motion in a certain region with just a single correspondence in it and without the need of using edge information (see Figures 2 and 3). The outliers in this experiment were randomly introduced to test our claim against a huge percentage of them. Nevertheless, the robustness of our iterated algorithm to usual outliers is shown in Figure 4, the different Tables and remaining experiments and it relies on pruning strategies based on saliency and forwardbackward optical flow consistency.
3 Proposed minimization strategy
In this section we present a minimization strategy that can be applied to any optical flow energy functional and which is founded on estimating a good local minimum with the help of a discrete set of matches. It is able to benefit both from the sparse techniques, which handle arbitrarily large displacements, and from the continuous optimization of a variational formulation, which yields dense flow fields with subpixel accuracy. The basic idea of the method is to assume that at least some matches are correct, and propagate the correct information from those seeds driven by the minimimization of the energy around them. This is ilustrated in Figure 1, where each region of smooth movement has at least one correct seed.
The sparse set of initial correspondences (we will refer to them as seeds) is used as a reference or guide to recover a dense flow field. This is done by iteratively growing the seeds by a local (patchbased) minimization of the functional in a proper order (detailed in Sect. 3.1.1 and Sect. 3.1.2). This dense optical flow is then refined by a global minimization of the energy. The algorithm always works at the finest scale of the image. A pseudocode of the whole strategy is presented in Algorithm 1 while Algorithm 3 presents a slightly modified version which is explained in Sect. 3.1.2.
In order to detail the proposed approach, we introduce some notation and assumptions. Let us denote two consecutive image frames of a video sequence as . As usual, we assume that the image domain is a rectangle in . In order to compute the optical flow between and , we use a discrete set of matches , , and an energy , defined from and . Minimizing on an appropriate set, a minimum of represents an optical flow between and . We assume that the discrete matches in have been computed with a sparse matching algorithm in such a way that is thought as belonging to the image domain of and as belonging to the image domain of . From these discrete matches, we compute the initial set of seeds, denoted here by , by defining , . Each seed in the finite set of initial seeds stores the corresponding pixel , its related optical flow and the local energy of the functional around it (see Algorithm 1).
3.1 Computing a good local minimum
This step can be interpreted as an adaptive grouped coordinated descent approach driven by the lowest local values of the energy on patches centered at pixels. It is inspired from the match propagation principle MatchPropag , where a set of initial sparse matches, the seeds, are propagated to neighboring pixels using a similar technique to the region growing strategy. This principle was used in the work of Kannala_Brandt in order to obtain a quasidense disparity map, assuming that the seeds and their neighboring pixels may present similar disparity values. Moreover, it shares ideas with the coordinate descent methods, which are optimization techniques in multivariable functions. They minimize the objective function solving a sequence of one variable minimization problems. Each subproblem improves the estimate of the solution by minimizing along a selected coordinate while all other coordinates are fixed. Generally, each coordinate is visited several times to reach a minimum. The sweep pattern is the name used to define how the alternating minimization process moves along all the coordinates. If there is a fixed order to visit the coordinates, this is called pathwise coordinate descent Frie_Tibs or cyclic coordinate. In our case, the election of the sweep pattern has a fundamental role during the minimization process; we will follow an adaptive choice of the sweep pattern driven by a seed growing algorithm based on the value of the local minimization of the energy in a patch centered at that coordinate (or pixel in our case). The sweep process is managed through a priority queue where the potential candidates (optical flow) for each coordinate are inserted. Each candidate presents a related energy that is used to determine its position in the queue: Candidates with less associated local energy will be at the top of the priority queue. The initial seeds are inserted with zero energy.
In the following we will first present the baseline algorithm for the local minimization step, where every pixel is visited just once. Then we explain how it is iterated in order to revisit the pixels several times and gain more robustness against occlusions and resistant outliers of the sparse matcher.
3.1.1 Baseline algorithm: faldoi
Initially, the seeds are inserted to the priority queue (containing, as said before, the value of the optical flow and the energy on a local patch centered at the pixels) with zero energy. Along the minimization process, new candidates will be added to the queue. This collection of potential candidates for each coordinate will be sorted based on their local energies. Whenever an element is removed from the queue to fix a coordinate, we are selecting the candidate with the lowest associated local energy. The aforementioned process is repeated until the priority queue is empty and a dense optical flow is obtained from the initial seeds. There may be several candidates for the same pixel in the queue; when a candidate is extracted from the queue to fix it, if its corresponding pixel has already been fixed (by a candidate with lowest energy) nothing is done. Then, after a certain extractions the queue is emptied.
Each time that a coordinate is extracted from the queue to fix it, a local minimization of the energy is solved on a square patch centered around the pixel previously fixed. Then, the estimated optical flow values of its neighbors are inserted as potential candidates into the priority queue with the energy (after local minimization) of the patch centered at the fixed coordinate.
Whenever the energy is minimized in a local patch centered at the coordinate that has been fixed we need to provide an initial flow in the unknown values in (where is the set of locations where the optical flow has not been fixed yet). The initialization is just an interpolation of the already fixed optical flow values in the patch through the Laplace equation resulting from the minimization of the following Dirichlet energy
where , , and contains the optical flow of the already fixed coordinates in ( denotes the complementary set of ). The EulerLagrange equation derived from the previous energy is the Laplace equation, which is solved by gradient descent with Neumann boundary conditions. Other intrapatch interpolations are allowed (e.g., based on the bilateral filter or even a constant interpolation based on the flow values on ).
The details of the energy minimization process are given in Appendix A and Algorithm 4. In practice, we consider patches of pixels (although bigger patches can be used, it increases the computational cost which is proportional to the size of the patch) and we perform 10 iterations of the minimization process, in every local patch (line 7 of Algorithm 2 ), of a version of where the warped image has been linearized with a single warping.
3.1.2 Iterated faldoi
Given the fact that, generally, in coordinate descent methods each coordinate has to be visited several times in order to reach a minimum, we propose to revisit the coordinates and thus perform several iterations of the baseline algorithm presented in Sect. 3.1.1 with a forwardbackward pruning of the flow values between two consecutive iterations in order to gain more robustness against resistant outliers.
More precisely, between two consecutive iterations of the coordinate descent completing a sweep pattern, we perform a pruning based on a forwardbackward consistency check. The goal is to remove wrong matches, specially in occlusion areas and also due to selfsimilarity. The latter may appear due to outliers in the initial seeds that have expanded by the region growing scheme. Our algorithm computes both the forward, , and the backward, , optical flows between any two frames and . Then, a forwardbackward consistency check of these flows is performed and the forward flow values at points not verifying are removed, where is a small constant (it is set to in our experiments). In the same way, the backward flow at points not verifying are removed.
In the iterated faldoi, the sweeps following the first one are slightly different from the baseline sweep of Sect. 3.1.1 because they start not from a sparse set of optical flow values but from a dense motion field with holes that arise after the forwardbackward pruning. The two main differences with respect to the main sweep are:
1. Initialization of the queue. The seeds that survive to the forwardbackward consistency check are inserted into a new priority queue with zero local energy. The rest of optical flow values that also survive to the forwardbackward pruning are added with its associated local energy.
2. Initialization of the optical flow of a patch before local minimization. Every time a local patch is minimized an initial flow in the patch is needed. We make a distinction between the pixels that passed the forwardbackward consistency test and those that didn’t. In the pixels that passed the test the initial flow is the most updated flow value by the last local minimization in that pixel. For that, we use an auxiliary flow which is updated after every local minimization. On the other hand, in the pixels that didn’t pass the test we fillin the initial flow by iterating a bilateral filtering in the local patch. This intrapatch initialization could be also used in the first baseline sweep of Sect. 3.1.1. Nevetherless, in that very first interpolation, for efficiency reasons we opted for the rougher Laplace interpolation explained above.
A pseudocode of the iterated faldoi strategy is presented in Algorithm 3. Figure 4 shows the importance of the iterations of the sweep pattern in the local minimization step. The optical flow is improved from one iteration to the following, in particular, the motion boundaries are progressively better recovered and the effect of the piecewise constant flow is lost. Moreover, this iterated faldoi strategy adds robustness to outliers present in the initial seeds due to the forwardbackward consistency check between two consecutive iterations. The initial outlier regions are reduced, after six iterations to a single region, which is completely lost after the global iteration step. In this experiment the seeds have been computed using Deep Matching, which gives more outliers than SIFT (which was used for the experiment with the same pair of images in Figure 2). Let us remark that in this example, we did not include the initial pruning of seeds based on the low saliency of patches that is used, e.g., in EpicFlow .
3.2 Global Minimization
The last step is a global minimization of , taking as initial condition the dense solution provided by the previous step, the local minimization guided by the initial set of matches. Let us remark that we only minimize the global energy at the finest scale, that is, we avoid the multiscale approach. During both steps, local and global, we minimize the same energy functional following the same numerical scheme, which is detailed in Algorithm 4 and Appendix A. For the global minimization we perform 4 warpings of the energy.
4 Results
Our proposal assumes that two ingredients are given: a discrete set of correspondences between two frames of a video (the seeds) and an optical flow energy functional, namely, . Section 4.1 presents the different energies we have used in this paper and Section 4.2 briefly discusses several possibilities for the initial sparse correspondences. Later on, in Sect. 4.3 we provide a quantitative and qualitative comparison among the several possibilities for the energy terms, as well as a comparison between our method and several stateoftheart methods, including methods based on the combination of sparse matches and variational techniques Brox_Malik_LD ; leordeanu2013locally ; EpicFlow ; Weinzaepfel2013 .
4.1 Different possibilities for the energy
The proposed framework is independent from the energy functional and we validate it by using several energies. Following most of the optical flow variational approaches in the literature, the different possibilities will share the common feature of being made of two terms: a data fidelity term , and a regularization term ,
(1) 
where , is the image domain, are two consecutive frames ( for gray level images and for color images), and represents the motion field between them.
Let us also remark that our method allows the inclusion of other terms as, e.g., a third term dealing with the occlusions, as in BGLC ; ayvaci2012sparse , which will not be considered here for the sake of simplicity in the presentation.
With the aim of having a robust data cost, specially under illumination changes, we use a convex and continuous approximation of the data cost based on the Census transform hafner2013census ; stein2004efficient ; zabih1994non , approximated by a sum of centralized absolute differences, denoted as vogel2013evaluation , and defined as:
(2) 
where
denotes the characteristic function of a square of size
centered at the origin ( in our experiments, as in vogel2013evaluation ).On the other hand, we also consider the classical pointwise data term that imposes the wellknown brightness constancy assumption, namely,
(3) 
It is important to correctly preserve the motion boundaries in order to obtain an accurate optical flow. Motion boundaries are usually aligned with image boundaries, this aspect has motivated the use of edge detectors (e.g. dollar2013structured ; leordeanu2012efficient ) in previous optical flow methods leordeanu2013locally ; menze2015discrete ; EpicFlow . Instead, we opt for using the NonLocal Total Variation () regularizer. was used for optical flow estimation in werlberger2010motion , where the authors show its ability to better recover motion boundaries, in particular in lowtextured areas, occluded regions, and in small objects (when used in a coarsetofine scheme). In our case, a nonlocal regularizer which better captures motion discontinuities is also very useful in the local minimization step where the initial correspondences are densified. We apply the regularizer to each flow channel independently:
(4) 
where denotes the Euclidean distance between the color values at and in the Lab space, is the Euclidean distance between points and , is a normalizing constant, and , are constant parameters (set to and ).
As before, with the purpose of testing the proposed minimization procedure against different energies, we consider other regularization terms such as the classical coupled Total Variation (called in strekalovskiyetalsiims14 ), namely,
(5) 
Appendix A details the optimization algorithms for all the energies. Let us now just say that the data and the regularization terms are decoupled and standard methods such as primaldual, thresholding or median schemes are used.
4.2 Different possibilities for the initial set of seeds
In this work, the initial seeds are computed from one of the sparse matchers in the literature. There are several methods that provide a set of sparse matches between two different images containing common objects, representing two views of a scene or, as in our case, two frames of a video. Some of them are based in the estimation of distinctive point location and matching SIFT ; Mikolajczyk2005 . Being based on local properties, they can be used to estimate arbitrarily large displacements. In our experiments, we will use SIFT SIFT or Deep Matching Weinzaepfel2013 . Although SIFT is very effective, it has some disadvantages when dealing with small textureless objects or with non rigid deformations. The recent Deep Matching algorithm handles these problems and generates a more dense set of matchings. We have compared these two different methods, SIFT and Deep Matching, to compute the initial set of matches (the seeds). The first column of Figures 5 and 6 show some examples of the set of seeds obtained by the SIFT and Deep Matching algorithms. We use the SIFT implementation of Sift_Ipol , with its default parameters. Also, we use the Deep Matching implementation with the default parameters presented in Weinzaepfel2013 .
With the aim of avoiding outliers in homogeneous areas, we initially remove the seeds having a low local saliency, which is determined by the minimum eigenvalue of the autocorrelation matrix computed locally. If the saliency is below a threshold (set to
in our experiments), these seeds are removed. This pruning is also used by other authors (see, e.g., EpicFlow ).In any case, we claim that our optical flow strategy is able to recover the dense motion regardless of the number of initial seeds; the only condition is to have at least one correct seed in every region where the motion is smooth. As a proof of concept, in Fig. 1 we use a single seed per , which has been extracted from the ground truth flow. The estimated optical flow compares favourably to the ground truth even in these cases where the initial set of matches is extremely sparse. In contrast, other sparsetodense methods like EpicFlow that produce stateoftheart results are not capable to estimate the optical flow in this challenging situation. Another example is shown in Fig. 5 where similar optical flow results are obtained independently of the cardinality of the set of seeds, which have been computed using either the SIFT (Fig. 5, second row) or DeepMatching Weinzaepfel2013 (Fig. 5, third row) algorithms. Let us notice that DeepMatching produces in general more matches than SIFT. On the other hand, Fig. 6 shows a situation where the lack of seeds in some regions (second row, seeds provided by SIFT matches) produces an incorrect flow estimation in these areas. A much better estimation is obtained when there is at least one seed in these regions, as it happens when we use as seeds the matches provided by DeepMatching (Fig. 6, third row).
4.3 Experimental results
The proposed method has been tested on several publicly available databases: Middlebury Middlebury , MPISintel Sintel , KITTI 2012 Kitti , and KITTI 2015 Menze2015CVPR , as well as on proof of concept examples, chosen in order to obtain a better understanding of the fundamental parts of the proposed strategy. Let us remark that all results have been obtained by using the grayscale versions of the original color frames. The color version is only used to compute the seeds in case they come from the Deep Matching algorithm and to compute the local regularization weights in case of the NonLocal Total Variation as regularization term. All the results in this section have been obtained with the iterated faldoi minimization strategy presented in Sect. 3.1.2. We have fixed the parameters for all the experiments, more details are given in Appendix B.
First, we present a comparison of our strategy against the multiscale approach using different functionals. In order to assure that the proposed framework is a real alternative to the coarsetofine warping strategy and, therefore, also valid for sequences that do not have large displacements of small objects, experiments on the Middlebury optical flow data set Middlebury have been performed for both approaches. Table 1 shows how our framework achieves better results in all the samples in the dataset and for both energy functionals,  and , even if the difference is slight is some cases. Some qualitative results are shown for images of the MPI Sintel database that contain large displacements in Figure 8 where the initial seeds have been computed using DeepMatching. The first row shows, from left to right, two consecutive frames and the optical flow ground truth (color coded). The image (d) of the second row shows the ground truth occlusion map (the occluded points are shown in white). The multiscale results obtained using the  and the  energies are shown in (e) and (f), respectively, while the corresponding results obtained by our minimization strategy (iterated faldoi) are shown in (h) and (i), respectively. Finally, image (g) displays the result of our minimization faldoi strategy applied to the  energy functional. As it can be observed, the use of an advanced data term based on patches reduces the outlier area and better recovers the human shape. After adding the nonlocal regularizer the motion boundaries are more accurate and the outlier is removed.
Middlebury  Dim.  Hyd.  Rub.  Gro2  Gro3  Urb2  Urb3  Ven 

, multiscale  
, our min.  
, multiscale  
, our min. 
Table 2 provides a quantitative comparison of different estimations of the optical flow obtained with different energy functionals ( and ) and different set of seeds (SIFT seeds and Deep Matching seeds). We use a subset of 10% of frames randomly selected from the final training set of the MPI Sintel database. Let us notice that the energy based on the nonlocal Total Variation and a smooth approximation of the Census transform () is the one which achieves the best results, both quantitative and qualitatively. On the other hand, Figure 7 shows the energy values using both approaches, multiscale and our proposal (with Deep Matching seeds) over the same functionals. Notice how in general our method gives a lower or similar energy than the multiscale method.
training set. It is the standard boxplot representation. Red dots are outliers, each box shows the first, second and third quartiles and the whiskers length corresponds to 1.5 interquartile range (IQR).
Functional, minimization  EPE all  EPE mat.  EPE unmat.  s010  s1040  s40+ 

, multiscale  6.6453  5.1181  15.8828  2.9261  8.0172  45.3955 
, multiscale  8.4243  7.0860  16.8037  2.8129  9.3389  50.4899 
, our min. (SIFT)  6.7307  5.4784  14.8439  3.2928  8.3336  47.2940 
, our min. (SIFT)  7.3337  6.1058  14.7972  4.4182  10.1687  49.8888 
, our min. (Deep)  4.7688  3.5043  13.5503  2.3861  6.4305  32.3350 
, our min. (Deep)  4.0158  2.7901  12.5350  2.0757  5.4551  31.5814 
, our min. (Deep)  4.0060  2.7845  12.2698  2.3115  5.2989  30.8537 
Table 3 and Figure 9 show a quantitative and a qualitative comparison, respectively, among our method and several stateoftheart methods, including methods based on the combination of sparse matches and variational techniques Brox_Malik_LD ; Weinzaepfel2013 via an extra coupling term and that can also be adapted to any energy. Our approach outperforms LDOF Brox_Malik_LD in both Clean and Final versions of the MPISintel dataset. Compared to DeepFlow Weinzaepfel2013 we get better results in the Clean version and in the Final version we get an EPE which is only a tenth worse, thus we can say that the results are comparable in the Sintel database. Our approach gives better results than S2DMatching leordeanu2013locally , which is another sparsetodense technique that uses edge information and occlusion estimation in the densification process. The images in Fig. 9 have been taken from the MPISintel webpage. Compared to Brox_Malik_LD ; menze2015discrete ; Weinzaepfel2013 and EpicFlow EpicFlow (also a sparsetodense technique that achieves good positions in the Sintel table) our result better recovers the silhouettes of the girl (elbows, armpit, right hip and lower part of the legs) and the birds without using precomputed edge information as EpicFlow , although it produces some halos around the birds and the topleft corner of the image.
EPE all  EPE mat.  EPE unmat.  s010  s1040  s40+  
bailer2015flowICCV  5.810  2.621  31.799  1.157  3.739  33.890 
menze2015discrete  6.077  2.937  31.685  1.074  3.832  36.339 
EpicFlow  6.285  3.060  32.564  1.135  3.727  38.021 
Weinzaepfel2013  7.212  3.336  38.781  1.284  4.107  44.118 
7.337  3.580  37.904  1.487  4.355  43.526  
leordeanu2013locally  7.872  3.918  40.093  1.172  4.695  48.782 
Brox_Malik_LD  9.116  5.037  42.344  1.485  4.839  57.296 
menze2015discrete  3.567  1.108  23.626  0.703  2.277  20.906 
bailer2015flowICCV  3.748  1.056  25.700  0.546  2.110  23.602 
EpicFlow  4.115  1.360  26.595  0.712  2.117  25.859 
4.927  1.542  32.535  1.047  2.647  29.719  
Weinzaepfel2013  5.377  1.771  34.751  0.960  2.730  33.701 
 leordeanu2013locally  6.510  2.792  36.785  0.622  3.012  44.187 
Brox_Malik_LD  7.563  3.432  41.170  0.936  2.908  51.696 
LDOF Brox_Malik_LD ,  Discrete Flow menze2015discrete , 
DeepFlow Weinzaepfel2013 ,  EpicFlow EpicFlow , 
Our proposal,  Ground truth 
Table 4 shows a quantitative comparison among our method and several methods on the KITTI 2012 database Kitti . As expected, the performance of our approach is better than LDOF Brox_Malik_LD , and comparable to CRTflow DBLP:journals/ijcv/DemetzHW15 , EpicFlow EpicFlow and DeepFlow Weinzaepfel2013 . KITTI 2015 Menze2015CVPR can be used to evaluate optical flow and scene flow algorithms. Table 5 only contains results for optical flow methods.
EPEnoc  EPEall  OutNoc3  OutAll 3  

bailer2015flowICCV  1.4  3.5  5.77  14.01 
menze2015discrete  1.3  3.6  6.23  16.63 
Weinzaepfel2013  1.5  5.8  7.22  17.79 
EpicFlow  1.5  3.8  7.88  17.08 
1.7  5  8.81  19.93  
DBLP:journals/ijcv/DemetzHW15  2.7  6.5  9.43  18.72 
Brox_Malik_LD  5.6  12.4  21.93  31.39 
Flbg  Flbf  Flall  

DBLP:journals/corr/GadotW15  19.98  30.24  21.69 
menze2015discrete  21.53  26.68  22.38 
Hu_2016_CVPR  22.32  27.79  23.23 
EpicFlow  25.81  33.56  27.10 
27.08  31.51  27.81  
Weinzaepfel2013  27.96  35.28  29.18 
sun2014quantitative  39.90  53.59  42.18 
In the approach of Brox et al. Brox_Malik_LD or Weinzaepfel et al. Weinzaepfel2013 the matches are precomputed and then added as a constraint to the energy term. Thanks to these matches the motion of small objects that disappear at the coarser scales is recovered. However, these approaches need a minimum density of sparse matches over the area of the small object in order to correctly capture large displacements, even if the matches are weighted strong enough and enough iterations are performed. By contrast, our proposal only needs one single seed per area motion to recover the whole motion field. This is illustrated in Figure 10 were our proposal is able to recover the four large displacements with just a single seed in each region while DeepFlow doesn’t succeed. We also include the result obtained with LDOF Brox_Malik_LD , but in this case it has been obtained with their own seeds (using their originally binary code).
Recent optical flow datasets as MPISintel Sintel contain different and challenging effects, such as illumination changes, large displacements, blur, etc. In MPISintel these effects have been artificially created to provide naturalistic video sequences. However, the evaluation on these datasets does not take into account the robustness to the shot noise that appears in any real sequence, being noise one of the main limitations to any imaging system. Thus, we evaluate the robustness of several approaches to noise. To this goal, we corrupt the clean images from Sintel
with additive white Gaussian noise of standard deviation
, for several values of . Sparsetodense techniques are very dependent of the initial seeds that are used to obtain a dense optical flow. It is important to note that our proposal works even with an extremely sparse set of initial seeds. This fact allows us to choose the best matching method according to the image peculiarities, without caring that much about the density of the correspondences. In particular, for highly noisy images SIFT correspondences are more robust than DeepMatching ones. Table 6 shows a comparative of our method with different optical flow estimation methods for different levels of Gaussian noise. We use the  functional with SIFT matches. As shown in the table, the multiscale method provides the best optical flow estimation in noisy images. For noise levels of standard deviation greater or equal to 10 and 20 respectively, our method produces better results than EpicFlow and DeepFlow.10  20  30  40  

without matches (pure multiscale)  0.7468  1.0353  1.3832  1.4808 
Weinzaepfel2013  0.7766  1.5796  2.6128  4.3918 
EpicFlow  1.1418  1.6654  2.3386  3.2328 
() with SIFT matches  0.9486  1.5634  2.1090  2.9830 
5 Conclusions
We have proposed and provided an indepth analysis of a method based on a variational formulation of the optical flow problem. The method works at the original scale of the image and finds a good local minimum of any energy functional using an adaptive coordinate descent strategy guided by a sparse set of initial matches. This is a general technique that consistently outperforms the multiscale strategy for the same energy functional. With respect to alternative techniques that also include sparse matches in any energy functional, our performance is comparable to DeepFlow Weinzaepfel2013 and superior to LDOF Brox_Malik_LD while being more robust to a low density of matches, high levels of noise and outliers in the matches. The only requirement is that at least one correct match is given for each object in motion. For best overall results, we propose to use an energy with advanced data and regularization terms. Namely, we chose a smooth variant of the Census transform with a nonlocal regularization, providing robustness to illumination changes and occlusions while handling motion discontinuities. We present accurate quantitative and qualitative results that are comparable with stateoftheart methods. As future work we plan to model the occlusions in the functional to reduce the halo effect in occluded regions.
Appendix A Minimizing the energy
The numerical minimization algorithm for the general energy (1) is obtained in this paper by decoupling both terms. We linearize the image near a given optical flow and make the following approximation , where
and , denote the partial derivatives of with respect to and respectively. Let us recall that the two data terms that we have considered in Sect. 4.1 depend on and ; we will denote as the same data term but depending on and . In order to decouple the fidelity term and the regularization term in (1), we introduce an auxiliary variable representing the optical flow and we penalize its deviation from . Thus, the energy to minimize is
(6) 
depending on the two variables , where . The decoupled energy (6) can be minimized by an alternating minimization procedure; alternatively fixing one variable and minimizing with respect to the other one. Sect. 4.1 presents the different possibilities for the energy.

For fixed, let us consider each of the two different regularization terms, and , presented in Sect. 4.1.

In the case of , we reformulate the problem as a minmax problem incorporating the dual variables. Then, our minimization problem can be solved as a saddlepoint problem. Following the notation of Osher et al. gilboa2008nonlocal , for fixed, we solve
(7) for , and is the dual variable defined on . Let us explain it in detail. First, it is necessary to extend the notion of derivatives to a nonlocal framework. The nonlocal derivative can be written as
(8) where is a positive measure between two points . By taking such that , the nonlocal gradient
is defined as the vector of all partial derivatives:
(9) Now, by writing for , the nonlocal divergence is defined as the adjoint of the nonlocal gradient:
(10) Proposition 1
The solution of (7) is given by the following iterative scheme
(11) (12) (13) where is the primal variable and is the dual variable.

In the case of we use the primaldual algorithm that Chambolle proposed to minimize the ROF model Chambolle2010 and which is based on a dual formulation of the . Then, our minimization problem can be solved as a saddlepoint problem. For fixed, we solve
(14) where the dual variables are and satisfy .
Proposition 2


For fixed, let us consider each of the two different data terms, and , presented in Sect. 4.1.

Case , Li and Osher li2009 present a simple algorithm to find the optimal value of the function when the are nonnegative and is strictly convex. If is also differentiable and is bijective, it is possible to obtain an explicit formula in terms of the median. For fixed, we solve
(18) Following the ideas of vogel2013evaluation , we solve the discrete version of this problem. Due to the isotropy of the quadratic term, the optimal solution of can be obtained solving a one dimensional problem. In particular, setting being an orthogonal vector to the gradient, where is our new variable. Then, we minimize over
(19) where
Proposition 3
The minimum of (19) with respect to is
(20) where and for all the discrete neighbors (corresponding to above), where is the number of points in the discrete neighborhood.

Case . Notice that this term is a particular case of the previous data term. The functional to minimize
(21) where , does not depend on spatial derivatives on . Then, a simple thresholding step gives an explicit solution Zach .
Proposition 4
The minimum of (21) with respect to is
(22)

Appendix B Implementation details
Our code is written in C. The numerical scheme to solve the functional , in both steps, is based on the implementation of Palomares2014 . Image warpings use bicubic interpolation. The image gradient is computed using centeredderivatives. Input images have been normalized between . The algorithm parameters are initialized with the same default setting for all the experiments. Both time steps are set to to ensure convergence. As stopping criterion, the optical flow method uses the infinitenorm between two consecutive values of with a threshold of . The coupling parameter is set to . The smoothness term weight is set to for the  functional and for the  one, as suggested by vogel2013evaluation , where is the cardinality of the neighborhood considered in the term (we use a neighborhood of in the data term and then ) and we fixed and for the spatial an color domain of the term. For the iterated faldoi strategy we set to 3. The size of the patch in the local minimization is . The complexity of our algorithm is , where is the number of pixels of the image frame. The basic faldoi algorithm takes around 20 seconds for  energy and around 10 minutes for  over an image. Notice that our algorithm using  energy is very slow, especially because our implementation does not use parallized code. As  can be easily parallelized, it should take the same time for both functionals using a GPU implementation.
References

(1)
Alvarez, L., Deriche, R., Papadopoulo, T., Sánchez, J.: Symmetrical dense
optical flow estimation with occlusions detection.
In: European Conference on Computer Vision, pp. 721–735 (2002)
 (2) Alvarez, L., Weickert, J., Sánchez, J.: Reliable estimation of dense optical flow fields with large displacements. International Journal of Computer Vision 39(1), 41–56 (2000)
 (3) Ayvaci, A., Raptis, M., Soatto, S.: Sparse occlusion detection with optical flow. International Journal of Computer Vision 97(3), 322–338 (2012)
 (4) Bailer, C., Taetz, B., Stricker, D.: Flow fields: Dense correspondence fields for highly accurate large displacement optical flow estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4015–4023 (2015)
 (5) Baker, S., Scharstein, D., Lewis, J.P., Roth, S., Black, M.J., Szeliski, R.: A database and evaluation methodology for optical flow. International Journal on Computer Vision 92(1), 1–31 (2011)
 (6) Ballester, C., Garrido, L., Lazcano, V., Caselles, V.: A TVL1 Optical Flow Method with Occlusion Detection, Lectures Notes in Computer Science, vol. 7476, p. 31–40 (2012)
 (7) Bao, L., Yang, Q., Jin, H.: Fast edgepreserving patchmatch for large displacement optical flow. Image Processing, IEEE Transactions on 23(12), 4996–5006 (2014)
 (8) Black, M.J., Anandan, P.: The robust estimation of multiple motions: Parametric and piecewisesmooth flow fields. Computer vision and image understanding 63(1), 75–104 (1996)
 (9) Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: European Conference on Computer Vision, vol. 3024, pp. 25–36 (2004)
 (10) Brox, T., Malik, J.: Large displacement optical flow: descriptor matching in variational motion estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(3), 500–513 (2011)
 (11) Buades, A., Facciolo, G.: Reliable multiscale and multiwindow stereo matching. SIAM J. Imaging Sciences 8(2), 888–915 (2015)
 (12) Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: European Conference on Computer Vision, pp. 611–625 (2012)
 (13) Chambolle, A., Pock, T.: A firstorder primaldual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision 40(1), 120–145 (2011)

(14)
Chen, Z., Jin, H., Lin, Z., Cohen, S., Wu, Y.: Large displacement optical flow
from nearest neighbor fields.
In: Computer Vision and Pattern Recognition, IEEE Conf. on, pp. 2443–2450 (2013)
 (15) Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886–893 (2005)
 (16) Demetz, O., Hafner, D., Weickert, J.: Morphologically invariant matching of structures with the complete rank transform. International Journal of Computer Vision 113(3), 220–232 (2015)
 (17) Dollár, P., Zitnick, C.L.: Structured forests for fast edge detection. In: Computer Vision (ICCV), IEEE International Conference on, pp. 1841–1848 (2013)
 (18) Fortun, D., Bouthemy, P., Kervrann, C.: Aggregation of local parametric candidates with exemplarbased occlusion handling for optical flow. Computer Vision and Image Understanding
 (19) Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Tech. rep., Annals of Applied Statistics (2007)
 (20) Gadot, D., Wolf, L.: Patchbatch: a batch augmented loss for optical flow. CoRR abs/1512.01815 (2015). URL http://arxiv.org/abs/1512.01815
 (21) Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
 (22) Gilboa, G., Osher, S.: Nonlocal operators with applications to image processing. Multiscale Modeling & Simulation 7(3), 1005–1028 (2008)
 (23) Hafner, D., Demetz, O., Weickert, J.: Why is the census transform good for robust optic flow computation? In: ScaleSpace and Variational Methods in Computer Vision, pp. 210–221 (2013)
 (24) Horn, B.K.P., Schunck, B.G.: Determining optical flow. Artificial Intelligence 17, 185–203 (1981)
 (25) Hu, Y., Song, R., Li, Y.: Efficient coarsetofine patchmatch for large displacement optical