On Optical Flow Models for Variational Motion Estimation

12/01/2015 ∙ by Martin Burger, et al. ∙ University of Münster 0

The aim of this paper is to discuss and evaluate total variation based regularization methods for motion estimation, with particular focus on optical flow models. In addition to standard L^2 and L^1 data fidelities we give an overview of different variants of total variation regularization obtained from combination with higher order models and a unified computational optimization approach based on primal-dual methods. Moreover, we extend the models by Bregman iterations and provide an inverse problems perspective to the analysis of variational optical flow models. A particular focus of the paper is the quantitative evaluation of motion estimation, which is a difficult and often underestimated task. We discuss several approaches for quality measures of motion estimation and apply them to compare the previously discussed regularization approaches.



There are no comments yet.


page 21

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Motion estimation is a crucial task in many different areas. On the one hand, motion estimation is important in medical and biological contexts where the goal is e.g. to track moving cells or to detect the motion of organs. On the other hand, it is also important in the automotive sector. Nowadays, there are many approaches that use motion estimation to make driving saver, by detecting both dangers from the outside of a car and also inattentiveness of a driver, which expresses e.g. in slower movement or closing eyes due to tiredness. These are only two of very many further applications of motion estimation.
Motion estimation generally arises in the context of image sequences , depending on a spatial position and a time . For real applications there exists only the discrete counterpart of , which is a set of images recorded at time steps . There also exist a variety of characteristics we have to consider when estimating motion:

  • A digital image can suffer from low resolution, low contrast, different gray levels and noise.

  • The temporal resolution is strongly connected to the underlying motion. For too large time steps we might loose correspondence between consecutive images (e.g. a very fast car might only be visible in one image).

  • A natural image often contains a set of moving objects with different speeds. A sufficient model should simultaneously be able to detect small and large movements in the same sequence. On the other hand, for the static background no motion should be detected.

  • For many biological applications, we have to consider the fact that the illumination is constant, but fluorescence of the observed object can be inhomogeneous in space, or might even underlie changes over time.

1.0.1 Optical Flow and Real Motion

When looking at image sequences and moving objects we directly speak of motion. This is a false implication since only projections of the real 3-dimensional motion fields are recorded by an image recording device (and in particular the human eye).
To emphasize this fact, we consider a camera recording traffic on a highway. The camera is only able to follow 2-dimensional paths on the image domain , which is the projection of the real 3-dimensional path . Thus, already one degree of information gets lost here. This problem is even worse since we are not able to measure the 2-dimensional motion field directly. On images only the apparent motion (or optical flow) is visible, that is displacements of intensities. Unfortunately the apparent motion and the 2-dimensional motion field are two fundamentally different properties (a detailed discussion of this problem can be found in [32]). The Barber’s pole example is often used to underline this difference. The pole simply rotates counterclockwise, but optical devices (e.g. camera, eye) can only detect gray values tending upwards and consequently the optical flow points upwards.
The aim of an optical flow model is to detect the real motion only from the intensities of the given images. Performing flow estimation we have to deal with an inverse problem. We need to find the right motion that leads to the actual appearance of a specific image.

2 Models

One of the most common techniques to formally link intensity variations and motion is the optical flow constraint. Based on the assumption that the image intensity is constant along a trajectory with

we get using the chain-rule


The last equation is generally known as the optical flow constraint. This optical flow constraint can also be regarded as a linear inverse problem. Therefore, we rewrite (1) by changing the positions of and to get


We define via

and (3)

Hence, the constraint fulfills the inverse problem , with or depending on the norm of the data fidelity. A more detailed introduction into the optical flow problem can be found in for example in [1] or [5]. We will discuss this issue in more detail in the following sections.

2.1 Variational Models with Gradient Regularization

The optical flow constraint constitutes in every point one equation, but in the context of motion estimation from images we usually have two or three spatial dimensions. Consequently, the problem is massively underdetermined. However, it is possible to estimate the motion using a variational model

where represents the so-called data term and incorporates the optical flow constraint in a suitable norm. The second part models additional a-priori knowledge on and is denoted as regularizer. The parameter regulates between data term and regularizer.
Possible choices for the data term are


The quadratic L norm can be interpreted as solving the optical flow constraint in a least-squares sense inside the image domain . On the other hand, taking the L

norm enforces the optical flow constraint linearly and is able to handle outliers more robust.

The regularizer has to be chosen such that the a-priori knowledge is modeled in a reasonable way. If the solution is expected to be smooth, a quadratic L norm on the gradient of is chosen and we have

Another possible approach is to choose the total variation (TV) of if we expect piecewise constant parts of motion. In the finite dimensional setting this can be written as



results in the very well known model of Horn and Schunck [18], 1981. With efficient primal-dual schemes to minimize L norms [12], L-TV optical flow models with


became very popular [36, 25]. This model was further improved by Werlberger et el [35], where the classical TV regularizer was replaced with a Huber norm. In this context, let us refer to a recent survey by Becker, Petra and Schnörr [5].

2.2 Extension of the Regularizer

The previously described regularizations are able to produce either smooth velocity fields (L regularization) or sharp edges (TV regularization). However, in realistic applications we often need a combination of both characteristics. To calculate velocity fields that combine smoothness and sharp edges, we need to extent the regularization term.
A possibility to In the following approach we want to make use of the potential of TV regularization to calculate sharp edges. To incorporate the possibility to get smooth parts, we need an additional primal variable , which will be needed for a second regularization term that allows for smooth transitions. We link the variables and by forcing

If we penalize this difference with an L norm and minimize it for , we get a simple TV regularization, which is shifted by . This way we ensure to keep characteristics of a TV regularization for our results. To combine this with the characteristics of another regularization technique, we apply a second regularizer to and also minimize everything for . This procedure leads us to the following extended regularizer:


The term is the additional term, which should be chosen in a way that it produces smooth flow fields. The weighting parameters and on the one hand determine the relation between the regularization parts and the optical flow constraints. On the other hand they also determine the relation between the two regularization parts itself and this way adjust the proportion of smooth parts and edges in the result. If is large, the constant parts and edges outweigh the smooth parts; if it is small, the smooth parts outweigh the constant parts and edges. Note that a functional of the form (4) leads to a regularization of via


Similar to the TV term in a standard model, we can choose between different evaluations of the norm in the above formulation. Since this is a shifted TV norm, we once more have the possibility to use the anisotropic variant.

2.2.1 L-Tv/l Optical Flow Model

As we have already seen before, a regularization with the L norm leads to smooth approximations of a flow field. In the following approach, we use L of as a norm for the additional regularization term , which leads us to


Together with

we get an L-TV/L model. This model is a combination between the previously described Horn-Schunck model and the L-TV model and thus contains characteristics of both kinds of flow fields. The regularizer of this model is very similar to the Huber function [19], which is defined as

The anisotropic variant of an optical flow model with Huber regularization has also been discussed in [35]. For another formulation of a related regularizer see e.g. [33] or [8].
Even if the Huber function can be solved directly, we want to be able to compare the different models by using the same algorithm. Therefore, later on we we will use the same the algorithm as for the other approaches.

2.2.2 L-TV/TV Optical Flow Model

Another possibility to consider smoothness in a flow estimation is to incorporate a higher order derivative. In the setting of (4), we can choose as TV of , which leads us to


Since the first term of the regularization leads to being close to the gradient of , the last term considers a term, which is close to the second gradient of . For the additional TV term in this model, we can again choose between the isotropic and the anisotropic variant, i.e. equivalently to those in Section 2.1. Overall, this leads to a huge variety of possible evaluations for this model.
By using

as a data term we get an L-TV/TV model, which we will later use for the numerical realization.
The regularizer we get with the described approach is a primal realization of a specific case of a Total Generalized Variation (TGV) regularizer [7]. See [28, 27] for further analysis of the TGV regularizer applied to motion estimation models.

2.3 Bregman Iterations

A well-known drawback of TV regularization in combination with L data fidelity is the loss of contrast, which means that the difference between intensities in the result is lowered. However, edges are well recovered. Concerning optical flow, this shows up in a reduced velocity of motion. A possibility to overcome this drawback is to apply contrast-enhancing Bregman iterations [23]. For the case of higher order regularization term additional systematic bias, e.g. related to slopes, is corrected in the same way (cf. [6]). For this sake, we need the Bregman Distance of the regularizer , which is defined as

for . This distance replaces the former regularization term in the respective models. This way it is possible to re-enhance the contrast. Thanks to the condition that is in the subgradient of , it is ensured, that the edges stay at the correct positions. The Bregman iteration iteratively computes as a solution of


Afterwards, the Bregman variable is updated by setting

Applying Bregman iterations to the optical flow problem with L data fidelity can equivalently adding the Bregman variable to the data. For the n-th Bregman iteration we can use the extended data term , which results in an additive constant for the optimality condition. So far Bregman iterations have not been investigated as an iterative regularization method for optical flow. A previous investigation [17] was only considering (split) Bregman methods as a minimization method for the original variational model, not investigating the possibility to reduce systematic bias. As we shall see below (see Figure 3) the Bregman iteration can yield improvements in parts where other variational methods underestimate motion (see e.g. the right block in the rubber whale data set below) and produce competitive results (see Table 2). It might thus nicely complement other approaches.

In the case of an data fidelity, the simple modification of the data term is not possible and the iteration needs to be carried out via the original form (8). Due to the additional linear term in comparison to the other only linear terms in the functional, it is not guaranteed that the functional is coercive, hence the well-posedness of the iteration is not guaranteed in the infinite-dimensional setting. In any case, it is not sure that the Bregman iteration yields additional benefit, since the L data fidelity does not introduce systematic bias. We leave this issue to future research.

3 Analysis

In the following we briefly comment on some aspects in the analysis of variational problems for optical flow, with special focus on particular aspects related to the recent extensions of total variation. We are able to generalize existing results for the Horn-Schunck and -TV to other data terms, focusing in particular on . This can be done under the same conditions on the image gradient as previously used, but the proof needs some adaption to take care of the constant functions in the null-space of the gradient. The coercivity related to the latter needs a different proof compared to previous results (cf. [30]), while on the other hand we give a more concise proof as in [16] and avoid too strong regularity conditions. The understanding of the basic structure of the proof then immediately allows a generalization to existence results for the different recent variants of total variation regularization, whose analysis has not been studied in the optical flow setting. Finally, we will discuss another problem hardly studied in the optical flow literature previously, namely the quantitative estimation of errors in the motion estimate when changing image data.

3.1 Existence of Minimizers

The basic issue of existence of minimizers for optical flow problems has been considered by various authors (cf. e.g. [1, 16]) using methods of calculus of variations. With the assumption and it is straightforward to see that the operator and data defined in (3) are bounded for and with standard convexity arguments lower semicontinuity of the data fidelities follow (for assumptions can even be relaxed). The associated data terms can all be shown to be L-coercive on subspaces of in the case of the Horn-Schunck model and of

for the other models. The subspaces are given by functions with mean value zero for standard models, while they are given by functions of vanishing mean and first moments in the case of the TV-TV and TV-L

regularizer. Hence, in order to obtain coercivity and thus existence of minimizers it is a natural question whether the data term yields coercivity on the whole space, i.e. boundedness of the data fidelity further implies bounds on the mean (and potentially the first moments) of . Obviously this cannot be true without further assumptions on (which is seen immediately for constant ). In the case of the Horn-Schunck model this question was answered in [30], who showed coercivity for the original Horn-Schunck model (see also [20] for a detailed discussion). His proof heavily uses the quadratic structure of the data fidelity as well as the fact that functions in the kernel of the regularization term are constant, hence it can be generalized to the L-TV model, but neither to extensions of the TV regularization nor to L-fidelities. However, the result can easily be generalized as we show in the following:

Lemma 3.1.

Let , arbitrary, be a bounded domain and let be a sequence in , . Let and such that the functions

are linearly independent. If there exists a sequence of vectors

such that is bounded in and

then is bounded in


Under the above assumptions it obviously suffices to prove that the sequence is bounded in .We have

The boundedness of of imply that is bounded in . Now assume that is unbounded, then there exists a subsequence (again denoted by with growing to infinity in a monotone way. Then is well-defined and bounded, moreover tends to zero in . Hence, has a convergent subsequence with limit satisfying , in particular . By continuity of the norm we find

which contradicts the linear independence of the functions and thus yields the assertion. ∎

With the Poincare-Wirtinger inequality and the embedding of into for and in arbitrary dimension, we obtain the coercivity of the total variation on functions of mean value zero. Now we choose equal to the mean value of and obtain coercivity of the respective L-TV model in . With a standard (weak) lower semicontinuity argument for the convex functionals we thus deduce the following existence result:

Theorem 3.1.

Let be a bounded domain and let either or . Let and such that the functions are linearly independent. Then there exists a minimizer for the L-TV model in the cases for arbitrary and for .

Lemma 3.2.

Let , arbitrary, be a bounded domain and let be a sequence in , . Let and such that the functions are linearly independent. If there exists a sequence of vectors and matrices such that is bounded in and

then is bounded in


Under the above assumptions it obviously suffices to prove that the sequence is bounded in and the sequence is bounded in . We have

The boundedness of of imply that is bounded in . A limiting procedure as in the proof of Lemma 3.1 yields a nontrivial limit respectively such that

which contradicts the linear independence assumption and thus yields the assertion. ∎

Using the coercivity of the effective functionals on the space of BV-functions with vanishing mean and first moments (cf. [6] in the TV/TV case and [11] in the TV/L case) and a similar proof as above we obtain the following result:

Theorem 3.2.

Let be a bounded domain and let either or . Let and such that the functions are linearly independent. Then there exists a minimizer for the L-TV/TV model and the L-TV/L model in the cases for arbitrary and for .

A major restriction of the existing existence analysis is the fact that some regularity of and needs to be assumed, which might not be met by solutions of transport equations, in particular for noisy versions of the images (note however that many results in the literature assume even stronger regularity, e.g. images of class as in [16]). A detailed analysis of such issues remains a relevant question for future work. The case of stochastic noise in the image data might even enforce different approaches, e.g. Bayesian models (cf. [31]).

Let us finally mention that uniqueness of minimizers cannot be expected for total variation models, since neither the regularization nor the data term are strictly convex (due to the nullspace of the map ). In the case of an L data fidelity one can at least derive uniqueness of the component normal to the image level sets.

3.2 Quantitative Estimates

Quantitative estimates have hardly been investigated for optical flow models in the past, which is probably due to the nonuniqueness of the flow reconstruction. Hence, an estimate to a ground truth seems out of reach in most cases. Nonetheless, it seems interesting to take a look at the optical flow problem from the perspective of error estimation for inverse problems (cf.

[10]). Let us consider the case of L data fidelities for simplicity, we start from the optimality condition


The obvious first question to ask is the robustness of solutions for errors in the data , a second question is related to the behaviour as tends to zero. Let us first sketch the derivation of quantitative error estimates in the case of data perturbations, for this sake let be the solution of the variational model with data and . Then we have

A duality product with and an elementary calculation yields

Using appropriate duality estimates on the right-hand side one immediately obtains estimates for the Bregman distances and the error in the velocity component parallel to in terms of errors in the data. To do so, let us further rewrite the right-hand side to obtain

and with Young’s inequality we find

The two error terms on the right-hand side can be interpreted well. The first term measures the difference of the image data in a norm related to the optical flow model, it can further be related to the consistency in the optical flow constraint with estimated velocity . The second term creates more trouble, since due to it can contain terms in the direction perpendicular to , in which case it cannot be related to the first term on the left-hand side. Note that when viewing the optical flow estimation as an inverse problem this term is indeed related to the model error, i.e. the error in the forward operator . Even if this term can be estimated somehow by the Bregman distances on the left-hand side (e.g. in the case of the Horn-Schunck functional simply by Young’s inequality), it will lead to an error term of order . With the inherent smallness of we hence conclude that errors in the image gradient will strongly influence the result. This can e.g. be interpreted in terms of the accuracy in the discrete approximation of the image gradient. The quality in the estimated flow indeed depends heavily on the gradient approximation in numerical results. This is illustrated in Table 2 via an evaluation of errors for different choices of difference quotients for .

In the case of negligible error in the image gradient we can give an improved estimate:

Proposition 3.1.

Let and be weakly differentiable image data such that and . Then the estimate


holds for and being the solution of

with data respectively .

Note that we see in any estimate that the velocity parallel to the image gradient (normal to level lines) can be estimated robustly from the image error (with constants independent of ), while we obtain hardly information perpendicular to the gradient. This is just a mathematical consequence of the aperture problem in motion estimation. In the perpendicular direction we obtain some stability of the velocity via the Bregman distance terms, but note that the error will grow inversely proportional to . We mention that to leading order in a small time step we have

Hence, the error measure is strongly related to the end-point error frequently used in practical evaluations of motion estimation, an issue we will discuss in detail below.

The limit is the one usually studied in inverse problems with strong emphasis on the appropriate treatment of instabilities. In the case of motion estimation the underdeterminedness is the more crucial issue. Thus, the key question is to understand what solutions are obtained in the limit, respectively which kind of flows can be reconstructed well. The limiting solutions are determined by the constrained problem


The solutions to be reconstructed well are those satisfying a source condition (cf. [10]), i.e. the optimality condition for the constrained problem in a Lagrangian duality setting. This is equivalent to


for some . Under this condition one can construct data , such that the constrained solution also satisfies the unconstrained variational problem with regularization parameter , thus error estimation reduces to the above case. From (12) we can give a mathematical reason why it is it is useful to have images with a large gradient, which corresponds to the usual intuition. The data perturbation is a multiple of , hence error estimates depend on the norm of , which is however inversely proportional to the norm of . A more detailed study of (12) depends on the specific regularization terms and is an interesting question for further research.

4 Numerical Solution

In the following we discuss the numerical solution of the variational models discussed above. We start with a unified discussion of primal-dual iterative methods and then proceed to discretization issues.

4.1 Primal-Dual Algorithm

The concept of duality offers an efficient way of minimizing the introduced variational models. Let us for the numerical application consider two finite dimensional vector spaces and equipped with a scalar product and a norm . We furthermore consider a continuous linear operator . Now, class of variational problems in this section can be written as


with are proper, convex and lower semi-continuous functionals. In our application, usually incorporates the data term, whereas denotes the regularizer. In the following, we denote Equation as the primal problem. The primal problem is often hard to minimize or yields very slow algorithms. Instead of the primal problem, we can equivalently optimize the so-called primal-dual problem:


Here refers to the convex conjugate of , which can be easily calculated for the observed regularizers. The very well known algorithm of Chambolle and Pock dedicates to this class of problems [12]. For , a pair and initial value we obtain the following iterative scheme to solve the saddle-point problem :


Here, denotes the proximal or resolvent operator

which can be interpreted as a compromise between minimizing and being close to the input argument . Suitable choices of and were chosen according to [26] and are given in Section 4.2 for each algorithm. The primal-dual methods are efficient if the proximal problem can be solved efficiently. In [12] the authors demonstrated an application of the primal-dual algorithm to a L-TV optical flow problem. In the following we will extend this application to each of the introduced models and provide solutions to the occurring proximal problems.
In Table 4.1 we transferred our models to the notation of the primal problem (13). Let us begin with the proximal problems for the primal part , for a more detailed description we refer to [13].

L data term:

For the L data term we have to solve for the following proximal problem:

This yields a linear optimality system, which can be directly calculated. The solution is given by , where is a linear function also incorporating the scalar terms and .

L data term:

In case of the L optical flow term, the proximal problem reads

Defining the affine linear function , the problem above represents a affine-linear L optimization problem. It can be shown that the solution is given by

Model G(v) F(Kv) K
Model G(v,w) F(K(v,w)) K
L regularization:

Denoting the proximal problems for the dual part requires calculating the convex conjugate of the dual part first. For the L regularization term we calculate and obtain the proximal problem

Consequently, the proximal problem can be solved straightaway.

TV regularization:

In case of the total variation and the higher order regularization we always have L norms involved. Similar to the standard TV regularization () we have , where denotes the indicator function of the L unit ball. The corresponding proximal problem then reads

The solution is given by a point-wise projection of onto the unit ball corresponding to the chosen vector norm. The solution is will be discussed in more detail in Section 4.2.

4.2 Discretization and Parameters

In the context of discretization we have to consider two different aspects. On the one hand we have to approximate the image derivatives and , which are passed to the algorithm as static variables afterwards. One the other hand, we have to discretize the operators and coming from the primal-dual iteration.
Starting with the image discretization, we consider the image domain to consist of the regular grid:

Since we are interested in the velocity field between and we have to evaluate the derivatives of at . In our observation, a mixed scheme consisting of a forward difference for the time derivative and central differences for the spatial derivatives turned out to give the best results. Stability in terms of is not required, due to the fact that they act as constants in the model. Consequently, we obtain the following scheme:

For the sake of simplicity, we omitted the time-dependency in the derivatives.
The operators in our model only consist of gradients for the spatial regularization and the corresponding divergences. We propose a forward discretization with Neumann boundary conditions for the gradient and consequently obtain a backward difference for the divergence to keep a discrete adjoint structure. The resulting scheme reads

The primal-dual algorithms moreover requires suitable parameters and such that . It has been shown in [26] that a diagonal preconditioning technique can be used to find without calculating . According to [26], Lemma 2, we choose for the gradient models from Section 2.1 and for the extended models from Section 2.2.
In the previous section, the solution of a proximal problem involving an indicator function of the L unit ball of was given as a point-wise projection. To conceive this, let us begin by writing down the convex set corresponding to :

Here, refers to the discrete maximum over all elements in the image domain:

An interesting aspect is how to evaluate the vector norm . Choosing

results in the so-called anisotropic variant, whereas

is denoted as the fully isotropic one. The anisotropic total variation is more suitable when dealing with quadratic structures, the isotropic variant better fits to circular structures. Depending on the chosen vector norm for (anisotropic, fully isotropic), we obtain a different dual inner norm . For the anisotropic case, that refers to the vectorial L norm, we get the maximum norm for the dual variable:

Since the fully isotropic case refers to the self-dual Euclidean norm, we get in this case

Transferring this to the optimization problem above, we get in the anisotropic case a the point-wise projection of onto :

and for the isotropic case a point-wise projection of onto the L unit ball:

For isotropic and anisotropic variants of the extended models, the norms are achieved equivalently. However, it has to be considered that for the L-TV/TV model there are independent norms for both parts of the regularization. Using the partially isotropic variant also results in independent norms for the different components of . For the sake of simplicity we will restrict ourselves to the fully isotropic cases for all appearing total variations in the numerical realizations.

5 Results

5.1 Error Measures for Velocity Fields

Finding error measures for velocity fields is a delicate problem which can be motivated by the following simple example. Consider two images, each consisting of only pixels of the following form:


From our point of view we seek for a velocity field that transforms into . Unfortunately, it is unclear which velocity field underlies this motion. Examples of possible solutions are

  1. Just move the pixels at position and by 1 to the bottom, hence

  2. Move the pixels at position and by 1 to the bottom and change their position:

  3. Move the whole image by 1 to the bottom which gives

  4. Move the pixels at position and by 1 to the bottom and exchange some pixels in the third column

All these velocity fields produce the same result, that is they all fulfill the optical flow constraint . This results from the fact that for given images the optical flow constraint states for every point only one equation for a 2-dimensional velocity field . The solution is unique iff there exists a unique bijection between and . This requires both images to consist of the same intensity values, which furthermore have to be unique. Unfortunately, this assumption is far from reality, since regions of constant intensity are characteristic for background or objects. Hence, for practical problems we have a highly underdetermined system and consequently a huge variety of possible underlying velocity fields .
As we will see later an error measure always explicitly or implicitly favors one of these possible results over the others. Those error measures, which explicitly prefer one result, expect a given ground truth field and measure a distance from the calculated velocity field to the given ground truth field. This strategy is questionable because in real world examples we usually do not have a ground truth velocity field, and artificial examples can usually be generated by several different ground truth fields. Hence, setting the ground truth directly favors a subjective result. This is at least from the mathematical viewpoint questionable.

5.1.1 Absolute Endpoint Error

Despite the fact that explicit error measures might be problematic they are often used in the literature and we will also use them to evaluate our algorithms. In [2] two explicit error measures have been presented. The most intuitive one is the average endpoint error (aee), proposed in [24], which is the vector-wise Euclidean norm of the difference vector . The difference is divided by and we have