 # Statistical Inverse Formulation of Optical Flow with Uncertainty Quantification

Optical flow refers to the visual motion observed between two consecutive images. Since the degree of freedom is typically much larger than the constraints imposed by the image observations, the straightforward formulation of optical flow inference is an ill-posed problem. By setting some type of additional "regularity" constraints, classical approaches formulate a well-posed optical flow inference problem in the form of a parameterized set of variational equations. In this work we build a mathematical connection, focused on optical flow methods, between classical variational optical flow approaches and Bayesian statistical inversion. A classical optical flow solution is in fact identical to a maximum a posteriori estimator under the assumptions of linear model with additive independent Gaussian noise and a Gaussian prior distribution. Unlike classical approaches, the statistical inversion approach to optical flow estimation not only allows for "point" estimates, but also provides a distribution of solutions which can be used for ensemble estimation and in particular uncertainty quantification.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Optical flow reflects the visual motion between consecutive images. Determination of optical flow is important for applications ranging from machine learning and computer vision



, artificial intelligence and robotics

[4, 25], to scientific applications from oceanography to weather forecasting [6, 5, 10, 19, 20], to name a few. In classical approaches optical flow is determined by solving a variational optimization problem which requires inclusion of a regularization term in additional to data fitting for the problem to be well-posed [1, 16, 35]. The choice of regularization parameter turns out critical for the satisfactory inference of optical flow. Despite the many (competing) methods of selecting the regularization parameter, none seems to be most “natural” comparing to the others [15, 30, 34]

. Another important feature of classical optical flow approaches is that they produce a single solution (by design) as a “point estimate”. In practice, the magnitude of the flow as well as measurement error and noise can vary significantly from one part of the images to another, and thus the at a given pixel the inferred flow vector can be associated with either large or small uncertainty. Although not captured in classical optical flow, such “uncertainty” would provide valuable information if attainable as part of the solution.

In this work we adopt a statistical inversion framework for the computation of optical flow. In this framework the estimation of optical flow is reformulated from the Bayesian perspective as a statistical inversion problem. From this new, statistical perspective, different types of information black and model assumptions are collectively fused to give rise to a posterior distribution of the unknowns of interest. The posterior distribution, which is typically sampled using some appropriately designed Markov chain Monte Carlo scheme, can be further used to derive various estimates. Importantly, unlike the point estimate of optical flow obtained by classical variational approaches, the proposed statistical inversion approach is a methodological way of uncertainty propagation to produce a distribution of candidate optical flow fields from which various statistical properties can be extracted, including ensemble average estimation and uncertainty quantification.

The rest of the paper is organized as follows. In Section 2 we review the standard optical flow setup and discuss how it can be treated as an inverse problem defined in finite dimensions. In Section 3 we we review the basics of inverse problems, including the standard least squares approach and Tikhonov regularization. Then, in Section 4 we present a statistical inversion approach for optical flow, and discuss the choice of priors, models, sampling procedure and computational algorithms. In Section 5 we showcase the utility statistical inversion approach for optical flow using several benchmark examples of flow fields and noisy image data. Finally, Section 6 contains a conclusion and discussion of several unsolved and unattempted issues that might be of interest for future research.

## 2 Optical Flow as an Inverse Problem

Given a sequence of consecutively captured images, the visual relative motion between them is commonly referred to as optical flow, which can often provide insight about the actual physical motion. The inference of optical flow is an outstanding scientific question, which requires making assumptions about the underlying motion as well as the measurement process.

### 2.1 Problem Setup

Consider two single-channeled (typically grayscale) digital images (“pictures”) taken from the same scene at two nearby time instances. The image data are represented by two matrices and . Thus each image contains pixels, defined on a common two-dimensional subspace . The goal of optical flow estimation is to infer a flow field, defined by matrices and where represents the optical “velocity” occurring at the -th pixel inferred from the two images. The image data and are often regarded as sampled data from smooth functions and , with and where are grid points from a spatial domain . Thus and can be viewed as discrete spatial samples of a smooth 2D flow field , which is defined on that captures the visual optical motion occurring between the two observed images.

### 2.2 Variational Approach of Inferring Optical Flow

The classical variational approach of optical flow starts by defining an “energy” functional whose minimization yields an estimation of the optical flow field . One of the most widely used functional was proposed by Horn and Schunck in 1981 , given by

 E(U,V)=∫∫Ω(FxU+FyV+Ft)2dxdy+α∫∫Ω(∥∇U∥2+∥∇V∥2)dxdy, (1)

where and are smooth functions defined over which represent a candidate flow field. In the Horn-Schunck functional, the first term is often referred to as data fidelity as it measures the deviation of the total image intensity from being conservative, that is, how much does the model fit deviates from the condition

 dF/dt=FxU+FyV+Ft=0. (2)

The second term measures the solution regularity by penalizing solutions that have large spatial gradients and is called the regularization term. The relative emphasis of smoothness as compared to “fitting” the total image intensity conservation equation (2) is controlled by the positive scalar which is called a regularization parameter. The main role of the regularization term is to ensure that the minimization of the functional is a well posed problem. Without the regularization term, the problem is ill-posed.

Given , the functions and that minimize the Horn-Schunck functional (1) satisfy the Euler-Lagrange equations

 {Fx(FxU+FyV+Ft)=α(Uxx+Uyy),Fy(FxU+FyV+Ft)=α(Vxx+Vyy), (3)

which are typically solved by some iterative scheme over a finite set of spatial grid points [16, 29] to produce an estimation of the optical flow. Alternatively, one could also discretize the functional (1) itself to yield a finite-dimensional inverse problem as discussed in Section 2.3 below with solution strategy reviewed in Section 3.

### 2.3 Finite-dimensional representation of the Variational Optical Flow Functional

As discussed in the previous section, the classical variational approach of optical flow works by first formulating and minimizing a functional over smooth vector fields, and then evaluating the obtained vector field at the grid points on which the original image data are given. Here we approach the problem from a different route, by first discretizing the functional (1) to convert the functional minimization (an infinite-dimensional problem) into a finite-dimensional linear inverse problem, and then solving the inverse problem to yield solutions which give the values of a vector field defined over the same grid points as the image data.

The remainder of this section will be focused on the conversion of the functional (1

) into a finite-dimensional function defined on a set of uniformly distributed grid points

 {(xi,yj)}i=1,2,…,nx;j=1,2,…,ny, (4)

where and are the spacing in the direction and the direction, respectively. The conversion will be achieved by approximating the integrals in Eq. (1) with appropriately derived summations over the grid points. For notational convenience, we use a bold lowercase variable to denote the vectorization of a matrix. For example, the boldface vector denotes the column vector obtained by “vertically stacking” the columns of a matrix in order , where denotes the -th column of . That is,

 q=vec(Q)=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣→Q1→Q2⋮→Qn⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦. (5)

First let us consider the data fidelity term: . The spatial derivatives and can be approximated by a finite difference scheme. For example, the simple forward difference yields the approximations:

 (6)

We next express these derivatives as operations on the column vector . To do this, we define matrix as

 S(ij)k={−δij+δi+1,j,~{}% if i

The forward difference applied to can be represented as

 {fx≈Qxf,fy≈Qyf, (8)

where

 ⎧⎨⎩Qx≡1Δx[In⊗Sm],Qy≡1Δy[Sn⊗Im]. (9)

The temporal derivative can be estimated from the data by the direct pixel-wise difference between the two images, to yield

 ft≈g−f. (10)

With these definitions and approximations, we obtain a discretized version of the conservation equation (2) expressed as a finite-dimensional linear system:

 Ax=b, (11)

where

 ⎧⎪ ⎪⎨⎪ ⎪⎩A=[diag(fx),diag(fy)],x=[u⊤,v⊤]⊤,b=−ft. (12)

Here and represent the diagonal matrices whose diagonal elements are given by the entries of the vector and , respectively. From this connection we approximate the first integral in the functional (1) as where denotes the standard Euclidean norm.

Next we develop a finite-dimensional approximation of the regularization term in the functional (1). This requries discretization of and . Using a similar forward difference to approximate the partial derivatives, we obtain

 ⎧⎨⎩∇U≈1Δx[U(x+Δx,y)−U(x,y)]+1Δy[U(x,y+Δy)−U(x,y)],∇V≈1Δx[V(x+Δx,y)−V(x,y)]+1Δy[V(x,y+Δy)−V(x,y)]. (13)

For the vectorized variables and , we have

 {∇u=ux+uy≈(Qx+Qy)u,∇v=vx+vy≈(Qx+Qy)v, (14)

where and are defined in Eq. (9). Consequently, we obtain the approximation of the second integral in the Horn-Schunck functional (1) as

 {∫∫Ω∥∇U∥2dxdy≈u⊤xux+u⊤yuy≈u⊤[Q⊤xQx+Q⊤yQy]u,∫∫Ω∥∇V∥2dxdy≈v⊤xuy+v⊤yvy≈v⊤[Q⊤xQx+Q⊤yQy]v, (15)

which then gives

 ∫∫Ω(∥∇u∥2+∥∇v∥2)dxdy≈x⊤Qx, (16)

where the matrix

 Q=I2⊗[Q⊤xQx+Q⊤yQy]. (17)

Therefore, the variational optical flow formulation (1) can be reformulated at a finite spatial resolution as an inverse problem, where, for a given parameter value , the corresponding solution is given by solving the following (regularized) optimization:

 minx(∥Ax−b∥2+αx⊤Qx). (18)

This is a standard approach in inverse problems, formulated as a least squares problem with Tikhonov regularization, with more details to be presented in the next section.

## 3 Inverse Problem in Finite Dimensions

Although there has been a great deal of progress on the mathematical characterization of inverse problems in the field of functional analysis, a practical problem often concerns finding a solution in a finite-dimensional space. At a fundamental level, the most common inverse problem stems from a linear model [15, 17, 34]

 b=Ax+η, (19)

where is a column vector of observed data, is a (known) matrix representing the underlying model, the column vector denotes (additive) noise, and is the vector of unknowns to be inferred.

Given and , the problem of inferring in Eq. (19) is called an inverse problem because rather than direct “forward” computation from the model, it requires a set of indirect, “backward”, or “inverse” operations to determine the unknowns . Depending on the rank and conditioning of the matrix , the problem may be ill-posed or ill-conditioned. In classical approaches, these issues are dealt with by adjusting the original problem to a (slightly) modified optimization problem as discussed in Section 3.1 whose solution is meant to represent the original, as discussed in Section 3.2.

We note that in the classical setting a solution to the inverse problem is a vector as a result of solving an optimization problem. Such a solution is referred to as an point estimate because it gives one solution vector without providing any information about how reliable (or uncertain) the solution is [15, 34]. On the other hand, the statistical inversion approach to inverse problems provides an ensemble of solutions—defined by the posteriori distribution which not only point estimates can be made but also their uncertainty quantification [11, 17].

### 3.1 Least squares solution

The classical least squares solution to the inverse problem is given by 

 xℓ2=A†b, (20)

where denotes the pseudo-inverse of

which can be obtained from the singular value decomposition of

. Depending on the rank of , the least squares solution is associated with one of the minimization problems:

 {minAx=b∥x∥2,if %rank(A)

Here denotes the (Euclidean) norm. Let the true solution to Eq. (19) be , that is, . It follows that

 xℓ2−x∗=A†η. (22)

In practice, even when the matrix has full column rank (), the discrepancy between the true and least squares solutions is typically dominated by noise when some singular values of are close to zero, rendering

an ill-conditioned matrix and the solution

unstable and sensitive to small changes in data .

### 3.2 Tikhonov Regularization

A powerful approach to resolve the instabilities due to noise and the near-singularity of is to regularize the problem. In the classical Tikhonov regularization one adds a quadratic regularization term for some prescribed matrix to penalize non-smoothness, giving rise to a regularized optimization problem [31, 32, 33, 34]:

 minx(∥Ax−b∥22+αxTLx). (23)

In the regularized problem the positive scalar parameter controls the weight of regularization and is typically a symmetric positive definite matrix, both of which need to be chosen appropriately for the problem to be uniquely defined [31, 32, 33, 34]. We refer to the two terms and in (23) as data fidelity and solution regularity, respectively. In a simplistic description, they can be described as “selecting” a solution that balances the desire to “solve” and to be “regular” as measured by . The regularization parameter therefore dictates the extent to which the compromise is made between the two.

For a given parameter , we denote the corresponding regularized solution by

 xα=argminx{∥Ax−b∥2+αx⊤Lx}. (24)

By standard vector calculus, it can be shown that is in fact a solution to the modified linear system

 (A⊤A+αL)xα=A⊤b, (25)

which is typically well-posed for appropriate choices of and . When the matrices are large and sparse, Eq. (25) is often solved by iterative methods rather than a direct matrix inversion since the latter tends to be numerically costly and unstable .

In Tikhonov regularization, a key issue is how to choose the regularization parameter appropriately. In theory, a “good” regularizer has the property that in the absence of noise, the solution to the regularized problem converges to the true solution when the regularization parameter . However, in practice, under the presence of noise, it is always a challenge to try to determine a good value for . If is too small, the instability and sensitivity of the original problem would still persist; whereas for too large of an the solution will be over-regularized and not fit the data well. A good balance is thus paramount. Despite the existence and ongoing development of many competing methods for selecting most of which focus on asymptotical optimality as the number of data points approach infinity, none of them stands out as blacka best “natural” choice unless specific priori information about the noise in the data are available (see Chapter 7 of Ref. 

). In the following section we discuss how the problem of selecting an exact regularization parameter is no longer required from a Bayesian perspective; instead, it suffices to start with some loose range of values represented by a probability distribution, unless more specific knowledge is available about the problem in which case the distribution can be chosen to reflect such information.

## 4 Statistical Inversion Approach

The statistical inversion approach to an inverse problem starts by treating all variables as random variables (e.g.,

, and in Eq. (19)), and representing our knowledge (or absence of knowledge) of the unknowns as prior distributions. With observational data and a forward model (e.g., matrix in Eq. (19)), the “inversion” leads to an ensemble of points together with a posterior distribution from which various statistical estimates can be made [21, 11, 17, 8]. The key in the inversion is to use the Bayes rule to express the posterior distribution , which is the conditional distribution of the “solution vector” given the observed data , as [11, 17]

 p(x|b)=1p(b)p(b|x)⋅p(x). (26)

Here the likelihood function

is the probability density function (pdf) of the random variable

given which is determined by the underlying model; is the priori distribution of ; and acts as a normalization constant which does not affect the solution procedure or the final solution itself.

Thus, in the statistical inversion formulation, each candidate solution is associated with the probability that is determined (up to a normalization constant ) once the likelihood function and the prior distribution are given. For a given inverse problem, the likelihood function can be obtained using the underlying model such as Eq. (19) including the noise distribution. On the other hand, the prior distribution is typically constructed according to some prior knowledge of the solution. Unlike standard approaches which produce “point estimates”, in statistical inversion it is the posterior distribution that is considered as the “solution” to the inverse problem. Based on the posterior distribution, one can further extract useful information such as point estimates and uncertainty quantification, by sampling from the distribution. Efficient sampling methods will be reviewed toward the end of this section.

The unique feature of enabling information fusion and uncertainty quantification has made the statistical inversion approach to inverse problems an attractive venue for the development of new theory and applications. In image processing applications, it has been utilized for many problems such as image denoising and deblurring [2, 18], sparse signal reconstruction , and more recently attempted for optical flow computation . In particular, we note that our approach, although different in many of the technical aspects, shares a similar statistical inversion perspective as Ref. .

### 4.1 Statistical interpretation of optical flow obtained by Tikhonov regularization

Under the statistical inversion framework, solution to an inverse problem is the posterior distribution, from which point estimates can be further obtained. Among these point estimates, a particularly common one is the maximum a posteriori (MAP) estimator, which is defined as

 x\scriptsize MAP = argmaxxp(x|p)=argminx{−lnp(x|b)} (27) = argminx{−lnp(b|x)−lnp(x)}. (28)

As noted in Refs. [2, 17], the MAP estimator given by Eq. (28) produces a vector that is identical to the solution of a Tikhonov regularization specified in Eq. (24) upon appropriate choice of the model and prior pdf. In particular, consider the model given by Eq. (19

) with independent and identically distributed (iid) Gaussian noise of variance

. It follows that

 p(b|x)=p(η)∝exp(−λ2∥Ax−b∥2), (29)

where the symbol “” means “proportional to”. Under a Gaussian prior,

 p(x)∝exp(−δ2x⊤Lx), (30)

the term in the MAP estimator becomes

 −lnp(x|b)∝∥Ax−b∥2+(δ/λ)x⊤Lx. (31)

The choice of in the Tikhonov regularization then yields a solution that equals the vector given by the same MAP estimator. Thus, with these assumptions of the form of the noise, the distribution of the prior, there is a logical bridge between two different philosophies for inverse problems, in that a solution from Tikhonov regularization can be interpreted as an MAP estimate of the posteriori distribution under appropriate choices of prior distributions and likelihood functions.

Furthermore, and crucially, as shown in the next section, more important information exists in the statistical inversion framework. Specifically, by sampling from the posterior distribution the statistical inversion approach allows for not only a point estimate but also other statistical properties associated with the solution, in particular spread around a point estimate which enables uncertainty quantification.

### 4.2 Choice of Priors and Hyperpriors

In general, the statistical inversion framework allows flexibility in choosing prior distributions about unknowns and noise. When these prior distributions themselves contain unknown parameters, these parameters can themselves be thought of as random variables which follow hyperprior distributions. In this paper we will consider some elementary choice of the priors and hyperpriors, with the main purpose of showing how to setup the procedure for sampling the posterior distribution.

We focus on (multivariate) Gaussian priors for both the unknown and noise . In the absence of knowledge of how “spread-out” the prior distributions are, we use additional parameters for these priors, where these parameters are taken to be random variables drawn from hyperprior distributions. This type of approach has been previously adopted and implemented in image denoising applications . In particular, we consider a specific class of the noise and prior distributions:

 ⎧⎪⎨⎪⎩likelihood function:~{}p(b|x,λ)∝λm/2exp(−λ2∥Ax−b∥2),prior distribution:~{}p(x|δ)∝δn/2exp(−δ2x⊤Lx). (32)

Here the noise is assumed to be additive, Gaussian, and independent of the measured data, with variance , giving rise to the form of the likelihood function. On the other hand, the prior distribution is considered to be Gaussian with covariance matrix (matrix is referred to as the precision matrix). We choose where is given by Eq. (17) which corresponds to a spatial regularization measure. As it turns out, this choice of is closely related to the selection of prior according to a spatial Gaussian Markov random field which is common in tackling spatial inverse problems [3, 14].

As we pointed out, to completely specify the posterior distribution, we also need to choose prior distributions for the parameters and . These are often called hyperpriors. Following Ref. , we choose the priors for and

to be Gamma distributions, as

 {p(λ)∝λαλ−1exp(−βλλ),p(δ)∝δαδ−1exp(−βδδ). (33)

Such choice ensures that and are conjugate hyper-priors, and makes the design of sampling of posterior distributions more convenient. In the absence of knowledge of the values of and , one needs to choose the values of , , , and to ensure the distributions and to be “wide”, allowing the Markov chain to explore a potentially larger region of the parameter space. Following Ref. [2, 3, 14], we set and unless otherwise noted. We tested other choice of parameters as well and they mainly affect the length of the transient in the MCMC sampling process and do not seem to have a strong influence on the asymptotic outcome. Details of these will be discussed in Section 5 along with numerical examples.

### 4.3 Sampling from the Posterior Distribution

In the statistical inversion formalism, once the form of the posterior distribution is derived, the remaining part of the work is devoted to efficient sampling from the posterior distribution. Typically a Markov chain Monte Carlo (MCMC) sampling approach is adopted. The main idea is to generate a sequence of samples according to a prescribed Markov chain whose unique stationary distribution is the desired posterior distribution.

Note that under the Gaussian priors and gamma hyperpriors as we discussed above, the full conditional distributions that relate to the posterior distribution are given by

 (34)

In other words,

 ⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩x|λ,δ,b∼N((λA⊤A+δL)−1λA⊤b,(λA⊤A+δL)−1),λ|x,δ,b∼Γ(m/2+αλ,12∥Ax−b∥2+βλ),δ|x,λ,b∼Γ(n/2+αδ,12x⊤Lx+βδ). (35)

We adopt the block Gibbs sampler developed in Refs. [2, 17] as a specific MCMC procedure to sample the posterior distribution. In theory the sample distribution asymptotically converges to the true posterior distribution. The approach contains the following steps.

1. Initialize and , and set .

2. Sample .

3. Sample .

4. Sample .

Here the computational burden is mainly due to Step 1, which requires drawing samples from a multivariate Gaussian variable, which is equivalent to solving the following linear system at each iteration for :

 (λkA⊤A+δkL)xk=λkA⊤b+w, where w∼N(0,λkA⊤A+δkL). (36)

For large matrices, instead of a direct solve using Gauss elimination, an iterative method is usually preferred. Among the various notable iterative methods such as Jacobi, Gauss-Seidel (G-S), and conjugate gradient (CG) , we adopted the CG for all the numerical experiments as reported in this paper, with a starting vector of all zeros, maximum of iterations, and error tolerance of . Note that the computational bottleneck in solving Eq. (36) can in principle be tackled using more efficient methods, for example, by exploring history of solutions with similar parameters to provide “good” initial guess or by utilizing the Karhunen-Loève expansion to reduce the dimensionality of the problem.

## 5 Results: Bayesian Optical Flow from Statistical Inversion

### 5.1 Benchmark Flow Fields and Noisy Image Pairs

To benchmark the proposed statistical inversion approach for Bayesian optical flow, we consider qualitatively different flow fields that span a broad diversity of possibilities. The flow fields, which are all defined on the normalized spatial domain , are generally not divergence-free. In particular, the flow fields are defined as follows.

Flow field 1: .

Flow field 2: .

Flow field 3: .

Flow field 4: .

Flow field 5: .

We first consider synthetic image pairs, where the first image is constructed by the equation

 F(x,y)=12[cos(πx)cos(πy)+1]. (37)

Then, for each optical flow field, we generate the second image using the equation

 g=f−fxu−fyv+η, (38)

where and represent the vectorization of and , respectively,

denotes (multivariate) noise whose individual components are independently drawn from a Gaussian distribution with zero mean and fixed standard deviation

, and spatial derivatives are numerically implemented using the forward difference scheme. All synthetic images are constructed at the fixed resolution of -by- pixels, with uniform spacing in both directions, represented by a set of -by- matrices.

### 5.2 Bayesian Optical Flow with Uncertainty Quantification

For each image pair , we adopt the MCMC-Gibbs sampling procedure and corresponding choice of prior pdf and hyerpriors presented in Section 4 to obtain an empirical posterior distribution as the solution of the statistical inversion problem, which we refer to as Bayesian optical flow. Unlike classical optical flow which provides a point estimate, the Bayesian optical flow can be thought of as an ensemble of flow fields each associated with some probability. From such an ensemble and associated probability distribution (the posterior distribution), we can further extract useful information. For example, the mean flow field can be computed using the sampling mean of the posterior distribution, which is shown in Fig. 1 through Fig. 6 to compare with the true underlying flow field. In particular, in each figure, the top row (a1-a3) shows the image data of the first image (a1), and the second image generated from Eq. (38) with no noise (a2) and with noise under standard deviation (a3), respectively. The middle rows (b1-b3) show the true optical flow field (b1) compared with the inferred “mean” optical flow fields together with uncertainty quantification from the MCMC samples (b2-b3). The uncertainty regions are computed as follows. At each point , we construct a 2d normal pdf by using the sample mean and sample covariance estimated from the MCMC samples after discarding the initial transients. This allows us to obtain a “mean” optical flow at point defined as . Uncertainty is quantified by computing a confidence region that contains

probability mass of the fitted multivariate normal distribution given by



 (z−μ)⊤Σ−1(z−μ)≤χ22(q). (39)

Here denotes the

-th quantile of the Chi-squared distribution with two degrees of freedom, that is,

where is the cdf of . These confidence regions (shaded ellipses) are shown for the zoomed-in plots for the inferred optical flow fields. Finally, the last row (c1-c3) in each figure shows how the MCMC procedure produces a distribution of the effective regularization parameter . Panel (c1) shows the change of over time in the MCMC sampling procedure, indicating convergence to a stationary distribution typically after a quick initial transient. The remaining panels (c2) and (c3) show the distribution of after discarding the initial transient, for both the case of no noise (c2) and the case with noise (c3).

We point out a few observations from the numerical experiments. First, the estimated mean optical flow compare reasonably well with the true flow in all examples of the qualitatively different optical flow fields, supporting the utility of the proposed statistical inversion approach [see panels (b1-b3) in all figures]. Secondly, we again point out that the MCMC procedure used in our statistical inversion approach to optical flow does not require an active prior choice of the regularization parameter. The MCMC samples seems to quickly converge to a stationary distribution for the effective parameter [panel (c1) in all figures], from which the distributions of parameters and solutions can be determined. Finally, comparing to the noise free images, the estimation of optical flow becomes less accurate when noise is added. It is worth mentioning that the statistical inversion approach in fact allows us to “predict” this difference without knowing the ground-truth optical flow, by quantifying and comparing the uncertainty of solutions [panels (b2) and (b3) in all figures]. Note that although here we only show the Bayesian optical flow results based on Gaussian noise, we have also performed simulations using uniform noise and Laplace-distributed noise, and found that the results are quite similar to what is shown in Fig. 1 to Fig. 6.

### 5.3 Simulations on Real Images

Finally, we conduct numerical experiments for the computation of Bayesian optical flow from using real images together with the benchmark flow fields. In each example, we consider matrix defined by a real image from the Middlebury dataset (http://vision.middlebury.edu/stereo/, also see ), resized to a fixed size of -by- pixels in grayscale, with intensity normalized so that each pixel intensity is in the range . We consider a total of six such images, as shown in Fig. 6.

For each real image and a given flow field , we generate a resulting second image according to the flow equation (38), where the noise is taken to be a multivariate Gaussian distribution with zero mean and covariance matrix with . This choice of standard deviation ensures a non-negligible effect on the pixel intensities of the image pairs, since each . These “noisy second images” are shown in each of the first column of Fig. 7 to Fig. 12. For comparison, we also consider what we call a “true second image”, denoted as , which is defined by the same flow equation (38) but in the absence of noise. These true second images are shown in the second column in Fig. 7 through Fig. 12, for each one of the image example and flow field. Thus, each baseline image and flow field gives rise to a noisy image pair , and there is a total of 30 such pairs given the 6 real images and 5 flow fields.

For each noisy image pair , we use the same methodology with the same choice of priors and hyperpriors as in Sec. 5.2 to obtain a Bayesian optical flow from sampling the posterior distribution–in practice because of the randomness of the MCMC process, we observe that sometimes the sample distribution does not “converge” to a stationary distribution even after thousands of iterations; when this occurs we simply restart the MCMC from a different initial seed. Recall that different from a standard regularization approach which require careful choice of the regularization parameter, here such parameter is itself treated as a random variable that has its own prior which can be taken to be a “wide” distribution in the absence of additional knowledge. From the posterior distribution, we take the mean as an estimate of the average flow field. To test the usefulness of the reconstructed flow field, we use it together with the first image to obtain an estimated second image from equation (38), setting the noise term to be zero. The estimated second image is shown for each example in the third column of Fig. 7 through Fig. 12. Interestingly, the estimated seems to not only resemble the given noisy image , but even more similar to the noiseless second image . This observation is confirmed quantitatively, as shown in the last column of Fig. 7 through Fig. 12, where the values of are plotted against both those of and of . These results confirm the validity of Bayesian optical flow obtained by a statistical inversion approach, and, additional suggests that accurate reconstruction of optical flow can be potentially useful for image smoothing and denoising applications.

## 6 Discussion and Conclusions

In this paper we take a statistical inversion perspective to the optical flow inference problem. From this perspective all relevant variables in an otherwise standard inverse problem are treated as random variables, and the key is to form an efficient process to construct and sample the posterior distribution by utilizing knowledge about the form of model, noise, and other prior information. From a Bayesian perspective, the various priors and the forward model combine to produce a posterior distribution describing the propagation of prior information in context of the problem. We have shown that optical flow estimates given by traditional variational approaches such as the seminal work developed by Horn and Schunck  can in fact be interpreted in the statistical inversion framework under particular choices of models, noise, and priors. Thus we recap that there are major advantages over the classical variational calculus approach to inverse problems where by necessity the ill-posedness is dealt with by adding an ad hoc regularity term that hopefully agrees with expected physical interpretation. From a Bayesian perspective, the ill-posedness is dealt with naturally under the statistical inversion framework by restating as a well-posed extended problem in a larger space of probability distributions [11, 17]. This therefore naturally removes a key difficulty of having to choose exactly an appropriate regularity parameter encountered in classical methods. Instead, in contrast to classical optical flow methods which only yield single solutions as “point estimates”, the statistical inversion approach produces a distribution of points which can be sampled in terms of most appropriate estimators and also for uncertainty quantification. Specifically in the context of an optic flow problem, we expect a distribution of regularity parameters, and correspondingly a distribution of optical flow vectors at each pixel.

In this paper we focused on a statistical inversion formulation with the choice of priors inspired from the classical Horn-Schunck functional. Although such choices appear to be reasonable for the synthetic flow fields considered herein, real images and optical flow fields are much more complicated and, consequently, require additional efforts in forming the appropriate priors. Such priors can come from previous knowledge of images and flow fields from similar systems, taken under similar scenes, or from other physical measurements. In addition, different noise distributions should be considered, and these will require modifying the standard Horn-Schunck framework, which assumed rigid body motion and conservation of brightness. In particular, with regard to the regularization term, one example is to consider different -norms other than the standard -norm, and possibly with a kernel to weigh in additional information about the physical embedding of objects . Other data fidelity terms [5, 19, 6] can be formulated to describe the physics of the underlying application, for example for fluid and oceanographic problems where a stream function, or even a quasi-static approximation to assume such physics as corriolis can be used; or in scenarios where divergence-free flows are estimated by utilizing vorticity-velocity formalism or more general data assimilation tools [7, 13, 36]. Likewise, regularity in time and multiple time steps may be appropriate , as these correspondingly more complex formulations nonetheless come back to a linear inverse problem tenable in the framework of this paper. Specifically within the statistical inversion framework developed in the current paper, all of these could be recast so that the data fidelity term may be incorporated and should not require significant modification of the general framework as introduced here, which we plan in future work. Likewise, other numerical differentiation and integration schemes can be used as well in place of the simple forward difference used here.

## References

•  G. Aubert, R. Deriche and P. Kornprobst, Computing optical flow via variational techniques, SIAM J. Appl. Math., 60 (1999), 156–182.
•  J. M. Bardsley, MCMC-based image reconstruction with uncertainty quantification, SIAM J. Sci. Comput., 34 (2012), A1316–A1332.
•  J. M. Bardsley, Gaussian Markov random field priors for inverse problems, Inverse Problems and Imaging, 7 (2013), 397–416.
•  J. L. Barron, D. J. Fleet and S. S. Beauchemin, Performance of optical flow techniques, International Journal of Computer Vision, 12 (1994), 43–77.
•  R. Basnayake and E. M. Bollt, A Multi-Time Step Method to Compute Optical Flow with Scientific Priors for Observations of a Fluidic System, in Ergodic Theory, Open Dynamics, and Coherent Structures (eds. G. Froyland and W. Bahsoun), Springer, 2014.
•  R. Basnayake, A. Luttman and E. M. Bollt, A lagged diffusivity method for computing total variation regularized fluid flow, Contemp. Math, 586 (2013), 57–64.
•  D. Béréziat and I. Herlin, Solving ill-posed image processing problems using data assimilation, Numerical Algorithms, 56 (2011), 219–252.
•  L. Cavalier, Nonparametric statistical inverse problems, Inverse Problems, 24 (2008), 034004.
•  G. Chantas, T. Gkamas and C. Nikou, Variational-Bayes optical flow, J. Math. Imaging Vision, 50 (2014), 199–213.
•  I. Cohen and I. Herlin, Optical flow and phase portrait methods for environmental satellite image sequences, in Proc. Europ. Conf. Computer Vision, Cambridge, U.K., 1996, pp. 141–150.
•  A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari and D. B. Rubin, Bayesian Data Analysis, 3 edition, Chapman & Hall/CRC, 2014.
•  G. H. Golub and C. F. Van Loan, Matrix Computations, 4 edition, Johns Hopkins University Press, 2012.
•  I. Herlin and D. Béréziat, Divergence-Free Motion Estimation, European Conference on Computer Vision (ECCV 2012), 15–27.
•  D. Higdon, A primer on space-time modelling from a Bayesian perspective, in Los Alamos National Laboratory, Statistical Sciences Group, Technical Report, LA-UR-05-3097.
•  P. C. Hansen, Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion, SIAM, Philadelphia, 1998.
•  B. K. P. Horn and B. G. Schunck, Determining optical flow, Artificial Intelligence, 17 (1981), 185–203.
•  J. Kaipio and E. Somersalo, Statistical and Computational Inverse Problems, Springer, 2005.
•  M. Lebrun, A. Buades and J. M. Morel, A nonlocal Bayesian image denoising algorithm, SIAM J. Imaging Sci., 6 (2013), 1665–1688.
•  A. Luttman, E. Bollt, R. Basnayake and S. Kramer, A stream function approach to optical flow with applications to fluid transport dynamics, Proc. Appl. Math. Mech., 11 (2011), 855–856.
•  A. Luttman, E. M. Bollt, R. Basnayake, S. Kramer and N. B. Tufillaro, A framework for estimating potential fluid flow from digital imagery, Chaos, 23 (2013), 033134.
•  F. A. O’Sullivan, A statistical perspective on ill-posed inverse problems, Statistical Science, 1 (1986), 502–527.
•  Y. Saad, Iterative methods for sparse linear systems, 2 edition, SIAM, 2003.
•  M. W. Seeger and H. Nickisch,

Large scale Bayesian inference and experimental design for sparse linear models,

SIAM J. Imaging Sci., 1 (2011), 166–199.
•  M. Siotani, Tolerance regions for a multivariate normal population, Annals of the Institute of Statistical Mathematics, 16 (1964), 135–153.
•  K. Souhila and A. Karim, Optical flow based robot obstacle avoidance, International Journal of Advanced Robotic Systems, 4 (2007), 13–16.
•  D. Sun, S. Roth and M. J. Black, Secrets of optical flow estimation and their principles,

Proc. IEEE Conf. Computer Vision and Pattern Recognition

, (2010).
•  J. Sun, F. J Quevedo and E. Bollt, Data Fusion Reconstruction of Spatially Embedded Complex Networks, arXiv preprint arXiv:1707.00731, (2017).
•  R. Szeliski, Computer Vision, Springer 2011.
•  L. Le Tarnec, F. Destrempes, G. Cloutier and D. Garcia, A proof of convergence of the Horn-Schunck optical flow algorithm in arbitrary dimension, SIAM J. Imaging Sci., 7 (2014), 277–293.
•  A. M. Thompson, J. C. Brown, J. W. Kay and D. M. Titterington, A study of methods of choosing the smoothing parameter in image restoration by regularization, IEEE Trans. Pattern Anal. Mach. Intell., 13 (1991), 326–339.
•  A. N. Tikhonov, Regularization of incorrectly posed problems, Soviet Mathematics Doklady, 4 (1963), 1624–1627.
•  A. N. Tikhonov and V. Arsenin, Solutions of Ill-Posed Problems, Wiley, New York, 1977.
•  A. N. Tikhonov, A. V. Goncharsky, V. V. Stepanov and A. G. Yagola, Numerical Methods for the Solution of Ill-Posed Problems, Kluwer Academic Publishers, 1990.
•  C. R. Vogel, Computational Methods for Inverse Problems, SIAM, Philadelphia, 2002.
•  J. Weickert and C. Schnörr, Variational optic flow computation with a spatio-temporal smoothness constraint, Journal of Mathematical Imaging and Vision, 14 (2001), 245–255.
•  S. Zhuk, T. Tchrakian, A. Akhriev, S. Lu and H. Hamann, Where computer vision can aid physics: dynamic cloud motion forecasting from satellite images, arXiv preprint arXiv:1710.00194, (2017).