1 Introduction
Inverse problems are ubiquitous in science and engineering. They are characterized by 1) the estimation of parameters in a mathematic model from measurements of model output, and 2) a highdimensional parameter space, typically resulting from the discretization of a function defined on a computational domain. For typical inverse problems, the problem of estimating model parameters from measurements is illposed, which motivates the use of regularization in the deterministic setting and the choice of a prior probability density in the Bayesian setting. In this paper, we consider linear models of the form
(1) 
where
is the vector of measurements,
is the forward model matrix, is the vector of unknown parameters, and means that is an dimensional Gaussian random vector with mean and covariance matrix , with denoting the identity. The random vector in (1) has probability density function
(2) 
where denotes proportionality, and we include from the normalization constant because it will be needed in the hierarchical model described below. The maximizer of with respect to is known as the maximum likelihood estimator, and we denote it by . As stated above, due to illposedness, is unstable with respect to errors in , i.e., small changes in result in large relative changes in .
There are various methods to stabilize the solution of inverse problems, but they all involve some form of regularization. In this paper, we take the Bayesian approach [1], which requires the definition of a prior probability density function on . We make the assumption that the prior is Gaussian of the form , which has probability density function
(3) 
where is the precision (inversecovariance) matrix, and we include from the normalization constant for the hierarchical model described below.
Now that we have defined the prior (3) and the likelihood (2), using Bayes’ law, we multiply them together to obtain the posterior density function
(4) 
whose maximizer, , is known as the maximum a posteriori (MAP) estimator. The MAP estimator can be equivalently expressed
1.1 The Matérn Class of Covariance Matrices and WhittleMatérn Priors
It remains to define the prior covariance matrix . The Matérn class of covariance matrices has garnered much praise [2] for its flexibility in capturing many covariance structures and its allowance of direct control of the degree of correlation in the vector [3]. The Matérn covariance matrix is defined by the Matérn covariance function, which was first formulated by Matérn in 1947 [4],
(5) 
where is the separation distance, is the modified Bessel function of the second kind of order [5], is the gamma function, is the scale parameter, is the smoothness parameter, and
is the marginal variance. Omitting
gives the Matérn correlation function. In the isotropic case, when the covariance depends only on the distance between elements, given the covariance parameters and , one can obtain the covariance matrix of a vector with spatial positions by lettingwhere is defined by (5). is then obtained by inverting .
The parameters of the Matérn covariance function are not as straightforward to interpret as the parameters of some other covariance functions. When is small (), the spatial process is said to be rough, and when it is large (), the process is smooth [3, 6]. Figure 1 shows how the covariance function behaves with different values of and . On the left, and varies, while on the right and varies. Note that as increases, the behavior at small lags changes, leading to more correlation at smaller distances and a larger practical range, which is defined to be the distance at which the correlation is equal to 0.05. Meanwhile, as increases, or decreases, the decay rate of the covariance increases considerably. It is typical to think of as a range parameter, since as increases (decreases), the practical range increases (decreases). However, for the Matérn covariance, the parameter also affects the practical range. In [7], a range approximation is used where .
Despite the benefits of using the Matérn class of covariance matrices, its use is problematic for inverse problems because computing the precision matrix , which is what appears in the posterior (4), requires inverting a dense matrix. Fortunately, the Matérn covariance function has a direct connection to a class of elliptic SPDEs [7] whose numerical discretization yields sparse precision matrices, , that are computationally feasible to work with even when is large. Connections of this type were first shown to exist by Whittle in [8], where he showed the connection held for a special case of the Matérn covariance class. Hence, priors that depend on this connection are often referred to as WhittleMatérn priors. The connection between the general Matérn covariance function and SPDEs has been used in a wide range of applications for defining computationally feasible priors for highdimensional problems [9, 10, 11]. Moreover, work has been done in establishing convergence theorems for, and lattice approximations of, these WhittleMatérn priors [12].
The remainder of the paper is organized as follows. In Section 2, we describe in detail the connection between zeromean Gaussian processes with the Matérn covariance operator and those that arise as solutions of a class of elliptic SPDEs. In Section 3, we show how to estimate the hyperparameters in the WhittleMatérn prior using the semivariogram method, and then we show how to use this approach to define the prior when solving a Bayesian inverse problem. In Section 4, we instead assume hyperpriors on the hyperparameters and then use MCMC to sample from the resulting posterior distribution, which yields a distribution over – rather than point estimates of – the hyperparameters. For both approaches, we present numerical tests on one and twodimensional image deblurring test cases. We end with conclusions in Section 5.
2 WhittleMatérn Class Priors via SPDEs
In this section, we will show that the WhittleMatérn class of priors can be specified as the solution of the SPDE
(6) 
where is the Laplacian operator in one or two dimensions (i.e., ), and
is spatial Gaussian white noise, which we define below. Although this connection has been shown to exist
[8, 7, 10], here we provide a significantly more detailed derivation of this result than we have seen elsewhere.2.1 Preliminary Definitions
Before deriving the solution of (6), we need some preliminary definitions.
2.1.1 Gaussian Fields
A stochastic process , with , is a Gaussian field [13] if for any and any locations ,
is a normally distributed random vector with mean
, where denotes expected value, and covariance matrix , for . The covariance function is defined . It is necessary that the covariance function is positive definite, i.e., for any , with , the covariance matrix defined above is positive definite. The Gaussian field is called stationary if the mean is constant and the covariance function satisfies and isotropic if .2.1.2 White Noise
The term white noise [14, 15] comes from light. White light is a homogeneous mix of wavelengths, as opposed to colored light, which is a heterogeneous mix of wavelengths. In a similar way, white noise contains a homogeneous mix of all the different basis functions. The mixing of these basis functions is determined by a random process. When this random process is Gaussian, we have Gaussian white noise. Consider a domain and let be an orthonormal basis of where . Then Gaussian white noise is defined by
If we are dealing with spatial Gaussian white noise, then refers to location. With this definition, it is clear that Gaussian white noise has mean zero: Moreover, one can show that where is the Dirac delta function [16]. A wellknown and very important property of the Dirac delta function is that it satisfies the sifting property:
2.1.3 Green’s Functions
We now consider differential equations of the form , , where is a linear, differential operator. A Green’s function, , of is any solution of [17], where is the Dirac delta function. The Green’s function can be used to solve ; specifically, since
where the last equality holds because is linear and acts only upon the first argument, , of [18]. The solution of is therefore given by
(7) 
2.2 The Gaussian Field Solution of the SPDE (6)
First, we derive the Green’s function for (6), which is the solution of
(8) 
Using (7), the solution to (6) is given by
(9) 
making
a Gaussian field since it is a linear transformation of Gaussian white noise. To derive the Green’s function
in (9), we first define . Then (8) implies(10) 
Taking the Fourier transform of both sides of (
10) yields [19, 20]and thus
(11) 
which is the Fourier transform of the Green’s function.
We can now compute the mean and covariance of the Gaussian field, , defined by (9). Since the Green’s function is a strictlypositive, symmetric, and rapidly decaying function, we can apply Fubini’s theorem [21] to obtain the mean of :
Since has mean zero, the covariance is given by
If we define , the previous result implies that if , then for our linear acting only on ,
(12) 
Next, we assume stationarity so that the covariance only depends on the relative locations of the points, i.e., . Then and (12) can be expressed If we take the Fourier transform of both sides of this equation, and appeal to (11), we obtain
Since the Laplacian, , is invariant under rotations and translations, we have radial symmetry, which is analogous to isotropy in the covariance. Thus we can let and to obtain the equivalent expression
To transform back to the original () space, we use the Hankel transform and its relationship to the radially symmetric Fourier transform, i.e.,
(13) 
where is the original (untransformed) covariance function and is the Bessel function of the first kind of order ; see [22, Section 2] for a proof. Note that if and then (13) implies
which is the Hankel transform of [23]. Hence, using the inverse Hankel transform, we obtain
(14) 
Plugging the expressions for and into (14) and solving for yields
Finally, using the integral identity [24, Eq. 20, p. 24, vol. II] and some algebra, we obtain
(15) 
Using the fact that , and defining with , it can be shown that (15) is exactly the Matérn covariance function (5). Thus, we have proved the following theorem.
2.3 The Effect of a Finite Domain and Boundary Conditions
The proof of Theorem 1 above assumed that the domain was all of . However, when solving inverse problems, is restricted to a finite domain . In such cases, boundary conditions must be assumed, and the equivalence between the Gaussian fields defined by the SPDE (6) and those defined by the Matérn covariance function no longer holds. To see this, consider the case where and with Dirichlet (zero) boundary conditions, . Additionally, we assume so that the exponent of the differential operator is equal to one, making the discretization straightforward. In this case, (6) simplifies to
Using a uniform mesh on with a step size of , so that , yields the numerical discretization
where is the standard discretization of on a uniform mesh [25], and is the scaling parameter for the prior. Then the probability density for is given by
or equivalently,
(16) 
We generate 100,000 samples from (16) for each of the values and calculate the empirical correlation between the samples and compare this with the theoretical correlation defined by the Matérn covariance function. We do this for , 5, and 10 and plot the results in Figure 2, together with the Matérn correlation function. Recall that is inversely related to the degree of correlation in the prior, i.e., larger corresponds to lower correlation. The plots on the top in Figure 2 suggest that the larger the correlation is (and the smaller is), the more impact the boundary conditions have, which makes intuitive sense.
Fortunately, we can restore the connection between the Gaussian fields defined by the SPDE and by the Matérn covariance function, which is desirable because then the hyperparameters in the SPDE can be estimated using the semivariogram method described in Section 3. The restoration requires extending the computational domain: in one dimension, we define , for , e.g., if then . We then generate realizations for the values on the extended domain () and compute the empirical correlation only for . The results are plotted on the bottom in Figure 2, where it is clear that the empirical correlations are nearly indistinguishable from those obtained using the Matérn correlation function for each value of . We note that as decreases, the extension necessary to preserve the connection rises sharply. For example, if , we must have and when , is required. However, having implies that correlation is likely to persist across most of the region (depending on the value), which rarely occurs in practice, making a reasonable assumption.
To determine the value that extends the domain far enough to restore the Matérn/SPDE connection, but not so far as to introduce unnecessary computational cost, we look to the Matérn correlation function itself. We want to extend the domain far enough so that all values in are nearly uncorrelated with the values at the end of the extended domain. In tests, it was found that we should always extend the domain at least slightly. If we let be the distance for which the Matérn correlation is approximately equal to 0.10, then our tests showed that setting
where denotes ceiling, restores the connection to the Matérn covariance for . For and , must be set equal to and , respectively, as shown in Figure 2.
The same results hold in two dimensions. Consider the case where and with Dirichlet (zero) boundary conditions, , where . In two dimensions, in (16) is replaced by , where denotes Kronecker product, which corresponds to the standard discretization of the negativeLaplacian for a uniform grid on . We now assume so the exponent of the differential operator is again equal to one. Since we are assuming isotropy, we need only consider the distance between points when calculating the correlation. The distance, however, can extend as far as in our domain since we are working in the unit square. Following the same procedure as in the onedimensional case, again using (so ), we obtain the results shown in Figure 3 for . The plot on the left shows the disconnect between the true and empirical correlation when using the domain and the plot on the right shows the reconnection on the extended domain .
For the above discussion, we focused on zero boundary conditions. Similar results hold if periodic boundary conditions are assumed, in which case can be diagonalized by the fast Fourier transform (FFT) [25], assuming that is a power of 2. The FFTbased diagonalization of can be exploited to greatly reduce computational cost, thus when extending the domain in twodimensions, it is advantageous to use periodic boundary conditions and the extended domain so that defined on can be diagonalized by the FFT.
Finally, in our numerical experiment above, we chose specific values of , but other values of can be chosen. The general form of the prior density, with included as a hyperparameter, is
(17) 
where . If is a noninteger, a fractional power of must be computed, which is possible, generally speaking, if we have a diagonalization of in hand, but the resulting precision matrix is not sparse. Such a diagonalization is typically computable in onedimensional examples, even with dense matrices. In two dimensions, however, an efficient diagonalization is possible only if periodic boundary conditions are assumed. We will restrict the exponent to be an integer in this paper to preserve the sparsity in the precision matrix.
2.4 Computing MAP Estimators for WhittleMatérn Priors
Using Bayes’ law, we multiply the prior (17) by the likelihood (2) to obtain the posterior density function
The maximizer of is known as the MAP estimator, and it can be computed by solving
(18) 
where . Assuming we know and , can be estimated using a regularization parameter selection method. We use generalized cross validation (GCV):
(19) 
In practice, is often fixed [26, 9] and is estimated manually. This can be time consuming, subjective and unintuitive, so we present a method for selecting these parameters next.
3 The Semivariogram Method for Estimating and
In the inverse problem formulation above, the components of the vector correspond to values of an unknown function at numerical mesh points within a spatial region . This motivates using methods from spatial statistics to estimate the WhittleMatérn prior hyperparameters and . One such method uses a variogram, and a corresponding semivariogram [27], which requires the assumption of intrinsic stationarity, i.e., that the elements of have constant mean and the variance of the difference between the elements is constant throughout the region. This is a weaker assumption than is required by many other parameter estimation tools, which is one of the reasons variograms have become popular in spatial statistical applications [28], and it is the reason we will use semivariograms here.
The semivariogram is defined by where and is a spatial process. Due to our stationarity assumption, , which we use to derive the following alternative expression for :
Thus, the semivariogram simplifies to the difference between the variance in the region and the covariance between two points with a difference . The variogram is formally defined as , hence the terms variogram and semivariogram are often used interchangeably. To remain consistent, we will continue to refer to as a semivariogram throughout the paper.
We now need a way to estimate the semivariogram from given data. For this, we use what is known as the sample, or empirical, semivariogram. Assuming that is isotropic, if , then the empirical semivariogram can be expressed
(20) 
where is a realization of , and is the number of points that are separated by a distance . The values are often referred to as the semivariance values. In a typical semivariogram, the semivariance values increase as increases since points tend to be less similar the further apart they are, which increases the variance of their differences.
Although the empirical semivariogram is useful in obtaining semivariance values from data, it is not ideal for modeling data for various reasons (see [28] for details), thus it is typical to fit a semivariogram model to the empirical semivariogram. Since our prior distribution for has a Matérn covariance, we will use the theoretical Matérn semivariogram model [4, 2] given by
(21) 
where is the nugget, is the sill, and . The nugget is the term given to the semivariance value at a distance just greater than zero and the sill is the total variance contribution or the semivariance value where the model levels out. The sill, , is also the variance parameter in the Matérn covariance function (5). We can estimate , , , and by fitting the semivariogram model to the empirical semivariogram.
There are a number of ways to fit the semivariogram model to the empirical semivariogram. We use weighted least squares, as is commonly done [28], choosing the that minimizes
(22) 
To minimize , we adapt the MATLAB codes from [29, 30]. More specifically, we adapt [29] for computing the empirical semivariance and we adapt [30] for minimizing . Although it is possible to optimize both and continuously, we will require to be an integer.
For an illustration, we look to the onedimensional deblurring example in Section 2.3.2 of [25] and construct the semivariogram (20) using the data . The optimized parameters of the model are and , which corresponds to a practical range of . The sill and nugget are estimated to be and , respectively. A plot of the resulting, fitted Matérn semivariogram model together with the empirical semivariogram is given in Figure 4.
The values of and from the obtained by fitting the Matérn semivariogram model to , as described in the previous paragraph, can be used to define the WhittleMatérn prior (17). The sill, , and the nugget, , are not especially useful outside of fitting the semivariogram model because they do not correspond to any parameter in (17). They are helpful only in determining the best estimates for and . Any contribution these parameters may have made to the prior distribution will be accounted for in the regularization parameter, . Therefore, after fitting the semivariogram models, and are discarded.
With estimates for and in hand, the MAP estimator, , can then be computed as in Section 2.4, from which we can recompute by fitting the Matérn semivariogram model to the empirical semivariogram values of . Repeating this process iteratively yields the following algorithm.
The Semivariogram Method for MAP Estimation with WhittleMatérn Prior:

Estimate by fitting a Matérn semivariogram model to .

Update by fitting a Matérn semivariogram model to .

Return to step 1 and repeat until and stabilize.
3.1 Numerical Experiments
We now implement the semivariogram method on deblurring examples in both one and two dimensions. Recall that the connection between the Matérn covariance and the WhittleMatérn prior depends on a stationarity assumption, which the following examples may not exhibit. For simplicity, we will still assume stationarity and acknowledge that future work should be done in the case when no stationarity or isotropy is present.
3.1.1 OneDimensional Results
When implementing the semivariogram method for the deblurring problem in Section 2.3.2 of [25], we will fit the semivariograms using 15 approximately equally spaced distances up to a max distance of (nearly 30% of the way across the region), yielding the 15 semivariance values that we fit using the Matérn semivariogram model. We chose as a cutoff because it balances the need to capture the covariance structure at short distances, which are wellknown to be the most important [28], with those at longer distances.
We implement three iterations of the semivariogram method, and obtain the WhittleMatérn parameter values and , and regularization parameter . A plot of the resulting is given in Figure 5. It is plotted together with the true image on the left and with the Tikhonov reconstruction on the right. The relative error between the true image and the reconstructions is for the WhittleMatérn prior and for the Tikhonov reconstruction. When using periodic boundary conditions instead of zero boundary conditions, we obtain nearly identical results, with , , and a relative error of .
In addition to giving a good estimate for , the semivariogram method has another advantage: fitting a semivariogram to in step 0 yields an estimate of that we use to determine how far we need to extend our domain in order to maintain the connection between the SPDE solution and the Matérn covariance. In the above example, the parameter values and were computed in step 0, which suggested that only a slight extension of the computational domain was needed in order maintain the connection.
3.1.2 TwoDimensional Results
We now consider a twodimensional image deblurring test case similar to that in [25, Section 3.1.3]. We assume periodic boundary conditions on the extended domain, but due to the restriction from the extended domain to , circulant structure is lost in the forward model matrix, and hence, linear system solves must be done using an iterative method. As in [25, Section 3.1.3], we use preconditioned conjugate gradient (PCG) iteration, both for computing using GCV and for computing . We attempt to deblur a image of Main Hall on the University of Montana (UM) campus. To do this, we begin with a image, given in Figure 6, and then restrict to the center image. This smaller image in the middle will be thought of as being on a domain and the larger, full image will then be defined on .
To obtain , we first perform the blurring operation on the full 256256 true image plotted in Figure 6. Since this is a color image, the deblurring process is done individually for the red, green, and blue intensity arrays. We then restrict to the central 128128 pixels (with boundaries denoted in Figure 6) to obtain the blurred image plotted on the left in Figure 7. We seek an estimate of in the same central subregion. Omnidirectional semivariograms with 25 approximately equally spaced grid points in are used. We add ten additional grid points and reduce the distance considered to onetenth the maximum from onefifth because there are many more pairs of points in the twodimensional case. The semivariogram method is used to obtain for each color band, and for the red, green, and blue intensities, respectively, and . We also compute the Tikhonov solution, as defined in [25, Section 3.1.3], with . The two solutions are plotted in Figure 7, where it appears that the solution with the WhittleMatérn prior, plotted on the right, is better able to reconstruct the true image than is the Tikhonov solution, plotted in the middle.
True Image  Identity Covariance  WhittleMatérn Covariance  
0.418  0.419  0.418  
s  0.216  0.307  0.224 
Min  0.000  
0.243  0.199  0.244  
Median  0.455  0.396  0.429 
0.545  0.623  0.562  
Max  1.00  1.79  1.17 
0.705  0.950  
Residual MAE  0.169  0.055  
Residual MSE  0.047  0.005 
A more objective (nonvisual) comparison of quality for the two reconstructions is presented in Table 1, which contains the following statistics calculated for the elements of the color image: the mean (), the variance (
), the first and third quartiles (
and ), the correlation (), the maximum, minimum and median, and finally, the mean absolute error (MAE) and the mean squared error (MSE). From the table, it is clear that the distribution of color intensities when using the WhittleMatérn prior is much more accurate than when using the identity covariance with the Tikhonov solution. Note, specifically, that the median and first and third quartiles line up well with the true percentiles. Additionally, the variance is lower and the correlation between true and estimated values is much higher for the WhittleMaérn solution. The MAE and MSE of the residuals are also much lower for this solution. Therefore, all evidence is pointing to the fact that the WhittleMatérn prior, with and estimated using the semivariogram method, yields a better solution than the Tikhonov solution.In both of the previous examples, it is probable that both the Tikhonov and WhittleMatérn reconstructions could be improved with a moreoptimal value. We did not finetune the selection procedure and simply used the one chosen by GCV.
4 Hierarchical Modelling and Markov chain Monte Carlo
In the previous examples, we calculated point estimates of and using semivariograms and of using GCV. Although these estimates are useful by themselves, an approach that allows for the quantification of uncertainty in and is desirable.
Sampling and simultaneously with a fullyBayesian approach can be done using the Metropolis algorithm or, as is suggested in [31], via slice sampling. The problem is that these variables, especially and , are so highly correlated that the sampling can be wildly inefficient [32] without careful selection of the prior distributions and considerable adjusting of the proposal distributions [26]. In our tests in one dimension, the essential sample size (ESS) [25] was more than 100 times smaller than the number of samples generated.
Due to the inefficiency of sampling from this posterior, we will fix , as is commonly done [26, 9], setting it equal to the estimated value determined by the semivariogram method. Since , , and are fixed in these examples, we will simplify notation and refer to as . We now focus on quantifying the uncertainty in . To this end, we employ Markov chain Monte Carlo (MCMC) methods to obtain samples from .
First, we assume and
are random variables with Gammadistributed hyperpriors, as in
[25]:where and . Gamma hyperpriors lead to Gamma conditional posterior densities for and (a property known as conjugacy), which can be sampled from directly. Since no such conjugacy exists for , we assign it a uniform hyperprior, , where is a selected upper bound for . We assign the uniform hyperprior in an effort to be as datadriven as possible. By Bayes’ law, the posterior density function is given by
where ,
. It can be shown that the conditional probability distributions for
and are given byand the probability density for is
Since no conjugacy exists for , we will use the MetropolisHastings algorithm to sample from . For the proposal density, we use a lognormal proposal, i.e.,
(23) 
where chosen offline. A proposed sample is then accepted with probability
That is, we set with probability , and otherwise we set
Comments
There are no comments yet.