1 Motivation and Introduction
Inverse problems involve the use of experimental data or measurements to make inferences about parameters (e.g., initial or boundary conditions), governing the model. Inverse problems have a wide range of applications such as in material science, geophysics, and medicine. The Bayesian approach to inverse problems is prevalent because of its ability to incorporate prior knowledge of the parameters and its ability to quantify the uncertainty associated with the parameter reconstructions. However, many computational challenges persist in the implementation of the Bayesian approach to largescale inverse problems.
Stuart [30] advocated the use of Gaussian priors within Bayesian inverse problems with covariance operators of the form where is an elliptic differential operator, the exponent , and is the spatial dimension. This choice ensures that the covariance operator is traceclass and under appropriate conditions on the likelihood, the resulting inverse problem is wellposed. A computational framework for infinitedimensional Bayes inverse problems based on Stuart’s framework was developed in [11]. For computational convenience, the authors used the covariance operator since it satisfies the requirement for up to three spatial dimensions. This choice has two advantages: first, it is integervalued, and second, it is easy to obtain a factored form of the covariance operator. For these reasons, this approach is attractive from a computational standpoint, and has since become popular and has resulted in scalable software implementations [31, 21]. A similar approach can also be found in [25], in which the authors also used the Whittle–Matérn priors for Bayesian inverse problems. However, the choice of for computational convenience makes it restrictive in practice, and in many instances, it can lead to oversmoothing of reconstructions. It is, therefore, desirable to have an efficient computational method for handling exponents that allows the user to choose the appropriate prior covariance based on prior knowledge or estimate it from data in a hierarchical Bayesian setting.
The use of Whittle–Matérn priors in Bayesian inverse problems has close connections with the developments in Gaussian random fields in spatial and computational statistics. In [24], the authors considered the stochastic partial differential equation (SPDE) approach to Gaussian random fields. They showed an explicit connection between Gaussian and GaussMarkov random fields based on the Matérn covariance family and provide explicit expressions for the precision operator for integer values of . Based on this connection, they provided extensions beyond the Matérn model and showed how the SPDE approach can be applied on manifolds, for nonstationary random fields. In a review paper [23], 10 years since the publication of [24], the authors trace the recent developments in the SPDE approach to Gaussian and nonGaussian random fields. Recent work (e.g., [7, 8]) allows for noninteger values of the exponent; see discussion below.
Goals and Main Contributions
In this paper, we want to develop an efficient method for representing and computing with Whittle–Matérn Gaussian random priors in which the covariance operator is of the form and . Furthermore, we also want to incorporate this efficient representation of covariance operators while solving Bayesian inverse problems. The novel and noteworthy features of our contributions are:

We consider a quadrature based approach for the approximation of the covariance operator for fractional powers. The resulting approximation has the form of a rational function. The action of the covariance operator on a vector can be written as a sequence of shifted linear systems, which we accelerate using Krylov subspace methods. As a result, we have a method for efficiently applying the covariance operator for any exponent . The method is scalable to large problems and the cost is nearly the same for values of between consecutive integers. This approach is applicable for stationary (including isotropic and anisotropic) and nonstationary random fields.

We leverage the fast method for covariance matrices to efficiently generate samples in two different ways: by solving the underlying SPDE that defines the prior covariance and by efficiently computing the truncated KarhunenLoéve approximation.

We show how to use this efficient representation of the prior covariance operator in an infinitedimensional Bayesian problem formulation. We perform careful discretization of all the relevant quantities and leverage the fast applications of the covariance operator to compute the maximum a posteriori (MAP) estimate and approximate posterior variance.
Through numerical experiments, we demonstrate the computational performance of the solvers including a X speedup over naïve approaches. We also demonstrate the feasibility of the approach on model and realdata problems from seismic and Xray tomography, and an application to a 2D PDEbased inverse diffusion equation.
Related Work
As mentioned earlier, in Bayesian inverse problems, the exponent is typically chosen to be . To our knowledge, in Bayesian inverse problems, the problem of efficient computations with covariance operators for noninteger values of is still an outstanding challenge. The fractional Laplacian with has been used as a regularization operator in a nonBayesian setting [1, 2]. Notice that, [1] uses the Fourier approach which is limited to periodic settings and [2] uses an eigendecomposition of the Laplacian that is not scalable to large problems.
There are some techniques in computational and spatial statistics that are relevant to this discussion; however for a detailed review, see [23]. In [28], a method was proposed for sampling from generalized Matérn fields on compact Riemannian manifolds using the socalled matrix transfer technique. In this method, after discretizing the differential operator, the samples are obtained by applying the matrix function () on a random vector. The application of the matrix function is accomplished using contour integral and Krylov subspace methods similar to ours. But our approach differs in how we approximate the fractional operator and also in the choice of Krylov subspace methods. Recent work in [7, 8] develop efficient methods for sampling from the SPDE for certain fractional exponents (in our notation, ); these references also use the same quadrature scheme for the integral representation of the fractional inverse as ours. However, our approach generalizes it to all exponents and the use of Krylov subspace solvers substantially speeds up the solution of the SPDE. In a followup work [6], the authors developed the rational SPDE approach in which they use a rational approximation of the form (in our notation) , where are polynomials. When the degrees of the polynomials are small, the discretized representations retain the computational benefits of the integer versions. However, the error analysis in [6, Remark 3.4] suggests in order to balance the errors due to the spatial discretization and the error in the rational approximation, the degrees of the polynomials may be large, which leads to a loss in computational efficiency. The paper in [18]
also works in a similar setting as ours in bounded domains but solves the linear systems in parallel using multilevel preconditioned iterative solvers with linear complexity in the degrees of freedom; additionally, their approach is also applicable to compact metric spaces. Also relevant to this discussion are
[22, 20] which deal with sampling from random fields on compact Riemannian manifolds. Our efficient approach for representing covariance matrices and sampling from the random field may be applicable to that setting as well. To our knowledge, these techniques have not been used within the context of Bayesian inverse problems, which is the primary focus of this paper.2 Background
In Section 2.1, we review the SPDE approach to Gaussian random fields which form the basis for the prior distributions in Bayesian inverse problems. In Section 2.2, we review the appropriate definitions of fractional elliptic operators and show how to discretize them in Section 2.3.
2.1 SPDE approach to Whittle–Matérn Gaussian Random Fields
We follow the Whittle–Matérn random field approach, which models the random fields as stationary Gaussian random fields with the covariance kernel
where is the smoothness parameter, the integer value of which controls the meansquare differentiability of the process, is the modified Bessel function of the second kind of order , and is a scaling parameter that governs the length scale. When , this corresponds to the exponential covariance function, and as , this corresponds to the Gaussian kernel. On , samples from the random field with the covariance function correspond to the solution of the stochastic partial differential equation (SPDE)
(1) 
where
is a spatial Gaussian white noise process with unit variance on
, and the marginal variance isHowever, to implement this on a bounded region , we need to impose additional boundary conditions. The choice of the boundary conditions may affect the correlation near the boundaries. To mitigate this effect, one may solve the SPDE on a slightly larger domain that encloses ; other approaches have been proposed in [25, 15]. The covariance operator corresponding to the covariance kernel is . In this paper, work with covariance operators of the form
(2) 
and with corresponding zero Neumann boundary conditions. The definition ensures that the covariance operator is a selfadjoint, traceclass operator [30].
Furthermore, since it more convenient to work with integer values of the exponent , the constraint forces us to choose for one spatial dimension and for two or three spatial dimensions. However, in the context of Bayesian inverse problems this choice of can result in oversmoothing of the solution and poor edgepreserving behavior of the solution. Furthermore, it is common to use parameters such as , etc; for example corresponds to the wellknown exponential covariance kernel. However, when , which cannot be tackled using standard approaches. It is desirable to develop a numerical approach that can exploit the full power of these parameterized priors, without the constraints of integer powers. This motivates the use of fractional operators which we discuss in Section 2.2.
Following [24, Section 3.2] and [23, Section 2.6], It is also straightforward to model nonstationary effects by considering the SPDE
(3) 
where , , and are now spatially varying functions and is a spatial Gaussian white noise process with unit variance. Assuming that , the corresponding covariance operator (with zero Neumann boundary conditions) is
(4) 
Note that this assumes (2) as a special case.
2.2 Fractional Laplacian
For , let us use the notation to denote the realization in of
with zero Neumann boundary conditions. Then there exists a sequence of eigenvalues
satisfying with. The corresponding eigenfunctions
. It is wellknown that, forms the orthonormal basis of . Towards, this end, for any , we can define the fractional order Sobolev space asWe are now ready to define the fractional powers of , see [3, Section 8].
Definition 1.
Let with on , where is the outward normal vector. Then the fractional power of is given by
Notice that can be extended to an operator mapping from to where is the dual of . We further emphasize that the condition can be dropped in the above definition, see [3, Def. 2.2]. We also refer to [3], for extensions to nonzero Neumann conditions.
Finally, we can extend the above definition to in (3). In this case, we need to assume and to be symmetric, bounded in and elliptic.
2.3 Discretization
We show how to discretize the appropriate quantities in a finite dimensional space. The discussion here closely mimics the derivation in [11].
Finite dimensional parameter space
Let be a basis for the finite dimensional space , with where the subscript refers to a mesh discretization parameter. Assume that the basis functions corresponding to the nodal points satisfy for . We can then approximate the inversion parameter within this finite dimensional subspace as , and we denote the vector of coefficients in the expansion
For two functions , the inner product can be approximated as , where is the mass matrix with entries
We denote to be the vector space with the inner product , to distinguish from the usual Euclidean space.
Let . Then the matrix approximation of , denoted , is obtained as follows. First, we construct a matrix with entries
Here are the finite dimensional basis vectors and is the canonical basis vector for corresponding to the basis function . This gives us the matrix representation . The matrix transpose of , denoted , has entries , but the adjoint of , denoted , satisfies .
Discretization of the Covariance operator
In the following discussion, we first assume that and , with zero Neumann boundary conditions. To apply the fractional operator , we use sinc quadrature to approximate the integral [9]. Let solve
and compute the approximation to as
where , and . The number of terms in the expansion was chosen to balance the errors in the quadrature with the finite element discretization error. Alternatively, in matrix notation we can write
where is the stiffness matrix with entries
and is the mass matrix with entries for . The quadrature weights and nodes are
(5) 
Based on the preceding discussion, the matrix representation of the discretized version of is
Let denote the set of natural numbers (excluding zero). Now consider the covariance operator where and . We write where be the integer part of and is the fractional part. Following [8, Remark 3.6], the discretized representation of the covariance operator is
(6) 
Note that we can interchange the order of the fractional and the integer part since terms like and commute. Furthermore, the matrix is selfadjoint, i.e., . An important point to note here is that is not formed explicitly. The application of the matrix to a vector can be obtained by independent linear system solves which can be readily parallelized. However, we will show how to compute the action of efficiently using Krylov subspace methods for shifted linear systems (see Section 3.2).
3 Efficient computations with the covariance operator
In this section, we review the Multipreconditioned GMRES method for shifted linear systems [4] (MPGMRESSh) for solving sequences for shifted linear systems (Section 3.1). We use this solver to accelerate the computations with the covariance operator and generating samples from the SPDE (Section 3.2). Finally, in Section 3.3, we also describe how to efficiently compute the truncated KarhunenLoève expansion.
3.1 Krylov methods for shifted systems
The dominant cost of applying the fractional operator is the solution of a system of shifted linear equations. We first briefly review the MPGMRESSh approach [4]. This is an efficient method for solving shifted linear systems of the form
where and are a set of shifts. This solver can be used in multiple ways to accelerate computations involving Gaussian random fields.
In the MPGMRESSh approach, we select a set of shifts that determine the preconditioners The method uses multiple preconditioners to build a basis that we will use to define a search space for all the shifted systems. In the below discussion, for simplicity, we drop the index on . At the th iteration, given the matrix the method builds the matrix as
The method is initialized with . We define the matrices that collect the columns and as and . Using these relationship we can derive the multipreconditioned Arnoldi relationship for a shift
Here the matrices and are defined as follows. Let denote the the Kronecker product and define
and for . Here means a block diagonal matrix composed of the given subblocks. With these definitions in place, then and The matrix is block upperHessenberg.
The main point here is that the solution for each shifted system corresponding to shift is obtained as , where minimizes the residual and is obtained by solving the least squares problem
The details of this algorithm are given in [4, Algorithm 2].
Computational Cost
Assume that the cost of matvecs involving and is and the cost of applying the preconditioner is . If iterations of the MPGMRESSh are performed, then the total cost is
As we shall see in the numerical experiments, the number of iterations low (typically ). Note that the cost of building the basis is independent of the number of shifted systems, whereas the cost of the projected system depends on the number of shifted systems .
3.2 Applying the covariance operator
We now show how to compute the application of the covariance operator to a vector . Let the matrices and be as defined in Section 2.3.
Let be a positive number. If , then it is straightforward to compute an application of the covariance operator which requires sequential applications of solves involving . Henceforth, we assume that . Let be the integer part of and let be the fractional part. We can apply the covariance operator to a vector , this can be computed as
(7) 
In writing this expression, we have replaced the weights with . This form is completely equivalent to (6) but makes it easier to precondition. To turn this expression into an efficient algorithm, we perform the following steps. In the offline phase, we determine the quadrature weights and nodes , preconditioner shifts . Furthermore, we compute factorizations of and the preconditioners . In the online stage, we compute the action of on times to obtain . We then apply MPGMRESSh with , , and shifts for to obtain the solutions . Finally, we compute . The details are given in Algorithm 1.
When the integer part is large, the number of solves with can become expensive; however, in practice, a value of is typically not used since the resulting field is extremely smooth. If an application requires the use of a large value of techniques from [19, Section 4.1] may be used.
Alternative approaches
In the expression for the application of the covariance defined in (7), we used a slight reformulation for the fractional part, which is obtained by factoring out in each summand. We also investigated two other formulations for computing . The first is perhaps the more natural version
While mathematically equivalent to the approach used in (7), we found that it was easier to precondition the formulation in (7). The second approach is to decompose
The advantage of this approach is that in this instance both the sets of weights, and , are at most one in their respective intervals and ; similarly the nodes and are also at most in their respective intervals. However, the downside is that for apply the fractional operator, we need two solves with MPGMRESSh. For this reason, we prefer the formulation in (7).
3.2.1 Generating samples from the SPDE
We can adapt the efficient approach in Algorithm 1 to generate samples from the SPDE
(8) 
with zero Neumann boundary conditions. We follow the approach in [7] to solve the SPDE. Let be the Cholesky factorization of and let . For , we can compute a solution to the SPDE as
where , and the number of shifted systems , where
We can readily modify Algorithm 1 to efficiently solve this problem. In particular, we can leverage the use of the MPGMRESSh approach to accelerate the solution to the SPDE. If we instead use zero Dirichlet boundary conditions for , the error analysis in [7, Theorem 2.1] applies directly.
3.3 KarhunenLoève expansions
In this section, we discuss an approach generate samples from the Gaussian measure on a Hilbert space , where is a the mean function and is a selfadjoint, positive semidefinite covariance operator. This approach is based on the KarhunenLoève (KL) expansion of the stochastic process. In this approach we consider an orthonormal set of eigenpairs for where the eigenvalues are arranged in decreasing order as . Furthermore, let
be an independent and identically distributed (i.i.d.) sequence of random variables with
. Then by [30, Theorem 6.19], the sample defined by the KL expansionis distributed according to . In practice, for certain applications, the eigenvalues exhibit rapid decay and, therefore, the expansion can be approximated by the truncated representation . In the context of Bayesian inverse problems, rather than estimating the field, one can estimate the coefficients of the KL expansion. This can be computationally beneficial because of the input dimensionality reduction.
Galerkin approximation
In this discussion, we will derive the expressions for the truncated KL expansion for the Gaussian measure . We will then discuss implementation details for computing the truncated KL expansion for the Gaussian measure . Consider a finite dimensional subspace with and let be a basis for the subspace. The eigenvalue problem is then replaced by the finite dimensional eigenvalue problem , where and . We seek a solution , by considering the Galerkin projection
Following the discussion in Section 2.3, we can write and , where is the discretized covariance matrix. Therefore, we now have the generalized Hermitian eigenvalue problem (GHEP)
(9) 
where
is the generalized eigenvector. Let
be the generalized eigenvalues and let be the corresponding eigenvectors. Then a sample from the process can be generated as(10) 
Efficient implementation
We want to compute the truncated KL for the measure The GHEP (9) can be solved using any standard Krylovbased eigensolver (e.g., Lanczos) that does not form the matrix or explicitly but instead relies on forming the matvecs involving the matrices. The matvecs involving can be efficiently computed using the MPGMRESSh approach as described in Algorithm 1.
An alternative approach to computing the truncated KL expansion is to work with the eigenpairs of the operator , which is attractive at first glance, since it avoids the complications with computing the fractional part. However, the approach based on the covariance operator directly is advantageous for two reasons. First, to compute the truncated KL expansion, we need to target the eigenpairs of corresponding to the largest eigenvalues, which corresponds to the eigenpairs of corresponding to the smallest eigenvalues, which is typically much harder. Second, for the operator acts as a spectral transformation that enhances the eigenvalue gaps.
4 Application to Bayesian Inverse Problems
In this section, we show how to leverage the efficient representation of the covariance operator. In Section 4.1, we review the necessary background for Bayesian inverse problems. In Section 4.2, we derive the discretized posterior distribution that uses the covariance matrix in (6) as the prior covariance and in Section 4.3, we show how to adapt Generalized Hybrid Iterative Methods (GenHyBR) appropriately to efficiently compute the MAP estimate.
4.1 Bayesian Inverse Problems
Let be the inversion parameter that we wish to recover from the measurements , which are related through the measurement equation
(11) 
where is the parametertoobservable map or the measurement operator, and is the measurement noise which is assumed to be Gaussian with zero mean and covariance , which we write as
. Therefore, the likelihood probability density function takes the form
We assume that is endowed with a Gaussian prior with mean function and covariance operator , i.e., . Here acts a regularization parameter that may be known, or can be estimated as part of the inversion process (this is what we do in this paper). For the operator we take it to be defined in Section 2.1.
By the choice of the likelihood and the prior distribution, using the infinite dimensional Bayes formula, the RadonNikodym derivative of the posterior probability measure
with respect to the prior measure , takes the formwhere is a normalization constant. The maximum a posteriori (MAP) estimator maximizes the posterior distribution and can be obtained by the solution to the optimization problem
If is a linear operator, then the posterior distribution is Gaussian; we exploit this fact in this paper.
4.2 Discretized posterior distribution
The covariance matrix after discretization takes the same form as in (6). It is easy to verify that the resulting matrix is selfadjoint; i.e., and is symmetric with respect to the standard Euclidean inner product and positive definite. This gives us a discretized representation of the prior distribution , i.e.,
For completeness, we give expressions for the posterior distribution, and the MAP estimate, for the discretized problem. This uses the components developed in Section 2.3
. The posterior probability density function takes the form
The MAP estimator can be obtained by solving the optimization problem
If the forward operator is linear , then the posterior distribution is Gaussian with , where
(12)  
and is the adjoint operator of , and takes the form . Note that is selfadjoint with respect to the inner product, so that is symmetric positive definite.
4.3 GenHyBR for computing the MAP estimate
We now derive an alternate expression for the MAP estimate (12). Multiplying both sides of (12) with , we get
We use , and the change of variables . Then to obtain the MAP estimate, we solve the linear system for
(13) 
and then compute . This form of the MAP estimator will be the basis for our computationally efficient procedure. Note that is a symmetric positive definite matrix with respect to the Euclidean inner product; we denote this by for convenience. The application of to a vector can be performed efficiently, using a slight modification of Algorithm 1. Furthermore, note that is symmetric with respect to the inner product.
We can reformulate the equations in such a way that we can use the generalized GolubKahan bidiagonalization (genGK) approach [13] for efficiently estimating the MAP estimate. We initialize the iterations with , , and . At step in this approach,
where are chosen such that for . These iterates can be collected to form the matrices , , and the bidiagonal matrix
This can be rearranged to obtain the relations, which are typically accurate up to machine precision
and the orthogonality relations and . The columns of form a basis for the Krylov subspace , where the Krylov subspace is defined as