Log In Sign Up

Efficient algorithms for Bayesian Inverse Problems with Whittle–Matérn Priors

This paper tackles efficient methods for Bayesian inverse problems with priors based on Whittle–Matérn Gaussian random fields. The Whittle–Matérn prior is characterized by a mean function and a covariance operator that is taken as a negative power of an elliptic differential operator. This approach is flexible in that it can incorporate a wide range of prior information including non-stationary effects, but it is currently computationally advantageous only for integer values of the exponent. In this paper, we derive an efficient method for handling all admissible noninteger values of the exponent. The method first discretizes the covariance operator using finite elements and quadrature, and uses preconditioned Krylov subspace solvers for shifted linear systems to efficiently apply the resulting covariance matrix to a vector. This approach can be used for generating samples from the distribution in two different ways: by solving a stochastic partial differential equation, and by using a truncated Karhunen-Loève expansion. We show how to incorporate this prior representation into the infinite-dimensional Bayesian formulation, and show how to efficiently compute the maximum a posteriori estimate, and approximate the posterior variance. Although the focus of this paper is on Bayesian inverse problems, the techniques developed here are applicable to solving systems with fractional Laplacians and Gaussian random fields. Numerical experiments demonstrate the performance and scalability of the solvers and their applicability to model and real-data inverse problems in tomography and a time-dependent heat equation.


page 15

page 16

page 17

page 18

page 19


Non-Stationary Multi-layered Gaussian Priors for Bayesian Inversion

In this article, we study Bayesian inverse problems with multi-layered G...

All-at-once formulation meets the Bayesian approach: A study of two prototypical linear inverse problems

In this work, the Bayesian approach to inverse problems is formulated in...

Variational Inverting Network for Statistical Inverse Problems of Partial Differential Equations

To quantify uncertainties of the inverse problems governed by partial di...

A CCBM-based generalized GKB iterative regularizing algorithm for inverse Cauchy problems

This paper examines inverse Cauchy problems that are governed by a kind ...

Bayesian Level Set Approach for Inverse Problems with Piecewise Constant Reconstructions

There are several challenges associated with inverse problems in which w...

Solving linear Bayesian inverse problems using a fractional total variation-Gaussian (FTG) prior and transport map

The Bayesian inference is widely used in many scientific and engineering...

1 Motivation and Introduction

Inverse problems involve the use of experimental data or measurements to make inferences about parameters (e.g., initial or boundary conditions), governing the model. Inverse problems have a wide range of applications such as in material science, geophysics, and medicine. The Bayesian approach to inverse problems is prevalent because of its ability to incorporate prior knowledge of the parameters and its ability to quantify the uncertainty associated with the parameter reconstructions. However, many computational challenges persist in the implementation of the Bayesian approach to large-scale inverse problems.

Stuart [30] advocated the use of Gaussian priors within Bayesian inverse problems with covariance operators of the form where is an elliptic differential operator, the exponent , and is the spatial dimension. This choice ensures that the covariance operator is trace-class and under appropriate conditions on the likelihood, the resulting inverse problem is well-posed. A computational framework for infinite-dimensional Bayes inverse problems based on Stuart’s framework was developed in [11]. For computational convenience, the authors used the covariance operator since it satisfies the requirement for up to three spatial dimensions. This choice has two advantages: first, it is integer-valued, and second, it is easy to obtain a factored form of the covariance operator. For these reasons, this approach is attractive from a computational standpoint, and has since become popular and has resulted in scalable software implementations [31, 21]. A similar approach can also be found in [25], in which the authors also used the Whittle–Matérn priors for Bayesian inverse problems. However, the choice of for computational convenience makes it restrictive in practice, and in many instances, it can lead to over-smoothing of reconstructions. It is, therefore, desirable to have an efficient computational method for handling exponents that allows the user to choose the appropriate prior covariance based on prior knowledge or estimate it from data in a hierarchical Bayesian setting.

The use of Whittle–Matérn priors in Bayesian inverse problems has close connections with the developments in Gaussian random fields in spatial and computational statistics. In [24], the authors considered the stochastic partial differential equation (SPDE) approach to Gaussian random fields. They showed an explicit connection between Gaussian and Gauss-Markov random fields based on the Matérn covariance family and provide explicit expressions for the precision operator for integer values of . Based on this connection, they provided extensions beyond the Matérn model and showed how the SPDE approach can be applied on manifolds, for nonstationary random fields. In a review paper [23], 10 years since the publication of [24], the authors trace the recent developments in the SPDE approach to Gaussian and non-Gaussian random fields. Recent work (e.g., [7, 8]) allows for noninteger values of the exponent; see discussion below.

Goals and Main Contributions

In this paper, we want to develop an efficient method for representing and computing with Whittle–Matérn Gaussian random priors in which the covariance operator is of the form and . Furthermore, we also want to incorporate this efficient representation of covariance operators while solving Bayesian inverse problems. The novel and noteworthy features of our contributions are:

  1. We consider a quadrature based approach for the approximation of the covariance operator for fractional powers. The resulting approximation has the form of a rational function. The action of the covariance operator on a vector can be written as a sequence of shifted linear systems, which we accelerate using Krylov subspace methods. As a result, we have a method for efficiently applying the covariance operator for any exponent . The method is scalable to large problems and the cost is nearly the same for values of between consecutive integers. This approach is applicable for stationary (including isotropic and anisotropic) and non-stationary random fields.

  2. We leverage the fast method for covariance matrices to efficiently generate samples in two different ways: by solving the underlying SPDE that defines the prior covariance and by efficiently computing the truncated Karhunen-Loéve approximation.

  3. We show how to use this efficient representation of the prior covariance operator in an infinite-dimensional Bayesian problem formulation. We perform careful discretization of all the relevant quantities and leverage the fast applications of the covariance operator to compute the maximum a posteriori (MAP) estimate and approximate posterior variance.

  4. Although the focus of this paper is on Bayesian inverse problems, the computational techniques developed here are applicable to solving systems with fractional Laplacians [9, 32] and Gaussian random fields.

Through numerical experiments, we demonstrate the computational performance of the solvers including a X speedup over naïve approaches. We also demonstrate the feasibility of the approach on model and real-data problems from seismic and X-ray tomography, and an application to a 2D PDE-based inverse diffusion equation.

Related Work

As mentioned earlier, in Bayesian inverse problems, the exponent is typically chosen to be . To our knowledge, in Bayesian inverse problems, the problem of efficient computations with covariance operators for non-integer values of is still an outstanding challenge. The fractional Laplacian with has been used as a regularization operator in a non-Bayesian setting [1, 2]. Notice that, [1] uses the Fourier approach which is limited to periodic settings and [2] uses an eigendecomposition of the Laplacian that is not scalable to large problems.

There are some techniques in computational and spatial statistics that are relevant to this discussion; however for a detailed review, see [23]. In [28], a method was proposed for sampling from generalized Matérn fields on compact Riemannian manifolds using the so-called matrix transfer technique. In this method, after discretizing the differential operator, the samples are obtained by applying the matrix function () on a random vector. The application of the matrix function is accomplished using contour integral and Krylov subspace methods similar to ours. But our approach differs in how we approximate the fractional operator and also in the choice of Krylov subspace methods. Recent work in [7, 8] develop efficient methods for sampling from the SPDE for certain fractional exponents (in our notation, ); these references also use the same quadrature scheme for the integral representation of the fractional inverse as ours. However, our approach generalizes it to all exponents and the use of Krylov subspace solvers substantially speeds up the solution of the SPDE. In a follow-up work [6], the authors developed the rational SPDE approach in which they use a rational approximation of the form (in our notation) , where are polynomials. When the degrees of the polynomials are small, the discretized representations retain the computational benefits of the integer versions. However, the error analysis in [6, Remark 3.4] suggests in order to balance the errors due to the spatial discretization and the error in the rational approximation, the degrees of the polynomials may be large, which leads to a loss in computational efficiency. The paper in [18]

also works in a similar setting as ours in bounded domains but solves the linear systems in parallel using multilevel preconditioned iterative solvers with linear complexity in the degrees of freedom; additionally, their approach is also applicable to compact metric spaces. Also relevant to this discussion are 

[22, 20] which deal with sampling from random fields on compact Riemannian manifolds. Our efficient approach for representing covariance matrices and sampling from the random field may be applicable to that setting as well. To our knowledge, these techniques have not been used within the context of Bayesian inverse problems, which is the primary focus of this paper.

2 Background

In Section 2.1, we review the SPDE approach to Gaussian random fields which form the basis for the prior distributions in Bayesian inverse problems. In Section 2.2, we review the appropriate definitions of fractional elliptic operators and show how to discretize them in Section 2.3.

2.1 SPDE approach to Whittle–Matérn Gaussian Random Fields

We follow the Whittle–Matérn random field approach, which models the random fields as stationary Gaussian random fields with the covariance kernel

where is the smoothness parameter, the integer value of which controls the mean-square differentiability of the process, is the modified Bessel function of the second kind of order , and is a scaling parameter that governs the length scale. When , this corresponds to the exponential covariance function, and as , this corresponds to the Gaussian kernel. On , samples from the random field with the covariance function correspond to the solution of the stochastic partial differential equation (SPDE)



is a spatial Gaussian white noise process with unit variance on

, and the marginal variance is

However, to implement this on a bounded region , we need to impose additional boundary conditions. The choice of the boundary conditions may affect the correlation near the boundaries. To mitigate this effect, one may solve the SPDE on a slightly larger domain that encloses ; other approaches have been proposed in [25, 15]. The covariance operator corresponding to the covariance kernel is . In this paper, work with covariance operators of the form


and with corresponding zero Neumann boundary conditions. The definition ensures that the covariance operator is a self-adjoint, trace-class operator [30].

Furthermore, since it more convenient to work with integer values of the exponent , the constraint forces us to choose for one spatial dimension and for two or three spatial dimensions. However, in the context of Bayesian inverse problems this choice of can result in oversmoothing of the solution and poor edge-preserving behavior of the solution. Furthermore, it is common to use parameters such as , etc; for example corresponds to the well-known exponential covariance kernel. However, when , which cannot be tackled using standard approaches. It is desirable to develop a numerical approach that can exploit the full power of these parameterized priors, without the constraints of integer powers. This motivates the use of fractional operators which we discuss in Section 2.2.

Following [24, Section 3.2] and [23, Section 2.6], It is also straightforward to model non-stationary effects by considering the SPDE


where , , and are now spatially varying functions and is a spatial Gaussian white noise process with unit variance. Assuming that , the corresponding covariance operator (with zero Neumann boundary conditions) is


Note that this assumes (2) as a special case.

2.2 Fractional Laplacian

For , let us use the notation to denote the realization in of

with zero Neumann boundary conditions. Then there exists a sequence of eigenvalues

satisfying with

. The corresponding eigenfunctions

. It is well-known that, forms the orthonormal basis of . Towards, this end, for any , we can define the fractional order Sobolev space as

We are now ready to define the fractional powers of , see [3, Section 8].

Definition 1.

Let with on , where is the outward normal vector. Then the fractional power of is given by

Notice that can be extended to an operator mapping from to where is the dual of . We further emphasize that the condition can be dropped in the above definition, see [3, Def. 2.2]. We also refer to [3], for extensions to nonzero Neumann conditions.

Finally, we can extend the above definition to in (3). In this case, we need to assume and to be symmetric, bounded in and elliptic.

2.3 Discretization

We show how to discretize the appropriate quantities in a finite dimensional space. The discussion here closely mimics the derivation in [11].

Finite dimensional parameter space

Let be a basis for the finite dimensional space , with where the subscript refers to a mesh discretization parameter. Assume that the basis functions corresponding to the nodal points satisfy for . We can then approximate the inversion parameter within this finite dimensional subspace as , and we denote the vector of coefficients in the expansion

For two functions , the inner product can be approximated as , where is the mass matrix with entries

We denote to be the vector space with the inner product , to distinguish from the usual Euclidean space.

Let . Then the matrix approximation of , denoted , is obtained as follows. First, we construct a matrix with entries

Here are the finite dimensional basis vectors and is the canonical basis vector for corresponding to the basis function . This gives us the matrix representation . The matrix transpose of , denoted , has entries , but the adjoint of , denoted , satisfies .

Discretization of the Covariance operator

In the following discussion, we first assume that and , with zero Neumann boundary conditions. To apply the fractional operator , we use sinc quadrature to approximate the integral [9]. Let solve

and compute the approximation to as

where , and . The number of terms in the expansion was chosen to balance the errors in the quadrature with the finite element discretization error. Alternatively, in matrix notation we can write

where is the stiffness matrix with entries

and is the mass matrix with entries for . The quadrature weights and nodes are


Based on the preceding discussion, the matrix representation of the discretized version of is

Let denote the set of natural numbers (excluding zero). Now consider the covariance operator where and . We write where be the integer part of and is the fractional part. Following [8, Remark 3.6], the discretized representation of the covariance operator is


Note that we can interchange the order of the fractional and the integer part since terms like and commute. Furthermore, the matrix is self-adjoint, i.e., . An important point to note here is that is not formed explicitly. The application of the matrix to a vector can be obtained by independent linear system solves which can be readily parallelized. However, we will show how to compute the action of efficiently using Krylov subspace methods for shifted linear systems (see Section 3.2).

3 Efficient computations with the covariance operator

In this section, we review the Multipreconditioned GMRES method for shifted linear systems [4] (MPGMRES-Sh) for solving sequences for shifted linear systems (Section 3.1). We use this solver to accelerate the computations with the covariance operator and generating samples from the SPDE (Section 3.2). Finally, in Section 3.3, we also describe how to efficiently compute the truncated Karhunen-Loève expansion.

3.1 Krylov methods for shifted systems

The dominant cost of applying the fractional operator is the solution of a system of shifted linear equations. We first briefly review the MPGMRES-Sh approach [4]. This is an efficient method for solving shifted linear systems of the form

where and are a set of shifts. This solver can be used in multiple ways to accelerate computations involving Gaussian random fields.

In the MPGMRES-Sh approach, we select a set of shifts that determine the preconditioners The method uses multiple preconditioners to build a basis that we will use to define a search space for all the shifted systems. In the below discussion, for simplicity, we drop the index on . At the -th iteration, given the matrix the method builds the matrix as

The method is initialized with . We define the matrices that collect the columns and as and . Using these relationship we can derive the multipreconditioned Arnoldi relationship for a shift

Here the matrices and are defined as follows. Let denote the the Kronecker product and define

and for . Here means a block diagonal matrix composed of the given subblocks. With these definitions in place, then and The matrix is block upper-Hessenberg.

The main point here is that the solution for each shifted system corresponding to shift is obtained as , where minimizes the residual and is obtained by solving the least squares problem

The details of this algorithm are given in [4, Algorithm 2].

Computational Cost

Assume that the cost of matvecs involving and is and the cost of applying the preconditioner is . If iterations of the MPGMRES-Sh are performed, then the total cost is

As we shall see in the numerical experiments, the number of iterations low (typically ). Note that the cost of building the basis is independent of the number of shifted systems, whereas the cost of the projected system depends on the number of shifted systems .

3.2 Applying the covariance operator

We now show how to compute the application of the covariance operator to a vector . Let the matrices and be as defined in Section 2.3.

Let be a positive number. If , then it is straightforward to compute an application of the covariance operator which requires sequential applications of solves involving . Henceforth, we assume that . Let be the integer part of and let be the fractional part. We can apply the covariance operator to a vector , this can be computed as


In writing this expression, we have replaced the weights with . This form is completely equivalent to (6) but makes it easier to precondition. To turn this expression into an efficient algorithm, we perform the following steps. In the offline phase, we determine the quadrature weights and nodes , preconditioner shifts . Furthermore, we compute factorizations of and the preconditioners . In the online stage, we compute the action of on times to obtain . We then apply MPGMRES-Sh with , , and shifts for to obtain the solutions . Finally, we compute . The details are given in Algorithm 1.

When the integer part is large, the number of solves with can become expensive; however, in practice, a value of is typically not used since the resulting field is extremely smooth. If an application requires the use of a large value of techniques from [19, Section 4.1] may be used.

Alternative approaches

In the expression for the application of the covariance defined in (7), we used a slight reformulation for the fractional part, which is obtained by factoring out in each summand. We also investigated two other formulations for computing . The first is perhaps the more natural version

While mathematically equivalent to the approach used in (7), we found that it was easier to precondition the formulation in (7). The second approach is to decompose

The advantage of this approach is that in this instance both the sets of weights, and , are at most one in their respective intervals and ; similarly the nodes and are also at most in their respective intervals. However, the downside is that for apply the fractional operator, we need two solves with MPGMRES-Sh. For this reason, we prefer the formulation in (7).

In the offline stage Algorithm 1, we factorize the preconditioners. However, this may not be feasible for very large-scale problems, especially in three spatial dimensions. In such cases, we can apply the preconditioners using iterative solvers. See [4, Section 4.2.2].

0:  matrices and , vector , scalar , preconditioner shifts
1:  {Offline stage: Precomputation}
2:  Split where and
3:  Determine quadrature weights and nodes for
4:  Factorize for and
5:  {Online stage: Solve}
6:  Apply the integer part:
7:  Solve the sequence and compute .
8:  return  Return
Algorithm 1 Applying the covariance operator

3.2.1 Generating samples from the SPDE

We can adapt the efficient approach in Algorithm 1 to generate samples from the SPDE


with zero Neumann boundary conditions. We follow the approach in [7] to solve the SPDE. Let be the Cholesky factorization of and let . For , we can compute a solution to the SPDE as

where , and the number of shifted systems , where

We can readily modify Algorithm 1 to efficiently solve this problem. In particular, we can leverage the use of the MPGMRES-Sh approach to accelerate the solution to the SPDE. If we instead use zero Dirichlet boundary conditions for , the error analysis in [7, Theorem 2.1] applies directly.

3.3 Karhunen-Loève expansions

In this section, we discuss an approach generate samples from the Gaussian measure on a Hilbert space , where is a the mean function and is a self-adjoint, positive semidefinite covariance operator. This approach is based on the Karhunen-Loève (KL) expansion of the stochastic process. In this approach we consider an orthonormal set of eigenpairs for where the eigenvalues are arranged in decreasing order as . Furthermore, let

be an independent and identically distributed (i.i.d.) sequence of random variables with

. Then by [30, Theorem 6.19], the sample defined by the KL expansion

is distributed according to . In practice, for certain applications, the eigenvalues exhibit rapid decay and, therefore, the expansion can be approximated by the truncated representation . In the context of Bayesian inverse problems, rather than estimating the field, one can estimate the coefficients of the KL expansion. This can be computationally beneficial because of the input dimensionality reduction.

Galerkin approximation

In this discussion, we will derive the expressions for the truncated KL expansion for the Gaussian measure . We will then discuss implementation details for computing the truncated KL expansion for the Gaussian measure . Consider a finite dimensional subspace with and let be a basis for the subspace. The eigenvalue problem is then replaced by the finite dimensional eigenvalue problem , where and . We seek a solution , by considering the Galerkin projection

Following the discussion in Section 2.3, we can write and , where is the discretized covariance matrix. Therefore, we now have the generalized Hermitian eigenvalue problem (GHEP)



is the generalized eigenvector. Let

be the generalized eigenvalues and let be the corresponding eigenvectors. Then a sample from the process can be generated as

Efficient implementation

We want to compute the truncated KL for the measure The GHEP (9) can be solved using any standard Krylov-based eigensolver (e.g., Lanczos) that does not form the matrix or explicitly but instead relies on forming the matvecs involving the matrices. The matvecs involving can be efficiently computed using the MPGMRES-Sh approach as described in Algorithm 1.

An alternative approach to computing the truncated KL expansion is to work with the eigenpairs of the operator , which is attractive at first glance, since it avoids the complications with computing the fractional part. However, the approach based on the covariance operator directly is advantageous for two reasons. First, to compute the truncated KL expansion, we need to target the eigenpairs of corresponding to the largest eigenvalues, which corresponds to the eigenpairs of corresponding to the smallest eigenvalues, which is typically much harder. Second, for the operator acts as a spectral transformation that enhances the eigenvalue gaps.

4 Application to Bayesian Inverse Problems

In this section, we show how to leverage the efficient representation of the covariance operator. In Section 4.1, we review the necessary background for Bayesian inverse problems. In Section 4.2, we derive the discretized posterior distribution that uses the covariance matrix in (6) as the prior covariance and in Section 4.3, we show how to adapt Generalized Hybrid Iterative Methods (GenHyBR) appropriately to efficiently compute the MAP estimate.

4.1 Bayesian Inverse Problems

Let be the inversion parameter that we wish to recover from the measurements , which are related through the measurement equation


where is the parameter-to-observable map or the measurement operator, and is the measurement noise which is assumed to be Gaussian with zero mean and covariance , which we write as

. Therefore, the likelihood probability density function takes the form

We assume that is endowed with a Gaussian prior with mean function and covariance operator , i.e., . Here acts a regularization parameter that may be known, or can be estimated as part of the inversion process (this is what we do in this paper). For the operator we take it to be defined in Section 2.1.

By the choice of the likelihood and the prior distribution, using the infinite dimensional Bayes formula, the Radon-Nikodym derivative of the posterior probability measure

with respect to the prior measure , takes the form

where is a normalization constant. The maximum a posteriori (MAP) estimator maximizes the posterior distribution and can be obtained by the solution to the optimization problem

If is a linear operator, then the posterior distribution is Gaussian; we exploit this fact in this paper.

4.2 Discretized posterior distribution

The covariance matrix after discretization takes the same form as in (6). It is easy to verify that the resulting matrix is self-adjoint; i.e., and is symmetric with respect to the standard Euclidean inner product and positive definite. This gives us a discretized representation of the prior distribution , i.e.,

For completeness, we give expressions for the posterior distribution, and the MAP estimate, for the discretized problem. This uses the components developed in Section 2.3

. The posterior probability density function takes the form

The MAP estimator can be obtained by solving the optimization problem

If the forward operator is linear , then the posterior distribution is Gaussian with , where


and is the adjoint operator of , and takes the form . Note that is self-adjoint with respect to the inner product, so that is symmetric positive definite.

4.3 GenHyBR for computing the MAP estimate

We now derive an alternate expression for the MAP estimate (12). Multiplying both sides of (12) with , we get

We use , and the change of variables . Then to obtain the MAP estimate, we solve the linear system for


and then compute . This form of the MAP estimator will be the basis for our computationally efficient procedure. Note that is a symmetric positive definite matrix with respect to the Euclidean inner product; we denote this by for convenience. The application of to a vector can be performed efficiently, using a slight modification of Algorithm 1. Furthermore, note that is symmetric with respect to the inner product.

We can reformulate the equations in such a way that we can use the generalized Golub-Kahan bidiagonalization (gen-GK) approach [13] for efficiently estimating the MAP estimate. We initialize the iterations with , , and . At step in this approach,

where are chosen such that for . These iterates can be collected to form the matrices , , and the bidiagonal matrix

This can be rearranged to obtain the relations, which are typically accurate up to machine precision

and the orthogonality relations and . The columns of form a basis for the Krylov subspace , where the Krylov subspace is defined as