Theoretical Guarantees for the Statistical Finite Element Method

The statistical finite element method (StatFEM) is an emerging probabilistic method that allows observations of a physical system to be synthesised with the numerical solution of a PDE intended to describe it in a coherent statistical framework, to compensate for model error. This work presents a new theoretical analysis of the statistical finite element method demonstrating that it has similar convergence properties to the finite element method on which it is based. Our results constitute a bound on the Wasserstein-2 distance between the ideal prior and posterior and the StatFEM approximation thereof, and show that this distance converges at the same mesh-dependent rate as finite element solutions converge to the true solution. Several numerical examples are presented to demonstrate our theory, including an example which test the robustness of StatFEM when extended to nonlinear quantities of interest.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

09/03/2020

An adaptive high-order unfitted finite element method for elliptic interface problems

We design an adaptive unfitted finite element method on the Cartesian me...
05/15/2019

The Statistical Finite Element Method

The finite element method (FEM) is one of the great triumphs of modern d...
10/21/2021

Statistical Finite Elements via Langevin Dynamics

The recent statistical finite element method (statFEM) provides a cohere...
05/13/2020

Extended Group Finite Element Method

Interpolation methods for nonlinear finite element discretizations are c...
02/04/2021

Structural stochastic responses determination via a sample-based stochastic finite element method

This paper presents a new stochastic finite element method for computing...
01/19/2022

Error analysis for a statistical finite element method

The recently proposed statistical finite element (statFEM) approach synt...
01/14/2021

Lebesgue integration. Detailed proofs to be formalized in Coq

To obtain the highest confidence on the correction of numerical simulati...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Mathematical models based on pde play a central role in science and engineering. Since PDEs do not generally yield closed form solutions, approximate solutions are typically obtained via numerical methods. The fem [strang1973analysis]

is one such scheme which approximates the solution by solving a discretised problem over a finite mesh of the PDE domain. fem-based methods have been widely studied, and as such, rigorous estimates of the error introduced through discretisation have been obtained for various families of PDEs. In particular, these results characterise the convergence of the error to zero as a function of the size of the mesh.

Mismatch between the model and the real system can occur for a variety of reasons, for example, if the geometry of the domain is not fully resolved, if underlying processes are not completely understood, or if approximations must be introduced to reduce the model to a closed system of PDEs. Stochasticity is often introduced in models to reflect the uncertainty arising from this lack of knowledge, or simply to capture intrinsically noisy processes. In the context of PDEs, stochasticity may appear in various forms, in the initial and/or boundary conditions, in the forcing terms, in the coefficients of the PDE or even in the geometry of the domain on which the PDE is defined. The resulting governing equations, known as Stochastic Partial Differential Equations (SPDEs), can be reformulated probabilistically as a probability distribution over the space of admissible solutions. The development and analysis of numerical methods for SPDEs is a very active area of research, see

[zhang2017numerical] for a review.

The increasing prevalence of instrumentation within modern engineering systems has driven the need for tools able to combine sensor data with predictions from PDE models in a manner which can mitigate any model misspecification. This technology forms an integral part of the modern notion of a digital twin, as a means of providing effective continuous monitoring of complex engineering assets.

The recently proposed statfem framework [girolami2019statistical, Girolami2021] seeks to address this need through a statistical formulation of fem, which permits numerical approximations of a PDE to be combined with sensor observations in a principled manner through the Bayesian formalism. Due to its general applicability and ability to provide quantification of uncertainty without expensive Monte Carlo simulations, it has established itself as an effective means of assimilating data into the large-scale PDEs which arise in civil, marine and aerospace engineering. As statfem is built on fem it is itself subject to numerical discretisation error which will propagate through to any predictions made by the model. Obtaining rigorous bounds on the bias introduced by discretisation error on statfem predictions and their associated uncertainty is a critical gap in the literature that this work will seek to address.

1.1 Related Work

This paper is focused on providing detailed error analysis for the statfem method first introduced in [girolami2019statistical, Girolami2021]. For clarity of ideas, we limit our attention to linear elliptic problems. Adapting these results to time-dependent and/or non-linear PDEs is left to future research.

There has been extensive research on deriving numerical methods for solving various classes of SPDEs. For detailed reviews of such methods the reader is referred to [lord2014introduction, MATTHIES1997283, sudret2000stochastic, STEFANOU20091031, xiu2010numerical, aldosary2018structural]

. Some of these methods, for example the Stochastic Galerkin Finite Element method (SG-FEM) and its many variants, also finds their roots in the classical finite element method, but differ fundamentally from StatFEM. More specifically, SG-FEM methods reformulate the SPDE as a deterministic PDE on a higher dimensional domain, where additional variables are introduced to characterise the underlying noise process. On the other hand, in the StatFEM approach, the approximation is itself intrinsically stochastic. While SG-FEM methods are applicable to a wider class of SPDEs, their scalability is limited by the curse of dimensionality resulting from the augmented noise dimensions. Additionally, while StatFEM formulation permits fast assimilation of measurement data, this is not straightforward for SG-FEM and related methods.

In the area of spatial statistics, Gaussian random fields are constructed as the solution of appropriately chosen SPDEs [lindgren2011explicit]

. The solution is approximated through a finite element discretisation of the weak form of the underlying SPDE. The resulting discretised solution is subsequently employed as a prior within a Bayesian inference problem for a spatial or spatio-temporal model

[sigrist2015stochastic]. For non-Gaussian likelihoods, fast approximate Bayesian Inference for the resulting latent Gaussian model is possible using an Integrated Nested Laplace Approximation (INLA) [martino2014integrated]. The construction of the statfem model is strongly connected to this statistical viewpoint of SPDEs. Indeed, in both cases, the spatial priors are gmrf. In the case of statfem the mean and covariance of the gmrf reflect the behaviour of the underlying deterministic PDE, while in the SPDE case the the covariance is tailored to ensure a specific degree of spatial decorrelation of the underlying latent Gaussian random field.

Conceptually, statfem also bears some similarity to Probabilistic Numerics Methods [hennig2015probabilistic], both being statistical reformulations of numerical methods. Indeed, PN methods have also been developed for approximating solutions of ode [tronarp2019probabilistic] as well as PDEs [conrad2017statistical, chkrebtii2016bayesian, cockayne2016probabilistic]. The two methodologies share common foundations, particularly Bayesian Probabilistic Numerical (BPN) Methods [cockayne2019bayesian] which are formulated as conditioning a prior distribution of potential solutions on partial evaluations of the underlying equations. statfem also builds on the Bayesian formalism, employing conditioning to incorporate new information, specifically noisy sensor observations, starting from a prior which characterises the distribution of approximate PDE solutions. However, these methods fundamentally differ in what they seek to achieve: pnm methods seek to recast numerical discretisation error as uncertainty, whereas the uncertainty arising in statfem primarily seeks to account for model error.

1.2 Contributions

The novel contributions of the paper are as follows:

  • We study the bias introduced by discretisation error in the statfem model, and derive rigorous upper bounds for the bias introduced in the prior and posterior in terms of the properties of the underlying finite element mesh.

  • We conduct a thorough simulation study confirming our theoretical results on a number of illustrative fem test problems.

  • We empirically explore the limits of our theory by studying the bias induced by statfem in a scenario where the assumptions underpinning our theoretical results are no longer valid.

1.3 Structure of the Paper

The remainder of the paper is structured as follows. In Section 1.4 we introduce notation required for the rest of the paper. In Section 2 we reintroduce statfem, deriving it from first principles. Section 3 contains our main theoretical contributions, error bounds for the statfem prior and the posterior when this prior is conditioned on sensor observations. We conduct simulations verifying our theoretical results in Section 4, and providing some concluding remarks in Section 5. The supplementary material contains the proofs required for the paper in LABEL:supp-sec:supplement.

1.4 Notation

In this section we introduce required notation for the remainder of the paper. Given a normed space denote the associated norm by . Given two normed spaces we denote by the set of all bounded linear operators from to . We denote the dual of by . When is a Hilbert space we denote the associated inner product by . We will sometimes drop the subscript for the norm and inner product when there is no risk of confusion over which spaces we are referring to.

Let be Hilbert spaces. Given an operator we denote the adjoint of by . Given , we say that is symmetric if and positive definite if it is symmetric and for all nonzero . Recall that the trace of can be defined as where is any orthonormal basis of . We will say that is trace-class if . For an operator we will define the operator norm to be , the trace norm to be , while the Hilbert-Schmidt norm is defined to be . Recall the well known inequality .

Let be an open domain with boundary and let be its closure. We denote by the space of continuous functions on and , respectively. Denote by the spaces of -integrable functions on and the Sobolev space with -integrable derivatives up to and including order . We use the notation and to be the elements of which are zero on . For clarity we will often use the notation , to denote the norm and semi-norm associated to .

1.5 Relevant Probability-Theoretic Concepts

In this section we briefly introduce and discuss the relevant probability-theoretic concepts we will require. To this end, for a space

let denote the Borel sigma-algebra of , and for a measurable space let denote the set of all Borel probability measures on .

Given a positive definite kernel function on the domain , and a function we say is a Gaussian process with mean function and covariance function if for every finite collection of points

the random vector

is a multivariate Gaussian random variable with mean vector

and covariance matrix . The mean function and kernel/covariance function completely determine the Gaussian process. We write to denote the Gaussian process with mean function and covariance function . We refer the reader to [Rasmussen06gaussianprocesses] for further details.

Gaussian processes that take values in a separable Hilbert space can be associated canonically with a Gaussian measure on the space . Gaussian measures on infinite-dimensional Hilbert spaces can be viewed as a generalisation of their finite-dimensional counterpart on , and are defined by a mean element and covariance operator , which is a positive-definite trace class operator. See [da2014stochastic] and [bogachev1998gaussian] for more information. Given a Gaussian process , this can be associated with the Gaussian measure with mean and covariance operator , associated with the kernel , defined by

Given any and which is positive-definite and trace class, there exists a Gaussian measure with mean and covariance operator [da2014stochastic, Proposition 2.18].

For the purposes of studying the error in statfem we will need to introduce a metric on the space of distributions. The probability metric we will work with in this paper is the Wasserstein distance; this will be justified in Section 3. Given two probability measures on a normed space

, having the first two moments finite, the Wasserstein-2 distance between the two measures,

, is defined as:

(1)

where is the set of couplings of and , i.e.

Throughout we will abbreviate .

It will be also useful to note that the Wasserstein distance between the measures above can be linked to the Wasserstein distance between the centered measures. Let denote the expectation of ; then the centered measure is defined by

for all . Denoting the means of by respectively we have from [cuesta1996lower]:

(2)

We also recall that there is an explicit expression for the Wasserstein distance between two Gaussians. Let be two Gaussian measures on a separable Hilbert space . Then from [masarotto2019procrustes], we have

(3)

2 The Statistical Finite Element Method

In this section we will present the statfem model. We begin by reviewing fem in Section 2.1. Then, in Section 2.2 we will derive the prior distribution over the solution to the pde that is used in statfem. Finally, in Section 2.3 we will condition this prior distribution on sensor observations to obtain the statfem posterior distribution.

2.1 The Finite Element Method

In this section we will briefly review the necessary concepts from fem needed for statfem, focusing on elliptic pde. We begin by establishing notation. Consider the following linear elliptic bvp:

(4)

for some positive, spatially-dependent conductivity and forcing . We make the following assumptions on the the data :

Assumption 1 (domain regularity).

The dimension of the domain is restricted to . The domain is convex polygonal, i.e. it is convex and its boundary is a polygon. Note that this implies that the domain is bounded.

Assumption 1 is assumed for simplicity; we note that it should be possible to relax some of these assumptions though this is not investigated in this work.

Assumption 2 (parameter regularity).

The parameter is a continuous function on , i.e. . Further, there exists a constant such that:

(5)

Inequality Eq. 5 is often referred to as (uniform) ellipticity of the parameter.

We consider the weak solution of the bvp Eq. 4, namely we seek which satisfies the variational formulation of Eq. 4, i.e.

(6)

Under Assumptions 2 and 1, standard regularity results [evans10, section 6.2.2] show that, for each , there exists a unique weak solution satisfying Eq. 6. Further, Assumptions 2 and 1 guarantee stronger regularity of the weak solution. From [larsson2008partial, section 3.7] it holds that and that there exists a constant such that . Note that this extra regularity can be generalised to elliptic problems in higher dimensions [grisvard2011elliptic, Theorem 3.2.1.2]

We can reformulate the variational formulation Eq. 6 as

(7)

where the bilinear form is defined by

(8)

Assumption 2 on ensures that the bilinear form is symmetric, bounded and coercive [evans10].

The Galerkin approximation of the variational problem (7) seeks an approximate solution within a finite dimensional subspace of which satisfies

The construction of the finite element method arises from a specific Galerkin approximation, where we introduce a triangulation of the domain, , where the are pairwise disjoint triangular elements with , and is the diameter of the largest element. This triangulation is usually referred to as the femesh and we will refer to as the mesh width. The discrete function space is then chosen to be piecewise polynomial functions defined on this mesh. The approximate weak solution can be expressed as a linear combination of basis functions . In this work, we assume that is the space of piecewise continuous functions, and chosen to be a nodal basis which satisfies , where are the internal vertices of the mesh.

By exploiting the linearity of both the bilinear form and the inner product in Eq. 7 the fem discretisation of the variational formulation of our bvp yields the following discrete system of linear equations:

(9)

where is the stiffness matrix whose -th element is given by:

(10)

The vector is the vector of dof and is the load vector whose -th entry is given by:

(11)

The stiffness matrix, , is a real symmetric matrix which is invertible by Assumption 2 [lord2014introduction, see Theorem 2.16]. Thus, the system of linear equations Eq. 9 can be solved for , yielding the finite element approximation where is the vector of the FE basis functions. The mass matrix, which is defined by

(12)

will also play an important role in the error analysis of Section 3.

It will be important for our work to have access to a bound on the error between the true solution and the fem approximation in terms of . To this end consider a family of triangulations , such that and for all . We will require a further technical assumption on the family of the femesh:

Assumption 3.

The meshes under consideration remain regular in the sense that as we decrease to the angles of all triangles are bounded below independently of .

Under the assumptions above we have the following classical error bound [larsson2008partial, see chapter 5]:

Theorem 1.

Let be the unique weak solution to bvp Eq. 4 and let by the fem approximation. Assume that and that the Assumptions 3, 2 and 1 hold. We then have the following error bound, for sufficiently small :

(13)

where is a positive constant independent of .

From this error bound a standard duality argument yields the following error bound [larsson2008partial, see chapter 5], which will play an important role in the error analysis of Section 3.

Theorem 2.

Let be the unique weak solution to bvp Eq. 4 and let by the fem approximation. Assume that and that the Assumptions 3, 2 and 1 hold. We then have the following error bound, for sufficiently small :

(14)

where is a positive constant independent of .

2.2 The statfem Prior

In this section we present the statfem approximation for the solution of a stochastically-forced elliptic PDE. The resulting probability distribution will subsequently be employed as a prior to be combined with sensor data through Bayesian updates. To this end, we consider the bvp Eq. 4 where is chosen to be a gp,

where and where we assume the covariance is regular enough so that realisations of lie almost surely in . It is sufficient to take where is continuous [scheuerer2010regularity, see Corollary 1]. This is satisfied for example by the RBF kernel or any kernel in the Matérn family. Another alternative sufficient condition would be to choose the kernel so that the associated covariance operator is an appropriate negative power of the Laplacian. This is outlined in detail in [stuart2010inverse, see section 6].

In our analysis we shall assume that the conductivity coefficient is deterministic, unlike [Girolami2021] where this is assumed to be stochastic. Since Eq. 4 is well-posed, the solution can be expressed as , where is the inverse of the differential operator . The pushforward of the probability measure for through induces a probability distribution of solutions supported on . Since the inverse operator is linear, the resulting distribution for will also be a Gaussian measure supported on , with mean and covariance operator , where is the covariance operator of on . Indeed, can be expressed as a Gaussian process [owhadi2015bayesian, Proposition 3.1]

(15)

where

Here denotes the action of the operator on the first argument of the kernel and its action on the second argument.

We intend to use Eq. 15 as a prior, which will later be combined with sensor observations through sequential Bayesian updates. An important observation is that the mean and covariance of the GP defined in Eq. 15 cannot be computed exactly for general problems, which precludes the calculation of marginal distributions or realisations of the GP. To overcome this issue, another GP, which employs a finite element approximation of the mean and covariance functions will be introduced.

As discussed in Section 2.1 fem approximates the solution to our elliptic bvp as a linear combination of the FE basis functions. In particular, the fem approximation to the solution of our elliptic bvp can be expressed as follows:

(16)

where the operator is defined as . Here the operator takes a function to the load vector , defined in Eq. 11, and is the stiffness matrix defined in Eq. 10. The statfem approximation to (15) is obtained by formally replacing the solution operator with in the mean and covariance terms. Due to the linearity of the approximation solution map, the resulting process is still a Gaussian process, with transformed mean and covariance.

Proposition 1.

The statfem prior from [girolami2019statistical, Girolami2021] is a gp of the form

(17)

where the mean and covariance function are defined by:

(18)
(19)

where and .

The question of how much error is introduced by replacing with can be formulating as quantifying the distance between the probability measures associated with each Gaussian process. To this end, we define and to be the Gaussian probability measures supported on which are associated with the GPs (15) and (17), respectively. We can write and where

and

Remark 1.

Let be the supports of respectively. Since the image of the approximate solution operator is the span of the FE basis functions it follows that, for all , is a finite dimensional subspace of . Further, under Assumptions 2 and 1 we have that is an infinite dimensional subspace of . In the case that and are disjoint we clearly have that and are mutually singular. Consider now the case that and are not disjoint and . This means that and have a non-empty intersection, and that there exists a subset not in the intersection. Further, gives 0 measure to , while has positive measure under . This implies that and are mutually singular. Finally, in the case that we can write as the orthogonal direct sum of and its complement. Since gives full measure to it must give measure 0 to the orthogonal complement of , which has positive measure under . Hence, we again have that and are mutually singular. In all cases, the mutual singularity follows from the Feldman-Hajek theorem [da2014stochastic, Theorem 2.25].

2.3 Posterior Updates of the statfem approximation

We now consider the problem of inferring the solution of the BVP (4) for an unknown forcing term based on noisy observations of the form

(20)

where is a linear observation map. Following the Bayesian formalism [stuart2010inverse], the solution of this inverse problem is characterised by the posterior distribution of given the observations ,

(21)

where is the negative log-likelihood, and

is the observational noise standard deviation. In practice, we would work with an alternative posterior distribution, where the prior distribution

is replaced with a prior obtained from a finite element approximation. To understand the consequences of making this approximation, we must quantify the distance (in an appropriate sense) between the posterior probability measures (

21) and

(22)

We shall make the following assumptions, on the observation map .

Assumption 4.

The observation map is a bounded linear operator with full range.

Remark 2.

Note that the assumptions on the covariance structure of the forcing guarantee that solutions to our noisy elliptic bvp lie in almost surely. Thus, the support of the true prior lies in . By the Sobolev Embedding Theorem for dimensions 1,2, one has that so that is well-defined for functions drawn from the true prior. Draws from the statfem prior are trivially in and so is well-defined for functions drawn from this prior as well. Note that the condition on the range of will always be true for some .

A property which will be very important for the subsequent error analysis of Section 3 is that the posterior distributions and remain Gaussian measures. This can be viewed as the infinite dimensional version of the analogous result for multivariate Gaussians based on “completing the square”.

Proposition 2.

Suppose that the observation map satisfies Assumption 2 and let . Then and are Gaussian measures with mean and covariance given by

(23)
(24)

respectively, for .

3 Discretisation Error Analysis

Intuitively we expect that as in an appropriate sense because of the convergence properties of the finite element discretisation. Similarly, we expect that the approximate posterior, , will converge to the true posterior, , in an appropriate sense, as .

To establish these properties we must select an appropriate metric for the distance between the priors and posteriors. A challenge with selecting this metric is that, as explained in Remark 1, the measures and

are generally mutually singular. This complicates the typical stability analysis that arises when studying Bayesian inverse problems, based on Kullback-Leibler divergence or Hellinger distance

[cotter2009bayesian]. In this paper we shall adopt an alternative approach, employing the Wasserstein-2 metric to quantify the influence of discretisation error on the posterior distribution. This is advantageous in this context for several reasons: firstly it is not dependent on the use of a common dominating measure. Secondly, unlike other divergences, it is robust under vanishing noise: in this limit, the Wasserstein distance between the two probability measures will converge to the deterministic discretisation error between the respective two means. Finally, while computing Wasserstein distance is intractable for general measures, there exists a closed form for the Wasserstein-2 distance between Gaussian measures, valid in both finite and infinite dimensions. See [masarotto2019procrustes] for a statement of this and see [dowson1982frechet, olkin1982distance] for proofs in the finite dimensional case, and [cuesta1996lower] for a proof in the infinite dimensional case.

The results in this section will focus on bounding the Wasserstein distance between the priors and , and the posteriors and as a function of . The strategy employed to obtain these bounds involves exploiting an important connection between Wasserstein distance between Gaussian measures on Hilbert spaces and the Procrustes Metric between the respective covariance operators, as detailed in [masarotto2019procrustes]. We now present these results in the following two theorems.

Theorem 3 (Prior Error Analysis).

Assuming that , and that the domain, , and parameter, , satisfy Assumptions 3, 2 and 1 there exists a constant , independent of , such that, for sufficiently small we have:

(25)

The next theorem describes how this error in the prior propagates forward to the error between arbitrary linear functionals of the true and statfem posterior, again as a function of the mesh width .

Theorem 4 (Posterior Error Analysis).

Let be a bounded linear functional. Assuming that , and that the domain, , and parameter, , and the observation map , satisfy Assumptions 4, 3, 2 and 1 there exists a constant , independent of , such that, for sufficiently small we have:

(26)

Remark 3.

Theorem 4 provides a bound on , for arbitrary , rather than on . Naturally a bound on the latter quantity is stronger and more of interest, but deriving such a bound is technically challenging. Nevertheless the empirical results in Section 4 suggest that the stronger result may hold, and as a result this will be the focus of future research.

Theorem 3 shows that we have control over the Wasserstein distance between the full priors while Theorem 4 shows that, in the posterior case, we have control over linear quantities of interest. Nonlinear quantities of interest are also interesting and we will explore whether they are controlled empirically in the numerical experiments in Section 4.

4 Experiments

In this section we will present some numerical experiments demonstrating the theoretical results from Section 3. In Section 4.1 we will consider a one-dimensional example in which we have access to an explicit formula for the Green’s function, while in Section 4.2 we will consider a more challenging two-dimensional problem in which the solution operator is not available. In both scenarios we will aim to demonstrate the statfem prior and posterior bounds. Note that while Theorem 4 shows only that is our experiments suggest that for the problems examined here it also holds that is . Finally, in Section 4.3 we explore whether our convergence results extend beyond the linear setting by calculating the Wasserstein distance between the prior and posterior distributions of a nonlinear quantity of interest, given by . The code for these numerical experiments is available online111https://github.com/YanniPapandreou/statFEM. Additional implementation details for the experiments are provided in LABEL:supp-sec:impl_details.

4.1 One-dimensional Poisson Equation

In this section we will present a numerical experiment involving the following one-dimensional Poisson equation:

(27)

where the forcing is distributed as,

(28)

with mean and covariance function given by:

(29)
(30)

We took and . Note that the bvp Eq. 27 is an instance of the general problem Eq. 4 with , , and . The Green’s function, , for the bvp Eq. 27 is easily computed to be:

(31)

where is the Heaviside Step function. As discussed in Section 2.2 the solution to problem Eq. 27 will be the Gaussian process defined by Eq. 15. Since we readily have the Green’s function available for our 1D problem we can explicitly write down this distribution as:

(32)

where the mean and covariance functions are given by:

(33)
(34)

Note that we have explicitly computed the integral for the mean , but not for the covariance . The double integral for the covariance is approximated numerically using quadrature to accurately compute this on a reference grid.

4.1.1 Prior results for 1-D example

Since we can straightforwardly evaluate the true prior numerically, we can now demonstrate the error bound for the statfem prior given in Theorem 3. In order to do this the following remark, taken from [convergenceRateSite], will be useful.

Remark 4.

When we have access to the true distributions we can estimate the rate of convergence of the statfem priors and posteriors as follows. Let denote the true and statfem prior (or posterior) respectively. If we assume that we have a rate of convergence , i.e.,

(35)

then we have that there is a constant , independent of , such that,

(36)

from which it follows that,

(37)

From this we can see that we can estimate the rate of convergence, , when we know the true distributions by computing the Wasserstein distance for a range of sufficiently small -values, plotting a log-log plot of these values, and estimating the slope of the line of best fit.

As outlined in Remark 4 above we compute, for a range of sufficiently small -values, an approximation of the Wasserstein distance between the two priors and then plot these results on a log-log scale. We take a reference grid of equally spaced points in to compare the covariances, and we take -values in . With these choices we obtain the plot shown in Fig. 1 below. From this plot we can see that the results indeed lie along a straight line with a slope estimated to be (to decimal places) using scipy

’s built in linear regression function.

Figure 1: Log-log plot of Wasserstein distance against for the priors in a 1-D example. The reference line has a slope of 2. The slope of the fitted line is estimated to be , in agreement visually with the reference line.

4.1.2 Posterior results for 1-D example

For this example we will consider a specific case of the general Bayesian inverse problem outlined in Section 2.3 which will correspond to taking the observation operator to be a pointwise evaluation operator. More specifically, we suppose that we have sensor data which correspond to noisy observations of the value of the solution at some points . We thus take to be the operator which maps a function to the vector . For this choice of the update equations for the posterior means and covariances given in Eqs. 24 and 23 respectively can be simplified to a form which is more amenable to computation. These forms, for being either the symbol or , are:

(38)
(39)
(40)
(41)

where is the covariance function associated with and where the matrix has -th entry . The specific sensor observations used in the experiments were obtained by simulating a trajectory from evaluated at the locations .

Since we have the true posterior in an explicit form we can now demonstrate that the error bound for the full statfem posterior agrees with the error bound for linear quantities of interest from the posterior given in Theorem 4. We demonstrate this error bound for 4 different levels of sensor noise,

. These levels are chosen so that the lower values are comparable to the prior variances. We take

sensor readings equally spaced in the interval . For each level of sensor noise we again follow the argument outlined in Remark 4 and compute, for a range of sufficiently small -values, an approximation of the Wasserstein distance between the two posteriors, and plot these results on a log-log scale. We take a reference grid of equally spaced points in to compare the covariances, and we take 28 -values in . With these choices we obtain the plot shown in Fig. 2 below. We also present the estimates for the slopes and intercepts of the lines of best fit in Table 1.

Figure 2: Log-log plot of Wasserstein distance against for the posteriors in a 1-D example for varying levels of sensor noise. The reference line has a slope of 2. Notice how the best fit lines are all parallel to the reference line.
slope intercept
0.00005 2.0162 -2.9227
0.00010 2.0102 -2.9486
0.01000 1.9912 -2.7739
0.10000 1.9940 -2.2309
Table 1: Estimate of slopes and intercepts for the different levels of sensor noise . All of the slopes are around 2.

We can see from Fig. 2 and Table 1 that we obtain lines of best-fit which are all parallel to each other, each with a slope of around . The intercepts are slightly different for each level of sensor noise, as is to be expected since the constant in Theorem 4 depends upon . These results show that the error bound for the full posterior distributions is also of order , suggesting that it is possible to extend Theorem 4 to the Wasserstein distance between the full posteriors. Proving this is left to future research.

4.2 Two-dimensional Poisson Equation

In this section we will present a numerical example involving the following two-dimensional Poisson equation:

(42)

where the forcing is distributed as,

(43)

with mean and covariance given by

(44)
(45)

Again we took and . Note that the bvp Eq. 42 is an instance of the general problem Eq. 4 with , , and . For this problem we use the approach outlined in Remark 5 below to demonstrate the error bounds. We present the error estimates for the prior and posterior separately in the following sections.

4.2.1 Prior results for 2-D example

Following the notation of Theorem 3 we denote the statfem prior by . Since we do not have access to the true distributions here we will need an alternative approach to that of Remark 4. This approach is outlined in the remark, taken from [convergenceRateSite], below:

Remark 5.

When we do not have access to the true distributions we can proceed as follows. Again assuming that we have a rate of convergence, , as given by Eq. 35, we can obtain an estimate as follows. First we utilise the triangle inequality to obtain,

(46)

Together with Eq. 36 this yields,

(47)

Similarly we have,

(48)

Dividing the two above equations yields,

(49)

from which it follows that,

(50)

From the above equation we can see that we can obtain an estimate of the rate of convergence by computing, for a range of sufficiently small -values, the base-2 logarithm of the ratio and seeing what these logarithms tend to as gets very small.

As outlined in Remark 5 above we will compute, for a range of sufficiently small -values, the base-2 logarithm of the ratio and see what these logarithms tend to as approaches 0. From the results of Theorem 3 we expect the logarithms to converge to 2.

We take a reference grid of equally spaced points in to compare the covariances. We compute the logarithms for 59 -values in the interval . Note that these -values are not equally spaced; instead they are arranged according to a refinement strategy which was utilised for efficient use of memory. With these choices we obtain the plot shown in Fig. 3 on the next page.

Figure 3: Plot of against , in the prior case, for our 2-D example.

From this plot we can see that the logarithms seem to be approaching as , as our theory suggests. However, the results are somewhat oscillatory, which is to be expected for the approach outlined in Remark 5. Nevertheless the results seem to be oscillating around 2. To illustrate this more we smooth the above results by first discarding the results for larger -values and then computing a cumulative average of the ratios before applying the base-2 logarithm. Note that this cumulative average is taken by starting with the results for large first. We take our cutoff point to be . These smoothed results are shown in Fig. 4.

Figure 4: Plot of smoothed for , in the prior case. The horizontal axis indicates the smallest value taken in the cumulative averages.

From Fig. 4 we can see that by discarding the values corresponding to the base-2 logarithms of the rolling averages converge to a value slightly greater than . The final logarithm for these rolling averages is (to 2 decimal places). Thus, we obtain results in close agreement with Theorem 3.

4.2.2 Posterior results for 2-D example

For this example we will consider the same specific case of the general Bayesian inverse problem which we chose in Section 4.1.2. That is, we will again take the observation operator to be an evaluation operator. Since we do not have the true prior in an explicit form, we also do not have the true posterior and as such we will utilise the argument in Remark 5 to demonstrate the statfem posterior error bound. To generate sensor readings, since we do not have access to the true prior, we instead simulated trajectories from the statfem prior with a very fine grid of mesh size .

We take a reference grid of equally spaced points in to compare the covariances. We set the sensor noise level to and take sensor readings equally spaced in . The level of sensor noise is chosen to be comparable to the prior variances. We follow the notation of Theorem 4 and denote the statfem posterior by . We compute the logarithms for 53 -values in the interval , which are spread out according to the refinement strategy mentioned in Section 4.2.1. With these choices we obtain the plot shown in Fig. 5 below.

Figure 5: Plot of against , in the posterior case, for our 2-D example.

From this plot we can see that the logarithms seem to be approaching as gets small. However, just as in Section 4.2.1, the results are oscillatory but seem to be oscillating around . We will thus smooth the results as explained in Section 4.2.1. We again take the cutoff point to be . These smoothed results are shown in Fig. 6 on the next page. From this figure we can see that by discarding the values corresponding to the base-2 logarithms of the rolling averages converge to a value slightly greater than 2. The final logarithm of these rolling averages is (to 2 decimal places). Thus, just as in the 1D example we see results which suggest that it is possible to extend Theorem 4 to the case of the full posteriors.

Figure 6: Plot of smoothed for , in the posterior case. The horizontal axis indicates the smallest value taken in the cumulative averages.

4.3 Distribution of the maximum

In this section we will present a numerical experiment involving the same one-dimensional Poisson equation Eq. 27. The forcing will follow the same distribution as given in Eq. 28, but we will adjust the kernel lengthscale to be instead, as with a larger length-scale the maximum of the statfem prior and posterior were found to converge too quickly to demonstrate interesting behaviour. Since the solution, , is random the maximum, , is a univariate random variable. Since taking the maximum is not a linear functional we have no guarantees that the distribution of the maximum will be Gaussian. More importantly our results from Theorem 4 will not necessarily hold for this example.

We will investigate the errors for the true and statfem prior/posterior distributions for the maximum via a Monte Carlo approach as we now outline in the following remark.

Remark 6.

Let denote the true and statfem prior (or posterior) for the maximum of respectively. We are interested in investigating the rate of convergence of as . This is done as follows: in both the prior and posterior case we will first fix a reference grid and obtain samples of trajectories of the solution on this grid from the true distribution. We then take the maximum values of these trajectories to obtain an (approximate) sample from the true distribution for . For a range of -values we will then simulate trajectories of the solution on the same reference grid from the statfem distribution and then take the maximum values of these to obtain an (approximate) sample for the statfem case. We will then compute an approximation of the Wasserstein distance by computing the Wasserstein distance between these two empirical distributions for each -value by utilising the Python package POT [flamary2021pot]. We then investigate the rate of convergence as outlined previously in Remark 4.

The results for the statfem priors and posteriors will be presented separately in the following sections.

4.4 Prior results for maximum example

As outlined in Remarks 4 and 6, we compute, for a range of sufficiently small -values, a Monte Carlo approximation to the Wasserstein distance between the two priors for the maximum. We take a reference grid of equally spaced points in and -values in . We simulate trajectories from both priors to obtain approximate samples of the maximum. With these choices we obtain the plot shown in Fig. 7 below. From this plot we can see that the we indeed have convergence of the Wasserstein distance for this non-linear quantity of interest to , but at a different rate than that given in Theorem 3. The slope is estimated to be (to 2 decimal places).

Figure 7: Log-log plot of Wasserstein distance against for the priors for the maximum. The reference line has a slope of 1.35. The slope of the fitted line is estimated to be , in agreement visually with the reference line.

4.5 Posterior results for maximum example

For the posterior case we take a reference grid of equally spaced points in to simulate trajectories and -values in . We take sensor readings equally spaced in the interval . This is repeated for different levels of sensor noise, , again chosen so that the lower noise levels are comparable to the prior variances. For each level of noise we follow the argument outlined in Remarks 4 and 6. With these choices we obtain the plot shown in Fig. 8 on the next page. We also present the estimates for the slopes and intercepts of the lines of best fit in Table 2.

Figure 8: Log-log plot of Wasserstein distance against for the posteriors for the maximum for different levels of sensor noise.
slope intercept
0.00010 2.1862 -2.3203
0.00050 1.9552 -3.0104
0.00100 1.6573 -3.7885
0.01000 1.3131 -3.2989
Table 2: Estimate of slopes and intercepts for the different levels of sensor noise .

From Fig. 8 and Table 2 we can see that for each level of sensor noise we do indeed have convergence to