 # Probabilistic Numerical Methods for PDE-constrained Bayesian Inverse Problems

This paper develops meshless methods for probabilistically describing discretisation error in the numerical solution of partial differential equations. This construction enables the solution of Bayesian inverse problems while accounting for the impact of the discretisation of the forward problem. In particular, this drives statistical inferences to be more conservative in the presence of significant solver error. Theoretical results are presented describing rates of convergence for the posteriors in both the forward and inverse problems. This method is tested on a challenging inverse problem with a nonlinear forward model.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Partial differential equations (PDEs) are challenging problems which often have no analytical solution and must be solved numerically. In the style of Probabilistic Numerics (PN) , in this work we describe methods for probabilistically modelling the uncertainty in the true solution arising from the numerical approximation. This uncertainty can be thought of as arising from finite computation, as formalised in the Information Complexity literature ; in solving a problem numerically, we are forced to discretise some aspect of it. In the present work we model this uncertainty as arising from taking a finite number of evaluations of the forcing terms of the system of PDEs.

One of the core principles of probabilistic numerics is that, in complex procedures in which multiple numerical approximations must be composed to produce a final result, the uncertainty from each procedure can combine in a nontrivial way which can lead to incorrect inferences. The example we take here is that of PDE constrained Bayesian inverse problems, in which we wish to estimate parameters of a PDE model in a Bayesian framework, based on observations of a system which is believed to be described by the underlying PDE. In such problems it has been shown that employing an inaccurate PDE solver in the sampling can lead to incorrect inferences in the inverse problem

.

There has been recent interest in construction of probabilistic solvers for PDEs. Work by  constructs a nonparametric posterior distribution for ODEs and PDEs by injecting noise into standard numerical solvers in such a way as to maintain the convergence properties of these solvers. In , the authors discuss a meshless method which is similar to the method discussed herein by modelling the forcing of the PDE. This is developed in , which discusses a methodology for probabilistic solution of PDEs by an hierarchical game-theoretic argument. These latter two approaches do not examine application to inverse problems, however.

Work from  discusses the interpretation of symmetric collocation as the mean function of a Gaussian process prior after conditioning on observed values of the forcing, but applies this methodology predominantly to stochastic differential equations.

### 1.1 Structure of the Paper

We begin by introducing the concept of a probabilistic meshless method and giving some theoretical results related to it. We then show how the posterior measure over the forward solution of the PDE can be propagated to the posterior measure over parameters in a Bayesian inverse problem. Finally we present some numerical results for a challenging nonlinear inverse problem given by the steady-state Allen–Cahn equations.

Proofs for the presented theorems are omitted, and can be found in .

## 2 The Probabilistic Meshless Method

We now introduce the concept of a probabilistic meshless method (PMM). Consider an open, bounded subset of with Lipschitz boundary . We seek a solution , some Hilbert space of functions defined over , of the following system of operator equations

 Au(x)=g(x) x∈D Bu(x)=b(x) x∈∂D. (1)

Here and with and . is associated with a partial differential operator and is associated with the boundary conditions of the system. For notational simplicity we restrict attention to systems of two operators, however the methods discussed can be generalised to an arbitrary number of operator equations.

We proceed in a Bayesian setting by placing a prior measure on , and determining its posterior distribution based on a finite number of observations of the system given in Eq. 1. In this work we focus on the most direct observations of said system; namely, we choose sets of design points , for , . We then evaluate the right-hand-side corresponding to each of the operators in the system at these points; , .

It remains to specify our prior distribution. Here we choose a Gaussian process prior . Recall that a Gaussian Process is characterised by its mean function and its covariance function , and the property that, if ) then for any set of points ,

 u(X) ∼N(μ,Σ) i =m(xi) ij =k(xi,xj)

As is common in the literature we will use a centred Gaussian process prior; . Define

 L=[AB]¯L=[¯A¯B]

and furthermore for sets , , , let denote the Gram matrix of applied to and ; . Similarly , etc. Then

 L¯LK(X0,X0) =[A¯AK(XA0,XA0)A¯BK(XA0,XB0)B¯AK(XB0,XA0)B¯BK(XB0,XB0)] LK(X0,X) =[AK(XA0,X)BK(XB0,X)] ¯LK(X,X0) =[¯AK(X,XA0)¯BK(X,XB0)]

Here is to be interpreted as a set of points at which we evaluate those functions drawn from the posterior distribution, in contrast with which is the set of points at which evaluations of the forcing terms are taken.

###### Proposition 1 (Probabilistic Meshless Method).

Assume and are linear operators. Then the posterior distribution over the solution of the PDE, conditional on the data is such that, for we have

 u(X) ∼N(μ,Σ) μ =¯LK(X,X0)[L¯LK(X0,X0)]−1[g⊤b⊤]⊤ (2) Σ

Note that the mean function in Eq. 2 is the same as the numerical solution to the PDE that would be obtained using the method of symmetric collocation .

Thus far we not discussed the choice of prior covariance . There are several interesting choices in the literature. Work in  proposes use of a covariance which encodes information about the system through its Green’s function;  examined the properties of this choice in more detail. However, reliance on the Green’s function, which is not in general available in closed-form for complex systems, is a significant drawback. In practice we will generally posit a prior covariance directly by examining the system in question and selecting a prior which encodes a suitable level of differentiability.

We now present a theoretical result describing the rate of convergence of the posterior measure . Denote by the differential order of the PDE; that is, the maximum number of derivatives of required. Furthermore denote by the smoothness of the prior; the number of weak derivatives that almost surely exist under the prior measure. Lastly, define to be the “fill distance” of the design points :

 h=supx∈Dminx′∈X0∥∥x−x′∥∥2
###### Theorem 2 (Rate of Convergence).

For a ball of radius centred on the true solution of the system (1):

 Πg,bu(Bϵ(u0)c)=O(h2β−2ρ−dϵ)

where denotes the set complement.

### 2.1 Illustrative Example: The Forward Problem

We conclude this section by examining the performance of the probabilistic meshless method for a simple 1-dimensional PDE. Consider the system

 −∇2u(x) =sin(2πx) x∈(0,1) u(x) =0 x=0,1

the solution to which can be computed by direct integration to be . We compute the PMM solution to this PDE with varying number of design points. In this setting the Green’s function for the system is available explicitly, and so we used its associated prior covariance as suggested in ; full details are available in .

Samples from the posterior distribution can be seen in Fig. 1; note how, even with 20 design points, there is still significant posterior uncertainty. Convergence plots as the number of design points is increased are shown in Fig. 2 Figure 1: Samples from the posterior distribution over the unkown solution to a one-dimensional PDE, with mA=10 (left) and mA=40 (right). Figure 2: Convergence of mean function (left) and posterior covariance trace (right) as the number of design points mA is increased.

## 3 Application to Bayesian Inverse Problems

We now turn to an examination of how the PMM, constructed in the previous section, can be applied in Bayesian inverse problems. We now have a system in which we assume the operator depends upon some parameter , which we emphasise in the below system:

 Aθu(x)=g(x) x∈D Bu(x)=b(x) x∈∂D.

In a Bayesian inverse problem we place a prior distribution over , , and seek to determine its posterior distribution based on data collected at locations , . Further details on Bayesian inverse problems can be found in .

Such a posterior distribution is usually intractable and must be investigated by sampling, which involves solution of the underlying system of PDEs as the sampler visits different values of . We assume that the data is obtained by direct observation of the solution at these locations, corrupted with Gaussian noise

 yi=u(xi)+ξi

where . Our likelihood is thus given by

 p(y|θ,u)=N(y;u,Γ) (3)

where

are each vectors in

, with and .

Since the solution to the PDE system is inaccessible it is common to replace with an approximation obtained by some numerical scheme. We instead use the PMM as the forward solver, obtaining a measure describing our uncertainty. We may them marginalise in Eq. 3 over this measure to obtain

 pPN(y|θ) =∫p(y|θ,u)Πg,bu(du) =N(y;μ(θ),Γ+Σ(θ)) (4)

where are as in Prop. 1, and we have emphasised the dependence on . This is thus similar to the standard approach of replacing with in Eq. 3, but we compensate for the inaccuracy of the forward solver with an additive covariance term incorporating the uncertainty in the posterior distribution for the forward problem.

We now present a result which guarantees consistency in the inverse problem when we replace the likelihood in Eq. 3 with that in Eq. 4.

###### Proposition 3.

(Inverse Problem Consistency) Let be the posterior distribution which uses the PN likelihood given in Eq. 4. Assume that the posterior distribution contracts such that as , a Dirac measure centred on the true value of , . Then contracts such that provided

 h=o(n−1/(β−ρ−d/2))

### 3.1 Illustrative Example: The Inverse Problem

We now return to the previous illustrative example to demonstrate the use of a probabilistic solver in the inverse problem. Consideronsider the system

 −∇⋅θ∇u(x) =sin(2πx) x∈(0,1) u(x) =0 x=0,1

with the goal of inferring the parameter . Data was generated from the explicit solution to this problem with at locations , and corrupted with Gaussian noise with distribution .

In Fig. 3 we compare posterior distributions for generated with the PMM versus the standard approach of plugging a numerical solution of the PDE into the likelihood and ignoring discretisation error. The numerical method used in the standard approach was symmetric collocation, the most natural comparison. Note that when using collocation the posterior distributions are peaked and biased for small , and that the posterior uncertainty does not appear to depend on the number of design points. Conversely when using the probabilistic method we see that for small the posterior distributions are wide and flat, while as increases the distributions peak and centre on the true value of . Figure 3: Posterior distributions over θ with varying numbers of design points, on the left using the PMM, and on the right the standard approach of using a plug-in estimate for the PDE solution, here given by symmetric collocation.

Thus, with a standard numerical method the posteriors over

do not take into account the quality of the numerical solver used; for poor forward solvers based on coarse discretisations, the posteriors produced are as confident as those produced with a fine, accurate numerical solver. With a probabilistic forward solver the variance in the forward solver is propagated into the inverse problem, resulting in robust inferences even when the discretisation is coarse.

## 4 A Nonlinear Example

We now present an application of the methods discussed herein to a nonlinear partial differential equation known as the steady-state Allen–Cahn system, a model from mathematical physics describing the motion of boundaries between phases in iron alloys. This is given by

 −δ∇2u+δ−1(u3−u) =0 x∈(0,1)2 u =+1 x1∈{0,1},x2∈(0,1) u =−1 x2∈{0,1},x1∈(0,1) (5)

We phrase this as an inverse problem for determining . This system is noteworthy for the fact that it does not admit a unique solution; the three solutions to this system for are shown in Fig. 4. These were generated using the deflation technique described in .

Since this is a nonlinear system the posterior distribution will not be Gaussian, and we must resort to sampling techniques to explore the posterior distribution. In brief, we introduce a latent function and rearrange the system as follows:

 −δ∇2u−δ−1u =z (6) δ−1u3 =−z (7)

Note that by adding Eq. 6 and Eq. 7 we return to the original equation describing the interior dynamics given in Eq. 5. However Eq. 7 is monotonic and thus invertible; by inverting this we arrive at a new system:

 −δ∇2u−δ−1u =z u =(−δz)1/3

This system is equivalent to the original system but, importantly, is linear. Thus by the introduction of we are able to arrive at a new system which can be solved using the PMM.

It remains to describe , a latent function whose value is unknown. We seek to marginalise in the likelihood

 p(y|δ)=∫p(z|δ)∫p(y|u)Πg,b,zu(du)dz (8)

where is now additionally conditioned on a known value for . This integral is intractable. However by sampling from the posterior distribution over

by pseudo-marginal MCMC it is sufficient to produce an unbiased estimate of this quantity. This is accomplished by importance sampling; we assume an improper prior

and approximate Eq. 8 by the Monte-Carlo estimate

 p(y|δ)≈1MM∑i=1∫p(y|u)Πg,b,ziu(du)r(zi|y,δ)

for .

The importance distribution is chosen by solving the original system in Eq. 5 using the techniques described in , with a coarse finite-element solver. This gives estimates for the solution given a value of . By applying Eq. 7 to these estimates we obtain estimates of three values of ; .

To handle the multimodality in the solutions we extend the state-space of the inverse problem to include the solution index

. The importance distribution is constructed as a Gaussian distribution

 z∼GP(^zj,k)

with thus the appropriate multivariate Gaussian density after the field for has been discretised. Discretisation points are necessarily chosen to match , the design points for in the interior of the domain.

For application of the PMM we choose a squared-exponential prior covariance

 k(x,x′)=exp(−∥x−x′∥222ℓ2)

which is known to describe infinitely-differentiable functions. This choice is motivated by the high differential order required by the PDE; since we must be able to apply both the operator and the adjoint to the kernel, in this case we require that the covariance be twice differentiable in each argument, which amounts to a four-times differentiable covariance if the covariance chosen is isotropic.

The length-scale hyper-parameter was incorporated into the MCMC procedure, endowed with a half-Cauchy hyper-prior as recommended in . The parameter of interest was endowed with a uniform prior over the interval , in which the PDE was empirically found to consistently have three solutions.

Posterior distributions for generated using this methodology are shown in Fig. 5; these are compared with posterior distributions generated using a finite-element forward solver. In the finite-element case we see a more extreme version of the bias shown in Fig. 3 for coarse grids, whereas when using a probabilistic forward solver the posteriors are once again wider to account for an inaccurate forward solver.

We should also comment on the comparison to the finite-element method here; in the previous example comparison was to the symmetric collocation method for solving PDEs; in this case the comparison is more direct as the solution for the PDE in symmetric collocation is simply the posterior mean from the PMM. In this case we use a finite-element solver both to highlight the fact that the behaviour witnessed when using symmetric collocation is not unique to that solver, and because in existing methods for finding the multiple solutions to the Allen-Cahn equation the base numerical method applied is the finite-element method. Furthermore we note that as the underlying numerical method becomes arbitrarily accurate, the posterior inferences made in the inverse problem should be invariant to the forward solver used. Figure 5: Posterior distributions for δ obtained by use of the technique described herein (left) versus a standard Finite Element solver that does not model discretisation error (right).

## 5 Discussion

We have shown how to construct probabilistic models for the solution of partial differential equations, which quantify the uncertainty arising from numerical discretisation of the system. We have further shown how the uncertainty in the forward problem can be propagated into posteriors over parameters in inverse problems. This allows robust inferences to be made in inverse problems, even when the numerical scheme used to solve the forward problem is inaccurate, which is useful in cases where obtaining highly accurate solutions is computationally expensive, or where we are willing to tolerate less certain inferences in exchange for fast computation. In particular we have illustrated how this might be used to make inferences in nonlinear systems where a variety of phenomena, such as a non-unique solution could cause a numerical solver to fail.

Immediate extensions to this work lie in examining evolutionary systems in which the solution is additionally a function of time; the added complexity from the additional dimension demands more focussed attention. We also seek to examine a more generic approach for sampling from posterior distributions for nonlinear PDEs. Furthermore we note that the observations we have chosen for the forward problem are only one possible choice; another attractive option is given by Galerkin schemes for approximating PDEs, by choosing our observations to be Galerkin projections.

Lastly we seek to explore other choices of prior. The Gaussian measure is an unrealistic option in general, as it penalises extreme values and prevents encoding such simple properties as positivity of solutions.

## 6 Acknowledgements

TJS was supported by the Free University of Berlin within the Excellence Initiative of the German Research Foundation (DFG). MG was supported by EPSRC [EP/J016934/1, EP/K034154/1], an EPSRC Established Career Fellowship, the EU grant [EU/259348] and a Royal Society Wolfson Research Merit Award.

The authors would like to thank John Skilling for useful discussion, Patrick Farrell for providing code used in generating these results and François-Xavier Briol for helpful feedback. In addition they express gratitude to the developers of the Python libraries Autograd and GPyOpt.

## References

• Cialenco et al.  Igor Cialenco, Gregory E Fasshauer, and Qi Ye. Approximation of stochastic partial differential equations by a kernel-based collocation method. International Journal of Computer Mathematics, 89(18):2543–2561, December 2012.
• Cockayne et al.  Jon Cockayne, Chris Oates, Tim Sullivan, and Mark Girolami. Probabilistic Meshless Methods for Partial Differential Equations and Bayesian Inverse Problems. arXiv:1605.07811v1, May 2016.
• Conrad et al.  Patrick R Conrad, Mark Girolami, Simo Särkkä, Andrew Stuart, and Konstantinos Zygalakis.

Statistical analysis of differential equations: introducing probability measures on numerical solutions.

Statistics and Computing, 2016.
• Farrell et al.  Patrick E Farrell, Asgeir Birkisson, and Simon W Funke. Deflation techniques for finding distinct solutions of nonlinear partial differential equations. SIAM Journal on Scientific Computing, 37(4):A2026–A2045, 2015.
• Fasshauer  Gregory E Fasshauer.

Solving differential equations with radial basis functions: multilevel methods and smoothing.

Advances in Computational Mathematics, 11(2-3):139–159, 1999.
• Gelman  A Gelman. Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian analysis, 1(3):515–534, 2006.
• Hennig et al.  Philipp Hennig, Michael A Osborne, and Mark Girolami. Probabilistic numerics and uncertainty in computations. Proc R Soc A, 471(2179):20150142, July 2015.
• Owhadi [2015a] Houman Owhadi. Bayesian numerical homogenization. Multiscale Modeling & Simulation, 13(3):812–828, 2015a.
• Owhadi [2015b] Houman Owhadi. Multigrid with rough coefficients and multiresolution operator decomposition from Hierarchical Information Games. arXiv:1503.03467v4, March 2015b.
• Stuart  Andrew M. Stuart. Inverse problems: a Bayesian perspective. Acta Numer., 19:451–559, 2010. ISSN 0962-4929.
• Woźniakowski  Henryk Woźniakowski. What is information-based complexity? Essays on the complexity of continuous problems, pages 89–95, 2009.