 # Data-Driven Forward Discretizations for Bayesian Inversion

This paper suggests a framework for the learning of discretizations of expensive forward models in Bayesian inverse problems. The main idea is to incorporate the parameters governing the discretization as part of the unknown to be estimated within the Bayesian machinery. We numerically show that in a variety of inverse problems arising in mechanical engineering, signal processing and the geosciences, the observations contain useful information to guide the choice of discretization.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Models used in science and engineering are often described by problem-specific input parameters that are estimated from indirect and noisy observations. The inverse problem of input reconstruction is defined in terms of a forward model from inputs to observable quantities, which in many applications needs to be approximated by discretization. A broad class of examples motivating this paper is the reconstruction of input parameters of complex differential equations. The choice of forward model discretization is particularly important in Bayesian formulations of inverse problems: discretizations need to be cheap since statistical recovery may involve millions of evaluations of the discretized forward model; they also need to be accurate enough to enable input reconstruction. The goal of this paper is to suggest a simple data-driven framework to build forward model discretizations to be used in Bayesian inverse problems. The resulting discretizations are data-driven in that they finely resolve regions of the input space where the data are most informative, while keeping the cost moderate by coarsely resolving regions that are not informed by the data.

To be concrete and explain the idea, let us consider the inverse problem of recovering an unknown from data related by

 y=G(u)+η, (1.1)

where denotes the forward model from inputs to observables, and represents model error and observation noise. We will follow a Bayesian approach, viewing

as a random variable

[kaipio2006statistical, AS10, sanz2018inverse] with prior distribution The Bayesian solution to the inverse problem is the posterior distribution of given the data

which by an informal application of Bayes theorem is characterized by

 (1.2)

with A common computational bottleneck arises when the forward model and hence the likelihood are intractable, meaning that it is impossible or too costly to evaluate. This paper introduces a framework to tackle this computational challenge by employing data-driven discretizations of the forward model. The main idea is to include the parameters that govern the discretization as part of the unknown to be estimated within the Bayesian machinery. More precisely, we consider a family of approximate forward models and put a prior over both unknown inputs and forward discretization parameters to define a joint posterior

 (1.3)

While this structure underlies many hierarchical formulations of Bayesian inverse problems, in this paper the hyper-parameter determines the choice of discretization of the forward model

Including the learning of the numerical discretizations of the forward map as part of the inference agrees with the Bayesian philosophy of treating unknown quantities as random variables, and is also in the spirit of recent probabilistic numerical methods [cockayne2019bayesian]; rather than implicitly assuming that a true hidden numerical discretization of the forward model generates the data, a Bayesian would acknowledge the uncertainty in the choice of a suitable discretization and let the observed data inform such a choice. Moreover, the Bayesian viewpoint has two main practical advantages. First, data-informed grids will typically be coarse in regions of the input space that are not informed by the data, allowing successful input reconstruction at a reduced computational cost. Second, the posterior contains useful uncertainty quantification on the discretizations. This additional uncertainty information may be exploited to build a high-fidelity forward model to be employed within existing inverse problem solvers, either in Bayesian or classical settings.

### 1.1 Related Work

The Bayesian formulation of inverse problems provides a flexible and principled way to combine data with prior knowledge. However, in practice it is rarely possible to perform posterior inference with the model of interest (1.2) due to various computational challenges. In this paper we investigate the construction of computable data-driven forward discretizations of intractable likelihoods arising in the inversion of complex differential equations. Other intertwined obstacles for posterior inference are:

• Sampling cost.

With few exceptions, exact posterior inference can only be performed by employing sampling algorithms. Markov chain Monte Carlo and particle-based methods are popular, but vanilla implementations require repeated evaluation of the forward model

which may be costly.

• Large input dimension. The unknown parameter may be high, or even infinite dimensional. While the convergence rate of certain sampling schemes may be independent of the input dimension [cotter2013mcmc, agapiou2015importance, trillos2017consistency], the computational and memory cost per sample may increase prohibitively with dimension.

• Model error. The forward model is only an approximation of the real relationship between input and observable output variables. Model discrepancy can damage input recovery.

• Complex geometry. The unknown may be a function defined on a complex, perhaps unknown domain that needs to be approximated.

All these challenges have long been identified [sacks1989design, kennedy2001bayesian, kaipio2006statistical, kaipio2007statistical], giving rise to a host of methods for sampling, parameter reduction, model reduction, enhanced model error techniques and geometric methods for inverse problems. We focus on the model-reduction problem of building forward discretizations, but the methodology proposed in this paper can be naturally combined with existing techniques that address complementary challenges. For instance, our forward model discretizations may be used within multilevel MCMC methods [giles2008multilevel] or within two-stage sampling methods [green2001delayed, tierney1999some, christen2005markov, cui2015data, efendiev2006preconditioning], and thus help to reduce the sampling cost. Also, forward model discretizations may be combined with parameter reduction and model adaptation techniques, as in [lieberman2010parameter, li2018model]. It is important, however, to distinguish between the parameter and model reduction problems. While the former aims to find suitable small-dimensional representations of the input

, the latter is concerned with effectively reducing the number of degrees of freedom used to compute the forward model

In regards to model error, our framework may be thought of as incorporating Bayesian model choice to the Bayesian solution of inverse problems by viewing each forward model discretization as a potential model. Following this interpretation, the a posteriori

choice of forward discretization may in principle be determined using Bayes factors. Lastly, learning appropriate discretizations of forward models is particularly important for inverse problems set in complex, possibly uncertain geometries

[trillos2017consistency, garcia2018continuum, harlim2019kernel].

Many approaches to computing forward map surrogates and reduced-order models have been proposed; we refer to [frangos2010surrogate] for an extended survey, and to [peherstorfer2018survey] for a broader discussion of multi-fidelity models in other outer-loop applications. Most methods fall naturally into one of three categories:

1. Projection-based methods: the forward model equations are described in a reduced basis that is constructed using few high-fidelity forward solves (called snapshots). Two popular ways to construct the reduced basis are proper orthogonal decomposition (POD) and reduced order basis. In the inverse problem context, data-informed construction of snapshots [cui2015data] allows to approximate the posterior support with fewer high-fidelity forward runs. To our knowledge, there is little theory to guide the required number or location of snapshots to meet a given error tolerance.

2. Spectral methods: polynomial chaos [xiu2002wiener] is a popular method for forward propagation of uncertainty, that has more recently been used to produce surrogates for intractable likelihoods [marzouk2007stochastic]. The paper [marzouk2009stochastic] translates error in the likelihood approximation to Kullback-Leibler posterior error. A drawback of these methods is that they are only practical when the random inputs can be represented by a small number of random variables. Recent interest lies in adapting the spectral approximations to observed data [li2014adaptive].

3. Gaussian processes and neural networks: some of the earliest efforts to allow for Bayesian inference with complex models suggested to use Gaussian processes

[rasmussen2006gaussian] to construct surrogate likelihood models [sacks1989design, kennedy2001bayesian]. The accuracy of the resulting approximations has been studied in [stuart2017posterior], which again requires a suitable representation of the input space. Finally, representation of the likelihood using neural networks in combination with generalized polynomial chaos expansions has been investigated in [schwab2019deep].

This paper focuses on grid-based discretizations of static inverse problems arising in mechanical engineering, signal processing and the geophysical sciences. However, the proposed framework may be used in conjunction with other reduced-order models, in dynamic data assimilation problems, and in other applications. Finally, we mention that for classical formulations of certain specific inverse problems, optimal forward discretization choices have been proposed [borcea2005continuum, becker2005mesh].

### 1.2 Outline and Contributions

Section 2 reviews the Bayesian formulation of inverse problems. Section 3

describes the main framework for the Bayesian learning of forward map discretizations. We will consider two ways to parametrize discretizations: in the first, the grid points locations are learned directly, and in the second we learn a probability density from which to obtain the grid. In Section

4 we discuss a general approach to sampling the joint posterior over unknown input and discretization parameters, which consists of a Metropolis-within-Gibbs that alternates between a reversible jump Markov chain Monte Carlo (MCMC) algorithm to update the discretization parameters and a standard MCMC to update the unknown input. Section 5 demonstrates the applicability, benefits, and limitations of our approach in a variety of inverse problems arising in mechanical engineering, signal processing and source detection, considering Euler discretization of ODEs, Euler-Maruyama discretization of SDEs, and finite element methods for PDEs. We conclude in Section 6 with some open questions for further research.

## 2 Background: Bayesian Formulation of Inverse Problems

Consider the inverse problem of recovering an unknown from data related by

 y=G(u)+η, (2.1)

where is a space of admissible unknowns and is a random variable whose distribution is known to us, but not its realization. In many applications, the forward model can be written as the composition of forward and observation maps, . The forward map represents a complex mathematical model that assigns outputs to inputs . For instance, may be the parameters of a differential equation, and may be its analytical solution. The observation map establishes a link between outputs and observable quantities, e.g. by point-wise evaluation of the solution.

In the Bayesian formulation of the inverse problem (2.1), one specifies a prior distribution on and seeks to characterize the posterior distribution of given If the input space is finite dimensional, , the prior distribution can be defined through its Lebesgue density The noise distribution of in gives the likelihood . In this work we assume, for concreteness, that is a zero-mean Gaussian with covariance , so that

 (2.2)

where . Using Bayes’ formula, the posterior density is given by

 pu|y(u)=1Zpy|u(y|u)pu(u),Z=∫Upy|u(y|u)pu(u)du (2.3)

with multiplicative constant depending on .

For many inverse problems of interest, the unknown is a function and the input space is an infinite-dimensional Banach space. In such a case, the prior cannot be specified in terms of its Lebesgue density, but rather as a measure supported on Provided that is measurable and that the posterior measure is still defined, in analogy to (2.3), as a change of measure with respect to the prior

 (2.4)

We refer to [AS10] and [trillos2016bayesian] for further details. The posterior contains, in a precise sense [duke], all the information on available in the data and the prior . This paper is concerned with inverse problems where arises from a complex model that cannot be evaluated pointwise; we then seek to approximate the idealized posterior finding a compromise between accuracy and computational cost.

A simple but important observation is that approximating accurately is not necessary in order to approximate accurately. It is only necessary to approximate since appears in the posterior only through While producing discretizations to complex models has been widely studied in numerical analysis, here we investigate how to approximate with the specific goal of approximating the posterior incorporating prior and data knowledge into the discretizations. For some inverse problems the observation operator also needs to be discretized, leading to similar considerations.

## 3 Bayesian Discretization of the Forward Model

Suppose that is the solution map to a differential equation that cannot be solved in closed form, and is point-wise evaluation of the solution. Standard practice in computing the Bayesian solution to the inverse problem involves using an a priori fixed discretization, e.g., by discretizing the domain of the differential equation into a fine grid. Provided that the grid is fine enough, the posterior defined with the discretized forward map can approximate well the one in (2.4). However, the discretizations are usually performed on a fine uniform grid which may lead to unnecessary waste of computational resources. Indeed, it is expected that the choice of discretization should be problem dependent, and should be informed both by the observation locations (which are often not uniform in space) and by the value of the unknown input parameter that we seek to reconstruct. Thus we seek to learn jointly the unknown input and the discretization of the forward map.

We will consider a parametric family of discretizations. Precisely, we let

 A:={a=(k,θ):k∈K⊂{1,2,…},θ∈Dk⊂Rdk}, (3.1)

and each pair will parameterize a discretized forward model For given , represents the degrees of freedom in the discretization, and is the -dimensional model parameter of the discretization. In analogy with Bayesian model selection frameworks, may be interpreted as indexing the discretization model. We focus on the model reduction rather than the parameter reduction problem, and assume that all approximation maps share the same input and output spaces and We will illustrate the flexibility of this framework using grid-based approximations and density-based discretizations.

###### Example 3.1 (Grid-based discretizations).

Here the first component of each element represents the number of points in a grid, and the second component represents the locations of the grid points. The set contains all allowed grid sizes, is the temporal or spatial domain of the equation being discretized, and .

###### Example 3.2 (Density-based discretizations).

Here the first component of each element represents again the number of points in a grid, and the second component parametrizes a probability density by a parameter of fixed dimension, independent of . Given we may obtain for instance a finite element grid by employing MacQueen’s method [du2002] to generate a centroidal Voronoi tessellation, and use its corresponding Delaunay triangulation. Intuitively can be regarded as the spatial density of a non-uniform grid. The space represents, as before, all the allowed number of grid points.

###### Example 3.3 (Other discretizations).

As mentioned in the introduction, other discretizations and model reduction techniques could be considered within the above framework, including projection-based approximations, Gaussian processes, and graph-based methods. However, in our numerical experiments we will focus on grid-based and density-based discretizations.

We consider a product prior on given by

 qu,a(u,a)=qu(u)qa(a), (3.2)

where is as in the original, idealized inverse problem (2.1). In general, conditioning on may or may not provide useful information about how to approximate When it does, this can be infused into the prior by letting the conditional distribution of given depend on For simplicity we restrict ourselves to the product structure (3.2).

The examples above and the structure of the space defined in equation (3.1) suggest to define hierarchically a prior over

 qa(a)=qk,θ(k,θ)=qk(k)qθ|k(θ|k), (3.3)

where is a probability mass function that penalizes expensive discretizations that employ large number of degrees of freedom, and denotes the conditional distribution of given in

We define the likelihood of observing data given by

 (3.4)

where . The discretized forward maps will be chosen so that evaluating is possible.

We first consider the case where is a singleton, and has a Euclidean space structure. Then, by Bayes’ formula,

 qu,a|y(u,a)=1~Zqy|u,a(y|u,a)qu(u)qa(a),~Z=∫U×Aqy|u,a(y|u,a)qu(u)qa(a)duda, (3.5)

where is a normalizing constant. The first marginal of , which we denote as , constitutes a data-informed approximation of the posterior from the full, idealized inverse problem (2.3). We have the following result:

###### Proposition 3.4.

Let be defined as in (2.2) and be defined as in (3.4). If is bounded and, for -almost any , then

 dTV(qu|y(u),pu|y(u))

for some constant independent of .

###### Proof.

Integrating both sides of the first equation in (3.5) with respect to :

 qu|y(u)=1~Z(∫Aqy|u,a(y|u,a)qa(a)da)qu(u)=:1~Z~gy(u)qu(u).

Compare this to equation (2.3) and write , we have

where the last inequality follows from the Lipschitz continuity of the exponential function, boundedness of , and equivalence of norm in . This implies that and hence . Then the statement follows from a slight modification of Theorem 1.14 in [sanz2018inverse] and the definition of TV distance. ∎

Now we are ready to extend the above results to infinite-dimensional input space . We define a prior measure on given by , where is as in the idealized inverse problem. The posterior measure on conditioning on will still be denoted by .

###### Proposition 3.5.

Suppose that is a separable Banach space with , is continuous. Then the posterior measure of given is absolutely continuous with respect to the prior on and has Radon-Nikodym derivative:

 dνu,a|ydνu,a(u,a)∝exp(−Ψ(u,a;y)). (3.7)
###### Proof.

By the disintegration theorem (which holds for arbitrary Radon measures on separable metric spaces –see [dellacherie2011probabilities] chapter 3, page 70) for all measurable subsets , and , we can write in two different ways:

 ∫Y′νu,a|y(U′×A′|y)dνy(y)=νu,a,y(U′,A′,Y′)=∫U′×A′νy|u,a(Y′|u,a)dνu,a(u,a).

In particular,

given our assumptions on the noise model. We can then use Tonelli’s theorem to swap the order of the integrals and obtain

Given that is arbitrary, we conclude that is absolutely continuous with respect to the Lebesgue measure with density:

On the other hand,

 ∫U′×A′νy|u,a(Y′|u,a)dνu,a(u,a)

applying Tonelli’s theorem once again to obtain the last equality. Since was arbitrary, it follows that for -a.e. we have

In turn, from the arbitrariness of it follows that, for -a.e. the measure is absolutely continuous with respect to and its Radon-Nykodym derivative is proportional (naturally, the constant of proportionality depends on ) to

 exp(−12∥y−Ga(u)∥2Γ)

as claimed.

As in the finite dimensional case, we have the following result:

###### Proposition 3.6 (Well-posedness of Posterior).

Under the same assumption as in Proposition 3.5, suppose further that for -almost any , , and is bounded. Then we have

 dTV(νu|y,μu|y)

for some constant independent of

###### Remark 3.7.

In the context of grid-based forward approximations, the condition ‘ -almost surely’ can be interpreted as ‘almost any draw from the approximation parameter space can produce an approximation of the forward model with error at most ’. This is often the case, for example, when the grids are finer than some threshold under regularity conditions on the input space.

## 4 Sampling the Posterior

The structure of the joint posterior over unknowns and approximations suggests using a Metropolis-within-Gibbs sampler, which constructs a Markov chain by alternatingly sampling each coordinate:

In the above, and are Metropolis-Hastings Markov kernels that are reversible with respect to and We remark that the kernel involves evaluation of the forward model approximation but not of the intractable full model While the choice and design of the kernels and will clearly be problem-specific, and here we consider a standard method appropriate for the case where the input space is a space of functions to define

Before describing how to sample the full conditionals and of and it is useful to note that they satisfy the following expressions:

 (4.1)

### 4.1 Sampling the Full Conditional u|y,a

For given and , we can sample from using pCN [beskos2008mcmc], with proposal

 ~u:=√1−β2u+βξ,ξ∼μu,

and acceptance probability

 α(u,~u):=min{1,exp(−Ψ(~u,a;y)+Ψ(u,a;y))}.

Other discretization-invariant MCMC samplers [cui2016dimension, rudolf2015generalization] could also be used to update , but pCN is a straightforward and effective choice in the examples considered here.

### 4.2 Sampling the Full Conditional a|y,u

#### 4.2.1 Sampling Grid-based Discretizations

We will use a Markov kernel written as a mixture of two kernels, i.e.

 Lu,y(a|⋅)=ζLu,y1(a|⋅)+(1−ζ)Lu,y2(a|⋅),

each of which is induced by a different Metropolis-Hastings algorithm. The proposal mechanism for each of the kernels corresponds to a different type of movement, described next:

1. For we use Metropolis-Hastings to sample from the distribution using the following proposal: given with we set (i.e. the number of grid points stays the same) and let be defined by

 ~θi=θi,i=1,…,k−1,

and sample from a distribution on with density (w.r.t. Lebesgue measure on ) . In principle the density used to sample may depend on .

2. For we use Metropolis-Hastings to sample from the distribution using the following proposal: given we sample where is a Markov kernel on , and then generate according to

• If let for all and then sample independently from the density .

• If let for all .

###### Remark 4.1.

We notice that the proposals described above are particular cases of the ones used in reversible jump Markov chain Monte Carlo [green1995reversible].

For the Metropolis-Hastings algorithm associated to the acceptance probability takes the form

 α1(a,~a)=min{1,exp(−Ψ(u,~a;y)+Ψ(u,a;y))ρ~θ(θk)ρθ(~θk)},

where recall and and is defined similarly.

For the Metropolis-Hastings algorithm associated to the acceptance probability takes the form

 α2(a,~a)=min{1,σ(k|~k)νk(~k)σ(~k|k)νk(k)exp(−Ψ(u,~a;y)+Ψ(u,a;y))H(k,θ,~k,~θ)},

where

We notice that since each of the kernels and is defined by a Metropolis-Hastings algorithm, they leave the target invariant, and hence so does the kernel .

###### Remark 4.2.

If in the above the distribution is, regardless of

, the uniform distribution on

, then the acceptance probabilities reduce, respectively, to

 α1(a,~a)=min{1,exp(−Ψ(u,~a;y)+Ψ(u,a;y))},

and

#### 4.2.2 Sampling Density-based Discretizations

Since in this case the dimension of is fixed, the calculation of the the acceptance probabilities is straightforward and the details are omitted. We refer to Subsection 5.3 for a numerical example.

## 5 Numerical Examples

In this section we demonstrate the applicability of our framework and sampling approach in a variety of inverse problems. Our aim is illustrating the benefits and potential limitations of the methods; for this reason we consider inverse problems for which we have intuitive understanding of where the discretizations should concentrate, thus validating the performance of the proposed approach. Before discussing the numerical results, we summarize the main goals and outcomes of each set of experiments:

• In Subsection 5.1 we consider an inverse problem in mechanics [bigoni2019greedy], for which some observation settings highly influence the best choice of discretization while others inform it mildly. Our numerical results show that the gain afforded by grid learning is most clear whenever the observation locations highly influence the choice of discretization. We employ grid-based discretizations as described in Example 3.1 with an Euler discretization of the forward map. We also illustrate the applicability of the method in both finite and infinite-dimensional representations of the unknown parameter, showing a more dramatic effect in the latter.

• In Subsection 5.2 we consider an inverse problem in signal processing [hairer2011signal], with a choice of observation locations that determine where the discretization should concentrate. Our numerical results show that the grids adapt to the expected region, and that the degrees of freedom in the discretization necessary to reconstruct the unknown is below that necessary to satisfy stability of the numerical method with uniform grids. We employ grid-based discretizations as described in Example 3.1 with an Euler-Maruyama discretization of the forward map.

• In Subsection 5.3 we consider an inverse problem in source detection, where the true hidden unknown determines how best to discretize the forward model. Our numerical results show that the grids adapt as expected. We employ density-based discretizations as described in Example 3.2 with a finite element discretization of the forward model.

### 5.1 Euler Discretization of ODEs: Estimation of the Young’s Modulus of a Cantilever Beam

We consider an inhomogeneous cantilever beam clamped on one side () and free on the other (). Let denote its Young’s modulus and let be a load applied onto the beam. Timoshenko’s beam theory gives the displacement of the beam and the angle of rotation

through the coupled ordinary differential equations

 ⎧⎪⎨⎪⎩ddx[u(x)2(1+r)(φ(x)−ddxz(x))]=M(x)κA,ddx(u(x)Iddxφ(x))=κAu(x)2(1+r)(φ(x)−ddxz(x)), (5.1)

where , , , are physical constants. Following [bigoni2019greedy], we consider the inverse problem of estimating the Young’s modulus from sparse observations of the displacement .

Let be the solution map to equations (5.1). Let be the locations of the observation sensors, leading to the observation operator defined coordinate-wise by

 Oi(z):=∫L0zφidx,φi(x):=1γiexp(−(si−x)2/(2δ2)),1≤i≤m,

where and is the normalizing constant such that . Data are generated according to the model

 y=O∘F(u)+η=:G(u)+η.

Notice that for system (5.1) with proper boundary conditions specified at , the displacement at any point depends only on the values Thus, we expect suitable discretizations of the forward model to refine finely only the region where is the right-most observation location. We will discuss this in detail in section 5.1.2.

#### 5.1.1 Forward Discretization

To solve system (5.1) we employ a finite difference method. A family of numerical solutions can be parameterized by the set

 A:={a=(k,θ):k∈K⊂{1,2,…},θ=[x1,…,xk]∈[0,L]k},

where is the number of grid points and are the grid locations. Precisely, for , we first reorder so that

 0=:x0≤x1≤⋯≤xk≤xk+1:=L

and we let

 Fa:u↦za

be the linearly interpolated Euler finite difference solution to (

5.1), discretized using the ordered grid . We also discretize the observation operator using an Euler forward method, defined by

 Oai(za)=k∑j=0za(xj)φi(xj)(xj+1−xj).

Finally is approximated by

#### 5.1.2 Implementation Details and Numerical Results

For our numerical experiments we consider a beam of length , width and thickness . We use a Poisson ratio and Timoshenko shear coefficient . represents the cross-sectional area of the beam and

is the second moment of inertia. We run a virtual experiment of applying a point mass of

at the end of the beam, as seen in blue in Figures 0(a) and 1(a). We assume that the observations are gathered with error

We first assume that the beam is made of 5 segments of different kinds of steel, each of length , with corresponding Young’s moduli . The prior on is given by where

denotes the all-ones vector. We also assume that the number of grid points

is fixed to be , i.e., the prior is a point mass. The grid locations are assumed to be a priori uniformly distributed in . Results are reported in Figure 1. We next assume that the Young’s modulus varies continuously with . We set a Gaussian process prior on defined by with . For this case we assume that the prior on the number of grid points

follows a Poisson distribution with mean 60, i.e.,

Poisson, and the grid locations still have a uniform prior given .The true Young’s modulus underlying the data and the reconstruction results are reported in Figure 2. Sampling is performed, both in the discrete and continuous settings, updating and alternately for a total number of iterations. Figure 3: The reconstruction error with fixed-grid discretization (blue) and with data-driven grid discretization (orange). Red triangles are observations locations, while blue crosses are the grid points used in the fixed-grid discretization.

In both Figures 1 and 2 two settings are considered. In the first one observations are concentrated on the right side of the beam and in the second on the left. For reference, Figure 4 shows idealized posteriors considered, obtained with very fine discretizations, for each of the settings. Notice that for system (5.1) with proper boundary conditions specified at , the displacement at any point depends only on the values of Young’s moduli with This implies that when observations are gathered on the left side of the beam, the posterior on agrees with the prior on the right-side, and no resources should be on discretizing the forward map on that region. In that case our adaptive data-driven discretizations are strongly concentrated on the left, as shown in Figures 0(g), 0(h) and 1(g), 1(h). However, when observations are gathered on the right side of the beam, the data is informative on for all In such case, Figures 0(b), 0(c) and 1(b), 1(c) show that the data-driven discretizations are concentrated on the right, but less heavily so. See Tables 1, 2 in the appendix for a more detailed description of the grid points distribution in both cases. Also, our results indicate that using data-driven discretizations will lead to a better estimation of the true Young’s modulus, compared to fixed-grid discretizations.

Let be the output of the Gibbs sampling algorithm at iteration . The reconstruction error is defined as follows:

 er= ⎷N∑n=1∣∣Ga(n)(u(n))−G(u)∣∣2, (5.2)

where is approximately calculated on a very fine grid. In Figure 3 we plot the reconstruction error for the second experiment where the Young’s moduli is continuous and observations are gathered on the right-side. With fixed-grid discretization, the reconstruction error is small where the discretization matches the observation points. With adaptive data-driven discretizations the grid points will adaptively match the observation points in order to produce less error.

### 5.2 Euler-Maruyama Discretization of SDEs: a Signal Processing Application

Let be globally Lipschitz continuous and consider the SDE

 dz(t)=f(z)dt+du,0

where denotes -dimensional Brownian motion. We aim to recover from observations of the solution . We suppose that the observations are given by

 yi=z(ti)+ηi,i=1,…,m, (5.4)

where

is assumed to follow a centered Gaussian distribution with covariance

and

 0

are given observation times. Following [hairer2011signal], we cast the problem in the setting of Section 2. First note that the solution to the integral equation

 z(t)=∫t0f(z(s))ds+u(t),0≤t≤T, (5.5)

defines a map

 F:C([0,T],Rd) →C([0,T],Rd) (5.6) u ↦z. (5.7)

Thus we set the input and output space to be

Next we define an observation operator

 O:C([0,T],Rd) →Rm z ↦[z(s1),…,z(sm)]′

and set We put as prior on the standard -dimensional Wiener measure, that we denote Then the posterior distribution is given by Equation (2.4), which if may be rewritten as

 (5.8)

Note that the likelihood does not involve evaluation of at times and hence changing the definition of for does not change the posterior measure. Thus, we expect suitable discretizations of the forward model to refine finely only times up to the right-most observation.

#### 5.2.1 Forward Discretization

For most nonlinear drifts , the integral equation (5.2.1) cannot be solved in closed form and a numerical method needs to be employed. A family of numerical solutions may be parameterized by the set

 A:={a=(k,θ):k∈K⊂{1,2,…},θ=[t1,…,tk]′∈[0,T]k}.

Precisely, for we define an approximate, Euler-Maruyama solution map

 Fa:C([0,T],Rd) →C([0,T],Rd) u ↦za

as follows. First, we reorder the ’s so that

 0=:t0≤t1≤…≤tk≤tk+1:=T.

Then we define as , and

 zaj+1=zaj+(tj+1−tj)f(zaj)+u(tj+1)−u(tj),1≤j≤k. (5.9)

Finally, for we define by linear interpolation of and

Having defined the parameter space we now describe a choice of prior distribution on and the resulting combined prior on First we choose a prior on the number of grid-points. Given the number of grid points , assuming no knowledge on appropriate discretizations for the SDE (5.3) we put a uniform prior on grid locations

###### Remark 5.1.

More information could be put into the prior. In particular it seems natural to impose that grids are finer at the beginning of the time interval.

#### 5.2.2 Implementation Details and Numerical Results

For our numerical experiments we considered the SDE (5.3) with and double-well drift

 f(t)=10t(1−t2)/(1+t2). (5.10)

We generated synthetic observation data by solving (5.3) on a very fine grid, and then perturbing the solution at uniformly distributed times so that the last observation corresponds to time The observation noise was taken to uncorrelated, with The motivation for choosing this example is that there is certain intuition as to where one would desire the discretization grid-points to concentrate. Indeed, since all the observations are in the interval it is clear from Equation (5.8) that any discretization points will not contribute to better approximate In other words, those grid points would help in approximating but not in approximating .

We report our results for a small grid-size Similar but less dramatic effect was seen for larger grid size. Precisely, we chose our set of admissible grids to be given by

 [t0,…,t25]′:0.01=t0≤t1≤…≤t25=10.

For implementation purposes, elements in the space were represented as vectors in containing their values on a uniform grid of step-size . We run these algorithms with parameter choices

The experiments show a successful reconstruction of the SDE path. Moreover, the grids concentrate in in agreement with our intuition and the uncertainty quantification is satisfactory. In contrast, we see that when using the same number of grid points but on a uniform grid the Euler-Maruyama scheme is unstable, leading to a collapse of the MCMC algorithm. Then, the posterior constructed with a uniform grid completely fails at reconstructing the SDE path, and the uncertainty quantification is overoptimistic due to poor mixing of the chain.

### 5.3 Finite Element Discretization: Source Detection

Consider the boundary value problem

 {−Δz(x)=δ(x−u),x∈D,z(x)=0,x∈∂D, (5.11)

where is the unit square and is the Dirac function at the origin. We aim to recover the source location from sparse observations

 yi=z(si)+ηi,i=1,…,m, (5.12)

where follows a centered Gaussian distribution with covariance and are observation locations. To cast the problem in the setting of Section 2, we let be given by Green’s function for the Laplacian on the unit square (which does not admit an analytical formula but can be computed e.g. via series expansions [melnikov2006computability]) and be defined by point-wise evaluation at the observation locations. The prior on is the uniform distribution in the unit square , which we denote . Since is finite dimensional, the posterior has Lebesgue density given by Equation (2.3). We find this problem to be a good test, as there is a clear understanding that the data-driven mesh should concentrate around the source.

#### 5.3.1 Forward Discretization

To solve equation (5.11) numerically we employ the finite element method. The use of uniform grid is here wasteful, as the mesh should ideally concentrate around the unknown source

We will use grids obtained as the Delauny triangulation of central Voronoi tessellations and generators , which can be calculated as the solution of the optimization problem, parameterized by a probability density on :

 min{xi}∈D,{Vi}k∑i=1∫Viρ(x)∥x−xi∥2dx. (5.13)

For a fixed density and integer we denote the optimal grid points by . Then the approximated solution map

 Fa:D →H1(D) (5.14) u ↦za (5.15)

is exactly given by the finite element solution of equation (5.11) with respect to (the Delauny triangulation of) the grid points . Details on the creation of grid for prescribed parameters and will be discussed below.

In the spirit of adapting the grid to favor the ones maximizing the model evidence, we constrain to belong to a family of parametric densities where

is the product measure of two Beta distributions. Therefore in this case

and .

#### 5.3.2 Implementation Details and Numerical Results

We solved equation (5.11) on a very fine grid with the true point source . The observation locations were . Observation noise was uncorrelated with , .

In this example the prior of is the uniform distribution on and is initialized at . The parameters are also set to have a uniform prior . We initialize , which corresponds to (near) uniform grid in .

We compare our algorithm to the traditional method where we fix a uniform grid in and run the MCMC algorithm only on . We found out that with the same number of grid points, our data-driven approach gives a posterior distribution that is more concentrated around the true location of the point source, as shown in Figure 5(a) and 5(b). Also, Figure 5(c) shows that the adaptive discretization is concentrated at the top right corner of the region, where the hidden point source is located.

Next we show that data-driven discretizations of the forward map can be employed to provide improved uncertainty quantification of the PDE solution, and not only to better reconstruct the unknown input. To illustrate this, we approximate the pushforward distribution in three different ways, as shown in Figure 7. Let denote the output of the Gibbs sampling algorithm at iteration . We first consider the traditional method where the grid is fixed and uniform, that is, is fixed. Then the pushforward distribution can be well-approximated by , for large enough. We then consider the same setting except that is replaced by a forward map computed in an extremely fine and adapted grid and the pushforward is approximated by . Finally we consider a data-driven setting stemming from our algorithm, where the pushforward distribution is approximated by

. We see that our algorithm reconstructs well the solution to the PDE, with a more accurate mean and a smaller variance. (a) Fixed grid. Figure 7: The mean, 10 and 90-percentile of the pushforward distribution F♯(qu|y) under three different settings: (1) Both the posterior qu|y and its pushforward F♯(qu|y) are computed on a fixed and uniform grid; (2) The posterior qu|y is computed on a fixed and uniform grid, and its pushforward F♯(qu|y) is calculated using a (nearly) exact solver; (3) Both the posterior and its pushforward are computed on a data-driven grid.

## 6 Conclusions and Open Directions

• We have shown that, in a variety of inverse problems, the observations contain useful information to guide the discretization of the forward model, allowing a better reconstruction of the unknown than using uniform grids with the same number of degrees of freedom. Despite these results being promising, it is important to note that updating the discretization parameters may be costly in itself, and may result in slower mixing of the MCMC methods. For this reason, we envision that the proposed approach may have more potential when the computational solution of the inverse problem is very sensitive to the discretization of the forward map and discretizing said map is expensive. We also believe that density-based discretizations may help in alleviating the cost of discretization learning.

• An interesting avenue of research stemming from this work is the development of prior discretization models that are informed by numerical analysis of the forward map , while recognizing the uncertainty in the best discretization of the forward model Moreover, more sophisticated prior models beyond the product structure considered here should be investigated.

• Topics for further research include the development of new local proposals and sampling algorithms for grid-based discretizations, and the numerical implementation of the approach in computationally demanding inverse problems beyond the proof-of-concept ones considered here.

## Acknowledgement

The work of NGT and DSA was supported by the NSF Grant DMS-1912818/1912802. The work of DB and YMM was supported by NSF Grant DMS-1723011.