1 Introduction
Models used in science and engineering are often described by problemspecific input parameters that are estimated from indirect and noisy observations. The inverse problem of input reconstruction is defined in terms of a forward model from inputs to observable quantities, which in many applications needs to be approximated by discretization. A broad class of examples motivating this paper is the reconstruction of input parameters of complex differential equations. The choice of forward model discretization is particularly important in Bayesian formulations of inverse problems: discretizations need to be cheap since statistical recovery may involve millions of evaluations of the discretized forward model; they also need to be accurate enough to enable input reconstruction. The goal of this paper is to suggest a simple datadriven framework to build forward model discretizations to be used in Bayesian inverse problems. The resulting discretizations are datadriven in that they finely resolve regions of the input space where the data are most informative, while keeping the cost moderate by coarsely resolving regions that are not informed by the data.
To be concrete and explain the idea, let us consider the inverse problem of recovering an unknown from data related by
(1.1) 
where denotes the forward model from inputs to observables, and represents model error and observation noise. We will follow a Bayesian approach, viewing
as a random variable
[kaipio2006statistical, AS10, sanz2018inverse] with prior distribution The Bayesian solution to the inverse problem is the posterior distribution of given the datawhich by an informal application of Bayes theorem is characterized by
(1.2) 
with A common computational bottleneck arises when the forward model and hence the likelihood are intractable, meaning that it is impossible or too costly to evaluate. This paper introduces a framework to tackle this computational challenge by employing datadriven discretizations of the forward model. The main idea is to include the parameters that govern the discretization as part of the unknown to be estimated within the Bayesian machinery. More precisely, we consider a family of approximate forward models and put a prior over both unknown inputs and forward discretization parameters to define a joint posterior
(1.3) 
While this structure underlies many hierarchical formulations of Bayesian inverse problems, in this paper the hyperparameter determines the choice of discretization of the forward model
Including the learning of the numerical discretizations of the forward map as part of the inference agrees with the Bayesian philosophy of treating unknown quantities as random variables, and is also in the spirit of recent probabilistic numerical methods [cockayne2019bayesian]; rather than implicitly assuming that a true hidden numerical discretization of the forward model generates the data, a Bayesian would acknowledge the uncertainty in the choice of a suitable discretization and let the observed data inform such a choice. Moreover, the Bayesian viewpoint has two main practical advantages. First, datainformed grids will typically be coarse in regions of the input space that are not informed by the data, allowing successful input reconstruction at a reduced computational cost. Second, the posterior contains useful uncertainty quantification on the discretizations. This additional uncertainty information may be exploited to build a highfidelity forward model to be employed within existing inverse problem solvers, either in Bayesian or classical settings.
1.1 Related Work
The Bayesian formulation of inverse problems provides a flexible and principled way to combine data with prior knowledge. However, in practice it is rarely possible to perform posterior inference with the model of interest (1.2) due to various computational challenges. In this paper we investigate the construction of computable datadriven forward discretizations of intractable likelihoods arising in the inversion of complex differential equations. Other intertwined obstacles for posterior inference are:

Sampling cost.
With few exceptions, exact posterior inference can only be performed by employing sampling algorithms. Markov chain Monte Carlo and particlebased methods are popular, but vanilla implementations require repeated evaluation of the forward model
which may be costly. 
Large input dimension. The unknown parameter may be high, or even infinite dimensional. While the convergence rate of certain sampling schemes may be independent of the input dimension [cotter2013mcmc, agapiou2015importance, trillos2017consistency], the computational and memory cost per sample may increase prohibitively with dimension.

Model error. The forward model is only an approximation of the real relationship between input and observable output variables. Model discrepancy can damage input recovery.

Complex geometry. The unknown may be a function defined on a complex, perhaps unknown domain that needs to be approximated.
All these challenges have long been identified [sacks1989design, kennedy2001bayesian, kaipio2006statistical, kaipio2007statistical], giving rise to a host of methods for sampling, parameter reduction, model reduction, enhanced model error techniques and geometric methods for inverse problems. We focus on the modelreduction problem of building forward discretizations, but the methodology proposed in this paper can be naturally combined with existing techniques that address complementary challenges. For instance, our forward model discretizations may be used within multilevel MCMC methods [giles2008multilevel] or within twostage sampling methods [green2001delayed, tierney1999some, christen2005markov, cui2015data, efendiev2006preconditioning], and thus help to reduce the sampling cost. Also, forward model discretizations may be combined with parameter reduction and model adaptation techniques, as in [lieberman2010parameter, li2018model]. It is important, however, to distinguish between the parameter and model reduction problems. While the former aims to find suitable smalldimensional representations of the input
, the latter is concerned with effectively reducing the number of degrees of freedom used to compute the forward model
In regards to model error, our framework may be thought of as incorporating Bayesian model choice to the Bayesian solution of inverse problems by viewing each forward model discretization as a potential model. Following this interpretation, the a posteriorichoice of forward discretization may in principle be determined using Bayes factors. Lastly, learning appropriate discretizations of forward models is particularly important for inverse problems set in complex, possibly uncertain geometries
[trillos2017consistency, garcia2018continuum, harlim2019kernel].Many approaches to computing forward map surrogates and reducedorder models have been proposed; we refer to [frangos2010surrogate] for an extended survey, and to [peherstorfer2018survey] for a broader discussion of multifidelity models in other outerloop applications. Most methods fall naturally into one of three categories:

Projectionbased methods: the forward model equations are described in a reduced basis that is constructed using few highfidelity forward solves (called snapshots). Two popular ways to construct the reduced basis are proper orthogonal decomposition (POD) and reduced order basis. In the inverse problem context, datainformed construction of snapshots [cui2015data] allows to approximate the posterior support with fewer highfidelity forward runs. To our knowledge, there is little theory to guide the required number or location of snapshots to meet a given error tolerance.

Spectral methods: polynomial chaos [xiu2002wiener] is a popular method for forward propagation of uncertainty, that has more recently been used to produce surrogates for intractable likelihoods [marzouk2007stochastic]. The paper [marzouk2009stochastic] translates error in the likelihood approximation to KullbackLeibler posterior error. A drawback of these methods is that they are only practical when the random inputs can be represented by a small number of random variables. Recent interest lies in adapting the spectral approximations to observed data [li2014adaptive].

Gaussian processes and neural networks: some of the earliest efforts to allow for Bayesian inference with complex models suggested to use Gaussian processes
[rasmussen2006gaussian] to construct surrogate likelihood models [sacks1989design, kennedy2001bayesian]. The accuracy of the resulting approximations has been studied in [stuart2017posterior], which again requires a suitable representation of the input space. Finally, representation of the likelihood using neural networks in combination with generalized polynomial chaos expansions has been investigated in [schwab2019deep].
This paper focuses on gridbased discretizations of static inverse problems arising in mechanical engineering, signal processing and the geophysical sciences. However, the proposed framework may be used in conjunction with other reducedorder models, in dynamic data assimilation problems, and in other applications. Finally, we mention that for classical formulations of certain specific inverse problems, optimal forward discretization choices have been proposed [borcea2005continuum, becker2005mesh].
1.2 Outline and Contributions
Section 2 reviews the Bayesian formulation of inverse problems. Section 3
describes the main framework for the Bayesian learning of forward map discretizations. We will consider two ways to parametrize discretizations: in the first, the grid points locations are learned directly, and in the second we learn a probability density from which to obtain the grid. In Section
4 we discuss a general approach to sampling the joint posterior over unknown input and discretization parameters, which consists of a MetropoliswithinGibbs that alternates between a reversible jump Markov chain Monte Carlo (MCMC) algorithm to update the discretization parameters and a standard MCMC to update the unknown input. Section 5 demonstrates the applicability, benefits, and limitations of our approach in a variety of inverse problems arising in mechanical engineering, signal processing and source detection, considering Euler discretization of ODEs, EulerMaruyama discretization of SDEs, and finite element methods for PDEs. We conclude in Section 6 with some open questions for further research.2 Background: Bayesian Formulation of Inverse Problems
Consider the inverse problem of recovering an unknown from data related by
(2.1) 
where is a space of admissible unknowns and is a random variable whose distribution is known to us, but not its realization. In many applications, the forward model can be written as the composition of forward and observation maps, . The forward map represents a complex mathematical model that assigns outputs to inputs . For instance, may be the parameters of a differential equation, and may be its analytical solution. The observation map establishes a link between outputs and observable quantities, e.g. by pointwise evaluation of the solution.
In the Bayesian formulation of the inverse problem (2.1), one specifies a prior distribution on and seeks to characterize the posterior distribution of given If the input space is finite dimensional, , the prior distribution can be defined through its Lebesgue density The noise distribution of in gives the likelihood . In this work we assume, for concreteness, that is a zeromean Gaussian with covariance , so that
(2.2) 
where . Using Bayes’ formula, the posterior density is given by
(2.3) 
with multiplicative constant depending on .
For many inverse problems of interest, the unknown is a function and the input space is an infinitedimensional Banach space. In such a case, the prior cannot be specified in terms of its Lebesgue density, but rather as a measure supported on Provided that is measurable and that the posterior measure is still defined, in analogy to (2.3), as a change of measure with respect to the prior
(2.4) 
We refer to [AS10] and [trillos2016bayesian] for further details. The posterior contains, in a precise sense [duke], all the information on available in the data and the prior . This paper is concerned with inverse problems where arises from a complex model that cannot be evaluated pointwise; we then seek to approximate the idealized posterior finding a compromise between accuracy and computational cost.
A simple but important observation is that approximating accurately is not necessary in order to approximate accurately. It is only necessary to approximate since appears in the posterior only through While producing discretizations to complex models has been widely studied in numerical analysis, here we investigate how to approximate with the specific goal of approximating the posterior incorporating prior and data knowledge into the discretizations. For some inverse problems the observation operator also needs to be discretized, leading to similar considerations.
3 Bayesian Discretization of the Forward Model
Suppose that is the solution map to a differential equation that cannot be solved in closed form, and is pointwise evaluation of the solution. Standard practice in computing the Bayesian solution to the inverse problem involves using an a priori fixed discretization, e.g., by discretizing the domain of the differential equation into a fine grid. Provided that the grid is fine enough, the posterior defined with the discretized forward map can approximate well the one in (2.4). However, the discretizations are usually performed on a fine uniform grid which may lead to unnecessary waste of computational resources. Indeed, it is expected that the choice of discretization should be problem dependent, and should be informed both by the observation locations (which are often not uniform in space) and by the value of the unknown input parameter that we seek to reconstruct. Thus we seek to learn jointly the unknown input and the discretization of the forward map.
We will consider a parametric family of discretizations. Precisely, we let
(3.1) 
and each pair will parameterize a discretized forward model For given , represents the degrees of freedom in the discretization, and is the dimensional model parameter of the discretization. In analogy with Bayesian model selection frameworks, may be interpreted as indexing the discretization model. We focus on the model reduction rather than the parameter reduction problem, and assume that all approximation maps share the same input and output spaces and We will illustrate the flexibility of this framework using gridbased approximations and densitybased discretizations.
Example 3.1 (Gridbased discretizations).
Here the first component of each element represents the number of points in a grid, and the second component represents the locations of the grid points. The set contains all allowed grid sizes, is the temporal or spatial domain of the equation being discretized, and .
Example 3.2 (Densitybased discretizations).
Here the first component of each element represents again the number of points in a grid, and the second component parametrizes a probability density by a parameter of fixed dimension, independent of . Given we may obtain for instance a finite element grid by employing MacQueen’s method [du2002] to generate a centroidal Voronoi tessellation, and use its corresponding Delaunay triangulation. Intuitively can be regarded as the spatial density of a nonuniform grid. The space represents, as before, all the allowed number of grid points.
Example 3.3 (Other discretizations).
As mentioned in the introduction, other discretizations and model reduction techniques could be considered within the above framework, including projectionbased approximations, Gaussian processes, and graphbased methods. However, in our numerical experiments we will focus on gridbased and densitybased discretizations.
We consider a product prior on given by
(3.2) 
where is as in the original, idealized inverse problem (2.1). In general, conditioning on may or may not provide useful information about how to approximate When it does, this can be infused into the prior by letting the conditional distribution of given depend on For simplicity we restrict ourselves to the product structure (3.2).
The examples above and the structure of the space defined in equation (3.1) suggest to define hierarchically a prior over
(3.3) 
where is a probability mass function that penalizes expensive discretizations that employ large number of degrees of freedom, and denotes the conditional distribution of given in
We define the likelihood of observing data given by
(3.4) 
where . The discretized forward maps will be chosen so that evaluating is possible.
We first consider the case where is a singleton, and has a Euclidean space structure. Then, by Bayes’ formula,
(3.5) 
where is a normalizing constant. The first marginal of , which we denote as , constitutes a datainformed approximation of the posterior from the full, idealized inverse problem (2.3). We have the following result:
Proposition 3.4.
Proof.
Integrating both sides of the first equation in (3.5) with respect to :
Compare this to equation (2.3) and write , we have
where the last inequality follows from the Lipschitz continuity of the exponential function, boundedness of , and equivalence of norm in . This implies that and hence . Then the statement follows from a slight modification of Theorem 1.14 in [sanz2018inverse] and the definition of TV distance. ∎
Now we are ready to extend the above results to infinitedimensional input space . We define a prior measure on given by , where is as in the idealized inverse problem. The posterior measure on conditioning on will still be denoted by .
Proposition 3.5.
Suppose that is a separable Banach space with , is continuous. Then the posterior measure of given is absolutely continuous with respect to the prior on and has RadonNikodym derivative:
(3.7) 
Proof.
By the disintegration theorem (which holds for arbitrary Radon measures on separable metric spaces –see [dellacherie2011probabilities] chapter 3, page 70) for all measurable subsets , and , we can write in two different ways:
In particular,
given our assumptions on the noise model. We can then use Tonelli’s theorem to swap the order of the integrals and obtain
Given that is arbitrary, we conclude that is absolutely continuous with respect to the Lebesgue measure with density:
On the other hand,
applying Tonelli’s theorem once again to obtain the last equality. Since was arbitrary, it follows that for a.e. we have
In turn, from the arbitrariness of it follows that, for a.e. the measure is absolutely continuous with respect to and its RadonNykodym derivative is proportional (naturally, the constant of proportionality depends on ) to
as claimed.
∎
As in the finite dimensional case, we have the following result:
Proposition 3.6 (Wellposedness of Posterior).
Under the same assumption as in Proposition 3.5, suppose further that for almost any , , and is bounded. Then we have
(3.8) 
for some constant independent of
Remark 3.7.
In the context of gridbased forward approximations, the condition ‘ almost surely’ can be interpreted as ‘almost any draw from the approximation parameter space can produce an approximation of the forward model with error at most ’. This is often the case, for example, when the grids are finer than some threshold under regularity conditions on the input space.
4 Sampling the Posterior
The structure of the joint posterior over unknowns and approximations suggests using a MetropoliswithinGibbs sampler, which constructs a Markov chain by alternatingly sampling each coordinate:
In the above, and are MetropolisHastings Markov kernels that are reversible with respect to and We remark that the kernel involves evaluation of the forward model approximation but not of the intractable full model While the choice and design of the kernels and will clearly be problemspecific, and here we consider a standard method appropriate for the case where the input space is a space of functions to define
Before describing how to sample the full conditionals and of and it is useful to note that they satisfy the following expressions:
(4.1) 
4.1 Sampling the Full Conditional
For given and , we can sample from using pCN [beskos2008mcmc], with proposal
and acceptance probability
Other discretizationinvariant MCMC samplers [cui2016dimension, rudolf2015generalization] could also be used to update , but pCN is a straightforward and effective choice in the examples considered here.
4.2 Sampling the Full Conditional
4.2.1 Sampling Gridbased Discretizations
We will use a Markov kernel written as a mixture of two kernels, i.e.
each of which is induced by a different MetropolisHastings algorithm. The proposal mechanism for each of the kernels corresponds to a different type of movement, described next:

For we use MetropolisHastings to sample from the distribution using the following proposal: given with we set (i.e. the number of grid points stays the same) and let be defined by
and sample from a distribution on with density (w.r.t. Lebesgue measure on ) . In principle the density used to sample may depend on .

For we use MetropolisHastings to sample from the distribution using the following proposal: given we sample where is a Markov kernel on , and then generate according to

If let for all and then sample independently from the density .

If let for all .

Remark 4.1.
We notice that the proposals described above are particular cases of the ones used in reversible jump Markov chain Monte Carlo [green1995reversible].
For the MetropolisHastings algorithm associated to the acceptance probability takes the form
where recall and and is defined similarly.
For the MetropolisHastings algorithm associated to the acceptance probability takes the form
where
We notice that since each of the kernels and is defined by a MetropolisHastings algorithm, they leave the target invariant, and hence so does the kernel .
Remark 4.2.
If in the above the distribution is, regardless of
, the uniform distribution on
, then the acceptance probabilities reduce, respectively, toand
4.2.2 Sampling Densitybased Discretizations
Since in this case the dimension of is fixed, the calculation of the the acceptance probabilities is straightforward and the details are omitted. We refer to Subsection 5.3 for a numerical example.
5 Numerical Examples
In this section we demonstrate the applicability of our framework and sampling approach in a variety of inverse problems. Our aim is illustrating the benefits and potential limitations of the methods; for this reason we consider inverse problems for which we have intuitive understanding of where the discretizations should concentrate, thus validating the performance of the proposed approach. Before discussing the numerical results, we summarize the main goals and outcomes of each set of experiments:

In Subsection 5.1 we consider an inverse problem in mechanics [bigoni2019greedy], for which some observation settings highly influence the best choice of discretization while others inform it mildly. Our numerical results show that the gain afforded by grid learning is most clear whenever the observation locations highly influence the choice of discretization. We employ gridbased discretizations as described in Example 3.1 with an Euler discretization of the forward map. We also illustrate the applicability of the method in both finite and infinitedimensional representations of the unknown parameter, showing a more dramatic effect in the latter.

In Subsection 5.2 we consider an inverse problem in signal processing [hairer2011signal], with a choice of observation locations that determine where the discretization should concentrate. Our numerical results show that the grids adapt to the expected region, and that the degrees of freedom in the discretization necessary to reconstruct the unknown is below that necessary to satisfy stability of the numerical method with uniform grids. We employ gridbased discretizations as described in Example 3.1 with an EulerMaruyama discretization of the forward map.

In Subsection 5.3 we consider an inverse problem in source detection, where the true hidden unknown determines how best to discretize the forward model. Our numerical results show that the grids adapt as expected. We employ densitybased discretizations as described in Example 3.2 with a finite element discretization of the forward model.
5.1 Euler Discretization of ODEs: Estimation of the Young’s Modulus of a Cantilever Beam
We consider an inhomogeneous cantilever beam clamped on one side () and free on the other (). Let denote its Young’s modulus and let be a load applied onto the beam. Timoshenko’s beam theory gives the displacement of the beam and the angle of rotation
through the coupled ordinary differential equations
(5.1) 
where , , , are physical constants. Following [bigoni2019greedy], we consider the inverse problem of estimating the Young’s modulus from sparse observations of the displacement .
Let be the solution map to equations (5.1). Let be the locations of the observation sensors, leading to the observation operator defined coordinatewise by
where and is the normalizing constant such that . Data are generated according to the model
Notice that for system (5.1) with proper boundary conditions specified at , the displacement at any point depends only on the values Thus, we expect suitable discretizations of the forward model to refine finely only the region where is the rightmost observation location. We will discuss this in detail in section 5.1.2.
5.1.1 Forward Discretization
To solve system (5.1) we employ a finite difference method. A family of numerical solutions can be parameterized by the set
where is the number of grid points and are the grid locations. Precisely, for , we first reorder so that
and we let
be the linearly interpolated Euler finite difference solution to (
5.1), discretized using the ordered grid . We also discretize the observation operator using an Euler forward method, defined byFinally is approximated by
5.1.2 Implementation Details and Numerical Results
For our numerical experiments we consider a beam of length , width and thickness . We use a Poisson ratio and Timoshenko shear coefficient . represents the crosssectional area of the beam and
is the second moment of inertia. We run a virtual experiment of applying a point mass of
at the end of the beam, as seen in blue in Figures 0(a) and 1(a). We assume that the observations are gathered with errorWe first assume that the beam is made of 5 segments of different kinds of steel, each of length , with corresponding Young’s moduli . The prior on is given by where
denotes the allones vector. We also assume that the number of grid points
is fixed to be , i.e., the prior is a point mass. The grid locations are assumed to be a priori uniformly distributed in . Results are reported in Figure 1. We next assume that the Young’s modulus varies continuously with . We set a Gaussian process prior on defined by with . For this case we assume that the prior on the number of grid pointsfollows a Poisson distribution with mean 60, i.e.,
Poisson, and the grid locations still have a uniform prior given .The true Young’s modulus underlying the data and the reconstruction results are reported in Figure 2. Sampling is performed, both in the discrete and continuous settings, updating and alternately for a total number of iterations.In both Figures 1 and 2 two settings are considered. In the first one observations are concentrated on the right side of the beam and in the second on the left. For reference, Figure 4 shows idealized posteriors considered, obtained with very fine discretizations, for each of the settings. Notice that for system (5.1) with proper boundary conditions specified at , the displacement at any point depends only on the values of Young’s moduli with This implies that when observations are gathered on the left side of the beam, the posterior on agrees with the prior on the rightside, and no resources should be on discretizing the forward map on that region. In that case our adaptive datadriven discretizations are strongly concentrated on the left, as shown in Figures 0(g), 0(h) and 1(g), 1(h). However, when observations are gathered on the right side of the beam, the data is informative on for all In such case, Figures 0(b), 0(c) and 1(b), 1(c) show that the datadriven discretizations are concentrated on the right, but less heavily so. See Tables 1, 2 in the appendix for a more detailed description of the grid points distribution in both cases. Also, our results indicate that using datadriven discretizations will lead to a better estimation of the true Young’s modulus, compared to fixedgrid discretizations.
Let be the output of the Gibbs sampling algorithm at iteration . The reconstruction error is defined as follows:
(5.2) 
where is approximately calculated on a very fine grid. In Figure 3 we plot the reconstruction error for the second experiment where the Young’s moduli is continuous and observations are gathered on the rightside. With fixedgrid discretization, the reconstruction error is small where the discretization matches the observation points. With adaptive datadriven discretizations the grid points will adaptively match the observation points in order to produce less error.
5.2 EulerMaruyama Discretization of SDEs: a Signal Processing Application
Let be globally Lipschitz continuous and consider the SDE
(5.3) 
where denotes dimensional Brownian motion. We aim to recover from observations of the solution . We suppose that the observations are given by
(5.4) 
where
is assumed to follow a centered Gaussian distribution with covariance
andare given observation times. Following [hairer2011signal], we cast the problem in the setting of Section 2. First note that the solution to the integral equation
(5.5) 
defines a map
(5.6)  
(5.7) 
Thus we set the input and output space to be
Next we define an observation operator
and set We put as prior on the standard dimensional Wiener measure, that we denote Then the posterior distribution is given by Equation (2.4), which if may be rewritten as
(5.8) 
Note that the likelihood does not involve evaluation of at times and hence changing the definition of for does not change the posterior measure. Thus, we expect suitable discretizations of the forward model to refine finely only times up to the rightmost observation.
5.2.1 Forward Discretization
For most nonlinear drifts , the integral equation (5.2.1) cannot be solved in closed form and a numerical method needs to be employed. A family of numerical solutions may be parameterized by the set
Precisely, for we define an approximate, EulerMaruyama solution map
as follows. First, we reorder the ’s so that
Then we define as , and
(5.9) 
Finally, for we define by linear interpolation of and
Having defined the parameter space we now describe a choice of prior distribution on and the resulting combined prior on First we choose a prior on the number of gridpoints. Given the number of grid points , assuming no knowledge on appropriate discretizations for the SDE (5.3) we put a uniform prior on grid locations
Remark 5.1.
More information could be put into the prior. In particular it seems natural to impose that grids are finer at the beginning of the time interval.
5.2.2 Implementation Details and Numerical Results
For our numerical experiments we considered the SDE (5.3) with and doublewell drift
(5.10) 
We generated synthetic observation data by solving (5.3) on a very fine grid, and then perturbing the solution at uniformly distributed times so that the last observation corresponds to time The observation noise was taken to uncorrelated, with The motivation for choosing this example is that there is certain intuition as to where one would desire the discretization gridpoints to concentrate. Indeed, since all the observations are in the interval it is clear from Equation (5.8) that any discretization points will not contribute to better approximate In other words, those grid points would help in approximating but not in approximating .
We report our results for a small gridsize Similar but less dramatic effect was seen for larger grid size. Precisely, we chose our set of admissible grids to be given by
For implementation purposes, elements in the space were represented as vectors in containing their values on a uniform grid of stepsize . We run these algorithms with parameter choices
The experiments show a successful reconstruction of the SDE path. Moreover, the grids concentrate in in agreement with our intuition and the uncertainty quantification is satisfactory. In contrast, we see that when using the same number of grid points but on a uniform grid the EulerMaruyama scheme is unstable, leading to a collapse of the MCMC algorithm. Then, the posterior constructed with a uniform grid completely fails at reconstructing the SDE path, and the uncertainty quantification is overoptimistic due to poor mixing of the chain.
5.3 Finite Element Discretization: Source Detection
Consider the boundary value problem
(5.11) 
where is the unit square and is the Dirac function at the origin. We aim to recover the source location from sparse observations
(5.12) 
where follows a centered Gaussian distribution with covariance and are observation locations. To cast the problem in the setting of Section 2, we let be given by Green’s function for the Laplacian on the unit square (which does not admit an analytical formula but can be computed e.g. via series expansions [melnikov2006computability]) and be defined by pointwise evaluation at the observation locations. The prior on is the uniform distribution in the unit square , which we denote . Since is finite dimensional, the posterior has Lebesgue density given by Equation (2.3). We find this problem to be a good test, as there is a clear understanding that the datadriven mesh should concentrate around the source.
5.3.1 Forward Discretization
To solve equation (5.11) numerically we employ the finite element method. The use of uniform grid is here wasteful, as the mesh should ideally concentrate around the unknown source
We will use grids obtained as the Delauny triangulation of central Voronoi tessellations and generators , which can be calculated as the solution of the optimization problem, parameterized by a probability density on :
(5.13) 
For a fixed density and integer we denote the optimal grid points by . Then the approximated solution map
(5.14)  
(5.15) 
is exactly given by the finite element solution of equation (5.11) with respect to (the Delauny triangulation of) the grid points . Details on the creation of grid for prescribed parameters and will be discussed below.
In the spirit of adapting the grid to favor the ones maximizing the model evidence, we constrain to belong to a family of parametric densities where
is the product measure of two Beta distributions. Therefore in this case
and .5.3.2 Implementation Details and Numerical Results
We solved equation (5.11) on a very fine grid with the true point source . The observation locations were . Observation noise was uncorrelated with , .
In this example the prior of is the uniform distribution on and is initialized at . The parameters are also set to have a uniform prior . We initialize , which corresponds to (near) uniform grid in .
We compare our algorithm to the traditional method where we fix a uniform grid in and run the MCMC algorithm only on . We found out that with the same number of grid points, our datadriven approach gives a posterior distribution that is more concentrated around the true location of the point source, as shown in Figure 5(a) and 5(b). Also, Figure 5(c) shows that the adaptive discretization is concentrated at the top right corner of the region, where the hidden point source is located.
Next we show that datadriven discretizations of the forward map can be employed to provide improved uncertainty quantification of the PDE solution, and not only to better reconstruct the unknown input. To illustrate this, we approximate the pushforward distribution in three different ways, as shown in Figure 7. Let denote the output of the Gibbs sampling algorithm at iteration . We first consider the traditional method where the grid is fixed and uniform, that is, is fixed. Then the pushforward distribution can be wellapproximated by , for large enough. We then consider the same setting except that is replaced by a forward map computed in an extremely fine and adapted grid and the pushforward is approximated by . Finally we consider a datadriven setting stemming from our algorithm, where the pushforward distribution is approximated by
. We see that our algorithm reconstructs well the solution to the PDE, with a more accurate mean and a smaller variance.
6 Conclusions and Open Directions

We have shown that, in a variety of inverse problems, the observations contain useful information to guide the discretization of the forward model, allowing a better reconstruction of the unknown than using uniform grids with the same number of degrees of freedom. Despite these results being promising, it is important to note that updating the discretization parameters may be costly in itself, and may result in slower mixing of the MCMC methods. For this reason, we envision that the proposed approach may have more potential when the computational solution of the inverse problem is very sensitive to the discretization of the forward map and discretizing said map is expensive. We also believe that densitybased discretizations may help in alleviating the cost of discretization learning.

An interesting avenue of research stemming from this work is the development of prior discretization models that are informed by numerical analysis of the forward map , while recognizing the uncertainty in the best discretization of the forward model Moreover, more sophisticated prior models beyond the product structure considered here should be investigated.

Topics for further research include the development of new local proposals and sampling algorithms for gridbased discretizations, and the numerical implementation of the approach in computationally demanding inverse problems beyond the proofofconcept ones considered here.
Acknowledgement
The work of NGT and DSA was supported by the NSF Grant DMS1912818/1912802. The work of DB and YMM was supported by NSF Grant DMS1723011.