Accelerating Uncertainty Quantification of Groundwater Flow Modelling Using Deep Neural Networks

07/01/2020 ∙ by Mikkel B. Lykkegaard, et al. ∙ University of Exeter 0

Quantifying the uncertainty in model parameters and output is a critical component in model-driven decision support systems for groundwater management. This paper presents a novel algorithmic approach which fuses Markov Chain Monte Carlo (MCMC) and Machine Learning methods to accelerate uncertainty quantification for groundwater flow models. We formulate the governing mathematical model as a Bayesian inverse problem, considering model parameters as a random process with an underlying probability distribution. MCMC allows us to sample from this distribution, but it comes with some limitations: it can be prohibitively expensive when dealing with costly likelihood functions, subsequent samples are often highly correlated, and the standard Metropolis-Hastings algorithm suffers from the curse of dimensionality. This paper designs a Metropolis-Hastings proposal which exploits a deep neural network (DNN) approximation of the model, to significantly accelerate the Bayesian computations. We modify a delayed acceptance (DA) model hierarchy, whereby proposals are generated by running short subchains using an inexpensive DNN approximation, resulting in a decorrelation of subsequent fine model proposals. Using a simple adaptive error model, we estimate and correct the bias of the DNN approximation with respect to the posterior distribution on-the-fly. The approach is tested on two synthetic examples; a isotropic two-dimensional problem, and an anisotrpoic three-dimensional problem. The results show that the cost of uncertainty quantification can be reduced by up to 75 accuracy of the employed DNN.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

page 12

page 15

page 16

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Modelling of groundwater flow and transport is an important decision support tool when, for example, estimating the sustainable yield of an aquifer or remediating groundwater pollution. However, the input parameters for mathematical models of groundwater flow (such as subsurface transmissivity and boundary conditions) are often impossible to determine fully or accurately, and are hence subject to various uncertainties. In order to make informed decisions, it is of critical importance to decision makers to obtain robust and unbiased estimates of the total model uncertainty, which in turn is a product of the uncertainty of these input parameters

[1]. A popular way to achieve this, in relation to groundwater flow or any inverse problem in general, is stochastic or Bayesian modelling [2, 3, 4]. In this context, a probability distribution, the prior, is assigned to the input parameters, in accordance with any readily available information. Given some real-world measurements corresponding to the model outputs (e.g. sparse spatial measurements of hydraulic head, Darcy flow or concentration of pollutants), it is possible to reduce the overall uncertainty and obtain a better representation of the model by conditioning the prior distribution on this data. The result is a distribution of the model input parameters given data, which is also referred to as the posterior.

Obtaining samples from the posterior distribution directly is not possible for all but the simplest of problems. A popular approach for generating samples is the Metropolis–Hastings type Markov Chain Monte Carlo (MCMC) method [5]. Samples are generated by a sequential process. First, given a current sample, a new proposal for the input parameters is made using a so-called proposal distribution. Evaluating the model with this new set of parameters, a likelihood is computed - a measure of misfit between the model outputs and the data. The likelihood of the proposed and current samples are then compared. Based on this comparison, the proposal is either accepted or rejected, and the whole process is repeated, generating a Markov chain of probabilistically feasible input parameters. The key point is that the distribution of samples in the chain converges to the posterior – the distribution of input parameters given the data [5]. This relatively simple algorithm can lead to extremely expensive Bayesian computations for three key reasons. First, each step of the chain requires the evaluation of (often) an expensive mathematical model. Second, the sequential nature of the algorithm means subsequent samples are often highly correlated – even repeated if a step is rejected. Therefore the chains must often be very long to obtain good statistics on the distribution of outputs of the model. Third, without special care, the approach does not generally scale well to large numbers of uncertain input parameters; the so-called curse of dimensionality. Addressing these scientific challenges is at the heart of modern research in MCMC algorithms. As with this paper there is a particular focus on developing novel and innovative proposal distributions, which seek to de-correlate adjacent samples and limit the computational burden of evaluating expensive models.

Broadly in the literature, simple Darcy type models and other variants of the diffusion equation have long been a popular toy example problems for demonstrating MCMC methodologies in the applied mathematics community (see e.g. [6, 7, 8]). There appears to be much less interest in MCMC in the applied groundwater modelling community. This may be because of the computational cost of running MCMC on highly parametrised, expensive models, or the lack of an easy-to-use MCMC software framework, akin to the parameter estimation toolbox PEST [9].

An exciting approach to significantly reduce the computational cost has been proposed in multi-level, multi-fidelity and Delayed Acceptance (DA) MCMC methods. In each case, to alleviate computational cost, a hierarchy of models is established, consisting of a fine model and (possibly multiple) coarse, computationally cheap approximations. Typically, the coarser models are finite element solutions of the PDE on a mesh with a coarser resolution, but as we show in this paper, can be taken to be any general approximation similar to the multi-fidelity philosophy [10]. Independent of the approach, the central idea is the same: to obtain significant efficiency gains by exploiting approximate coarse models to generate ‘good’ proposals cheaply, using additional accept/reject steps to filter out highly unlikely proposals before evaluating the fine, expensive model. Previous studies of two-stage approaches include [11] who modelled multi-phase flow with coarse level proposals evaluated by a coarse-mesh single-phase flow model (an idea that was developed further in [12]), [13] and [14]. We note that the latter of which, instead of simply using a coarser discretisation, implemented a data-driven polynomial chaos expansion as a surrogate model. We intend to demonstrate how the development of novel techniques in MCMC and machine learning can be combined to help realise the potential of MCMC in this field.

In this work, we propose a combination of multiple cutting-edge MCMC techniques to allow for efficient inversion and uncertainty quantification of groundwater flow. We propose an improved delayed acceptance (DA) MCMC algorithm, adapted from the approach proposed by [15]. In our case, similarly to multi-level MCMC [7], proposals are generated by computing a subchain using a Deep Neural Network (DNN) as an approximate model – leading to cheaply computed, decorrelated proposals passed on to the fine model. For our first example, the subchain is driven by the preconditioned Crank-Nicolson (pCN) proposal distribution [16] to ensure the proposed Metropolis-Hastings algorithm is robust with respect to the dimension of the uncertain parameter space. For our second example, proposals for the subchains are generated using the Adaptive Metropolis (AM) proposal [17], since the posterior distribution in this case is highly non-spherical and multiple parameters are correlated. Finally, we propose a enhanced error model, in which the DNN is trained by sampling the prior distribution, yet the bias of the approximation is adaptively estimated and corrected on-the-fly by testing the approximations against the full model in an adaptive delayed acceptance setting [18].

2 Preliminaries

In this section we briefly introduce the forward model, defining the governing equations underpinning groundwater flow and their corresponding weak form, enabling us to solve the equations using FEM methods. We then formulate our model as an Bayesian inverse problem with random input parameters, effectively resulting in a stochastic model, which can be accurately characterised by sampling from the posterior distribution of parameters using MCMC. The simple Metropolis-Hastings MCMC algorithm is then introduced and extended with the preconditioned Crank-Nicolson (pCN) and Adaptive Metrpolis (AM) transition kernels.

2.1 Governing equations for groundwater flow

Consider steady groundwater flow in a confined, inhomogenous aquifer which occupies the domain with boundary

. Assuming that water is incompressible, the governing equations for groundwater flow can be written as the scalar elliptic partial differential equation:

(1)

subject to boundary conditions on defined by the constraint equations

(2)

Here is the heterogeneous, depth-integrated transmissivity, is hydraulic head, is fixed hydraulic head at boundaries with Dirichlet constraints, is fluid sources and sinks, is Darcy velocity, is Darcy velocity across boundaries with Neumann constraints and and define the boundaries comprising of Dirichlet and Neumann conditions, respectively. Following standard FEM practice (see e.g. [19]), eq. (1) is converted into weak form by multiplying by an appropriate test function and integrating by parts, so that

(3)

where is the Hilbert space of weakly differentiable functions on . To approximate the hydraulic head solution , a finite element space on a finite element mesh . This is defined by a basis of piecewise linear Lagrange polynomials , associated with each of the finite element nodes. As a result (3) can be rewritten as a system of sparse linear equations

(4)
(5)

where and

are the global stiffness matrix and load vector, respectively. The vector

is the solution vector of hydraulic head at each node within the finite element mesh so that

. In our numerical experiments, these equations are solved using the open source general-purpose FEM framework

FEniCS [20]. While there are well-established groundwater simulation software packages available, such as MODFLOW [21] and FEFLOW [19], FEniCS was chosen because of its flexibility and ease of integration with other software and analysis codes.

2.2 Aquifer transmissivity

The aquifer transmissivity is not known everywhere on the domain, therefore a typical approach is to model it as a log-Gaussian random field. There exists extensive literature on modelling groundwater flow transmissivity using log-Gaussian random fields (see e.g. [22, 23, 14]). Whilst this may not always prove a good model, particularly in cases with highly correlated extreme values and/or preferential flow paths [24, 25] as seen when considering faults and other discontinuities [26, 27]

. However, the log-Gaussian distribution remains relevant for modelling transmissivity in a range of aquifers

[28, 29, 14].

Our starting point is a covariance operator with kernel , which defines the correlation structure of the uncertain transmissivity field. For our numerical experiments, we consider the ARD (Automatic Relevance Determination) squared exponential kernel, a generalisation of the ‘classic’ squared exponential kernel, which allows for handling directional anisotropy:

(6)

where is the spatial dimensionality of the problem and is a vector of lengths scales corresponding to each spatial dimension. We emphasise that the covariance kernel is a modelling choice, and that different options are available, such as the Matern kernel which offers additional control over the smoothness of the field.

In our work, transmissivity was modeled as a discrete log-Gaussian random field expanded in an orthogonal eigenbasis with Karhunen-Loève (KL) eigenmodes. To achieve this we construct a covariance matrix , where entries are given by for each pair of nodal coordinates within the finite element mesh . Once constructed the largest eigenvalues

and associated eigenvectors

of can be computed. The transmissivity at the nodes , is given by

(7)

vector defines the log of the mean transmissivitity field,

a scalar parameterising the variance and

a vector of Gaussian random variables such that

as in [30]

. The random field can be interpolated from nodal values across

, using the shape functions so that

Truncating the KL eigenmodes at the th mode and interpolating the field both have a smoothing effect on the recovered transmissivity fields, which may or may not be desirable, depending on the application. Figure 1 shows some examples of realisations of Gaussian random fields with a square exponential kernel, which illustrates the effect of the covarance length scale and the number of admitted KL eigenmodes .

Figure 1: A selection of Gaussian random process realisations with a square exponential kernel using different covarance length scales and number of KL eigenmodes . All displayed realisations were generated using the same appropriately truncated random vector with identical eigenvectors for each .

2.3 The Bayesian inverse problem

To setup the Bayesian inverse problem and thereby quantify the uncertainty in the transmissivity field , the starting point is define a statistical model which describes distribution of the mismatch between observations and model predictions. The observations are expressed in a single vector and for a given set of model input parameters , the model’s prediction of the data is defined by the forward map, . The statistical model assumes the connection between model and observations through the relationship

(8)

where we take which represents the uncertainty the connection between model and data, capturing both model mis-specification and measurement noise as sources of this uncertainty.

The backbone of a Bayesian approach is Bayes’ theorem, which allows for computing

posterior beliefs of model parameters using both prior beliefs and observations

. Bayes’ theorem states that the posterior probability of a parameter realisation

given data can be computed as

(9)

where is referred to as the posterior distribution, is called the likelihood, the prior distribution and

(10)

is a normalising constant, sometimes referred to as the evidence. In most cases this integral does not have a closed-form solution and is infeasible to estimate numerically in most real-world applications, particularly when the dimension of the unknown parameter space is large and the evaluation of the model (required to compute ) is computationally expensive. A family of methods called Markov Chain Monte Carlo (MCMC) are often employed to approximate the solution [31]. Importantly MCMC, whilst computationally expensive, allows indirect sampling from the posterior distribution and avoids the explicit need to estimate (10). Moreover, it can be designed to be independent of the dimension of the parameter space and has no embedded unquantifiable bias. In this paper we consider a subclass of MCMC methods called the Metropolis-Hastings [32, 33, 5] algorithm, which is described in Algorithm 1. The algorithm generates a Markov chain with a distribution converging to . It is difficult (often impossible) to sample directly from the posterior, hence at each step, at position in the chain, a proposal is made from a simpler known (proposal) distribution . An accept/reject step then determines whether the proposal comes from (probabilistically) the posterior distribution or not. This accept/reject step is a achieved by essentially computing the ratio of the densities of the current state to the proposal. To do this we exploit Bayes’s Theorem. The key observation in MCMC is that the normalising constant is independent of , and so

(11)

Therefore when comparing the ratio of the densities, the normalizing contant (since independent of ) cancels.

Algorithm 1: Metropolis-Hastings Algorithm Given a parameter realisation and a transition kernel , generate a proposal . Compute the likelihood ratio between the proposal and the previous realisation:

If then set , otherwise, set .

In our model problem, the prior density of the parameters represents the available a priori knowledge about the transmissivity of the aquifier. From our statistical model (8) we see that our , hence

(12)

Importantly we note that for each step of the Metropolis-Hastings algorithms we are required to compute . This requires the evaluation of the forward mapping which can be computationally expensive. Moreover, due to the sequential nature of MCMC-based approaches, consecutive samples are correlated and hence many samples are required to obtain good statistics on the outputs.

The proposal distribution is the key element which drives the Metropolis-Hastings algorithm and control the effectiveness of the algorithm. A common choice is a simple random walk, for which , yet as shown in [34] , the basic random walk does not lead to a convergence that is independent of the input dimension . Better choices would be the preconditioned Crank-Nicolson proposal (pCN, [16]), which has dimension independent acceptance probability, or the Adaptive Metropolis algorithm (AM, [17]), which adaptively aligns the proposal distribution to the posterior during sampling. Moreover, unlike the Metropolis-Adjusted Langevin Algorithm (MALA), No-U-Turn Sampler (NUTS) and Hamiltonian Monte Carlo, none of these proposals rely on gradient information, which can be infeasible to compute for expensive forward models.

To generate a proposal using the pCN transition kernel, one computes

(13)

where is a random sample from the prior distribution, . This expression corresponds to the transition kernel . Moreover, for the pCN transition kernel, the acceptance probability simplifies to

(14)

as given in [7]. Additional details of derivation of the pCN proposal are are provided in Appendix A.

Similarly, to generate a proposal using the AM transition kernel, we draw a random sample

(15)

where is an iteratively updated covariance structure

Hence, proposals are drawn from a distribution with an initial covariance for a given period , after which adaptivivity is ’switched on’, and used for the remaining samples. The adaptive covariance can be constructed iteratively during sampling using the following recursive formula:

(16)

where is the arithmetic mean, is a scaling parameter, is the dimension of the proposal distribution and is a parameter which prevents from becoming singular [17]. This, on the other hand, corresponds to the transition kernel , which is not guaranteed to be ergodic, since it will depend on the history of the chain. However, the Diminishing Adaptation condition [35] holds, as adaptation will naturally decrease as sampling progresses.

2.4 Deep Neural Network

The approximate/surrogate model in our experiments is a feed-forward deep neural network (DNN), a type of artificial neural network with multiple hidden layers, as implemented in the open-source neural-network library Keras [36] utilizing the Theano backend [37].

The DNN approximates the forward map, accepting a vector of KL coefficients , and returning an approximation of the vector of approximate model output – in this paper a vector of hydraulic heads at given sampling points, i.e. . Figure 2 shows the graph of one particular DNN employed in our experiments.

Figure 2: Graph showing the structure of a feedforward DNN.

Each edge in Figure 2 is equipped with a weight where is index of the layer that the weight feeds into, is the index of nodes in the same layer and is the index of nodes in the previous layer. These weights can be arranged in matrices for each layer . Similarly, each node is equipped with a bias where is index of its layer and is the index of node, and these biases can be arranged in vectors . Data is propogated through the network such that the output of a layer

with activation function

is

(17)

Activation functions are applied element-wise on their input vectors so that

Many different activation functions are available for artificial neural networks, and we here give a short description of the ones employed in our experiments: the sigmoid and the rectified linear unit

(‘ReLU’). The transfer function of the nodes in the first layer of each DNN was of the type

sigmoid:

(18)

squashing the input vector into the interval , effectively resulting in a strictly positive output from the first hidden layer. The remaining hidden layers consisted of nodes with the de facto standard hidden layer activation function for deep neural networks, the rectified linear unit (‘ReLU’):

To fit an artifical neural network to a given set of data, the network is initially compiled using random weights and biases and then trained using a dataset of known inputs and their corresponding outputs. The weights and biases are updated iteratively during training by way of an appropriate optimisation algorithm and a loss function, and if appropriately set up, will converge towards a set of optimal values, allowing the MLP to predict the response of the forward model to some level of accuracy

[38]. Our particular DNNs were trained using the mean squared error (MSE) loss function

for

output variables, and the RMSprop optimiser, a stochastic, gradient based and adaptive algorithm, suggested by

[39] and widely used for training DNNs.

3 Adaptive Delayed Acceptance Proposal using a Deep Neural Network

In this section we describe a modified adaptive delayed acceptance proposal for MCMC, using ideas from multi-level MCMC [7]. The general approach generates proposals by running Markov subchains driven by an approximate model. In our case this approximation is constructed from a DNN of the forward map trained from offline samples of the prior distribution. Finally, we show how the approximate map can be corrected online, by adaptively learning a simple multi-variant Gaussian correction to the outputs of the neural network.

3.1 Modified Delayed Acceptance MCMC

Delayed Acceptance (DA) [15] is a technique that exploits a model hierarchy consisting of an expensive fine model and relatively inexpensive coarse approximation. The idea is simple: a proposal is first evaluated (pre-screened) by an approximate model and immediately discarded if it is rejected. Only if accepted, it is subjected to a second accept/reject step using the fine model. In this context, the likelihood of observations given a parameter set is henceforth denoted when evaluated on the approximate model and remains when evaluated on the fine model. This simple screening mechanism cheaply filters out poor proposals, wasting minimal time evaluating unlikely proposals on the expensive, fine model. In this paper we extend this approach by not evaluating every accepted approximation proposal with the fine model. Instead, a proposal for the fine model is generated by running an approximate subchain until approximate proposals have been accepted and only then evaluate using the fine model. This modified Delayed Acceptance MCMC algorithm is described in Algorithm 2 and an illustration of the process is given in Figure 3.

Algorithm 2: Modified Delayed Acceptance MCMC Given a realisation of the approximation parameters and the transition kernel , generate a proposal for the approximation . Compute the likelihood ratio on the approximate model between the proposal and the previous realisation:

(AM) (pCN)
If then set and return to (1); otherwise set and continue to (4). If proposals have been accepted in the approximation subchain, continue to (5), otherwise return to (1). Given the latest realisation of the entire parameter set with fine parameters and the transition kernel , generate a proposal for the fine parameters and set . Compute the likelihood ratio on the fine model between the proposal and the previous realisation:
(AM) (pCN)
If then set , otherwise set .

Figure 3: Illustration of the principle used to offset fine level samples to reduce autocorrelation. The fine model is only evaluated using the full set of proposed parameters after a prescribed number of approximation parameter sets have been evaluated on the approximate model and accepted into the chain.

This way, the autocorrelation of the fine chain is reduced, since proposals are ‘more independent’. This approach is strongly related to a two-level version of multi-level MCMC. Since the fine model likelihood ratio is corrected by the inverse of the approximate likelihood ratio in step 6 of Algorithm 2, detailed balance is satisfied, the resulting Markov Chain is guaranteed to come from the true posterior and there is no loss of accuracy, even if the approximate model is severely wrong [15]. To demonstrate that this approach does indeed decrease the autocorrelation in our fine chain MCMC samples, we compute the Effecive Sample Size of each MCMC simulation according to the procedures described in [40].

3.2 Adaptive correction of the approximate posterior

Whilst in theory the modified delayed acceptance proposal described in Section 3.1 will provide a convergent Metropolis-Hastings algorithm, there are cases in which the rate of convergence will be extremely slow. To demonstrate this, the left-hand contour plot in Fig. 4 shows an artificially bad example. In this case the approximate model (red isolines) poorly captures the target likelihood distribution (blue density); there is a clear offset in the distributions, and the scale, shape and orientation of the approximate likelihood is incorrect. If using the modified delayed acceptance algorithm without alteration, it is easy to see that the proposal mechanism would struggle to traverse the whole of the target distribution, since much of it lies in the tails of the approximate likelihood distribution. As a result, in practice, we would observe extremely slow convergence to the true posterior; in practise – at finite computational times – results would contain a significant bias.

An adhoc way to overcome this is to apply so-called tempering on the statistical model which drives the subchain. In this technique, the variance of the misfit on the subchain is artificially inflated to capture the uncertainty in the approximate model. The issue in adopting this approach is the difficulty in selecting a robust inflation factor for tempering, particularly in higher dimensions. Furthermore, an isotropic inflation of the approximate posterior will in general be sub-optimal.

Figure 4: Fine/target likelihood (blue) and approximate likelihood (red). (Left) Original likelihood before correction, (middle) corrected likelihood by a constant shift and (right) corrected approximate likelihood by multivariant Gaussian.

In this paper we instead implement an adaptive enhanced error model (EEM), which overcomes many of these challenges. Moreover, it is easy to implement and has negligible additional computational cost. Let denote the approximate forward map of the fine/target model . Then, following [41, 18], we apply a trick to the statistical model (8) where we add and subtract the coarse map . With some rearrangement we obtain the expression

(19)

Here is the bias associated with the approximation at given parameter values . We approximate this bias using a multivariant Gaussian distribution, i.e. , and therefore the likelihood function (12) can be rewritten as

(20)

The influence of redefining the likelihood is best demonstrated geometrically, as shown in Fig. 4 (middle and right). Firstly, as shown in Fig. 4 (middle) we can make a better approximation by simply adding a shift of the mean bias to the original approximate model . This has the effect of aligning the ‘centre of mass’ of each of the distributions. Secondly, we can learn the covariance structure of the bias. This has the effect of stretching and rotating the approximate distribution to give an even better overall approximation, as shown in Fig. 4 (right). The final mismatch between the approximate and target distribution, will be driven by the assumption that bias can be represented by a multivariant Gaussian, although more complex distributions could be constructed using, for example, Gaussian process regression. Whilst this is an avenue to explore in the future, any such approach would surrender the simplicity of this approach, which from the results appears particularly effective.

The idea of using an EEM when dealing with model hierarchies originates from [41], who suggested to use samples from the prior distribution of parameters to construct the EEM prior to Bayesian inversion, so that

(21)

The estimates for and could be obtained by sampling the prior distribution and comparing the approximate forward map against the target forward map. However, since the structure of bias in the prior and the posterior could be significantly different, this approach could result in a suboptimal EEM and requires additional fine model evaluations. Furthermore, in our examples where the approximate model is built from samples from the prior, it is expected that such an approach would underestimate the bias, since the neural network has been explicitly trained to minimise the error with respect to samples from the prior. Instead of estimating the bias using the prior, the posterior bias can be constructed on-line by iteratively updating its mean and covariance using coarse/fine solution pairs from the MCMC samples as suggested by [42]. In this case we select

(22)
(23)

While this approach does not in theory guarantee ergodicity of the chain, the bias distribution will converge as the chain progresses and adaptation diminishes, resulting in a de facto ergodic process after an initial period of high adaptivity. This is a common feature of adaptive MCMC algorithms, as discussed in the classic paper on adaptive Metropolis [17]. Our experiments showed that the bias distribution did indeed converge for every simulation, and that repeated experiments converged towards the same posterior bias distribution. Admitting a bias term in the inverse problem further introduces an issue of identifiability, as highlighted in [43]. Since observations are now modelled as a sum of coarse model output and multiple stochastic terms, the stochastic terms and are generally unidentifiable in the coarse model formulation.

4 Results

In this section, we examine the effectiveness of our proposed strategy on two synthetic groundwater flow problems: a two-dimensional problem with an isotropic covariance kernel and a three-dimensional problem with an anisotropic covariance kernel. For both examples, we begin by outlining the model setup, for which we select a ‘true’ transmissivity field and a number of fixed observation points. For the first example, the influence of training size for the DNNs is examined, and the total cost of uncertainty quantification using a selection of DNNs is computed. For the second example we use a single DNN setup and analyse the resulting posterior marginal distributions and the quantity of interest. The first example was completed on commodity hardware – an HP Elitebook 840 G5 with an Intel Xeon E3-1200 quad-core processor, while the second example was completed on a TYAN Thunder FT48T-B7105 GPU server with two Intel Xeon Gold 6252 processors and an NVIDIA RTX 2080Ti GPU.

4.1 Example 1: 2D Unit Square

4.1.1 Model Setup

This example was conducted on a unit square domain , meshed using an unstructured triangular grid comprising 2,601degrees of freedom. Dirichlet boundary conditions were imposed on the left and right boundaries with hydraulic heads of 1 and 0, respectively. The top and bottom edges impose homogeneous no-flow Neumann boundary conditions. For the forward model, the covariance length scales of the ARD squared exponential kernel was set to , effectively resulting in an isotropic covariance kernel, equal to the ‘classic’ square exponential kernel with . This resulted in a KL decomposition with of total signal energy in the first 32 modes and of signal energy in the first KL modes. Hence, modes were included in the approximate model whilst modes where included in the fine model.

(a) Log-Transmissivity.
(b) Hydraulic head.
Figure 5: “True” transmissivity field, its corresponding solution and sampling points.

Figure 4(a) shows the ‘true’ transmissivity field that we attempt to recover through our MCMC methodology and the modelled, corresponding hydraulic head. Synthetic samples for the likelihood function were extracted at 25 points on a regular grid with a horisontal and vertical spacing of 0.2m (Figure 4(b)).

4.1.2 Deep Neural Network Design, Training and Evaluation

We evaluated a selection of different DNNs to investigate the impact of various network depths and activation functions on the DNN performance. Table 1 shows the layers of the employed DNNs, the number of nodes in each layer and their corresponding activation functions. DNN1 and DNN2 had three hidden layers, while DNN3 and DNN4 had only two, as the ReLU layer with nodes was not included in these networks. The output layer of DNN1 and DNN3 consisted of nodes with an exponential activation function , resulting in a strictly positive output. The DNNs with an exponential activation function in the final layer tended overall to lead to the best performance.

Layer # Nodes Activation Functions
DNN1 DNN2 DNN3 DNN4
Input KL coefficients
1 Sigmoid Sigmoid Sigmoid Sigmoid
2 ReLU ReLU until
3 ReLU ReLU ReLU ReLU
Output datapoints Exponential Linear Exponential Linear
Table 1: Neural network layers and activation functions in the model approximation DNNs.

Each DNN was trained on a set of samples from the prior distribution of parameters , in advance of running the MCMC. Hence, the DNN samples were drawn from a Latin Hypercube [44] in the interval

and transformed to the standard normal distribution using the

probit-function, such that . The FEM model was then run for every parameter sample, obtaining for each a vector of model outputs at sampling points given parameters. We trained and tested each DNN on a range of different sample sizes, namely , where

, with a 9:1 training/test splitting ratio. Each DNN was then trained for 200 epochs with a batch size of 50 using the

rmsprop optimiser [39].

Deep Neural Networks performance was compared using the RMSE of their respective testing dataset

(24)

The residual RMSE (24) of each DNN was computed to compare the network designs described in Table 1 and to investigate the influence of training dataset size on the DNN performance (Figure 6). As expected, each DNN performed better as the number of samples in the training dataset were increased. In comparison, the DNN design had much less influence on the testing performance, suggesting that the main driver for constructing an accurate surrogate model, within the bounds of the examined DNN designs, was the number of training samples. For the remaining analysis, we chose the overall best performing network, namely DNN1 constructed from samples.

Figure 6: Testing performance (RMSE) of each DNN against the total sample size (). Please refer to table 1 for details in the structure of each DNN.

Further performance analysis consisted of analysing the DNN error for true and predicted heads ( and , respectively) for datapoints 0, 8, 16 and 24. (Figure 7). All error distributions were approximately Gaussian, with the errors for the DNN with

exhibiting some right skew at sampling point 24. For all DNNs, the sampling points closer to the boundaries (at sampling points 0 and 24) had lower errors than those further away, since the heads close to the boundaries were more constrained by the model.

Figure 7: Density plot of the error () of the testing dataset for DNN1 trained and tested on

samples, for sampling points 0, 8, 16 and 24. Bars show density of each bin, while the curve shows Gaussian kernel density estimate.

4.1.3 Uncertainty Quantification

For inversion and uncertainty quantification, we chose a multivariate standard normal distribution as the prior parameter distribution, and set the error covariance to . While conputationally convenient, the zero-centrered prior in practice favours transmissivity field realisations capable of reproducing the observed heads with as little variation as possible. In total, seven different sampling strategies were investigated, namely single level ‘vanilla’ MCMC, DA using three different DNNs trained and tested on samples, and DA with an enhanced error model (DA/EEM) using the same three DNNs. Every simulation was completed using the pCN transition kernel. Each MCMC sampling strategy was repeated () using randomly generated random seeds, to ensure that every starting point would converge towards the same stationary distribution and to allow for cross-chain statistics to be computed. Results given in this section pertain to these multi-chain samples rather than individual MCMC realisations, unless otherwise stated.

Our sampling strategies recovered the ground truth with good accuracy. Figure 8 shows the mean and variance of the recovered field from the DA/EEM MCMC using the DNN with . All recovered fields exhibit higher smoothness than the ground truth, which could be attributed to the relatively low number of sampling points and their regular distribution on the domain. None of the chains recovered the local peak in transmissivity on the right side of the domain, since there was too little data to discover this particular feature.

(a) Mean of recovered log-transmissivity.
(b) Variance of recovered log-transmissivity.
Figure 8: Mean and variance () of recovered log-transmissivity fields using DA/EEM MCMC with .

While the recovered fields indicate that every MCMC sampling strategy converged towards the desired stationary distribution, they do not reveal the relative efficiency of each stategy. Hence, the Effective Sample Size () was computed for each MCMC realisation. Every DA sampling strategy produced higher than the Vanilla pCN sampler, relative to the simulation time, with a clear correlation between DNN testing performance and . This was mainly because the better performing DNNs allowed for a longer coarse chain offset without diverging. Moreover, utilising the EEM produced even higher for every DA chain (Table 2).

Strategy NDNN / Acc. Rate Time (min)
Vanilla — / 80000 0.23 63.3 84.8
DA 4000 2 85461.9 / 20000 0.27 16.2 64.5
DA/EEM 4000 2 78853.4 / 20000 0.31 15.2 79.0
DA 16000 4 172383.1 / 20000 0.27 18.2 116.3
DA/EEM 16000 4 178978.4 / 20000 0.30 18.4 143.6
DA 64000 8 336447.5 / 20000 0.24 30.1 196.5
DA/EEM 64000 8 377524.4 / 20000 0.30 29.9 235.7
Table 2: Results for various MCMC sampling strategies, means of multiple chains with . is the number of total samples used to construct the DNN. is the improved DA offset length. / is the final length of the coarse and fine chain, respectively, after subtracting burnin. Acc. rate is the fine chain acceptance rate. Time (min) is the total running time of the simulation in minutes. is the Effective Sample Size.

4.1.4 Total cost

Since the DA chains required computation of a significant number of fine model solutions and training of a DNN in advance of running the chain, the total cost of each strategy was computed as

(25)

where was the time spent on precomputing fine model solution, was the time spent on training the respective DNN, was the time taken to run the chain and was the resulting effective sample size (Fig. 9).

Figure 9: Violinplot showing the total cost of each MCMC strategy with . Points show individual Markov Chains.

The mean cost of every DA chain was lower than that of the Vanilla pCN chain, with the chains using the EEM consistently cheaper than their non-EEM counterparts. Moreover, using the EEM reduced the variance of the cost in repeated experiments, allowing each repetition to produce a consistently high . The overall cheapest inversion was completed using the DNN trained on 16,000 samples using the EEM, reducing the total cost, relative to the Vanilla pCN MCMC, with 75%. Notice that these results are extremely conservative in the sense that the entire cost of evaluating every DNN training sample and training the DNN in serial on a CPU was factored into the cost of every repetition, even though the same DNN was used for all the repititions within each sampling strategy. The precomputation cost can be dramatically reduced by evaluating the DNN samples in parallel and utilising high-performance hardware, such as GPUs, for training the DNN.

4.2 Example 2: 3D Rectangular Cuboid

4.2.1 Model Setup

This example was conducted on a rectangular cuboid domain meshed using an unstructured tetrahedral grid with 10,416 degrees of freedom (Figure 10). Dirichlet boundary conditions of and were imposed at and , respectively. No-flow Neumann conditions were imposed on all remaining boundaries.

(a) Log-Conductivity of ground truth.
(b) Hydraulic head of ground truth and location of sampling points.
Figure 10: “True” conductivity field, its corresponding solution and sampling points.

We drew sampling well locations randomly using the Maximin Latin Hypercube Design [45], and samples of hydralic head were extracted at each well at datums , measured from the bottom of the domain, resulting in datapoints in total (Figure 9(b)). The covariance lengths scales for ARD squared exponential covariance kernel were set to , resulting in a highly anisotropic random process with high variation in the direction to simulate geological stratification, some variation in the direction and little variation in the direction (Figure 9(a)). Like in the first model, the random process was truncated at 64 KL eigenmodes for the fine model and 32 KL eigenmodes for the coarse model, embodying and of the total signal energy, respectively.

For this example, we first converged the conductivity parameters to the Maximum a posteriori (MAP) estimate using gradient descent, since initial MCMC experiments struggled to converge to the posterior distribution for random initial parameter sets.

4.2.2 Deep Neural Network Design, Training and Evaluation

Training a DNN to accurately emulate the model response for this setup was challenging, and we found no single combination of neural network layers and activation functions that would predict the head at every datapoint with sufficient accuracy. We hypothesise that this limitation could be caused by a strong ill-posedness of the DNN – for a single neural network, the output dimension greatly exceeded the input dimension, .

Figure 11: Layout of the multi-DNN design. Each DNN outputs a vector vector of head predictions at datum .

When we instead predicted the heads at each datapoint datum using a separate DNN, we found that we could utilise largely the same DNN design as had been employed in the first example. Hence, to predict the head at all datapoints, we utilised five identically designed but independent DNNs (Figure 11), each with four hidden layers and activation functions as indicated in Table 3.

Layer # Nodes Activation Functions
Input KL coefficients
1 Sigmoid
2 ReLU
3 ReLU
3 ReLU
Output wells Exponential
Table 3: Layers and activation functions in the four DNNs. Each DNN takes all KL coefficients as input and predicts the head at wells for a given datum.

Each DNN was trained and tested on a dataset of samples with KL coefficients drawn from a Latin Hypercube [44] in the interval and transformed to a normal distribution centered on the MAP estimate of the parameters , i.e. . This was done to increase the density of samples and thus improve the DNN prediction at and around the MAP point, which ideally equals the mode of the posterior distribution. The DNNs were then trained with for 200 epochs using a batch size of 50, the MSE loss function and the rmsprop optimiser [39]. Figure 12 shows performance plots of each DNN for both the training (top and the testing (bottom) datasets. While every DNN is clearly moderately biased by the training data, they all performed adequately with respect to the testing data.

Figure 12: Performance of each DNN with respect to the training dataset (top) and the testing dataset (bottom).

4.2.3 Uncertainty Quantification

Similarly to the first example, we chose a multivariate standard normal distribution as the prior distribution of parameters, and set the error covariance to . In this example, we utilised the Adaptive Metropolis (AM) transition kernel for generating proposals. We completed independent simulations, each initialised from a random initial point close to the MAP point , with a burnin of 1000 and a final sample size of . The subchains were run with an acceptance delay of , since longer subchains tended to diverge, leading to sub-optimal acceptance rates on the fine level. The simulations had a mean acceptance rate of 0.26, a mean effective sample size () of 55.2 and a mean autocorrelation length of 181.0. The samples of each independent simulation were pruned according to the respective autocorrelation length, and the remaining samples were pooled together to yield 443 statistically independent samples that were then analysed further.

Figure 13: One and two-dimensional posterior marginal distributions (diagonal and lower triangle) and scatterplots (upper triangle) of posterior samples pruned according to the autocorrelation length of each chain for the largest 5 KL eigenmodes. Please note that the axis scales of are not equal.

Figure 13 shows the marginal distributions of the six coarsest KL coefficients along with a scatterplot matrix of all the samples remaining after pruning. All the marginal distributions are approximately Gaussian, and the two-parameter marginal distributions are mostly elliptical. It is evident that some of these parameters are correlated, namely parameters and . It is worth mentioning that in every independent simulation, the AM proposal kernel managed to capture these correlations.

Moreover, we analysed the hydraulic head as a function of datum along a line in the centre of the domain . Figure 14 shows of the ground truth, MAP point , the mean of the independent simulations, and all the samples remaining after pruning. We observe that both the MAP point and the sample mean are fairly close to the ground truth, albeit exhibiting higher smoothness, particularly between the observation depths, where the head is essentially allowed to vary freely. It is also clear that the individual samples encapsulate the ground truth, indicating that the ground truth is indeed contained by posterior distribution.

Figure 14: Hydraulic head as a function of datum at . The solid red line shows the hydraulic head of the ground truth, the dashed orange line shows the head of the Maximum a posteriori (MAP) point , the dotted yellow line shows the mean head of the independent simulations () and the thin black lines show the head of 538 statistically independent samples, remaining after pruning according to the autocorrelation length of each chain, . The vertical dotted lines show the observation depths.

5 Discussion

In this paper, we have demonstrated the use of a novel Markov Chain Monte Carlo methodology which employs a delayed acceptance (DA) model hierarchy with a deep neural network (DNN) as an approximate model and a FEM solver as a fine model, and generates proposals using the pCN and AM transition kernels. Results from the first example clearly indicate that the use of a carefully designed DNN as a model approximation can significantly reduce the cost of uncertainty quantification, even for DNNs trained on relatively small sample sizes. We have established that offsetting fine model evaluations in the DA algorithm reduces the autocorrelation of the fine chain, resulting in a higher effective sample size which, in turn, improves the statistical validity of the results. In this context, the performance of the DNN is a critical driver when determining a feasible offset length to avoid divergence of the coarse chain. Hence, if a high effective sample size is required, it may be desirable to invest in a well-performing DNN. Moreover, we have shown that an enhanced error model, which introduces an iteratively-constructed bias distribution in the coarse chain likelihood function, further increases the effective sample size and decreases the variance of the cost in repeated experiments. Finally, we observed that for the second example, even when employing a relatively well-performing model approximation, we had to constrain the offset length of the subchains rather strongly to achieve optimal acceptance rates. This can be attributed in part to an apparent non-spherical and correlated posterior distribution, causing the employed proposal kernels to struggle to discover areas of high posterior probability.

We have demonstrated that relatively simple inverse hydrogeological problems can be solved in reasonable time on a commonly available personal computer with no GPU-acceleration. This opens the opportunity to apply robust uncertainty quantification during fieldwork and as a decision-support tool for groundwater surveying campaigns. We have also demonstrated the applicability of our approach on a larger scale three-dimensional problem, utilising a GPU-accelerated high-performance computer (HPC). Aside from the benefit of using a HPC computer for accelerating the fine model evaluations, utilising the GPU allowed for rapidly training and testing multiple different DNN designs to efficiently establish a well performing model approximation. There are other obvious ways to further encrease the efficiency of the proposed methodology. For example, construction of the DNNs used as coarse models comes with the cost of evaluating multiple models from the prior distribution, and, unlike the MCMC sampler, the prior models are independent and these fine model evaluations can thus be massively parallelised.

Our methodology was demonstrated in the context of two relatively simple groundwater flow problems with log-Gaussian transmissivity fields parameterised by Karhunen-Loève decompositions. While this model provides a convenient computational structure for our purposes, it may not reflect the full scale transmissivity of real-world aquifers, particularly in the presence of geological faults and other heterogeneities, as discussed in [24]. Future research could address this problem through geological layer stratification using the universal cokriging interpolation method suggested in [46], potentially utilising the open-source geological modelling tool GemPy [47], which allows for simple parametric representation of geological strata. Spatially heterogeneous parameters within each strata could then be modelled hierarchically using a low order log-Gaussian random field to account for within-stratum variation, as demonstrated in [12].

6 Acknowledgements

This work was funded as part of the Water Informatics Science and Engineering Centre for Doctoral Training (WISE CDT) under a grant from the Engineering and Physical Sciences Research Council (EPSRC), grant number EP/L016214/1. TD was funded by a Turing AI Fellowship (2TAFFP\100007). DM acknowledges support from the EPSRC Platform Grant PRISM (EP/R029423/1). The authors have no competing interests. Data supporting the findings in this study are under preparation and will be made available in the Open Research Exeter (ORE, https://ore.exeter.ac.uk/repository/) data repository.

References

  • [1] M. P. Anderson, W. W. Woessner, R. J. Hunt, Applied groundwater modeling: simulation of flow and advective transport, second edition Edition, Academic Press, London ; San Diego, CA, 2015, oCLC: ocn921253555.
  • [2] A. D. Woodbury, T. J. Ulrych, A full-Bayesian approach to the groundwater inverse problem for steady state flow, Water Resources Research 36 (8) (2000) 2081–2093. doi:10.1029/2000WR900086.
    URL http://doi.wiley.com/10.1029/2000WR900086
  • [3] G. Mariethoz, P. Renard, J. Caers, Bayesian inverse problem and optimization with iterative spatial resampling: ITERATIVE SPATIAL RESAMPLING, Water Resources Research 46 (11) (Nov. 2010). doi:10.1029/2010WR009274.
    URL http://doi.wiley.com/10.1029/2010WR009274
  • [4] M. de la Varga, J. F. Wellmann, Structural geologic modeling as an inference problem: A Bayesian perspective, Interpretation 4 (3) (2016) SM1–SM16. doi:10.1190/INT-2015-0188.1.
    URL http://library.seg.org/doi/10.1190/INT-2015-0188.1
  • [5] C. P. Robert, G. Casella, Monte Carlo statistical methods, 2nd Edition, Springer texts in statistics, Springer, New York, NY, 2010, oCLC: 837651914.
  • [6]

    D. Higdon, H. Lee, C. Holloman, Markov chain Monte Carlo-based approaches for inference in computationally intensive inverse problems, in: Bayesian Statistics, Vol. 7, Oxford University Press., 2003, pp. 181–197.

  • [7] T. J. Dodwell, C. Ketelsen, R. Scheichl, A. L. Teckentrup, A Hierarchical Multilevel Markov Chain Monte Carlo Algorithm with Applications to Uncertainty Quantification in Subsurface Flow, SIAM/ASA Journal on Uncertainty Quantification 3 (1) (2015) 1075–1108. doi:10.1137/130915005.
    URL http://epubs.siam.org/doi/10.1137/130915005
  • [8] G. Detommaso, T. Dodwell, R. Scheichl, Continuous Level Monte Carlo and Sample-Adaptive Model Hierarchies, arXiv:1802.07539 [math]ArXiv: 1802.07539 (Feb. 2018).
    URL http://arxiv.org/abs/1802.07539
  • [9] J. Doherty, Calibration and uncertainty analysis for complex environmental models, 2015, oCLC: 991568728.
  • [10] B. Peherstorfer, K. Wilcox, M. Gunzburger, Survey of multifidelity methods in uncertainty propagation, inference, and optimization, SIAM Review 60 (3) (2018) 550–591.
  • [11] Y. Efendiev, A. Datta‐Gupta, V. Ginting, X. Ma, B. Mallick, An efficient two-stage Markov chain Monte Carlo method for dynamic data integration, Water Resources Research 41 (12) (2005). doi:10.1029/2004WR003764.
    URL https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2004WR003764
  • [12] A. Mondal, Y. Efendiev, B. Mallick, A. Datta-Gupta, Bayesian uncertainty quantification for flows in heterogeneous porous media using reversible jump Markov chain Monte Carlo methods, Advances in Water Resources 33 (3) (2010) 241–256. doi:10.1016/j.advwatres.2009.10.010.
    URL https://linkinghub.elsevier.com/retrieve/pii/S0309170809001729
  • [13] P. Dostert, Y. Efendiev, B. Mohanty, Efficient uncertainty quantification techniques in inverse problems for Richards’ equation using coarse-scale simulation models, Advances in Water Resources 32 (3) (2009) 329–339. doi:10.1016/j.advwatres.2008.11.009.
    URL https://linkinghub.elsevier.com/retrieve/pii/S0309170808002121
  • [14] E. Laloy, B. Rogiers, J. A. Vrugt, D. Mallants, D. Jacques, Efficient posterior exploration of a high-dimensional groundwater model from two-stage Markov chain Monte Carlo simulation and polynomial chaos expansion: Speeding up MCMC Simulation of a Groundwater Model, Water Resources Research 49 (5) (2013) 2664–2682. doi:10.1002/wrcr.20226.
    URL http://doi.wiley.com/10.1002/wrcr.20226
  • [15] J. A. Christen, C. Fox, Markov chain Monte Carlo Using an Approximation, Journal of Computational and Graphical Statistics 14 (4) (2005) 795–810. doi:10.1198/106186005X76983.
    URL http://www.tandfonline.com/doi/abs/10.1198/106186005X76983
  • [16] S. L. Cotter, G. O. Roberts, A. M. Stuart, D. White, MCMC Methods for Functions: Modifying Old Algorithms to Make Them Faster, Statistical Science 28 (3) (2013) 424–446, arXiv: 1202.0709. doi:10.1214/13-STS421.
    URL http://arxiv.org/abs/1202.0709
  • [17] H. Haario, E. Saksman, J. Tamminen, An Adaptive Metropolis Algorithm, Bernoulli 7 (2) (2001) 223. doi:10.2307/3318737.
    URL https://www.jstor.org/stable/3318737?origin=crossref
  • [18] T. Cui, C. Fox, M. J. O’Sullivan, A posteriori stochastic correction of reduced models in delayed acceptance MCMC, with application to multiphase subsurface inverse problems, arXiv:1809.03176 [stat]ArXiv: 1809.03176 (Sep. 2018).
    URL http://arxiv.org/abs/1809.03176
  • [19] H.-J. G. Diersch, FEFLOW: Finite Element Modeling of Flow, Mass and Heat Transport in Porous and Fractured Media, Springer Berlin Heidelberg, Berlin, Heidelberg, 2014. doi:10.1007/978-3-642-38739-5.
    URL http://link.springer.com/10.1007/978-3-642-38739-5
  • [20] H. P. Langtangen, A. Logg, Solving PDEs in Python – The FEniCS Tutorial Volume I (2017).
  • [21] A. W. Harbaugh, MODFLOW-2005 : the U.S. Geological Survey modular ground-water model–the ground-water flow process, Report (2005). doi:10.3133/tm6A16.
    URL http://pubs.er.usgs.gov/publication/tm6A16
  • [22] R. A. Freeze, A stochastic-conceptual analysis of one-dimensional groundwater flow in nonuniform homogeneous media, Water Resources Research 11 (5) (1975) 725–741. doi:10.1029/WR011i005p00725.
    URL http://doi.wiley.com/10.1029/WR011i005p00725
  • [23] N.-O. Kitterrød, L. Gottschalk, Simulation of normal distributed smooth fields by Karhunen-Loéve expansion in combination with kriging, Stochastic Hydrology and Hydraulics 11 (6) (1997) 459–482. doi:10.1007/BF02428429.
    URL http://link.springer.com/10.1007/BF02428429
  • [24] J. Gómez-Hernández, X.-H. Wen, To be or not to be multi-Gaussian? A reflection on stochastic hydrogeology, Advances in Water Resources 21 (1) (1998) 47–61. doi:10.1016/S0309-1708(96)00031-0.
    URL http://linkinghub.elsevier.com/retrieve/pii/S0309170896000310
  • [25] J. Kerrou, P. Renard, H.-J. Hendricks Franssen, I. Lunati, Issues in characterizing heterogeneity and connectivity in non-multiGaussian media, Advances in Water Resources 31 (1) (2008) 147–159. doi:10.1016/j.advwatres.2007.07.002.
    URL https://linkinghub.elsevier.com/retrieve/pii/S0309170807001236
  • [26] D. Russo, M. Bouton, Statistical analysis of spatial variability in unsaturated flow parameters, Water Resources Research 28 (7) (1992) 1911–1925. doi:10.1029/92WR00669.
    URL http://doi.wiley.com/10.1029/92WR00669
  • [27] R. J. Hoeksema, P. K. Kitanidis, Analysis of the Spatial Structure of Properties of Selected Aquifers, Water Resources Research 21 (4) (1985) 563–572. doi:10.1029/WR021i004p00563.
    URL http://doi.wiley.com/10.1029/WR021i004p00563
  • [28] P. Dostert, Y. Efendiev, T. Hou, W. Luo, Coarse-gradient Langevin algorithms for dynamic data integration and uncertainty quantification, Journal of Computational Physics 217 (1) (2006) 123–142. doi:10.1016/j.jcp.2006.03.012.
    URL https://linkinghub.elsevier.com/retrieve/pii/S0021999106001380
  • [29] Y. M. Marzouk, H. N. Najm, Dimensionality reduction and polynomial chaos acceleration of Bayesian inference in inverse problems, Journal of Computational Physics 228 (6) (2009) 1862–1902. doi:10.1016/j.jcp.2008.11.024.
    URL https://linkinghub.elsevier.com/retrieve/pii/S0021999108006062
  • [30] C. Scarth, S. Adhikari, P. H. Cabral, G. H. Silva, A. P. d. Prado, Random field simulation over curved surfaces: Applications to computational structural mechanics, Computer Methods in Applied Mechanics and Engineering 345 (2019) 283–301. doi:10.1016/j.cma.2018.10.026.
    URL https://linkinghub.elsevier.com/retrieve/pii/S0045782518305309
  • [31] A. Gelman (Ed.), Bayesian data analysis, 2nd Edition, Texts in statistical science, Chapman & Hall/CRC, Boca Raton, Fla, 2004.
  • [32] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, E. Teller, Equation of State Calculations by Fast Computing Machines, The Journal of Chemical Physics 21 (6) (1953) 1087–1092. doi:10.1063/1.1699114.
    URL http://aip.scitation.org/doi/10.1063/1.1699114
  • [33] W. K. Hastings, Monte Carlo sampling methods using Markov chains and their applications, Biometrika (1970) 13.
  • [34] L. Katafygiotis, K. Zuev, Geometric insight into the challenges of solving high-dimensional reliability problems, Probabilistic Engineering Mechanics 23 (2-3) (2008) 208–218. doi:10.1016/j.probengmech.2007.12.026.
    URL https://linkinghub.elsevier.com/retrieve/pii/S0266892007000707
  • [35] G. O. Roberts, J. S. Rosenthal, Examples of Adaptive MCMC, Journal of Computational and Graphical Statistics 18 (2) (2009) 349–367. doi:10.1198/jcgs.2009.06134.
    URL http://www.tandfonline.com/doi/abs/10.1198/jcgs.2009.06134
  • [36] F. Chollet, et al., Keras, https://github.com/fchollet/keras (2015).
  • [37] Theano Development Team, Theano: A Python framework for fast computation of mathematical expressions, arXiv e-prints abs/1605.02688 (May 2016).
    URL http://arxiv.org/abs/1605.02688
  • [38] T. Hastie, R. Tibshirani, J. H. Friedman, The elements of statistical learning: data mining, inference, and prediction, 2nd Edition, Springer series in statistics, Springer, New York, NY, 2009.
  • [39] G. Hinton, N. Srivastava, K. Swersky, Neural networks for machine learning, lecture 6a: Overview of mini-batch gradient descent, Coursera, University of Toronto (2012).
  • [40] U. Wolff, Monte Carlo errors with less errors, Computer Physics Communications 176 (5) (2007) 383, arXiv: hep-lat/0306017. doi:10.1016/j.cpc.2006.12.001.
    URL http://arxiv.org/abs/hep-lat/0306017
  • [41] J. Kaipio, E. Somersalo, Statistical inverse problems: Discretization, model reduction and inverse crimes, Journal of Computational and Applied Mathematics 198 (2) (2007) 493–504. doi:10.1016/j.cam.2005.09.027.
    URL https://linkinghub.elsevier.com/retrieve/pii/S0377042705007296
  • [42] T. Cui, C. Fox, M. J. O’Sullivan, Bayesian calibration of a large-scale geothermal reservoir model by a new adaptive delayed acceptance Metropolis Hastings algorithm: ADAPTIVE DELAYED ACCEPTANCE METROPOLIS-HASTINGS ALGORITHM, Water Resources Research 47 (10) (Oct. 2011). doi:10.1029/2010WR010352.
    URL http://doi.wiley.com/10.1029/2010WR010352
  • [43] J. Brynjarsdóttir, A. O’Hagan, Learning about physical parameters: the importance of model discrepancy, Inverse Problems 30 (11) (2014) 114007. doi:10.1088/0266-5611/30/11/114007.
    URL http://stacks.iop.org/0266-5611/30/i=11/a=114007?key=crossref.7b886360dda7b385609c577ad82450aa
  • [44] M. D. McKay, R. J. Beckman, W. J. Conover, A comparison of three methods for selecting values of input variables in the analysis of output from a computer code, Technometrics 21 (2) (1979) 239–245.
    URL http://www.jstor.org/stable/1268522
  • [45] M. D. Morris, T. J. Mitchell, Exploratory designs for computational experiments, Journal of Statistical Planning and Inference 43 (3) (1995) 381–402. doi:10.1016/0378-3758(94)00035-T.
    URL https://linkinghub.elsevier.com/retrieve/pii/037837589400035T
  • [46] C. Lajaunie, G. Courrioux, L. Manuel, Foliation fields and 3d cartography in geology: Principles of a method based on potential interpolation, Mathematical Geology 29 (4) (1997) 571–584. doi:10.1007/BF02775087.
    URL http://link.springer.com/10.1007/BF02775087
  • [47] M. de la Varga, A. Schaaf, F. Wellmann, GemPy 1.0: open-source stochastic geological modeling and inversion, Geoscientific Model Development 12 (1) (2019) 1–32. doi:10.5194/gmd-12-1-2019.
    URL https://www.geosci-model-dev.net/12/1/2019/

Appendix A Preconditioned Crank-Nicolson

The preconditioned Crank-Nicolson (pCN) proposal was developed in [16] and is based on the following Stochasic Partial Differential Equation (SPDE):

where is the precision operator for the prior distribution , is brownian motion with covariance operator , and is a positive operator. This equation can be discretised using the Crank Nicolson approach to yield

for white noise

and a weight . If we choose , we get the plain Crank Nicolson (CN) proposal:

where . If we instead choose , we get the pCN proposal:

This is rewritten, conforming to our previous notation: