# Reduced-order modeling for nonlinear Bayesian statistical inverse problems

Bayesian statistical inverse problems are often solved with Markov chain Monte Carlo (MCMC)-type schemes. When the problems are governed by large-scale discrete nonlinear partial differential equations (PDEs), they are computationally challenging because one would then need to solve the forward problem at every sample point. In this paper, the use of the discrete empirical interpolation method (DEIM) is considered for the forward solves within an MCMC routine. A preconditioning strategy for the DEIM model is also proposed to accelerate the forward solves. The preconditioned DEIM model is applied to a finite difference discretization of a nonlinear PDE in the MCMC model. Numerical experiments show that this approach yields accurate forward results. Moreover, the computational cost of solving the associated statistical inverse problem is reduced by more than 70

There are no comments yet.

## Authors

• 6 publications
• 3 publications
• ### Novel Deep neural networks for solving Bayesian statistical inverse

We consider the simulation of Bayesian statistical inverse problems gove...
02/08/2021 ∙ by Harbir Antil, et al. ∙ 0

• ### Probabilistic Numerical Methods for PDE-constrained Bayesian Inverse Problems

This paper develops meshless methods for probabilistically describing di...
01/15/2017 ∙ by Jon Cockayne, et al. ∙ 0

• ### Spline-Based Bayesian Emulators for Large Scale Spatial Inverse Problems

A Bayesian approach to nonlinear inverse problems is considered where th...
05/08/2021 ∙ by Anirban Mondal, et al. ∙ 0

• ### Optimization-Based MCMC Methods for Nonlinear Hierarchical Statistical Inverse Problems

In many hierarchical inverse problems, not only do we want to estimate h...
02/15/2020 ∙ by Johnathan Bardsley, et al. ∙ 0

• ### hIPPYlib: An Extensible Software Framework for Large-Scale Inverse Problems Governed by PDEs; Part I: Deterministic Inversion and Linearized Bayesian Inference

We present an extensible software framework, hIPPYlib, for solution of l...
09/09/2019 ∙ by Umberto Villa, et al. ∙ 0

• ### Markov Chain Monte Carlo with Neural Network Surrogates: Application to Contaminant Source Identification

Subsurface remediation often involves reconstruction of contaminant rele...
03/01/2020 ∙ by Zitong Zhou, et al. ∙ 0

• ### Consensus ADMM for Inverse Problems Governed by Multiple PDE Models

The Alternating Direction Method of Multipliers (ADMM) provides a natura...
04/28/2021 ∙ by Luke Lozenski, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In the last few decades, computational science and engineering has seen a dramatic increase in the need for simulations of large-scale inverse problems [3, 6, 7, 17, 22]. When these problems are governed by partial differential equations (PDEs), the ultimate goal is to recover quantities from limited and noisy observations. One of the major challenges posed by such problems is that they are computationally expensive and, as result, entail vast storage requirements. Moreover, they are often ill-posed – a feature which can induce uncertainty in the recovered quantities [17]

. Generally speaking, solving an inverse problem could be accomplished using either a deterministic technique or a Bayesian statistical method. The deterministic approach essentially leads to an optimization problem in which the objective function is minimized to obtain a point estimate for the sought parameter. However, this method does not take into consideration possible uncertainties in the solution of the inverse problem. Instead of computing a single solution to the inverse problem, the Bayesian inference framework, on the other hand, seeks a probability distribution of a set of plausible solutions. More precisely, the Bayesian statistical approach is a systematic way of modeling the observed data and parameters of interest as random variables to enable the quantification of the uncertainties in the reconstructed quantities

[22]. The outcome is the so-called posterior distribution as the solution of the inverse problem, from which one can determine the mean, covariance, and other statistics of the unknown parameters.

The appealing features of the Bayesian statistical method notwithstanding, the resulting posterior distribution quite often does not admit analytic expression. Consequently, efficient numerical methods are required to approximate the solution to the inverse problem. In computational practice, the Markov chain Monte Carlo (MCMC) sampling techniques are probably the commonest methods used to explore the posterior distribution [3, 6, 7, 22, 25, 37]. For large-scale inverse problems involving discretizations of nonlinear PDEs, however, the MCMC schemes can be computationally intractable because they require the solution of high-dimensional discrete nonlinear PDEs at every sample point. This is the main focus of this work; here, we employ state-of-the-art nonlinear model order reduction techniques within MCMC algorithms to reduce the computational complexity of the statistical inverse problem. More specifically, following [4, 9, 15, 18] and the references therein, we apply the POD-Galerkin method and a preconditioned discrete empirical interpolation method (DEIM) to accelerate the forward solves in our MCMC routine.

The outline of the paper is as follows. A general formulation of the discrete nonlinear PDE model, as well as the associated statistical inverse problem we would like to solve, is presented in Section 2. The MCMC scheme for sampling the posterior density of the unknown parameters in the inverse problem is presented in Section 3. Section 4 discusses the reduced-order models which will be employed to facilitate the forward solves within the MCMC algorithm. Finally, Section 5 reports our preconditioning strategy for the forward solves, as well as the numerical experiments.

## 2 Problem formulation

In this section, we present our nonlinear statistical inverse problem of interest. To begin with, we present a general formulation of the governing discrete nonlinear PDE model in Section 2.1. Next, we proceed to discuss in Section 2.2 the associated Bayesian statistical inverse problem.

### 2.1 Parameterized nonlinear full model

We consider here a system of parameterized nonlinear equations that results from spatial discretization of a nonlinear PDE using, for instance, a finite difference or finite element method. More precisely, we are interested in the following forward model

 (1) G(u;ξ):=Au+F(u;ξ)+B=0,

where the vectors

and denote the state and the parameters in parameter domain respectively. Moreover, the matrix represents the linear operators in the governing PDE, and a source term. The function corresponds to all the nonlinear terms in the PDE. The quantity of interest associated with the forward model (1) is expressed as

 (2) y(ξ)=Cu(ξ),

where is also a constant matrix.

In what follows, our ultimate goal is the case where the state vector is high-dimensional, and the numerical solution of the nonlinear system (1) is needed at many parameter values.

### 2.2 Bayesian statistical inverse problem

In what follows, we present the Bayesian formulation and solution method for our statistical inverse model problem. We refer to [7, 22] for excellent introductions to statistical inverse problems. Note that the goal of the inverse problem associated with the forward problem (1) is essentially to estimate a parameter vector given some observed data To achieve this goal, the Bayesian framework assumes that both the parameters of interest and the observed data

are random variables. Moreover, the solution to statistical inverse problem is a posterior probability density,

which encodes the uncertainty from the set of observed data and the sought parameter vector. Mathematically, is described by two density functions: the prior and the likelihood function

. More precisely, the prior probability density

describes the information which we would like to enforce on the parameters before considering the observed data, while the likelihood function is a conditional probability that models the relationship between the observed data and set of unknown parameters .

From Bayes’ theorem, the posterior probability density is given by

 (3) π(ξ|yobs)=π(ξ)π(yobs|ξ)π(yobs)∝π(ξ)π(yobs|ξ),

where the quantity is a normalization constant. We follow [7, 17] to derive a computable expression for the posterior probability density . To this end, we assume that the observed data are of the form

 (4) yobs=y(ξ)+ε,

where is the vector of unknown parameters that we wish to estimate, is the so-called nonlinear parameter-to-observable map, and is an -dimensional Gaussian error with zero mean and covariance , with That is, . Under the assumption of statistical independence between the noise and the unknown parameters it follows that the likelihood function so that (see, e.g., [7, p. 42])

 (5) π(yobs|ξ) ∝ exp[−12σ2(yobs−y(ξ))T(yobs−y(ξ))].

Similar to the likelihood function, the prior for can also be modeled as Gaussian [17]. In this work, however, we assume, for simplicity, that

is uniformly distributed over

. Consequently, the posterior probability density in (3) is expressed as

 (6) π(ξ|yobs) ∝{exp[−12σ2(yobs−y(ξ))T(yobs−y(ξ))]if ξ∈D,0otherwise.

Observe from the expression (6) that due to the nonlinearity of it is quite difficult, if not impossible, to sample directly from in (6). It is a common computational practice to use Markov chain Monte Carlo algorithms to tackle this problem [6, 25].

## 3 Markov chain Monte Carlo techniques

In this ssection, we discuss the evaluation of the posterior probability density as defined in (6). First, note that the evaluation of the posterior probability density could be accomplished via quadrature techniques or Monte Carlo integration techniques. However, these methods can be quite computationally intensive when one is dealing with high-dimensional problems [37]. An alternative method, which we employ in this work, is the Markov chain Monte Carlo (MCMC) technique [3, 6, 7, 37]. Unlike the quadrature and the Monte Carlo integration techniques, the MCMC method uses the attributes of the density to specify parameter values that adequately explore the geometry of the distribution. This is achieved by constructing Markov chains whose stationary distribution is the posterior density.

There are different ways to implement MCMC; the prominent among them include Metropolis-Hastings (MH) [37], Gibbs [3] and Hamiltonian [6] algorithms. The MH algorithm can be conceptualized as a random walk through the parameter space that starts at an arbitrary point and where the next step depends solely on the current position (this is the Markov property of the MCMC). Moreover, each random sample is generated from a proposal distribution (a term that will be made precise in the next section) which is subject to very mild restrictions. The Gibbs algorithm can be thought of as a special case of the MH algorithm in which the proposal distribution is determined by both the component parameter selected and its current location in the parameter space. Similar to the Gibbs algorithm, the Hamiltonian algorithm employs a dynamical proposal distribution. However, the Hamiltonian algorithms do not rely on being able to sample from the marginal posterior distributions of the unknown parameters. Because of its relative ease in implementation, in this study we focus on the MH algorithm which we introduce next.

To numerically sample from the posterior distribution in (6), the Metropolis-Hastings MCMC algorithm proceeds as follows [7, 37]. Let be a fixed parameter sample. Then,

1. Generate a proposed sample from a proposal density , and compute .

2. Set with probability

 (7) α(ξ∗|ξ)=min[1,π(ξ∗|yobs)q(ξ∗|ξ)π(ξ|yobs)q(ξ|ξ∗)].

Else, set .

Note that, at each step of the MH algorithm, the computation of in (7) requires solving the discrete nonlinear system defined by (1) and (2) to obtain This happens especially when the proposal falls inside the parameter domain . Else, , and is immediately discarded. In large-scale settings, the repeated solution of the underlying discrete nonlinear PDE can be quite computationally demanding. In the next section, we address this most expensive part of the MH scheme by deriving an efficient reduced-order model that will be used for the forward solve.

Now, denote by the samples generated by the MH algorithm. These parameter samples constitute a Markov chain. Under mild regularity conditions, the Markov chain can be shown to converge in distribution to the posterior density [34]. For a practical implementation of the accept/reject rule in the MH algorithm, one typically computes a random draw and then sets if and otherwise sets . For large-scale inverse problems, however, the computation of is numerically unstable [3]. To circumvent this issue, one can in this case set if

 lnθ

or, otherwise, set

Next, note that the choice of the proposal density is an extremely important part of the MH algorithm. We list here some classes of proposal densities that are common in the literature. One class of proposals is known as the independence proposals; they satisfy . An example of an independence proposal for MH is called the randomize-then-optimize (RTO) proposal which was introduced in [3]. The paper specifically discusses RTO in the context of nonlinear problems and shows that it can be efficient compared to other proposals. The approach is based on obtaining candidate samples by repeatedly optimizing a randomly perturbed cost function. Although the RTO yields relatively high acceptance rates for certain problems such as those considered in [3], it is quite involved and very computationally expensive.

Another class of proposals consists of the symmetric proposals; that is,

 q(ξ∗|ξi)=q(ξi|ξ∗),i=1,2,…,M.

The consequence of this choice of is that the acceptance probability becomes

 (8) α(ξ∗|ξi)=min[1,π(ξ∗|yobs)π(ξi|yobs)],i=1,2,…,M.

A typical example is the so-called preconditioned Crank-Nicolson (pCN) proposal density as discussed in [10]. The main challenge with the pCN is that the proposal density typically needs to be tuned by the user to achieve an efficient MCMC method. This requirement limits the usage of pCN in computational practice.

Another example of symmetric proposal density, which we use in this work, is an adaptive Gaussian proposal introduced in [20]. The proposal is defined recursively by

 (9) q(ξ∗|ξi) ∝ exp(−12(ξ∗−ξi−1)TΓi−1(ξ∗−ξi−1)),

where and

 (10) Γi=cov({ξj}i−1j=0)=1ii−1∑j=0(ξj−~ξ)T(ξj−~ξ)+ϵI,

with and a small positive number, say, . In the resulting adaptive Metropolis-Hastings (AMH) algorithm, we use lognormal proposal density, in which case the covariance (10) is computed directly from the log of the samples. Besides being symmetric, this proposal density enjoys relative ease in implementation. Hence, in our numerical experiments, we implement AMH using the proposal density (9), together with the acceptance probability (8).

## 4 Reduced-order modeling

Reduced-order modeling involves deriving a relatively cheap and low-dimensional model from a typically high-fidelity and computationally costly model in such a way as to achieve minimal loss of accuracy. The aim here is to reduce the cost of solving parameterized discrete PDEs at many parameter values. In reduced-order modeling, the computation of the discrete problem (1) proceeds in two steps: an offline step and an online step [4, 8, 14, 15]. In this offline-online paradigm, the offline step constructs the low-dimensional approximation to the solution space, and the online step uses this approximation – the reduced basis – for the solution of a smaller reduced problem. The resulting reduced problem provides an accurate estimate of the solution of the original problem. This section presents two reduced-order modeling techniques, POD-Galerkin and POD-DEIM-Galerkin approximations, for the general nonlinear system in (1

). The key to the success of the reduced-order modeling techniques is for the cost of simulation (that is, online operation count) to be independent of the number of full-order degrees of freedom and the number of evaluations of the nonlinear term

in the full-order system.

### 4.1 Constructing reduced-order model

There are several techniques for constructing reduced-order models [4, 8, 14, 15]. Perhaps the commonest of these methods are the projection-based techniques. They rely on Galerkin projection to construct a reduced-order system that approximates the original system from a subspace spanned by a reduced basis of dimension More precisely, assume that the columns of are the basis vectors and the solution of the full model (1) is adequately approximated by these vectors, so that

 (11) u=Qur,

where is the vector of reduced states. Then, using (11) and (1) and applying Galerkin projection, one obtains the reduced model

 (12) Gr(ur;ξ):=Arur+QTF(ur;ξ)+Br=0,

where and Similarly, from (2), we get

 (13) yr(ur(ξ))=Crur(ξ),

where and In this work, we assume that the quantity of interest is the state vector directly, so that and

The offline step is aimed at constructing the reduced basis , and it is the most expensive part of the computation. Furthermore, the quality of the approximation is significantly affected by the reduced basis . There are several techniques for constructing the reduced basis; each of theses techniques is based on the fact that the solution lies in a low-dimensional space, see e.g., [8, 19, 21, 32]. We use the proper orthogonal decomposition (POD) method to construct the reduced basis.

Given a set of snapshots of solutions obtained from the parameter space, the POD-Galerkin method constructs an optimal orthogonal (reduced) basis in the sense that it minimizes the approximation error with respect to the snapshots [33]. More specifically, the POD method takes a set of snapshots of the solution

and computes the singular value decomposition (SVD)

 (14) SU=¯VΣWT,

where and are orthogonal and is a diagonal matrix with the singular values sorted in order of decreasing magnitude. The reduced basis is defined as with . This produces an orthogonal basis which contains the important components from the snapshot matrix The major shortcoming of POD is that it often requires an expedient number of snapshots, , to construct . It is possible that the number of solutions of the full model required to find a basis with satisfactory accuracy could be quite large.

Despite the fact that the projection of the nonlinear operator is of dimension , it must be assembled at each step of a nonlinear iteration since it depends on the solution. For a nonlinear solution method such as the Newton method, each nonlinear iteration also requires the construction of the Jacobian matrix associated with and multiplication by , where the costs of both operations depend on More specifically, note that for the forward model defined in equation (1), the Jacobian matrix is

 (15) JG(u;ξ)=A+JF(u;ξ).

The Jacobian matrix of the reduced model equation (12) is then

 (16) JGr(ur;ξ):=QTJG(ur;ξ)Q=Ar+QTJF(ur;ξ)Q.

This feature makes the POD-Galerkin method inefficient in tackling the nonlinear problem. This difficulty can be addressed using the discrete interpolation empirical method (DEIM), which we discuss in Section 4.2.

### 4.2 The discrete empirical interpolation method (DEIM)

DEIM approximation is a discrete counterpart of the Empirical Interpolation Method (EIM) which was introduced in [4]. It is used for approximating nonlinear parametrized functions through sparse sampling. Following [4, 8, 15, 18], we present next the POD-DEIM-Galerkin method.

Besides the reduced basis constructed from (1), in the DEIM framework, a separate basis is constructed to represent the nonlinear component of the solution. To construct this basis, one first computes the snapshots of the nonlinear term

 SF=[F(u(ξ(1))),F(u(ξ(2))),⋯,F(u(ξ(s)))],

where , and then the SVD of the snapshot matrix:

 (17) SF=~V~Σ~W,

where, as before, and are orthogonal, and is a diagonal matrix containing singular values. Then, using the first columns of one determines the DEIM basis .

Next, DEIM uses distinct interpolation points and makes use of the DEIM interpolation matrix where is the th canonical unit vector with zeros in all components except . In [9], it is shown that the interpolation points and the interpolation matrix are constructed with a greedy approach using the DEIM basis . To make the presentation self-contained, we show the construction in Algorithm 1.

Observe that the th interpolation point can be associated with the basis vector in the th column of the DEIM basis . Moreover, the DEIM interpolant of is defined by the tuple , which is selected such that the matrix is nonsingular. DEIM then approximates the nonlinear function by

 (18) ¯F(u;ξ)=V(PTV)−1PTF(u;ξ),

where samples the nonlinear function at components only. This approximation satisfies

Combining the DEIM and POD-Galerkin yields the POD-DEIM-Galerkin reduced system

 (19) Gdeim(ur;ξ):=Arur+¯Fr+Br=0,

where

 (20) ¯Fr(ur;ξ)=QT¯F(u;ξ)=QTV(PTV)−1PTF(Qur;ξ).

The POD-DEIM-Galerkin reduced model approximation of the quantity of interest is defined as in (13). Moreover, the Jacobian of is

 (21) JGdeim(ur;ξ):=Ar+J¯Fr(ur;ξ),

where

 (22) J¯Fr(ur;ξ):=QTJ¯F(ur;ξ)Q=QTV(PTV)−1PTJF(u;ξ)Q.

Solving (19) with Newton’s method only requires the evaluation of entries of the nonlinear function at indices associated with the interpolation points given by , instead of at all components. Note that, for the costs to be low, for each , should depend on values of the solution and similar requirements are needed for the Jacobian. This condition is valid for a typical (finite difference or finite element) discretization of PDEs [2]. The corresponding computational procedure of the DEIM method is split into an offline phase where the DEIM reduced system is constructed and an online phase where it is evaluated. The offline phase obviously involves high computational costs; however, these costs are incurred only once and are amortized over the online phase. In particular, the construction of the matrix highlighted in (22) is part of the offline phase.

In what follows, we refer to the combination of AMH and the DEIM model as the AMH-DEIM scheme. Similarly, we write AMH-Full scheme for the combination of AMH and the full model.

## 5 Numerical experiments

In this section, we present results on the performance of AMH-DEIM and AMH-Full schemes for a statistical inverse problem, as well as the reduced-order (POD-Galerkin and DEIM) models for a nonlinear forward problem. To this end, consider the following nonlinear diffusion-reaction problem posed in a two-dimensional spatial domain [8, 19]

 (23) −∇2u(x1,x2)+F(u(x1,x2);ξ) = 100sin(2πx1)sin(2πx2), F(u;ξ) = ξ2ξ1[exp(ξ1u)−1],

where the spatial variables and the parameters are , with a homogeneous Dirichlet boundary condition.

Equation (23) is discretized on a uniform mesh in with and grid points in each direction using centered differences resulting, respectively, in and (unless otherwise stated). We solved the resulting system of nonlinear equations with inexact Newton-GMRES method as described in [23]. Recall that, for a fixed an inexact Newton method solves for in the nonlinear problem (1) by approximating the vector in the equation for the Newton step

 (24) JG(u(i);ξ)z=(A+JF(u(i);ξ))z=−G(u(i);ξ),

with

 u(i+1)=u(i)+z,i=0,1,2,…,

such that

 (25) ||JG(u(i);ξ)z+G(u(i);ξ)||≤ηi||G(u(i);ξ)||,i=0,1,2,…,

where the parameters are the so-called forcing terms. To realize the inexact Newton condition (25), Newton iterative methods typically apply an iterative method (GMRES method in this work [35]) to the equation for the Newton step and terminate that iteration when (25) holds. We refer to this linear iteration as the inner iteration and, the nonlinear iteration as the outer iteration.

If the forcing terms in the inexact Newton method are uniformly (and strictly) less than , then the method is locally convergent [11]. For more details of local convergence theory and the role played by the forcing terms in inexact Newton methods, see e.g., [1, 13, 23].

The forcing terms are chosen in such a way as to solve the linear equation for the Newton step to just enough precision to make good progress when far from a solution, but also to obtain quadratic convergence when near a solution [23]. Following [23], in our computations we chose

 (26) ηi=min(ηmax,max(ηsafei,0.5τ/||G(u(i);ξ)||)),

where the parameter is an upper limit on the forcing term and

 (27) ηsafei =⎧⎪⎨⎪⎩ηmaxi=0,min(ηmax,ηresi)i>0,γη2i−1≤0.1,min(ηmax,max(ηresi,γη2i−1))i>0,γη2i−1>0.1,

with

 ηresi=γ||G(u(i);ξ)||2/||G(u(i−1);ξ)||2,

the constant being somewhat arbitrary, and In our experiments, we set and We use the same strategy for the DEIM reduced model with in (1) replaced by in (19).

We compare the performance of the solvers by plotting the relative nonlinear residuals from the full and DEIM models. More precisely, we plot

• the relative nonlinear residual against the number of calls of the function in the full model defined by (1).

• the relative nonlinear residual against the number of calls of the function , in the DEIM model given by (19).

In particular, we make and where we choose small enough with , such that we expect the approximation to be as good as the numerical solution In our numerical experiments, we specifically set

### 5.1 Preconditioning

Now, as we have mentioned earlier, at each Newton step in both the full and reduced models, we elect to solve the associated linear system with the GMRES method. However, Krylov solvers (including GMRES method) are generally slow and require appropriate preconditioners to reduce the associated computational cost [16]. For an arbitrary linear system of equations we recall here that a preconditioner is, loosely speaking, a matrix such that (a) the system yields a matrix

whose eigenvalues are well clustered, and (b) the matrix-vector product

is relatively easy to compute for any vector of appropriate dimension. In the context of Krylov solvers, the consequence of (a) is that the solver converges quickly [16]. In particular, to solve (24), one could precondition the linear system with an approximation to the action of (that is, an approximate Poisson solver).

In this work, we use the AGgregation-based algebraic MultiGrid iterative method (AGMG) as a preconditioner [26, 27, 28]. AGMG employs algebraic multigrid method to solve the linear system of equations . The algebraic multigrid method is accelerated either by the Conjugate Gradient method (CG) or by the Generalized Conjugate Residual method (GCR, a variant of GMRES). AGMG is implemented as a “black box”. In particular, since (the Laplacian, in our case) is symmetric and positive definite, in our numerical experiments, we employ the CG iterations option in the software111For our problem, there were CG iterations, and the CG method stopped when the Euclidean norm of the residual relative to that of the right hand side was less than ; it uses one V-cycle of AMG with symmetric Gauss-Seidel (SGS) pre- and post-smoothing sweeps to approximate the action of We apply the preconditioner to the full model (1) as follows:

 (28) u+MF(u;ξ)+MB=0.

Note here that the preconditioner in (28) is used as a Poisson solver. For semilinear boundary value problems222That is, our model is linear in the highest order derivative, which is the Laplacian., preconditioning with an exact inverse for the high-order term typically yields optimal convergence of the linear iteration in the sense that the iteration counts are independent of the mesh spacing [24].

It is a common computational practice to solve the reduced problem using direct methods [4, 8]. However, as observed in [14, 15], the size of the reduced system corresponding to a prescribed accuracy is essentially determined by the dynamics of the problem. Quite often, although the size of the reduced model is significantly smaller than that of the full model, solving the former with direct solvers can still be computationally expensive. In this case, the use of preconditioned iterative methods may be more effective to solve the reduced problem. To this end, we precondition the DEIM model (19) as follows:

 (29) ur+QTMV(PTV)−1PTF(Qur;ξ)+QTMB=0,

where the matrices and should be precomputed. The preconditioned DEIM model (29) implies that the action of is approximated with . This preconditioning strategy is inspired by the experimental results in [14], which show that all the eigenvalues of the matrix

 (QTA−1Q)Ar=QTA−1QQTAQ

are bounded below by , and the largest eigenvalues grow only slightly with spatial dimension. This finding suggests that the condition number of the preconditioned reduced matrix is only weakly dependent on the spatial mesh. Note, however, that in general, since the matrix is often tall and rectangular333An alternative way to implement the action of is by computing the decomposition of , and then using the factors of the inverse of the resulting matrix [15]; that is,

.

Next, observe that since (23) is discretized on a uniform -point centered difference mesh, each row of the discrete Laplacian has at most nonzero elements. Thus, the computational cost of matrix-vector product by required an iterative solver in the preconditioned full model (28) is On the other hand, the cost of analogous computation with in the preconditioned DEIM model (29) is Hence, if the computational complexity of the forward model (23) is significantly reduced when (29) is used in place of (28).

### 5.2 Solving the parameterized nonlinear system

To solve the Bayesian statistical inverse problem, we first discuss how we evaluated the forward model. We generated a reduced basis with and DEIM basis . In all our experiments, we set . We used grid points in which correspond to snapshots.

Additionally, we use the random sampling technique444Another technique for constructing the reduced bases for DEIM is the greedy algorithm [4, 5]. This method is significantly more time-consuming than the random sampling approach for the problem considered herein. as described in [15, 18] to construct and This approach proceeds with a random sample of parameters, and a snapshot is taken only if the reduced solution at the current sample fails some error criterion. The basis is initialized using a single snapshot. Then, for each of the parameters, the reduced problem is solved. If the error indicator of the reduced solution is below a prescribed tolerance , the computation proceeds to the next parameter. Otherwise, the full model is solved and the snapshot is used to augment the reduced basis. In our experiments, we set and, measure the quality of the reduced model using the full residual since it is an easily computable quantity that indicates how well the reduced model approximates the full solution. More specifically, we use the error indicator

 ηξ=||G(u;ξ)||2/||B||2.

We generated a data set by using (4) with the “true parameter” and a Gaussian noise vector with First, Figure 1, shows the noise-free solution of the forward problem at the “true parameter” using the full model, as well as those solutions obtained with the reduced models at . We have also computed the following relative errors:

 reldeim: = ||u−Qudeim||2||u||2=3.2603×10−6, relredb: = ||u−Quredb||2||u||2=1.9851×10−7, reldr: = ||udeim−uredb||2||uredb||2=3.2543×10−6,

where denotes the relative error with respect to the full and DEIM solutions, denotes the relative error with respect to the full and POD-Galerkin solutions, and denotes the relative error with respect to the DEIM and POD-Galerkin solutions. Observe here that the POD-Galerkin solution yields slightly more accurate solution than the DEIM solution. However, since, as we have noted earlier, the computational cost of solving the forward problem using the POD-Galerkin reduced model grows with we use the DEIM reduced model in the rest of our computations.

Next, Figure 2 depicts the performance of the preconditioned full model compared to that of the preconditioned DEIM model. In particular, the size of the full model in each of the figures is fixed, while the reduced basis dimension and DEIM basis dimension are varied with that is, Plots , and correspond to and respectively. They show the relative nonlinear residual plotted against the number of function evaluations required by all inner and nonlinear iterations to compute for the full model in (1), together with the relative nonlinear residual plotted against the number of function evaluations required by all inner and nonlinear iterations to compute for the DEIM model in (19).

The counts of function evaluations corresponding to nonlinear iterations in these plots are indicated by small circles (for the full models) and triangles, square, star (for the DEIM models). With the exception of the plot for the plots for the DEIM models (i.e., and ) coincide with the plot for all the full models for each From these plots, one can compare both the number of nonlinear iterations and the total cost. Observe from these plots, that for a fixed as the size of the DEIM model increases, the number of nonlinear iterations and function evaluations required to solve both the preconditioned full model and DEIM models remain fairly constant; that is, about to nonlinear iterations and to fe.

To test the accuracy of the model reduction techniques for the forward problem, reduced basis and DEIM basis of various sizes, and , respectively, (with ) were constructed using snapshots. To do this, we computed and plotted the relative residuals

 rels:=maxξ||G(Qur;ξ)||2/||G(u0;ξ)||2,∀ξ∈E,

where is the set of parameters. In Figure 3, are plotted against . The figure shows that after , the accuracy of the DEIM model does not improve by increasing

Table 1, in combination with Figure 3, shows that the computational time for solving the DEIM models is generally much smaller than that for solving the full model. In particular, although the cost of DEIM increases with the number of vectors similar accuracy is obtained by DEIM for at a significantly lower cost.

full

For example, in Table 1, the computational times required by the full model increased from for to for while the DEIM model required a maximum of with benign dependence on but, as expected, its costs were independent of To solve the statistical inverse problem, we set in all subsequent computations with the AMH-DEIM model in Section 5.3.

###### Remark 1

Several attempts have been made in recent years to improve the performance of standard DEIM as presented in this work; see e.g., [12, 29, 30, 31, 36] and the references therein. A major strategy in this direction is the so-called randomization technique [12, 30, 36]. Originally introduced in [12], the randomized sampling approaches for computing DEIM indices have been analyzed in [30, 36]. In particular, the paper [30] proposes overdetermined DEIM (ODEIM), which employs randomized oversampling to tackle stability issues associated with the use of standard DEIM. We implemented ODEIM and found that although it slightly improves the accuracy of the reduced model, the algorithm yielded increased computational times for solving our forward problem compared to those from standard DEIM as reported in Table 1.

### 5.3 Solution of the statistical inverse problem

Recall that the goal of solving the inverse problem is to numerically approximate the posterior probability distributions of the parameters of interest and , as well as to quantify uncertainty from the distribution. To achieve this, we compute steps of AMH as described in Section 3. Here, we use an initial Gaussian proposal with covariance and update the covariance using (10) after every th sample. After discarding the first samples due to burn-in555This is the initial stage of any MCMC chain, when the elements of the chains stagger from their starting values to the region of relatively high probability of the target density. It is important to discard the samples corresponding to burn-in to avoid computing biased statistics from the MCMC chain. The remaining MCMC chain after discarding the burn-in samples is said to be in equilibrium, and provided it is long enough, it can be treated as a collection of samples from a target density [3]., we plot the results from AMH-DEIM model (with ) and AMH-full model (with ).

We assess the performance of AMH-DEIM model and AMH-full model in sampling the target density via the following criteria:

• Autocorrelation: The efficiency with which the MCMC method explores the target density is quite important. It is determined by the degree of correlation in the AMH chains. More specifically, suppose is a Markov chain generated by the AMH algorithm, where

are identically distributed with variance

Suppose also that the covariance is translation invariant, that is, for where denotes the covariance function. Then, the autocorrelation function (ACF) of the -chain is given by

 ρj=cov(δ1,δ1+|j|)/σ2.

The ACF decays very fast to zero as Moreover, the integrated autocorrelation time, , (IACT) of the chain is defined as

 (30) τint(δ) := J+1∑j=−J+1ρj = J+1∑j=−J+1cov(δ1,δ1+|j|)/σ2 ≈ 1+2J−1∑j=1(1−jJ)cov(δ1,δ1+j)/σ2,

where, in practical implementation, is often taken to be [3]. If say, then it means that the roughly every th sample of the chain is independent.

• Geweke test: This uses the Central Limit Theorem to compute a statistic

which is used to obtain the probability

of accepting the null hypothesis of, say,

where and denote the means of the first and last of the chain, respectively. It should be noted that a high value of suggests that the chain is in equilibrium.

• Confidence interval (CI): This shows the interval within which all the samples of the chain lie with a prescribed probability. The midpoint of this interval (which is the mean of these samples) can be used to estimate the true parameter.

In addition to the above statistics, the histograms, the scatter plots, as well as the pairwise plots of the chains from the AMH-DEIM and AMH-full models are shown. Finally, we compare the computational times required by the AMH-DEIM and AMH-full models to solve the statistical inverse problem.

The performance of the Markov chain from the AMH-DEIM model (with ) and AMH-Full (with ) are reported in Figures 4, 5, 6, and 7. First, we show the histograms for the posterior distributions for the two parameters and in Figure 4, under which we have also reported the the confidence intervals (CIs). With AMH-DEIM (right), the CI is for chain and for chain. The midpoints of these two intervals are, respectively, and thus, the mean values associated with the calculated distribution of can be seen as AMH-DEIM model’s estimate for the true parameters Indeed, the Geweke -values obtained with the AMH-DEIM algorithm for both and chains are and respectively. Observe that each of them is over this suggests that both chains are in equilibrium.

Figure 4 also shows the histograms for the posterior distributions666Note that the histograms from AMH-Full and AMH-DEIM are not exactly the same. The difference is due essentially to randomness and not the accuracy of the forward solutions. If we ran the two models with the same seed (of random numbers), they would look identical. for both parameters, with the AMH-Full model (left); in this figure it is also reported that the CI for chain is and chain is Note that the midpoints of these intervals are, respectively, and Thus, it can be seen that the estimates from the AMH-DEIM model (i.e., and ) are better approximations of the true parameters The Geweke -values obtained with the AMH-full model for both and chains are and respectively. Similarly to the AMH-DEIM case, these -values indicate that the chains are in equilibrium.

Figure 5 shows the ACF for the two chains obtained with AMH-Full (left) and AMH-DEIM (right) schemes. Note that the ACFs of and can be seen to approach zero smoothly. Observe also from the figure that, with the AMH-DEIM scheme, the IACT for the two chains and are and respectively. This suggests that roughly every th sample of both chains from AMH-DEIM model is independent. Similarly, with the AMH-Full scheme, the IACT for the and chains are and respectively. This suggests that approximately every th sample from chain and every th sample from chains are independent. Besides, the Markov chains plots in Figure 6 and the scatter plots in Figure 7, together with the above statistical checks, imply that the computed Markov chains from both AMH-DEIM and AMH-Full schemes “adequately” approximate the posterior probability distributions for the unknown input parameters in the inverse problem.

Undoubtedly, the computational complexity associated with the reduced models is significantly smaller than that of the full model. This justifies why we replace the full nonlinear solver with the reduced solvers. The overall consequence is then the reduction in the computational complexity of the solution of the statistical inverse problem using the MCMC model (AMH). In general, in all our computations the preconditioned AMH-DEIM model reduces the computational complexity of the statistical inverse problem by More precisely, while the AMH-Full model (with ) solves the inverse problem considered above in it takes the AMH-DEIM model with about , which means the DEIM model can reduce the computational time by about

## References

• [1] H.-B. Ana, Z.-Y Mob, and X.-P. Liua, A choice of forcing terms in inexact Newton method, Journal of Computational and Applied Mathematics, 200 (2007), pp. 47 – 60.
• [2] H. Antil, M. Heinkenschloss, and D. C. Sorensen, Application of the discrete empirical interpolation method to reduced order modeling of nonlinear and parametric systems, in Reduced Order Methods for Modeling and Computational Reduction, A. Quarteroni and G. Rozza, eds., vol. 8 of Springer MS&A Series, Springer Verlag, Milano, Italia, 2013, pp. 101–136.
• [3] J. M. Bardsley, A. Solonen, H. Haario, and M. Laine, Randomize-then-optimize: A method for sampling from posterior distributions in nonlinear inverse problems, SIAM Journal on Scientific Computing, 36(4) (2014), pp. A1895 – A1910.
• [4] M. Barrault, Y. Maday, N. C. Nguyen, and A. T. Patera, An ‘empirical interpolation’ method: Application to efficient reduced-basis discretization of partial differential equations, Comptes Rendus Mathematique, 339 (2004), pp. 667 – 672.
• [5] S. Boyaval, C. Le Bris, T. Lelievre, Y. Maday, N. C. Nguyen, and A. T. Patera, Reduced basis techniques for stochastic problems, Archives of Computational Methods in Engineering, 17 (2010), pp. 435 – 454.
• [6] T. Bui-Thanh and M. Girolami, Solving large-scale PDE-constrained Bayesian inverse problems with Riemann manifold Hamiltonian Monte Carlo, Inverse Problems, 2014 (30), p. 114014.
• [7] D. Calvetti and E. Somersalo, Introduction to Bayesian Scientific Computing, Springer, 2007.
• [8] S. Chaturantabut, Nonlinear Model Reduction via Discrete Empirical Interpolation, PhD thesis, Rice University, Houston, 2011.
• [9] S. Chaturantabut and D. C. Sorensen, Nonlinear model reduction via discrete empirical interpolation, SIAM Journal on Scientific Computing, 32 (2010), pp. 2737 – 2764.
• [10] S. L. Cotter, G. O. Roberts, A. M. Stuart, and D. White, MCMC methods for functions: Modifying old algorithms to make them faster, Statistical Science, 28 (2013), pp. 424 – 446.
• [11] R. S. Dembo, S. C. Eisenstat, and T. Steihaug, Inexact Newton methods, SIAM Journal on Numerical Analysis, 19 (1982), pp. 400 – 408.
• [12] Z. Drmǎc and S. Gugercin, A new selection operator for the discrete empirical interpolation method-improved a priori error bound and extensions, SIAM Journal on Scientific Computing, 38 (2016), pp. A631 – A648.
• [13] S. C. Eisenstat and H. F. Walker, Choosing the forcing terms in an inexact Newton method, SIAM Journal on Scientific Computing, 17(1) (1996), pp. 16 – 32.
• [14] H. C. Elman and V. Forstall, Preconditioning techniques for reduced basis methods for parameterized elliptic partial differential equations, SIAM Journal on Scientific Computing, 37(5) (2015), pp. S177 – S194.
• [15]  , Numerical solution of the steady-state Navier-Stokes equations using empirical interpolation methods, Computer Methods in Applied Mechanics and Engineering, 317 (2017), pp. 380 – 399.
• [16] H. C. Elman, D. J. Silvester, and A. J. Wathen, Finite Elements and Fast Iterative Solvers, vol. Second Edition, Oxford University Press, 2014.
• [17] H. P. Flath, L. C. Wilcox, V. Akcelik, J. Hill, B. van Bloemen Waanders, and O. Ghattas, Fast algorithms for Bayesian uncertainty quantification in large-scale linear inverse problems based on low-rank partial Hessian approximations, SIAM Journal on Scientific Computing, 33 (2011), pp. 407 – 432.
• [18] V. Forstall, Iterative Solution Methods for Reduced-order Models of Parameterized Partial Differential Equations, PhD thesis, University of Maryland, College Park, 2015.
• [19] M. A. Grepl, Y. Maday, N. C. Nguyen, and A. T. Patera, Efficient reduced-basis treatment of nonaffine and nonlinear partial differential equations, Mathematical Modelling and Numerical Analysis, 41(3) (2007), pp. 575 – 605.
• [20] H. Haario, E. Saksman, and J. Tamminen, An adaptive Metropolis algorithm, Bernoulli, 7 (2001), pp. 223 – 242.
• [21] J. S. Hesthaven, G. Rozza, and B. Stamm, Certified Reduced Basis Methods for Parametrized Partial Differential Equations, Springer, 2016.
• [22] J. Kaipio and E. Somersalo, Statistical and Computional Inverse Problems, Springer, 2005.
• [23] C. T. Kelley, Iterative Methods for Linear and Nonlinear Equations, SIAM, 1995.
• [24] T. A. Manteuffel and S. V. Parter, Preconditioning and boundary conditions, SIAM Journal on Numerical Analysis, 27 (1990), pp. 656 – 694.
• [25] J. Martin, L. C. Wilcox, C. Burstedde, and O. Ghattas, A stochastic Newton MCMC method for large-scale statistical inverse problems with application to seismic inversion, SIAM Journal on Scientific Computing, 34 (2012), pp. A1460 – A1487.
• [26] A. Napov and Y. Notay, An algebraic multigrid method with guaranteed convergence rate, SIAM Journal on Scientific Computing, 34 (2010), pp. A1079 – A1109.
• [27] Y. Notay, Aggregation-based algebraic multigrid for convection-diffusion equations, SIAM Journal on Scientific Computing, 34 (2012), pp. A2288 – A2316.
• [28]  , An aggregation-based algebraic multigrid method, Electronic Transactions on Numerical Analysis, 37 (2012), pp. 123 – 146.
• [29] B. Peherstorfer, K. Willcox D. Butnaru, and H. J. Bungartz, Localized discrete empirical interpolation method, SIAM Journal on Scientific Computing, 36 (2014), pp. A168 – A192.
• [30] B. Peherstorfer, Z. Drmǎc, and S. Gugercin, Stabilizing discrete empirical interpolation via randomized and deterministic oversampling, arXiv preprint arXiv:1808.10473, (2018).
• [31] B. Peherstorfer and K. Willcox, Online adaptive model reduction for nonlinear systems via low-rank updates, SIAM Journal on Scientific Computing, 37 (2015), pp. A2123 – A2150.
• [32] A. Quarteroni, A. Manzoni, and F. Negri, Reduced Basis Methods for Partial Differential Equations, Springer, 2016.
• [33] S. Ravindran, A reduced-order approach for optimal control of fluids using proper orthogonal decomposition, Acta Numerica, 34 (2000), pp. 425 – 448.
• [34] C. P. Robert and G. Casella, Monte Carlo Statistical Methods, Springer, 2004.
• [35] Y. Saad, Iterative Methods for Sparse Linear Systems, Society for Industrial and Applied Mathematics, Philadelphia, PA, 2003.
• [36] A. K. Saibaba, Randomized discrete empirical interpolation method for nonlinear model reduction, https://arxiv.org/pdf/1903.00911.pdf, (2019).
• [37] R. C. Smith, Uncertainty Quantification: Theory, Implementation, and Applications, SIAM, 2014.