1 Introduction
Continuous advances in numerical methods and computational technology have made it feasible to simulate largescale physical phenomena such as weather systems, computer vision, and medical imaging. Mathematical models are widely used in practice to predict the behavioral patterns of such physical processes. In the applications we consider, the mathematical models are typically described by systems of partial differential equations (PDEs). However, parameters that are needed for a full description of the mathematical models, such as initial and boundary conditions, or coefficients, are typically unknown and need to be inferred from experimental data by solving an inverse problem. The acquisition of data is usually a laborious or expensive process and has a certain cost associated with it. Due to budgetary or physical considerations, often times, only a limited amount of data can be collected. Even in applications where collecting data is relatively cheap, processing large amounts of data can be computationally cumbersome, or a poor design may lead to wastage of resources, or may miss out on important information regarding the parameters of interest. Therefore, it is important to control the experimental conditions for data acquisition in a way that makes optimal use of resources to accurately reconstruct or infer the parameters of interest. This is known as Optimal Design of Experiments (ODE).
In this article we adopt the Bayesian approach for solving inverse problems. The Bayesian approach has the following ingredients: the data, the mathematical model, the statistical description of the observational noise, and the prior information about the parameters we wish to infer. Bayes’ theorem is used to combine these ingredients to produce the posterior distribution, which encapsulates the uncertainty in every aspect of the inverse problem. The posterior distribution can be interrogated in various ways: one can compute the peak of this distribution, called the maximum a posteriori probability (MAP) estimate, which estimates the posterior mode of the parameter, draw samples from this distributions, or compute the conditional mean. The Bayesian approach to ODE aims to minimize various measures of uncertainty in the inferred parameters by minimizing certain criteria based on the posterior distribution. Popular examples of design criteria include the Bayesian A and Doptimal criteria
[1, 2, 3]. For Gaussian posteriors, the A and Doptimal criteria are defined as the trace and the logdeterminant of the posterior covariance matrix, respectively.ODE is an active area of research [4, 1, 3, 5, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]. Specifically, in the recent years a number of advances have been made on optimal design of experiments for large scale applications. The articles [25, 8, 9, 26, 27, 15, 18, 21] target Aoptimal experimental designs for largescale inverse problems. Fast algorithms for computing Doptimal experimental design criterion, given by expected information gain, for nonlinear inverse problems, were introduced in [12, 16, 28]. An efficient greedy algorithm for computing Bayesian Doptimal designs, with correlated observations, is introduced in [29]. Choosing a Doptimal experimental design that targets a specific region in the parameter space is discussed in [30], where the optimal design minimizes the marginal uncertainties in the region of interest. The works [19] and [31] address theory and computational methods for Bayesian Doptimal design in infinitedimensional Bayesian linear inverse problems. Motivated by goaloriented approaches for parameter dimensionality reduction [32, 33, 34], our paper presents theory and methods for goaloriented optimal design of experiments (GOODE).
There are two potential drawbacks in the standard Bayesian approach for ODE. First, in certain applications, what may be of interest is not the reconstructed parameter in itself, but some prediction quantity involving the reconstructed parameter. In this situation, it may be desirable to deploy valuable resources to collect experimental data so as to minimize the uncertainty in the endgoal, i.e. prediction, rather than the reconstructed quantity. Second, the reconstructed parameters are often spatial images or infinite dimensional functions. When discretized on a finescale grid, the resulting parameter dimension is very high. The posterior covariance matrix is also very high dimensional; forming and storing this covariance matrix explicitly is computationally infeasible on problems discretized on very fine grid resolutions. Consequently, evaluating the optimal design criteria is challenging. Randomized matrix methods
[35, 36] have been instrumental in addressing such computational challenges. When the dimension of the predictions is smaller than the dimension of the reconstructed parameter, working in the prediction space may be computationally beneficial. These two reasons—the need for designs tailored to predictions and the computational savings offered by targeting lowdimensional prediction quantities—motivate us to propose goaloriented criteria, and devise efficient algorithms for their computation and optimization.As a motivating application, consider the transport of a contaminant in an urban environment. The inverse problem of interest here seeks to identify the source of the contaminant from measurements collected at sensor locations. The standard Bayesian approach to ODE involves controlling the sensor locations in order to reconstruct the source (represented as a spatial function) with minimized uncertainty over the entire domain. On the other hand, if instead of determining the initial condition, the goal is to predict the average contaminant around a building after a certain amount of time has elapsed, then the experimental design should explicitly account for this goal. This application is explored in detail in Sections 4 and 5. As a preview, in Figure 1 we show optimal sensor placements corresponding to three different goals: roughly speaking, the left, middle, and right figures depict sensor placements that are focused on the first building, the second building, and both buildings, respectively. This shows immediately, that incorporating the end goal, i.e., the target prediction, in the ODE problem results in different sensor placements, which may be valuable in practice. More details are provided in Section 5.
The main contributions of this article are as follows. We propose two goal oriented criteria—AGOODE and DGOODE—that are analogues of the A and Doptimality criteria in classical Bayesian ODE. We investigate the properties of these two criteria by explaining connections to the classical Bayesian ODE criteria, and provide interpretations for the criteria by using tools from information theory. Then, we propose a gradientbased optimization framework for GOODE using these two criteria. To facilitate this, we derive expressions for the gradient of these objective functions, along with a computational recipe for computing them, for largescale linear inverse problems governed by timedependent PDEs. Our proposed strategy is implemented within the context of optimal sensor placement for the class of inverse problems under study. We enforce sparsity of the sensor placements using an norm penalty approach, and propose a practical strategy for specifying the associated penalty parameter. In addition, we present comprehensive numerical results, in context of initial state inversion in a timedependent advectiondiffusion equation, to test the performance of the criteria and the proposed algorithms. The specific model problem under study is motivated by the inverse problem of contaminant source identification.
This article is organized as follows. Section 2 presents a brief overview of Bayesian inverse problems and optimal design of experiments. Section 3 formulates the goaloriented optimal design of experiments problem, and presents the computational methods for solving such problems. Section 4 describes the model inverse advectiondiffusion problem and the setup of numerical experiments used to test the proposed methods. Numerical results are detailed in Section 5. Concluding remarks are given in Section 6.
2 Background
In this section, we briefly review the Bayesian formulation of an inverse problem, and some basics from Bayesian optimal design of experiments.
2.1 Bayesian inverse problem
Consider the problem of reconstructing an unknown parameter using noisy data and a model
(1) 
where
is a centered random variable that models measurement noise. Specifically, we consider a Gaussian noise model
. We consider the case thatis a (finitedimensional) vector of measurement data,
is an element of an appropriate infinitedimensional real separable Hilbert space, andis a continuous linear transformation, which will we will refer to as the parametertoobservable map. In applications we target
, where is a bounded domain in , with , or . The infinitedimensional formulation, finiteelement discretization, and numerical solution of this problem have been addressed in detail in [37], under the assumption of a Gaussian prior and additive Gaussian noise model, which is the setting we consider here.To keep the presentation simple, we consider the discretized version of the problem. However, in what follows, we pay close attention to the issues pertaining to discretization of the infinitedimensional problem and the infinitedimensional limit. We denote the Gaussian prior for the discretized parameter as , and assume . Letting be the discretized parametertoobservable map, the additive Gaussian noise assumption leads to the Gaussian likelihood,
(2) 
That is, . It is well known (see e.g., [38, Chapter 3]) that, in this setting, the posterior is also a Gaussian with
(3) 
It is also worth mentioning that the posterior mean is the minimizer of the following functional
(4) 
The Hessian of the above functional, is given by
(5) 
where denotes the Hessian of the datamisfit term in (4).
Note that for the linear operator , we use to denote its adjoint. The reason we do not simply use matrix transpose is as follows. The underlying infinitedimensional Bayesian inverse problem is formulated on equipped with the standard inner product. As noted in [37], when discretizing the Bayesian inverse problem, we also need to pay attention to the choice of inner products. Namely, the discretized parameter space has to be equipped with a discretized inner product. If finite element method is used to discretize the inverse problem, then, the appropriate inner product to use is the Euclidean inner product weighted by the finiteelement mass matrix . That is, for , in , we use . In the infinitedimensional setting, the forward operator is a mapping from to ; the domain is endowed with the inner product and the codomain is equipped with the Euclidean inner product, which we denote by . Upon discretization, we work with the discretized forward operator . For and , we have
from this we note . See [37] for further details on finiteelement discretization of Bayesian inverse problems.
2.2 Bayesian Optimal design of experiments
Next, we turn to the problem of optimal design of experiments (ODE) for Bayesian linear inverse problems governed by PDEs, which has received great amount of attention in recent years [39, 11, 14, 12, 40]. In a standard Bayesian experimental design problem, we seek an experimental design that results in minimized posterior uncertainty in the inferred parameter . The way one chooses to quantify posterior uncertainty leads to the choice of the design criterion [1, 41, 42, 2, 43, 44, 45, 46, 4, 3]. When the posterior distribution is Gaussian, the standard experimental design criteria are defined as functionals of ; here denotes a generic vector of experimental design parameters. For example, the Aoptimal design is found by minimizing the trace of the posterior covariance operator
(6) 
On the other hand, a Doptimal design is obtained by minimizing the logdeterminant of the posterior covariance
(7) 
See e.g., [47, 1, 25, 26] for further details. As discussed in the introduction, the present work is focused on goaloriented optimal design of experiments. This is detailed in the next section.
3 GoalOriented Optimal Design of Experiments
Classical Bayesian optimal experimental design constructs experimental designs that result in minimized posterior uncertainty on the inversion parameter . On the other hand, goaloriented optimal design of experiments (GOODE) seeks designs that minimize the uncertainty associated with a goal quantity of interest (QoI), which is a function of . In this article, we consider a goal QoI of the form
(8) 
where is a linear operator, which we call the goal operator. The parameter is inferred by solving an inverse problem, as described in the previous section.
As mentioned before, to keep the presentation simple, we work with discretized quantities. The goal operator is thus the discretization of a linear transformation that maps the inversion parameter, an element of , to a goal QoI. We consider the case where the goal QoI is finitedimensional; i.e., the goal operator has a finitedimensional range independent of discretization. This is motivated by many applications in which the end goal is either a scalar or a relatively lowdimensional vector. We denote the dimension of the goal by .
Assuming the Gaussian linear setting presented above, the prior and posterior laws of defined in (8) can be obtained as follows: the prior distribution law of is , with
(9) 
where is the adjoint of the goal operator . The posterior distribution law of the goal QoI , conditioned on the observations , is also Gaussian and is given by , where
(10) 
Below, we describe the GOODE criteria under consideration, and present a scalable framework for computing these criteria and their derivatives with respect to design parameters, which exploits the fact that the end goal is of much lower dimension than .
We focus on goaloriented A and Doptimal experimental designs (see Section 3.1). We also examine these criteria from a decision theoretic point of view. In the case of goal oriented Bayesian Aoptimality, we show that the minimization of expected Bayes risk of the goal QoI is equivalent to minimizing the trace of the covariance operator . In the case of goaloriented Bayesian Doptimality, we show that maximizing the expected information gain for the goal QoI is equivalent to minimizing logdeterminant of . These results, which are natural extensions of the known results from classical Bayesian optimal experimental design theory, provide further insight on the interpretation of the presented GOODE criteria. In Section 3.2, we describe the precise definition of an experimental design vector that parameterizes a sensor placement and leads to formulation of a suitable optimization problem. This is followed by the discussion of the optimization problem for finding A and DGOODE criteria and a computational framework for computing the associated objective functions and their gradients in Section 3.3. Algorithmic descriptions of the proposed methodologies, along with a discussion of the computational cost, are outlined in Section 3.4. In Section 3.5, we further discuss the connections of the GOODE criteria to the corresponding classical experimental design criteria, and outline possible extensions to the cases of nonlinear goal operators.
3.1 GOODE criteria
Let, as before, denote a generic vector of experimental design parameters. (The precise definition of , in the case of optimal sensor placement problems, will be provided later in this section.) In this section, we will assume that has full rowrank, and consider a Bayesian linear inverse problem as formulated in Section 2.1. Here we propose and examine the implications of goaloriented A and Doptimal criteria.
Goaloriented Aoptimal criterion
The goaloriented Aoptimal design (AGOODE) criterion is defined to be the trace of the posterior covariance matrix of the goal QoI:
(11) 
Two special cases are worth pointing out here. In the special case is a row vector , we get , which is the classical Coptimality criterion. On the other hand, if , then this is nothing but the classical Bayesian Aoptimal criterion.
Note that while is a highdimensional operator, in practice usually has a lowdimensional range. Thus, in practice is a lowdimensional operator. From computational point of view, this means computing the AGOODE criterion is significantly cheaper than that of the classical Aoptimality which is .
Minimizing the AGOODE criterion can be understood as minimizing the average variance of the goal QoI. However, having a measure of the statistical quality of the estimator
is also of crucial importance. Theorem 3.1 below addresses this by relating the goaloriented Aoptimal criterion and the expected Bayes risk of , which is nothing but the mean squared error averaged over the prior distribution.Theorem 3.1.
Consider a Bayesian linear inverse problem as formulated in Section 2.1. Let be the estimator for the goal QoI, , where has full rowrank, then,
(12) 
Proof.
See Appendix A. ∎
Note that for notational convenience, and since the result holds pointwise in , we have suppressed the dependence on in the statement of the theorem.
Goaloriented Bayesian DOptimality criterion
The goaloriented Doptimal design (DGOODE) criterion is taken to be the logdeterminant of the posterior endgoal covariance :
(13) 
To explain the motivation for this optimality criterion, we consider the KullbackLeibler (KL) divergence [48] from the posterior to prior distribution of the endgoal QoI :
(14) 
Since both distributions are Gaussian, the KLdivergence has a closed form expression, which will simplify the calculations considerably. The expected information gain is defined as
(15) 
The classical Bayesian Doptimality criterion is related to the expected information gain, quantified by the expected KL divergence between the posterior distribution and the prior distribution. A similar relation for the goaloriented Doptimality criterion can be derived. We present the following result that relates and the expected information gain :
Theorem 3.3.
Consider a Bayesian linear inverse problem as formulated in Section 2.1. Let be an endgoal QoI, where has full rowrank. Then,
(16) 
Proof.
See Appendix D. ∎
The significance of Theorem 3.3 is that it says minimizing with the appropriate constraints on , amounts to maximizing the expected information gain under the same constraints.
Remark 3.4.
Note that in the case of , and in the limit as the criterion (13) is meaningless. The reason for this is that in the infinitedimensional limit
is a positive selfadjoint trace class operator and thus its eigenvalues accumulate at zero. On the other hand in the case of
, (16) simplifies towhich recovers the known result [19] that for a Gaussian linear Bayesian inverse problem
(17) 
and note that the quantity to the right is well defined in the limit as .
The previous remark shows that (17) is the correct expression for the expected information gain to choose in the case of . While the derivation of (17) here is done for the discretized version of the problem, as shown in [19] such an expression for the expected information gain can be derived in the infinitedimensional Hilbert space setting also.
3.2 Goaloriented sensor placement
The experimental conditions we choose to control are the sensor locations in the domain at which data are to be collected. This can be expressed as an optimal design of experiments (ODE) problem, as we now demonstrate. Our strategy is to fix an array of candidate locations for sensors and then select an optimal subset of the candidate sensor locations. In this context, a design is a binary vector where each entry corresponds to whether or not a particular sensor is active. Practical considerations, such as budgetary or physical constraints, limit the number of sensors that can be chosen. In this context, ODE seeks to identify the best possible sensor locations out of the possible sensor locations.
We consider Bayesian inverse problems with timedependent linear forward models. In this case, maps the inversion parameter to spatiotemporal observations of the state at the sensor locations and at observation times. In what follows, we make use of the notation for the forward model that maps to the equivalent sensor measurements at observation time instance .
In the present formulation, enters the Bayesian inverse problem through the data likelihood:
(18) 
where , and is a block diagonal matrix with . In particular, where and is the Kronecker product. The noise covariance is in general a block diagonal matrix , where is the spatial noise covariance matrix corresponding to th observation time, and is the number of observation time instances. In the present work, we assume that observations are uncorrelated in space and time, and thus is a diagonal matrix. While this assumption is not necessary, it simplifies the formulation considerably. Since is also diagonal, we have the convenient relation
The posterior covariance of the parameter is
(19) 
Therefore, the posterior distribution of the goal QoI , conditioned by the observations , and the design is the Gaussian with
(20) 
The above definition of will be substituted in the GOODE criteria described above.
Identifying the best sensor locations out of a set of candidate sensor locations is a combinatorial problem that is computationally intractable even for modest values of and . A standard approach [49, 25, 8, 9, 15], which we follow in this article, is to relax the binary condition on the design weights and let , . To ensure that only a limited number of sensors are allowed to be active, we use sparsifying penalty functions to control the sparsity of the optimal designs.
In the present formulation, nonbinary weights are difficult to interpret and implement. Thus, some form of thresholding scheme is required to make a computed optimal design vector into a binary design. In this work, we adopt the following heuristic for thresholding: assuming that only
sensors are to be placed in the candidate locations, as is common practice, the locations corresponding to highest weights can be selected. This means that the corresponding weights are set to whereas all other sensors are set to , therefore giving a “nearoptimal” solution to the original binary problem [50]. An alternative approach, not considered in this article, is to obtain binary weights is to successively approximate the norm by employing a sequence of penalty functions yielding a binary solution [15]. The issue of sparsification and the choice of sparsifying penalty function is elaborated further in the description of the GOODE optimization problem formulation below and in the numerical results section.3.3 The optimization problem
The generic form of the goaloriented experimental design problem involves an optimization problem of the form
(21)  
subject to 
where, is the specific design criterion, is a penalty function, and is a userdefined penalty parameter that controls sparsity of the design. In this work, we make a choice to incorporate an norm to control sparsity of the design; that is, we set
(22) 
Depending on whether we want goalOriented A or Doptimality, is either or , respectively.
The optimization problem is solved using a gradient based approach, and therefore, this requires the derivation of the gradient. Since the design weights are restricted to the interval , differentiable, and the gradient of the penalty term in (21) is , where is a vector of ones. In the sequel, we derive expressions for the gradient of and .
3.3.1 Gradient of
The gradient of with respect to the design is given by
(23a)  
where is the pointwise Hadamard product, and  
(23b) 
Here is the coordinate vector in , and is the forward model that maps the parameter to the equivalent observation at time instance , . See Appendix B for derivation of this gradient expression.
In the case the forward model is timeindependent, the expressions simplify to
(24) 
where is the coordinate vector in .
3.3.2 Gradient of
We present two alternate way of computing the gradient that are equivalent, but differ in computational cost depending on the number of sensors and dimension of the goal QoI, i.e. . In the first formulation we assume that . We can compute the gradient as
(25a)  
where  
(25b) 
and is the coordinate vector in . The details are given in Appendix C.1. This formulation is especially suited for largescale fourdimensional variational (4DVar) data assimilation applications [51, 52, 53], including weather forecasting, and ocean simulations. Note that, in this formulation, evaluating the gradient requires the squareroot of the posterior covariance matrix .
Evaluating (25), requires solving linear systems. If , the following alternative formulation of the gradient of will be computationally beneficial:
(26a)  
where is now:  
(26b) 
and is the coordinate vector in . The derivation details are given in Appendix C.2. Note that the two gradient expressions are equivalent, in exact arithmetic.
3.4 Implementation and computational considerations
The main bottleneck of solving AGOODE and DGOODE problems lie in the evaluation of the respective objective functions and the associated gradients. Here, we detail the steps of evaluating the objective function and the gradient of both AGOODE and DGOODE problems. These steps are explained in Algorithms 1, and 2, respectively. For simplicity, we drop the penalty terms in the computations presented in the two Algorithms 1, and 2.
The AGOODE problem. Algorithm 1 outlines the main steps of evaluating the AGOODE objective and the gradient (23). Notice that all loops over the endgoal dimension, in Algorithm 1, i.e., steps 1–3, 8–10, and 16–20, are embarrassingly parallel. Evaluating , , is independent of the design , and can be evaluated offline and saved for later use. Step 2 requires a Hessian solve for each of the vectors , which can be done using preconditioned conjugate gradient. Each application of the Hessian requires a forward and an adjoint PDE solve. Using the prior covariance as a preconditioner, this requires CG iterations, i.e., PDE solves, where is the numerical rank of the priorpreconditioned data misfit Hessian; see [54, 55]. Another forward solutions of the underlying PDE are required in Step in the algorithm, for gradient computation. Therefore, the total number of PDE solves for objective and gradient evaluation is . This computations can be easily parallelized in cores. Moreover, the applications of the inverse Hessian—the Hessian solves—can be accelerated by using lowrank approximations of the priorpreconditioned data misfit Hessian [54]. Note that Algorithm (1) also requires independent applications of and its adjoint, which can be done in parallel.
Remark 3.5.
In Algorithm 1, is the goal operator, is the dimension of the endgoal, is the number of observation time instances, is the experimental design, is the weighted Hessian, is the covariance of the measurement noise at time instance , and is the forward model that maps the parameter from time instance to the equivalent observation at observation time instance .
The DGOODE problem. Algorithm 2 describes the main steps for calculating DGOODE objective and gradient expressions (25)–(26). Similar to Algorithm 1, all loops over the endgoal dimension, i.e., steps 1–3, 14–22, and 16–20 in Algorithm 2, are inherently parallel. Also, the loop over the observation dimension, , in Algorithm 2 is inherently parallel. With small endgoal space, the cost of Cholesky factorization of , in step is negligible compared to PDE solutions. The first form of the gradient, i.e. steps 12–24, requires Hessian solves, and forward PDE solves that can run completely in parallel. The second form of the gradient, i.e. steps 25–41, on the other hand requires Hessian solves, and forward and adjoint solves of the underlying system of PDEs. As mentioned before, lowrank approximation of the priorpreconditioned data misfit Hessian can be used to accelerate computations. Checkpointing is utilized in the second form of the gradient, i.e., steps 28–31. The checkpointed solutions are recalled in the adjoint solves in step 33.
Algorithm 2 requires applications of and its adjoint for objective function evaluations. As for the gradient, we need applications of with the first form of the gradient, and applications of . As before, the loops over endgoal and observation dimensions are embarrassingly parallel.
3.5 Connections to classical ODE criteria and extensions
Here we further discuss the connections of the GOODE criteria to the corresponding classical experimental design criteria and an extension to nonlinear goal operators. We have already mentioned two important connections:

Taking trivially recovers the Bayesian A and Doptimal design criteria.

In the special case is a row vector , we get , which is the Bayesian Coptimality criterion.
Furthermore, if the vector is randomly drawn from a distribution with mean zero and identity covariance, then
Comments
There are no comments yet.