# Goal-Oriented Optimal Design of Experiments for Large-Scale Bayesian Linear Inverse Problems

We develop a framework for goal oriented optimal design of experiments (GOODE) for large-scale Bayesian linear inverse problems governed by PDEs. This framework differs from classical Bayesian optimal design of experiments (ODE) in the following sense: we seek experimental designs that minimize the posterior uncertainty in a predicted quantity of interest (QoI) rather than the estimated parameter itself. This is suitable for scenarios in which the solution of an inverse problem is an intermediate step and the estimated parameter is then used to compute a prediction QoI. In such problems, a GOODE approach has two benefits: the designs can avoid wastage of experimental resources by a targeted collection of data, and the resulting design criteria are computationally easier to evaluate due to the often low dimensionality of prediction QoIs. We present two modified design criteria, A-GOODE and D-GOODE, which are natural analogues of classical Bayesian A- and D-optimal criteria. We analyze the connections to other ODE criteria, and provide interpretations for the GOODE criteria by using tools from information theory. Then, we develop an efficient gradient-based optimization framework for solving the GOODE optimization problems. Additionally, we present comprehensive numerical experiments testing the various aspects of the presented approach. The driving application is the optimal placement of sensors to identify the source of contaminants in a diffusion and transport problem. We enforce sparsity of the sensor placements using an ℓ_1-norm penalty approach, and propose a practical strategy for specifying the associated penalty parameter.

## Authors

• 7 publications
• 12 publications
• 14 publications
• ### A fast and scalable computational framework for goal-oriented linear Bayesian optimal experimental design: Application to optimal sensor placement

Optimal experimental design (OED) is a principled framework for maximizi...
02/12/2021 ∙ by Keyi Wu, et al. ∙ 0

• ### Optimal Experimental Design for Inverse Problems in the Presence of Observation Correlations

Optimal experimental design (OED) is the general formalism of sensor pla...
07/28/2020 ∙ by Ahmed Attia, et al. ∙ 0

• ### A sequential sensor selection strategy for hyper-parameterized linear Bayesian inverse problems

We consider optimal sensor placement for hyper-parameterized linear Baye...
11/23/2020 ∙ by Nicole Aretz-Nellesen, et al. ∙ 0

• ### Optimal design of large-scale Bayesian linear inverse problems under reducible model uncertainty: good to know what you don't know

We consider optimal design of infinite-dimensional Bayesian linear inver...
06/21/2020 ∙ by Alen Alexanderian, et al. ∙ 0

• ### Solving Inverse Problems for Steady-State Equations using A Multiple Criteria Model with Collage Distance, Entropy, and Sparsity

In this paper, we extend the previous method for solving inverse problem...
11/07/2019 ∙ by Herb Kunze, et al. ∙ 0

• ### Solving Optimal Experimental Design with Sequential Quadratic Programming and Chebyshev Interpolation

We propose an optimization algorithm to compute the optimal sensor locat...
12/13/2019 ∙ by Jing Yu, et al. ∙ 0

• ### Inverse heat source problem and experimental design for determining iron loss distribution

Iron loss determination in the magnetic core of an electrical machine, s...
03/23/2020 ∙ by Antti Hannukainen, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Continuous advances in numerical methods and computational technology have made it feasible to simulate large-scale physical phenomena such as weather systems, computer vision, and medical imaging. Mathematical models are widely used in practice to predict the behavioral patterns of such physical processes. In the applications we consider, the mathematical models are typically described by systems of partial differential equations (PDEs). However, parameters that are needed for a full description of the mathematical models, such as initial and boundary conditions, or coefficients, are typically unknown and need to be inferred from experimental data by solving an inverse problem. The acquisition of data is usually a laborious or expensive process and has a certain cost associated with it. Due to budgetary or physical considerations, often times, only a limited amount of data can be collected. Even in applications where collecting data is relatively cheap, processing large amounts of data can be computationally cumbersome, or a poor design may lead to wastage of resources, or may miss out on important information regarding the parameters of interest. Therefore, it is important to control the experimental conditions for data acquisition in a way that makes optimal use of resources to accurately reconstruct or infer the parameters of interest. This is known as Optimal Design of Experiments (ODE).

In this article we adopt the Bayesian approach for solving inverse problems. The Bayesian approach has the following ingredients: the data, the mathematical model, the statistical description of the observational noise, and the prior information about the parameters we wish to infer. Bayes’ theorem is used to combine these ingredients to produce the posterior distribution, which encapsulates the uncertainty in every aspect of the inverse problem. The posterior distribution can be interrogated in various ways: one can compute the peak of this distribution, called the maximum a posteriori probability (MAP) estimate, which estimates the posterior mode of the parameter, draw samples from this distributions, or compute the conditional mean. The Bayesian approach to ODE aims to minimize various measures of uncertainty in the inferred parameters by minimizing certain criteria based on the posterior distribution. Popular examples of design criteria include the Bayesian A- and D-optimal criteria

[1, 2, 3]. For Gaussian posteriors, the A- and D-optimal criteria are defined as the trace and the log-determinant of the posterior covariance matrix, respectively.

ODE is an active area of research [4, 1, 3, 5, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]. Specifically, in the recent years a number of advances have been made on optimal design of experiments for large scale applications. The articles [25, 8, 9, 26, 27, 15, 18, 21] target A-optimal experimental designs for large-scale inverse problems. Fast algorithms for computing D-optimal experimental design criterion, given by expected information gain, for nonlinear inverse problems, were introduced in [12, 16, 28]. An efficient greedy algorithm for computing Bayesian D-optimal designs, with correlated observations, is introduced in [29]. Choosing a D-optimal experimental design that targets a specific region in the parameter space is discussed in [30], where the optimal design minimizes the marginal uncertainties in the region of interest. The works [19] and [31] address theory and computational methods for Bayesian D-optimal design in infinite-dimensional Bayesian linear inverse problems. Motivated by goal-oriented approaches for parameter dimensionality reduction [32, 33, 34], our paper presents theory and methods for goal-oriented optimal design of experiments (GOODE).

There are two potential drawbacks in the standard Bayesian approach for ODE. First, in certain applications, what may be of interest is not the reconstructed parameter in itself, but some prediction quantity involving the reconstructed parameter. In this situation, it may be desirable to deploy valuable resources to collect experimental data so as to minimize the uncertainty in the end-goal, i.e. prediction, rather than the reconstructed quantity. Second, the reconstructed parameters are often spatial images or infinite dimensional functions. When discretized on a fine-scale grid, the resulting parameter dimension is very high. The posterior covariance matrix is also very high dimensional; forming and storing this covariance matrix explicitly is computationally infeasible on problems discretized on very fine grid resolutions. Consequently, evaluating the optimal design criteria is challenging. Randomized matrix methods

[35, 36] have been instrumental in addressing such computational challenges. When the dimension of the predictions is smaller than the dimension of the reconstructed parameter, working in the prediction space may be computationally beneficial. These two reasons—the need for designs tailored to predictions and the computational savings offered by targeting low-dimensional prediction quantities—motivate us to propose goal-oriented criteria, and devise efficient algorithms for their computation and optimization.

As a motivating application, consider the transport of a contaminant in an urban environment. The inverse problem of interest here seeks to identify the source of the contaminant from measurements collected at sensor locations. The standard Bayesian approach to ODE involves controlling the sensor locations in order to reconstruct the source (represented as a spatial function) with minimized uncertainty over the entire domain. On the other hand, if instead of determining the initial condition, the goal is to predict the average contaminant around a building after a certain amount of time has elapsed, then the experimental design should explicitly account for this goal. This application is explored in detail in Sections 4 and 5. As a preview, in Figure 1 we show optimal sensor placements corresponding to three different goals: roughly speaking, the left, middle, and right figures depict sensor placements that are focused on the first building, the second building, and both buildings, respectively. This shows immediately, that incorporating the end goal, i.e., the target prediction, in the ODE problem results in different sensor placements, which may be valuable in practice. More details are provided in Section 5.

The main contributions of this article are as follows. We propose two goal oriented criteria—A-GOODE and D-GOODE—that are analogues of the A- and D-optimality criteria in classical Bayesian ODE. We investigate the properties of these two criteria by explaining connections to the classical Bayesian ODE criteria, and provide interpretations for the criteria by using tools from information theory. Then, we propose a gradient-based optimization framework for GOODE using these two criteria. To facilitate this, we derive expressions for the gradient of these objective functions, along with a computational recipe for computing them, for large-scale linear inverse problems governed by time-dependent PDEs. Our proposed strategy is implemented within the context of optimal sensor placement for the class of inverse problems under study. We enforce sparsity of the sensor placements using an -norm penalty approach, and propose a practical strategy for specifying the associated penalty parameter. In addition, we present comprehensive numerical results, in context of initial state inversion in a time-dependent advection-diffusion equation, to test the performance of the criteria and the proposed algorithms. The specific model problem under study is motivated by the inverse problem of contaminant source identification.

This article is organized as follows. Section 2 presents a brief overview of Bayesian inverse problems and optimal design of experiments. Section 3 formulates the goal-oriented optimal design of experiments problem, and presents the computational methods for solving such problems. Section 4 describes the model inverse advection-diffusion problem and the setup of numerical experiments used to test the proposed methods. Numerical results are detailed in Section 5. Concluding remarks are given in Section 6.

## 2 Background

In this section, we briefly review the Bayesian formulation of an inverse problem, and some basics from Bayesian optimal design of experiments.

### 2.1 Bayesian inverse problem

Consider the problem of reconstructing an unknown parameter using noisy data and a model

 y=F(θ)+\boldmathδ, (1)

where

is a centered random variable that models measurement noise. Specifically, we consider a Gaussian noise model

. We consider the case that

is a (finite-dimensional) vector of measurement data,

is an element of an appropriate infinite-dimensional real separable Hilbert space, and

is a continuous linear transformation, which will we will refer to as the parameter-to-observable map. In applications we target

, where is a bounded domain in , with , or . The infinite-dimensional formulation, finite-element discretization, and numerical solution of this problem have been addressed in detail in [37], under the assumption of a Gaussian prior and additive Gaussian noise model, which is the setting we consider here.

To keep the presentation simple, we consider the discretized version of the problem. However, in what follows, we pay close attention to the issues pertaining to discretization of the infinite-dimensional problem and the infinite-dimensional limit. We denote the Gaussian prior for the discretized parameter as , and assume . Letting be the discretized parameter-to-observable map, the additive Gaussian noise assumption leads to the Gaussian likelihood,

 P(y|\boldmathθ)∝exp(−12∥∥F\boldmathθ−y∥∥2Γ−1noise). (2)

That is, . It is well known (see e.g., [38, Chapter 3]) that, in this setting, the posterior is also a Gaussian with

 Γpost=(F∗Γ−1noiseF+Γ−1pr)−1,\boldmathθypost=Γpost(Γ−1pr\boldmathθpr+F∗Γ−1noisey). (3)

It is also worth mentioning that the posterior mean is the minimizer of the following functional

 J(\boldmathθ):=12∥∥F\boldmathθ\boldmathθ−y∥∥2Γ−1noise+12∥∥\boldmathθ−\boldmathθpr∥∥2Γ−1pr. (4)

The Hessian of the above functional, is given by

 H=Hmisfit+Γ−1pr=Γ−1post, (5)

where denotes the Hessian of the data-misfit term in (4).

Note that for the linear operator , we use to denote its adjoint. The reason we do not simply use matrix transpose is as follows. The underlying infinite-dimensional Bayesian inverse problem is formulated on equipped with the standard inner product. As noted in [37], when discretizing the Bayesian inverse problem, we also need to pay attention to the choice of inner products. Namely, the discretized parameter space has to be equipped with a discretized inner product. If finite element method is used to discretize the inverse problem, then, the appropriate inner product to use is the Euclidean inner product weighted by the finite-element mass matrix . That is, for , in , we use . In the infinite-dimensional setting, the forward operator is a mapping from to ; the domain is endowed with the inner product and the co-domain is equipped with the Euclidean inner product, which we denote by . Upon discretization, we work with the discretized forward operator . For and , we have

from this we note . See [37] for further details on finite-element discretization of Bayesian inverse problems.

### 2.2 Bayesian Optimal design of experiments

Next, we turn to the problem of optimal design of experiments (ODE) for Bayesian linear inverse problems governed by PDEs, which has received great amount of attention in recent years [39, 11, 14, 12, 40]. In a standard Bayesian experimental design problem, we seek an experimental design that results in minimized posterior uncertainty in the inferred parameter . The way one chooses to quantify posterior uncertainty leads to the choice of the design criterion [1, 41, 42, 2, 43, 44, 45, 46, 4, 3]. When the posterior distribution is Gaussian, the standard experimental design criteria are defined as functionals of ; here denotes a generic vector of experimental design parameters. For example, the A-optimal design is found by minimizing the trace of the posterior covariance operator

 Ψ\mathupA(\boldmathw):=tr(Γpost(\boldmathw)). (6)

On the other hand, a D-optimal design is obtained by minimizing the log-determinant of the posterior covariance

 Ψ\mathupD(\boldmathw):=logdet(Γpost(\boldmathw))=logdet([H(\boldmathw)]−1). (7)

See e.g., [47, 1, 25, 26] for further details. As discussed in the introduction, the present work is focused on goal-oriented optimal design of experiments. This is detailed in the next section.

## 3 Goal-Oriented Optimal Design of Experiments

Classical Bayesian optimal experimental design constructs experimental designs that result in minimized posterior uncertainty on the inversion parameter . On the other hand, goal-oriented optimal design of experiments (GOODE) seeks designs that minimize the uncertainty associated with a goal quantity of interest (QoI), which is a function of . In this article, we consider a goal QoI of the form

 \boldmathρ=P\boldmathθ, (8)

where is a linear operator, which we call the goal operator. The parameter is inferred by solving an inverse problem, as described in the previous section.

As mentioned before, to keep the presentation simple, we work with discretized quantities. The goal operator is thus the discretization of a linear transformation that maps the inversion parameter, an element of , to a goal QoI. We consider the case where the goal QoI is finite-dimensional; i.e., the goal operator has a finite-dimensional range independent of discretization. This is motivated by many applications in which the end goal is either a scalar or a relatively low-dimensional vector. We denote the dimension of the goal by .

Assuming the Gaussian linear setting presented above, the prior and posterior laws of defined in (8) can be obtained as follows: the prior distribution law of is , with

 \boldmathρpr=P\boldmath% θpr,Σpr=PΓprP∗, (9)

where is the adjoint of the goal operator . The posterior distribution law of the goal QoI , conditioned on the observations , is also Gaussian and is given by , where

 (10)

Below, we describe the GOODE criteria under consideration, and present a scalable framework for computing these criteria and their derivatives with respect to design parameters, which exploits the fact that the end goal is of much lower dimension than .

We focus on goal-oriented A- and D-optimal experimental designs (see Section 3.1). We also examine these criteria from a decision theoretic point of view. In the case of goal oriented Bayesian A-optimality, we show that the minimization of expected Bayes risk of the goal QoI is equivalent to minimizing the trace of the covariance operator . In the case of goal-oriented Bayesian D-optimality, we show that maximizing the expected information gain for the goal QoI is equivalent to minimizing log-determinant of . These results, which are natural extensions of the known results from classical Bayesian optimal experimental design theory, provide further insight on the interpretation of the presented GOODE criteria. In Section 3.2, we describe the precise definition of an experimental design vector that parameterizes a sensor placement and leads to formulation of a suitable optimization problem. This is followed by the discussion of the optimization problem for finding A- and D-GOODE criteria and a computational framework for computing the associated objective functions and their gradients in Section 3.3. Algorithmic descriptions of the proposed methodologies, along with a discussion of the computational cost, are outlined in Section 3.4. In Section 3.5, we further discuss the connections of the GOODE criteria to the corresponding classical experimental design criteria, and outline possible extensions to the cases of nonlinear goal operators.

### 3.1 GOODE criteria

Let, as before, denote a generic vector of experimental design parameters. (The precise definition of , in the case of optimal sensor placement problems, will be provided later in this section.) In this section, we will assume that has full row-rank, and consider a Bayesian linear inverse problem as formulated in Section 2.1. Here we propose and examine the implications of goal-oriented A- and D-optimal criteria.

##### Goal-oriented A-optimal criterion

The goal-oriented A-optimal design (A-GOODE) criterion is defined to be the trace of the posterior covariance matrix of the goal QoI:

 (11)

Two special cases are worth pointing out here. In the special case is a row vector , we get , which is the classical C-optimality criterion. On the other hand, if , then this is nothing but the classical Bayesian A-optimal criterion.

Note that while is a high-dimensional operator, in practice usually has a low-dimensional range. Thus, in practice is a low-dimensional operator. From computational point of view, this means computing the A-GOODE criterion is significantly cheaper than that of the classical A-optimality which is .

Minimizing the A-GOODE criterion can be understood as minimizing the average variance of the goal QoI. However, having a measure of the statistical quality of the estimator

is also of crucial importance. Theorem 3.1 below addresses this by relating the goal-oriented A-optimal criterion and the expected Bayes risk of , which is nothing but the mean squared error averaged over the prior distribution.

###### Theorem 3.1.

Consider a Bayesian linear inverse problem as formulated in Section 2.1. Let be the estimator for the goal QoI, , where has full row-rank, then,

 Eμ\mathuppr[Ey|\boldmathθ[∥∥\boldmathρ\boldmathρypost−\boldmathρ∥∥2]]=tr(Σpost). (12)
###### Proof.

See Appendix A. ∎

Note that for notational convenience, and since the result holds pointwise in , we have suppressed the dependence on in the statement of the theorem.

###### Remark 3.2.

In Theorem 3.1, if we choose , i.e., classical Bayesian A-optimal experimental design for a Gaussian posterior distribution, we recover the known result [2]

 E\boldmathθ[Ey|\boldmathθ[∥∥\boldmathθypost−\boldmathθ∥∥2]]=tr(Γpost).

See also [19] for a derivation of the above expression in the infinite-dimensional setting.

##### Goal-oriented Bayesian D-Optimality criterion

The goal-oriented D-optimal design (D-GOODE) criterion is taken to be the log-determinant of the posterior end-goal covariance :

 ΨGD(\boldmathw):=logdet(Σpost(\boldmathw))=logdet(P[H(%\boldmath$w$)]−1P∗). (13)

To explain the motivation for this optimality criterion, we consider the Kullback-Leibler (KL) divergence [48] from the posterior to prior distribution of the end-goal QoI :

 DKL{Pa(% \boldmathρ|y,\boldmathw)∥Pb(% \boldmathρ)} =DKL{N(% \boldmathρpost(\boldmathw),Σpost(% \boldmathw))∥N(\boldmathρpr,Σpr)}. (14)

Since both distributions are Gaussian, the KL-divergence has a closed form expression, which will simplify the calculations considerably. The expected information gain is defined as

 ΨKL(\boldmathw)=Eμ\mathuppr[Ey|\boldmathθ,\boldmathw[DKL{Pa(\boldmathρ\boldmathρ|y,% \boldmathw)∥Pb(\boldmathρ)}]]. (15)

The classical Bayesian D-optimality criterion is related to the expected information gain, quantified by the expected KL divergence between the posterior distribution and the prior distribution. A similar relation for the goal-oriented D-optimality criterion can be derived. We present the following result that relates and the expected information gain :

###### Theorem 3.3.

Consider a Bayesian linear inverse problem as formulated in Section 2.1. Let be an end-goal QoI, where has full row-rank. Then,

 Eμ\mathuppr[Ey|\boldmathθ[DKL{Pa(%\boldmath$ρ$|y)∥Pb(\boldmathρ)}]]=−12logdet(Σpost)+12logdetΣpr. (16)
###### Proof.

See Appendix D. ∎

The significance of Theorem 3.3 is that it says minimizing with the appropriate constraints on , amounts to maximizing the expected information gain under the same constraints.

###### Remark 3.4.

Note that in the case of , and in the limit as the criterion (13) is meaningless. The reason for this is that in the infinite-dimensional limit

is a positive self-adjoint trace class operator and thus its eigenvalues accumulate at zero. On the other hand in the case of

, (16) simplifies to

 ΨKL(\boldmathw)=−12logdetΓpost(\boldmathw)+12logdetΓpr=12logdet(Γ1/2prHmisfitΓ1/2pr+I),

which recovers the known result [19] that for a Gaussian linear Bayesian inverse problem

 Eμ\mathuppr[Ey|\boldmathθ,\boldmathw[DKL{N(\boldmathθypost(% \boldmathw),Γpost(\boldmathw))∥N(\boldmathθ\boldmathθpr,Γpr)}]]=12logdet(Γ1/2prHmisfitΓ1/2pr+I), (17)

and note that the quantity to the right is well defined in the limit as .

The previous remark shows that (17) is the correct expression for the expected information gain to choose in the case of . While the derivation of (17) here is done for the discretized version of the problem, as shown in [19] such an expression for the expected information gain can be derived in the infinite-dimensional Hilbert space setting also.

### 3.2 Goal-oriented sensor placement

The experimental conditions we choose to control are the sensor locations in the domain at which data are to be collected. This can be expressed as an optimal design of experiments (ODE) problem, as we now demonstrate. Our strategy is to fix an array of candidate locations for sensors and then select an optimal subset of the candidate sensor locations. In this context, a design is a binary vector where each entry corresponds to whether or not a particular sensor is active. Practical considerations, such as budgetary or physical constraints, limit the number of sensors that can be chosen. In this context, ODE seeks to identify the best possible sensor locations out of the possible sensor locations.

We consider Bayesian inverse problems with time-dependent linear forward models. In this case, maps the inversion parameter to spatio-temporal observations of the state at the sensor locations and at observation times. In what follows, we make use of the notation for the forward model that maps to the equivalent sensor measurements at observation time instance .

In the present formulation, enters the Bayesian inverse problem through the data likelihood:

 (18)

where , and is a block diagonal matrix with . In particular, where and is the Kronecker product. The noise covariance is in general a block diagonal matrix , where is the spatial noise covariance matrix corresponding to th observation time, and is the number of observation time instances. In the present work, we assume that observations are uncorrelated in space and time, and thus is a diagonal matrix. While this assumption is not necessary, it simplifies the formulation considerably. Since is also diagonal, we have the convenient relation

The posterior covariance of the parameter is

 Γpost(\boldmathw)=[H(% \boldmathw)]−1=(Hmisfit(\boldmathw% )+Γ−1pr)−1=(F∗WΓF+Γ−1pr)−1. (19)

Therefore, the posterior distribution of the goal QoI , conditioned by the observations , and the design is the Gaussian with

 (20)

The above definition of will be substituted in the GOODE criteria described above.

Identifying the best sensor locations out of a set of candidate sensor locations is a combinatorial problem that is computationally intractable even for modest values of and . A standard approach [49, 25, 8, 9, 15], which we follow in this article, is to relax the binary condition on the design weights and let , . To ensure that only a limited number of sensors are allowed to be active, we use sparsifying penalty functions to control the sparsity of the optimal designs.

In the present formulation, non-binary weights are difficult to interpret and implement. Thus, some form of thresholding scheme is required to make a computed optimal design vector into a binary design. In this work, we adopt the following heuristic for thresholding: assuming that only

sensors are to be placed in the candidate locations, as is common practice, the locations corresponding to highest weights can be selected. This means that the corresponding weights are set to whereas all other sensors are set to , therefore giving a “near-optimal” solution to the original binary problem [50]. An alternative approach, not considered in this article, is to obtain binary weights is to successively approximate the -norm by employing a sequence of penalty functions yielding a binary solution [15]. The issue of sparsification and the choice of sparsifying penalty function is elaborated further in the description of the GOODE optimization problem formulation below and in the numerical results section.

### 3.3 The optimization problem

The generic form of the goal-oriented experimental design problem involves an optimization problem of the form

 min\boldmathw∈R\textscNsΨ(\boldmathw)+αΦ(\boldmathw) (21) subject to 0≤wi≤1,i=1,…,\textscNs,

where, is the specific design criterion, is a penalty function, and is a user-defined penalty parameter that controls sparsity of the design. In this work, we make a choice to incorporate an norm to control sparsity of the design; that is, we set

 Φ(\boldmathw):=∥∥\boldmathw∥∥1. (22)

Depending on whether we want goal-Oriented A- or D-optimality, is either or , respectively.

The optimization problem is solved using a gradient based approach, and therefore, this requires the derivation of the gradient. Since the design weights are restricted to the interval , differentiable, and the gradient of the penalty term in (21) is , where is a vector of ones. In the sequel, we derive expressions for the gradient of and .

The gradient of with respect to the design is given by

 ∇\boldmathwΨGA=−\textscNt∑k=1\textscNgoal∑j=1\boldmathζk,j⊙% \boldmathζk,j, (23a) where ⊙ is the pointwise Hadamard product, and \boldmathζk,j=R−12kF0,k[H(\boldmathw)]−1P∗\boldmathei. (23b)

Here is the coordinate vector in , and is the forward model that maps the parameter to the equivalent observation at time instance , . See Appendix B for derivation of this gradient expression.

In the case the forward model is time-independent, the expressions simplify to

 ∇\boldmathwΨGA=−\textscNgoal∑i=1(Γ−12noiseF[H(\boldmathw)]−1P∗\boldmathei)⊙(Γ−12noiseF[H(% \boldmathw)]−1P∗\boldmathei), (24)

where is the coordinate vector in .

We present two alternate way of computing the gradient that are equivalent, but differ in computational cost depending on the number of sensors and dimension of the goal QoI, i.e. . In the first formulation we assume that . We can compute the gradient as

 ∇\boldmathw(ΨGD(\boldmathw))=−\textscNt∑k=1\textscNgoal∑j=1% \boldmathξk,j⊙\boldmathξk,j, (25a) where \boldmathξk,j=R−1/2kF0,k[H(\boldmathw)]−1P∗Σ−1/2post(\boldmathw)\boldmathej, (25b)

and is the coordinate vector in . The details are given in Appendix C.1. This formulation is especially suited for large-scale four-dimensional variational (4D-Var) data assimilation applications  [51, 52, 53], including weather forecasting, and ocean simulations. Note that, in this formulation, evaluating the gradient requires the square-root of the posterior covariance matrix .

Evaluating (25), requires solving linear systems. If , the following alternative formulation of the gradient of will be computationally beneficial:

 ∇\boldmathw(ΨGD(\boldmathw))=−\textscNt∑k=1\textscNs∑i=1% \boldmathei(\boldmathηTk,iΣ−1post\boldmathηk,i) (26a) where \boldmathηk,i is now: \boldmathηk,i=P[H(\boldmathw)]−1F∗k,0R−1/2k% \boldmathei, (26b)

and is the coordinate vector in . The derivation details are given in Appendix C.2. Note that the two gradient expressions are equivalent, in exact arithmetic.

In the time-independent setting, where a single vector of sensor measurements is available, the gradient expressions for simplify as follows. The formula (25) reduces to

 ∇\boldmathw(ΨGD(\boldmathw))=−\textscNgoal∑j=1\boldmathξj⊙\boldmathξjwith \boldmathξj=(Γ−1/2noiseF[H(\boldmathw)]−1P∗Σ−1/2post(% \boldmathw)\boldmathej), (27a) where \boldmathej is the jth coordinate vector in R\textscNgoal, and (26) reduces to (27b) where \boldmathei is the ith coordinate vector in R\textscNobs.

### 3.4 Implementation and computational considerations

The main bottleneck of solving A-GOODE and D-GOODE problems lie in the evaluation of the respective objective functions and the associated gradients. Here, we detail the steps of evaluating the objective function and the gradient of both A-GOODE and D-GOODE problems. These steps are explained in Algorithms 1, and 2, respectively. For simplicity, we drop the penalty terms in the computations presented in the two Algorithms 1, and 2.

The A-GOODE problem. Algorithm 1 outlines the main steps of evaluating the A-GOODE objective and the gradient (23). Notice that all loops over the end-goal dimension, in Algorithm 1, i.e., steps 1–3, 8–10, and 16–20, are embarrassingly parallel. Evaluating , , is independent of the design , and can be evaluated offline and saved for later use. Step 2 requires a Hessian solve for each of the vectors , which can be done using preconditioned conjugate gradient. Each application of the Hessian requires a forward and an adjoint PDE solve. Using the prior covariance as a preconditioner, this requires CG iterations, i.e., PDE solves, where is the numerical rank of the prior-preconditioned data misfit Hessian; see [54, 55]. Another forward solutions of the underlying PDE are required in Step in the algorithm, for gradient computation. Therefore, the total number of PDE solves for objective and gradient evaluation is . This computations can be easily parallelized in cores. Moreover, the applications of the inverse Hessian—the Hessian solves—can be accelerated by using low-rank approximations of the prior-preconditioned data misfit Hessian [54]. Note that Algorithm (1) also requires independent applications of and its adjoint, which can be done in parallel.

###### Remark 3.5.

In Algorithm 1, is the goal operator, is the dimension of the end-goal, is the number of observation time instances, is the experimental design, is the weighted Hessian, is the covariance of the measurement noise at time instance , and is the forward model that maps the parameter from time instance to the equivalent observation at observation time instance .

The D-GOODE problem. Algorithm 2 describes the main steps for calculating D-GOODE objective and gradient expressions (25)–(26). Similar to Algorithm 1, all loops over the end-goal dimension, i.e., steps 1–3, 14–22, and 16–20 in Algorithm 2, are inherently parallel. Also, the loop over the observation dimension, , in Algorithm 2 is inherently parallel. With small end-goal space, the cost of Cholesky factorization of , in step is negligible compared to PDE solutions. The first form of the gradient, i.e. steps 12–24, requires Hessian solves, and forward PDE solves that can run completely in parallel. The second form of the gradient, i.e. steps 25–41, on the other hand requires Hessian solves, and forward and adjoint solves of the underlying system of PDEs. As mentioned before, low-rank approximation of the prior-preconditioned data misfit Hessian can be used to accelerate computations. Checkpointing is utilized in the second form of the gradient, i.e., steps 28–31. The checkpointed solutions are recalled in the adjoint solves in step 33.

Algorithm 2 requires applications of and its adjoint for objective function evaluations. As for the gradient, we need applications of with the first form of the gradient, and applications of . As before, the loops over end-goal and observation dimensions are embarrassingly parallel.

### 3.5 Connections to classical ODE criteria and extensions

Here we further discuss the connections of the GOODE criteria to the corresponding classical experimental design criteria and an extension to nonlinear goal operators. We have already mentioned two important connections:

1. Taking trivially recovers the Bayesian A- and D-optimal design criteria.

2. In the special case is a row vector , we get , which is the Bayesian C-optimality criterion.

Furthermore, if the vector is randomly drawn from a distribution with mean zero and identity covariance, then