Log In Sign Up

Statistical Treatment of Inverse Problems Constrained by Differential Equations-Based Models with Stochastic Terms

This paper introduces a statistical treatment of inverse problems constrained by models with stochastic terms. The solution of the forward problem is given by a distribution represented numerically by an ensemble of simulations. The goal is to formulate the inverse problem, in particular the objective function, to find the closest forward distribution (i.e., the output of the stochastic forward problem) that best explains the distribution of the observations in a certain metric. We use proper scoring rules, a concept employed in statistical forecast verification, namely energy, variogram, and hybrid (i.e., combination of the two) scores. We study the performance of the proposed formulation in the context of two applications: a coefficient field inversion for subsurface flow governed by an elliptic partial differential equation (PDE) with a stochastic source and a parameter inversion for power grid governed by differential-algebraic equations (DAEs). In both cases we show that the variogram and the hybrid scores show better parameter inversion results than does the energy score, whereas the energy score leads to better probabilistic predictions.


page 11

page 13

page 14


Probabilistic Numerical Methods for PDE-constrained Bayesian Inverse Problems

This paper develops meshless methods for probabilistically describing di...

A benchmark for the Bayesian inversion of coefficients in partial differential equations

Bayesian methods have been widely used in the last two decades to infer ...

Estimating volcanic ash emissions using retrieved satellite ash columns and inverse ash transport modelling

This paper describes the inversion procedure being used operationally at...

Stochastic Learning Approach to Binary Optimization for Optimal Design of Experiments

We present a novel stochastic approach to binary optimization for optima...

Stochastic seismic waveform inversion using generative adversarial networks as a geological prior

We present an application of deep generative models in the context of pa...

Learning Control by Iterative Inversion

We formulate learning for control as an inverse problem – inverting a dy...

1 Introduction

Inverse problems have been traditionally posed as inferring unknown or uncertain parameters (e.g., coefficients, initial conditions, boundary or domain source terms, geometry) that characterize an underlying model from given (possibly noisy) observational or experimental data [53, 25]. Such inverse problems governed by physics-based models, also referred to as data assimilation in the meteorological and climate communities [26], abound in a wider range of application areas such as geophysics, cryosphere studies, medical imaging, biochemistry, and control theory. Typically the models governing these inverse problems are considered deterministic. In reality, however, in addition to the inversion parameter, these models involve other sources of uncertainties and randomness. For instance, the models have multiple uncertain coefficients or unknown or random source terms, parameters that are not—or cannot be—inferred. Motivated by the need to account for these additional uncertainties, researchers in recent years have shown a growing interest in considering inverse problems governed by stochastic (or uncertain) models, mostly in the context of optimal control [55, 9, 41, 31]. In this paper, we consider the inference of parameters for stochastic models (described by differential equations) and quantify the uncertainty associated with this inference.


This study introduces a methodology for the statistical treatment of inverse problems constrained by physics-based models with stochastic terms. The salient idea of our approach is to express the problem as finding the inversion parameter for which the stochastic model generates a distribution that best explains the distribution of the observations. To this end, we formulate a Bayesian inverse problem with a likelihood given by suitable statistical scoring metrics typically employed in forecast verification [49, 19, 18]. Proper scores allow for including in the inversion process a large range of statistical features (e.g., spatial and/or temporal correlations and biases) and, in this respect, are a significant departure from and improvement over the traditional least-squares misfit metrics.

We also delve into the issue of how different statistical scores affect the results of the inference problem. This issue becomes important when we explore fitness functions for multivariate distributions because one invariably needs to rely on statistics that typically favor certain features over others, for example, variance over correlations. Traditionally, the solution of the inverse problem is the parameter field (or a distribution if working with statistical inverse problems) that is the closest to the true parameter field in some metric, e.g., least-squares or statistical scores such as those proposed in this work. However, one can also pose the problem as finding the parameter field that generates the most accurate predictions in some statistical sense. In other words, we are interested in how much more predictable the model is after inference/inversion, not necessarily in the goodness of fit. We also explore this alternative inversion paradigm in this study and show that various proper scores or combination of them can be used successfully in different inversion setups (i.e, inverse problems governed by spatial differential equations and time-dependent differential equations) to improve the model predictability.

Another critical issue we address in this paper is the efficient computation of the numerical solution of the proposed statistical inverse problems. Namely, we propose a solution approach based on numerical optimization and provide the ingredients, in the form of gradient-based scalable optimization algorithms and adjoint sensitivities, that are needed to ensure the scalability of our methodology to large-scale problems. More specifically, to compute the maximum a posteriori (MAP) point, we solve an optimization problem (implicitly) constrained by the stochastic model with a quasi-Newton limited-memory secant algorithm, BFGS updates for the inverse of the Hessian of the cost function, and an Armijo line search. We derive adjoint-based expressions for efficient computation of the gradient of the objective with respect to the inversion parameters. We illustrate our approach with two problems. The first is an inversion for the coefficient field in an elliptic partial differential equation (PDE), interpreted as a subsurface flow problem under a stochastic source field. The second problem is represented by the inversion for a parameter in a differential-algebraic system of equations (DAEs) under stochastic load terms. This model can represent a power grid system with uncertain load behavior, which induces small transients in the system, the goal here being to determine the dynamic parameters within a quasi-stationary regime.

Problem formulation

In what follows, let us consider that we have a mathematical model expressed as , with states and parameters driven by a stochastic forcing

with known probability law. Formally, such a model can consist of a standard stochastic PDE on a domain

() with suitable boundary ; in this case is a function on and is defined by means of a probability space , where is the set of events, is the algebra of events, and is a probability measure. We assume that this probability space is completely defined. We take to be a real-valued deterministic field, although extensions to random fields are also possible. Formally, the mathematical model can be defined as in [22] in the following form:


We also assume that we have a suitable vector space with finite stochastic moments when

is a PDE. More details on the setup can be found in [21]. In this work is defined by a PDE (see Sec. 4

) or an ordinary differential equation(ODE)/DAE (see Sec.

5) system. We assume the availability of sparse observations of the states corresponding to the true parameters, which we denote by . These observations are subject to observational noise , which we assume to be a random vector with known Borel probability measure , in addition to the stochastic forcing .

At the high level, the inverse problem formulation we use in this work is standard: Given model (1) and observations such that a.s., find that generates model predictions that best explain the observations under a certain metric, where the operator is a projection map from the state space to the observations space. A commonly used metric is the distance between the observations and observables . The metrics used in this work are referred to as statistical score functions that compute the fitness or a distance between the distribution of the observables and the set of validation data, namely, observations . We denote these score functions by , where the first two components represent the observation and model predictions. The third component represents the control variable, and will be used to encode prior information or a regularization factor. We introduce and discuss in detail such score functions in the next section.

Scores are positive functions that achieve their global minimum when observations and model predictions are statistically indistinguishable. For that reason scores have been used as loss or utility functions in order to assess the level of confidence one has in the probabilistic model prediction [27, 17]. We define a likelihood function based on this metric as , and eventually construct a posteriori function. Therefore, the inverse problem can be formulated as finding such that equationparentequation

while observing

with i.i.d. and .

The objective in (2a) depends on the observed data via a single or multiple realizations, the numerical model observables output , and potentially explicitly on the parameter . The inclusion of in the objective is meant to account for the prior information on the parameter or to regularize the optimization problem. We remark that the optimization problem also depends on the parameter implicitly through the PDE or ODE/DAE constraint described by .

Related work

Inverse problems with stochastic parameters are typically addressed in a multilevel context. Here the stochasticity may come from reducing the models and introducing a stochastic term that accounts for the model reduction error [3, 29]. In other cases, the additional stochastic/uncertain input is treated as a nuisance parameter and approximative premarginalization over this parameter is carried out [33, 25, 24]. In this study we consider the case when stochasticity is inherent in the problem and we do not have access to a deterministic version of a complete model. In the optimal control community, recent efforts have targeted moment matching between a stochastic controlled PDE and observations, [55, 9, 41]

. In most cases the loss function is based on statistics of univariate or marginal distributions. Extending to multivariate settings is extremely challenging because of the difficulty of accounting for complex dependencies, the curse of dimensionality, and the lack of order or rank statistics. The scoring functions used in this study are precisely addressing the multivariate aspect of the model and observational data distributions. A strategy similar to what we present here has been introduced in the statistical community sometimes under the name of statistical postprocessing or model output statistics. It consists of altering the computational model probabilistic forecasts by postprocessing the ensemble forecasts, and it tends to address the issue of bias and dispersion

[18]. Most of these approaches are variations of Bayesian model averaging discussed in [40] and the nonhomogeneous regression model proposed in [18]. In these strategies the numerical model or its parameters are not controlled; only its output is adjusted [46, 28, 6, 7, 45, 49, 14]. In the strategy proposed in this study, the model itself through its parameters is part of the control space.

The remaining sections of this paper are organized as follows. We begin by introducing the scoring functions and discuss the property needed in order for the ansatz (2) to be well posed in Section 2. In Sections 4 and 5 we introduce a prototype elliptic PDE-based model problem with application in subsurface flow and a time-dependent problem driven by DAEs with application in power grid modeling, respectively. We conclude in Section 6 with a discussion of the method, results, and its limitations.

2 Scoring and metrics

In statistics one way to quantitatively compare or rank probabilistic models is using score functions. A score function is a scalar metric that takes as inputs (i) verification data, in our inverse problem formulation the observations , and (ii) outputs from the model to be evaluated, which in the present work are the observables subject to observational noise, namely, , and returns a scalar used to scoring or ranking verification outputs with respect to the verification data. These scores are commonly used in forecast evaluation where competing forecasts are compared [48].

While evaluating numerical model simulations, one aims to detect bias, trends, outliers, or correlation misspecification. To create an objective function that can distinguish among different modeling strategies, one needs appropriate mathematical scoring metrics to rank them. Complex mathematical properties are required for consistent ranking of the models


Score functions use different statistics to distinguish among different (statistical) models. Moreover, in order to be able to distinguish among and consistently rank different models, score functions are required to have specific mathematical properties. Proper score functions are widely used in statistics to ensure consistent ranking, for example in forecast evaluation, where competing forecasts are compared [48]. The following definition of proper scoring is from [17].

Definition 1

A score is proper iff



is the probability distribution of


In other words, this definition states that a proper scoring rule prevents a score from favoring any probabilistic distribution over the distribution of the verification data. In addition, a proper score has two main desired features: (1) statistical consistency between the verification data and the model outputs, called calibration, and (2) reasonable dispersion in the model outputs, provided they are calibrated, which is referred to as sharpness. In statistics, the trend is to build proper scores that access simultaneous calibration and sharpness.

Various functions can be used to assess the error between the verification data and model outputs; however, scoring is not restricted to pointwise comparison. Methods to evaluate the quality of unidimensional outputs are well understood [16]; however, the evaluation of multidimensional outputs or ensemble of outputs has been addressed in the literature more recently [19, 38, 44] and remains challenging. The energy and variogram-based scores (detailed below) are well suited to multiple multidimensional realizations of a same model to be verified. Therefore, in this paper we will focus on these scores (and combination of them). In what follows, let us denote by the statistical part of the score independent of the parameter .

A widely accepted score is the continuous ranked probability score (CRPS):



is the cumulative distribution function (CDF) of

, , and is the Heavyside function. The CRPS computes a distance between a full probability distribution and a single deterministic observation, where both are represented by their CDF. However, this score is only univariate and cannot be used if the dimension of the observations is larger than one. Moreover, one typically does not have access to a functional probability distribution function (PDF). Therefore we will use below samples from the PDF.

For this reason and since the variogram-based score is defined directly using the samples, in the following, we express the energy and variogram-based scores for discrete arguments as .

2.1 Energy score

The energy score [19] is multivariate and proper. It generalizes CRPS (4) from univariate to multivariate and can be expressed as


This score is sensitive to bias and variance discrepancy, but potentially less sensitive to correlations; it will be denoted as the ES-score.

In the probabilistic forecast context, scores can be used as a loss function to fit probabilistic predictive distributions to observations; for instance, see [43]. Similarly to this idea, we propose to use statistical proper scores as objective functions in the underlying inverse problems. In this context, the score could, for instance, be the energy score (2b); and if only samples from distribution are available, it can be defined as follows:


where is the number of samples and are the observations corresponding to th sample from the right hand side, in other words, , evaluated at parameter . Here is a linear observation operator that extracts measurements from .

2.2 Variogram score

The variogram-based score [44] is multivariate, proper, more sensitive to covariance (structure) but less to bias. It is given by


where we take , is the dimension of observations (e.g., number of observational points in space), and is a function of the distance of the position of observation and observation . In other words we take differences between every observation and then of the corresponding expectation of the scenarios. If we denote , where is the unit vector with the component , the preceding equation becomes


This will be referred to as the VS-score.

2.3 Discussion and other scores

The energy score is known for failing to discriminate misspecified correlation structures of the fields, but it successfully identifies fields with expectation similar to the one of the verifying data. On the other hand, the variogram-based score fails to discriminate fields with misspecified intensity, but it discriminates between correlation structures [39, 44]. Because of these different features and in order to discriminate fields according to their intensity and correlation structure, we propose to use a linear combination of the two scores, namely,


where and are problem specific. We will refer to this hybrid score as the HS-score. It is also a proper score because any linear positive combination of proper scores remains a proper score.

The score functions defined above are referred to as instantaneous scores because they are functions of one verification sample. If more than one verification sample is available—for example, if we have samples of

from the true distribution—then we can estimate the mean score defined as follows:


In most cases, scores are used on verification data that are assumed to be perfect. In practice, however, observations are almost always tainted with errors. Limited recent studies on forecast verification attempt to address this issue [13, 32]

. Incorporating error and uncertainty in the scoring setup is challenging. Analytical results are tractable only in particular cases such as linear or multiplicative noise with Gaussian and Gamma distributions, respectively. One way of tackling the observational error is to assume some probability distributions for the observations and the model outputs and to consider a new score defined as the conditional expectation of the original score given the observations

[32]. Using the notations of Definition 1, we can express the corrected score as , where represents the hidden true state of the system. In practice, to implement this method, one has to assume some distribution for the and and access an estimation of the distribution parameters. If the errors are i.i.d., then their contribution can be factored out when using the score as a loss function. This is the case under consideration in this study.

3 Model problems

To probe the proposed statistical treatment of inverse problems constrained by differential equations-based models with stochastic inputs, we consider two model problems. The first is a coefficient field inversion for subsurface flow governed by an elliptic PDE with a stochastic input, in other words, a PDE-constrained model problem (Section 4); the second is a parameter identification problem for power grid governed by DAEs with stochastic input, in other words, a DAE-constrained model problem (Section 5). For both problems, we generate synthetic observations by using one or more samples , . These samples then enter into the forward models with a parameter considered the truth, . We note that can also be a distribution. We then solve the optimization problem (2a) for the MAP point by evaluating the likelihood or a posteriori function with the precomputed scenarios such that , . To solve the PDE-constrained model problem efficiently, in Section 4.1 we derive the gradient of the objective function with respect to these fixed scenarios using adjoints. We remark that our calculations use classical Monte Carlo to estimate the solution of the problem at hand; however, more sophisticated methods such as higher-order [20] or multilevel [15] Monte Carlo methods can be used to solve the underlying stochastic PDE.

4 Model problem 1: Coefficient field inversion in an elliptic PDE with a random input

In this section, we study the inference of the log coefficient field in an elliptic partial differential equation with a random/stochastic input. This example can model, for instance, the steady-state equivalent for groundwater flows [31]. For simplicity, we state the equations using a deterministic right-hand side. We will then turn our attention to the case where the volume source terms are stochastic. To this end, consider the forward model


where () is an open bounded domain with sufficiently smooth boundary , . Here, is the state variable; , , and are volume, Dirichlet, and Neumann boundary source terms, respectively; and is an uncertain parameter field in , where is a Laplacian-like operator, as defined in  [47, 1] and for completeness repeated in Section 4.3. To state the weak form of (11), we define the spaces,

where is the Sobolev space of functions in with square integrable derivatives. Then, the weak form of (11) is as follows: Find such that

Here and denote the standard inner products in and , respectively.

In what follows we treat as a stochastic term, denoted by for consistency, given by a two-dimensional heterogeneous Gaussian process with known distribution . We use the instantaneous scores defined in Section 2 for this model.

4.1 Adjoint and gradient derivation

We apply an adjoint-based approach to derive gradient information with respect to the parameter field for the optimization problem (2a). The adjoint equations are derived through a Lagrangian formalism [50]. To this end, the Lagrangian functional can be written as


where is the adjoint corresponding to state . The formal Lagrangian formalism yields that, at a minimizer of (2a), variations of the Lagrangian functional with respect to all variables vanish. Thus we have



for all variations and . Note that (13a) and (13b) are the weak forms of the state and of the adjoint equations, respectively. The adjoint right-hand side in strong form for the energy score (6) is


and for the variogram score (7) the component of is


for , and for . Here


where denotes the component of , namely, .

The left-hand side in (13c) gives the gradient for the cost functional (2a), which is the Fréchet derivative of with respect to . In strong form this is


where and are solutions to the state and adjoint equations, respectively [50, 8].

4.2 Computational approach and cost

The inverse problems (2) are solved by using hIPPYlib (an inverse problem Python library [52, 51]). It implements state-of-the-art scalable adjoint-based algorithms for PDE-based deterministic and Bayesian inverse problems. It builds on FEniCS [12, 30] for the discretization of the PDEs and on PETSc [4, 5] for scalable and efficient linear algebra operations and solvers needed for the solution of the PDEs.

The gradient computation technique presented in the preceding section allows using state-of-the-art nonintrusive computational techniques of nonlinear programming to solve the estimation problems (2) efficiently for the energy and variogram scores we propose, as well as any combination of them. More specifically, we use a quasi-Newton limited-memory secant algorithm with BFGS updates for the inverse of the Hessian [34, 8] and an Armijo linesearch  [34] to solve (2) as an unconstrained optimization problem. This quasi-Newton solution approach is appealing since it can have fast local convergence properties similar to Newton-like methods without requiring Hessian evaluations and it also converges from remote starting points as robust as a gradient-based algorithm. In our computations the total number of quasi-Newton iterations was reasonably low, varying between and . The implementation in hIPPYlib uses an efficient compact limited-memory representation [11] of the inverse Hessian approximation that has reduced space and time computational complexities, namely, , where denotes the cardinal of the discretization vector of and is the length of the quasi-Newton secant memory (usually taken as .

The computational cost per iteration is overwhelmingly incurred in the evaluation of the objective function in (2) and its gradient. For both the energy and variogram scores, the evaluation of the objective and its gradient requires forward PDE solves and adjoint PDE solves, respectively, to compute states in (13a) and adjoint variables in (13b). To achieve this, the projected states appearing in (6) and (14) are stored in memory to avoid the expensive re-evaluations of the PDEs and state projections for the computation of the score in (6) and adjoint right-hand side in (14). Similarly, for the variogram score, in the evaluation of the objective function we save the terms () as a matrix for each and reuse them in the computation of the adjoint right-hand sides (15) during the objective gradient evaluation. This approach effectively avoids expensive re-evaluations of the PDEs at the cost of extra storage.

From (6) and (14) one can see that the computation of the energy score and its gradient also includes a complexity term in addition to the forward and adjoint solves. A similar extra complexity term is present in the computation of the variogram score from (7) and its adjoint right-hand size from (15).

Undoubtedly, the objective and gradient computations can be parallelized efficiently for both scores because of the presence of the summation operators. In particular, both scores allow a straightforward scenario-based decomposition that allows the PDE (forward and adjoint) solves to be done in parallel. Coupled with the (lower-level) parallelism achievable in hIPPYlib via DOLFIN and PETSc, this approach would result in an effective multilevel decomposition with potential for massive parallelism and would allow tackling complex PDEs and a large number of scenarios. The quasi-Newton method based on secant updates used in this work can be parallelized efficiently, as one of the authors has shown recently [35].

We remark that a couple of potential nontrivial parallelization bottlenecks exist. For example, both the energy score and variogram score apparently require nontrivial interprocess communication in computing the right-hand side (14) of the adjoint systems as well as in the computation of the double summation in the score itself. In this work we have used only serial calculations and deferred for future investigations efficient parallel computation techniques addressing such concerns.

4.3 Computational experiment

In this section we present the numerical experiment setup for the forward and inverse problems.

Forward problem: For the forward problem (11), we assume an unknown volume forcing, (i.e., ) and no-flow conditions on , in other words, the homogeneous Neumann conditions on . The flow is driven by a pressure difference between the top and the bottom boundary; that is, we use on and on . This Dirichlet part of the boundary is denoted by . In Figure 1, we show the “truth” permeability used in our numerical tests, and the corresponding pressure.


Figure 1: Log permeability field (a) and pressure field obtained by solving the state equation with (b). The dots show the pointwise observations.

The stochastic forcing term: The stochastic volume forcing is given by a two-dimensional heterogeneous Gaussian process with known distribution defined by


where , and . We choose , , , and . In Figure 2 we illustrate the forcing field and solution used to generate the observations as well as two other realizations of the forcing field along with their corresponding solutions.

The prior: Following [47], we choose the prior to be Gaussian; that is, is a prior distribution, where is the mean and is the covariance operator of the prior, modeled as the inverse of an elliptic differential operator. To study the effect of the prior on our results, we use an informative prior and the standard prior, both built in hIPPYlib [52]. The informative prior is constructed by assuming that we can measure the log-permeability coefficient at five points, namely, , in , namely, , , , , , as in [52]. This prior is built by using mollifier functions

The mean for this prior is computed as a regularized least-squares fit of the point observations , by solving


where is a differential operator of the form


equipped with homogeneous natural boundary conditions, , and is a realization of a Gaussian random field with zero average and covariance matrix . Above

is an s.p.d. anisotropic tensor,

, and control the correlation length and the variance of the prior operator; in our computations we used and . The covariance for the informative prior is defined as , where , with a penalization constant taken as 10 in our computations. The standard prior distribution is , with .

We note that the prior in finite dimensions is given by


where is the discretization of the prior covariance operator and is a mass weighted inner product [10, 36]. In the Bayesian formulation, the posterior is obtained as . By taking the negative log of the posterior, the objective in (2a) becomes


While the statistical assumptions ease the computations, this objective can take a different form under different functional likelihood or prior expressions. However, the overall MAP finding procedure will broadly follow the same steps.

Figure 2: Left column represents the pair of the noise realization and solution used to generate the observations. Two realizations of the forcing field () (top) along with their corresponding solutions () (bottom).

4.4 Results

The aim of this computational study is two-pronged. On the one hand, our goal is to invert for the unknown (or uncertain) parameter field with the measure of success being the retrieval of a parameter field close to the “truth”: this is the traditional inverse problem approach. On the other hand, we aim to generate accurate predictions in some statistical sense; for example, we are interested in covering well the multivariate distribution of the observables. We will therefore carry out two analyses: one focused on the inverted parameter field and one on the model output (i.e., the observables). In each of the analyses, to assess the inversion quality of our approach, we will use the standard root mean square error (RMSE), the Structural SIMilarity (SSIM) index [54], and the rank histogram as a qualitative tools. The SSIM is the product of three terms (luminance, contrast, and structure) evaluating respectively the matching of intensity between the two datasets and , the variability, and the covariability of the two signals. In statistical terms, luminance, contrast, and structure can be seen as evaluating the bias, variance, and correlation between the two datasets, respectively. SSIM is expressed as

where , , and

respectively are the mean, standard deviation, and cross-covariance of each dataset, and

, , and are constants derived from the datasets. The SSIM takes values between and . The closer to the values are, the more similar the two signals are in terms of intensity, variability, and covariability. Researchers commonly also investigate the three components (luminance, contrast, variability) separately, as done hereafter.

For a visual assessment of the statistical consistency between two datasets in terms of probability distributions [2, 23] we use the rank histogram, an assessment tool often used in forecast verification. This rank histogram gives us an idea about the statistical consistency for the two datasets. The more uniform the histogram is, the more statistically consistent (i.e., sharp and calibrated) it is.

We solve the optimization problem (2a) with as the energy and variogram scores and the forward model given by (11). To understand the effect of the prior on the inversion results, for our numerical studies, we consider two priors: an informed prior and a standard prior, as discussed in Section 4.3. In what follows, we discuss the inversion results.

Comparison of MAP and true parameters

In Figure 3 we show the difference between the true parameter and the MAP estimate for both the informative (top) and standard priors (bottom), as well as for the energy (left column) and variogram (center column) scores. In the top-right panel we also show the initial guess for the optimization solver. The results reveal that the VS-score objective displays a stronger match between the MAP and than does the ES-score one, whereas the HS-score falls in between. The HS-score coefficients in (9) are chosen to be and , with a better-informed choice possible but not fully explored in this study. Models with the informed prior exhibit smaller discrepancies than do the models with the standard prior. These results are displayed in Table 1.

a) informed prior: energy score (left), variogram score (center) and initial guess (right)
b) standard prior: energy score (left), variogram score (right)
Figure 3: Parameter field discrepancy (left and center columns) and initial guess (top right) when using the informed (top) and standard (bottom) priors with the energy and variogram scores. The Monte Carlo sample size is 64 in all panels.

Table 1 shows the RMSE and SSIM and its 3 components, computed between the parameters and . As expected, being in the informed prior case leads to smaller RMSE and better SSIM for the inverted parameter, in other words, better overall performance. Additionally, the contrast term, which is related to the variance of each signal, is well captured by the three scores and by both types of priors (standard and informed). In particular, we note that the use of standard priors degrades the capture of the intensity of the parameters given by the luminance term. As expected, the VS-score and HS-score perform better than the ES-score at capturing the covariance between the two parameters and (structure term).

Samples Luminance Contrast Structure SSIM RMSE
Model (a) Informed prior
ES 1 0.847 1 0.698 0.591 1.137
4 0.801 0.989 0.797 0.631 1.134
8 0.824 0.995 0.777 0.637 1.108
32 0.803 0.996 0.765 0.612 1.155
64 0.793 0.992 0.757 0.596 1.176
128 0.786 0.995 0.759 0.594 1.187
VS 1 0.825 0.997 0.755 0.621 1.124
4 0.955 0.976 0.868 0.809 0.696
8 0.966 0.967 0.859 0.803 0.66
32 0.98 0.948 0.848 0.788 0.617
64 0.982 0.939 0.846 0.78 0.612
128 0.987 0.947 0.855 0.799 0.574
HS 1 0.837 1 0.722 0.605 1.136
4 0.935 0.981 0.868 0.796 0.765
8 0.947 0.979 0.853 0.737 0.737
32 0.958 0.973 0.833 0.776 0.719
64 0.960 0.966 0.83 0.769 0.716
128 0.962 0.972 0.836 0.782 0.697
Model (b) Standard prior
ES 1 -0.729 0.997 0.442 -0.321 2.988
4 -0.336 1 0.371 -0.125 2.535
8 -0.29 0.999 0.385 -0.112 2.483
32 -0.386 0.989 0.469 -0.179 2.538
64 -0.283 0.985 0.469 -0.131 2.43
128 -0.371 0.996 0.412 -0.152 2.548
VS 1 -0.828 0.991 0.527 -0.432 3.146
4 -0.547 0.999 0.447 -0.245 2.737
8 0.268 0.998 0.363 0.097 2.017
32 0.412 0.995 0.46 0.189 1.8
64 0.839 0.995 0.43 0.359 1.3
128 0.863 0.999 0.402 0.346 1.292
HS 1 -0.85 0.989 0.501 -0.421 3.202
4 -0.351 0.998 0.42 -0.147 2.527
8 -0.192 1 0.406 -0.078 2.385
32 -0.223 0.989 0.485 -0.107 2.37
64 -0.023 0.992 0.465 -0.011 2.194
128 -0.074 0.998 0.421 -0.031 2.265
Table 1: Quality of the reconstruction of the parameter field (i.e., the MAP point ) measured by different metrics with informed (a) and standard (b) priors. The Samples column lists the number of Monte Carlo samples used to approximate the stochastic right-hand side was. The Luminance column shows the consistency in terms of intensity of the two signals; the Contrast column represents the matching of variance of the two signals; and the Structure column shows the covariance matching between the two signals. The SSIM column—the product of the luminance, contrast, and structure—is a global measure of consistency of the two studied quantities. The last column, RMSE, shows the root mean squared errors between the two signals. This table shows that in the proposed setup of informed priors enables better results in terms of SSIM and RMSE and that the VS-model tends to provide a better matching between the true and estimated parameter field. The variance of each signal tends to be well captured by all models.

Comparison between and

To assess the quality and statistical properties of the observables generated by the model, in Figure 4 we show the rank histograms reflecting the statistical consistency between the true observables and the generated ones . The results show that the standard priors (right row) provide a better calibration between and than do the informed priors (central row). Additionally, the results show that the ES-score (top row) generates calibrated . This is not unexpected since the energy score is known for discriminating between the intensity of the signals it compares [39]. The VS-score (center row) does not present good calibration results. This result is not unexpected either since the variogram score is known for not capturing the intensity of the signals it compares [44]. We remark that the observables from the VS-score show some overdispersion (bell-shaped histogram). The hybrid score seems to take advantage of the properties of the energy score in terms of calibration and thus appears to be a good compromise between the ES-score and the VS-score. We note that the rank histograms do not assess the correlation structure of the data. In that context we investigate in the following indexes that measure the spatial data structure.

initial guess ES (informed prior) ES (standard prior)
VS (informed prior) VS (standard prior)
HS (informed prior) HS (standard prior)
Figure 4: Rank histogram between and . From left to right: dguess, obtained by using the informed prior, and obtained by using the standard prior. From top to bottom: ES-score ; VS-score ; HS-score. The horizontal line is the perfect uniform histogram that represents a perfect match between and . Red whiskers show

-confidence intervals associated with the estimated count histogram. The closer to the uniform histogram, the better the consistency between

and . Histograms are obtained for simulations with 64 samples.

In order to investigate the spatial structure of the observables, a metric assessing structural feature, namely, the SSIM, is computed. For each generated sample, in order to assess the overall error between the signals, the SSIM and RMSE are computed between the true and the recovered observable. The values of the metrics are summarized in boxplots in Figure 5. Comparable to Figure 4, Figure 5 shows that the ES-score and HS-score provide better results than does the VS-score in terms of recovering the observables . The metrics tend to have more variability for the VS-score and HS-score when the number of samples increases. This variability likely comes from the overdispersion of the outputs of the VS-score. We note, however, that the overall range of the bulk of the distribution (box) stays reasonably narrow. The hybrid score thus appears to be a good compromise between the ES-score and VS-score.

ES (standard prior) VS (standard prior) HS (standard prior)
ES (informed prior) VS (informed prior) HS (informed prior)
ES (standard prior) VS (standard prior) HS (standard prior)
ES (informed prior) VS (informed prior) HS (informed prior)
Figure 5: RMSE (two upper rows) and SSIM (two lower rows) between and each sample of . Left to right: ES-score; VS-score; HS-score. Models are run with informed and standard priors. The number of samples of the right-hand side varies for each model (N = 4, 8, 32, 64, 128). The RMSE is expected to be as close to 0 as possible, whereas an ideal SSIM would be as close to 1 as possible.

As a conclusion, the overall method shows a wide range of results in this experimental setup. In terms of capturing the parameter field and the intensity and the variability of , the VS- and HS- scores show better agreement with the true one than does the ES-based model. For observables, however, the ES-score shows good results in capturing statistics of the data . As expected, the informed priors help capture the parameter better, as seen on Table 1 whereas the standard prior case gives a better calibration between and and more accurate intensity and structure of (see Figures 4 and 5) arguably by relaxing the parameter constrained through the prior.

5 Model problem 2: Parameter identification in power grid applications governed by DAEs

Next we probe the proposed scores on a power grid inverse model problem governed by an index-1 DAE system. This model incorporates an electromagnetic machine, a slack bus, and a stochastic load as illustrated in Figure 6.

Figure 6: Power grid diagram.

We model the power grid using the generator, current, and network equations [42], namely, equationparentequation


Here is associated mainly with generators, and represents current (part of the generator equations) and the network equations (Kirchhoff). The full set of equations is given in Appendix A. The unknown (or inversion) parameter here is and represents the generator inertia. This parameter can be interpreted as how fast the generator reacts to fluctuations in the network. In our case represents fluctuations in the load, where and represent the real and imaginary resistive components, respectively.

In recent work we have explored estimating the inertia parameters in a standard 9-bus system given a single known disturbance in the load from synthetic bus voltage observations [37]. In this study we pose the problem as having a small signal disturbance in the load, which is a discrete process in time. This problem now describes a realistic behavior of small-scale consumers drawing power from the grid in an unobserved fashion. What we consider to be known is the distribution of the probabilistic load process. Moreover, we assume that we measure the power flow (or voltage) at one of the buses (see Figure 6).

(a) simulated
(b) simulated
(c) time series
(d) time series
Figure 7: Voltage with (a) real and (b) imaginary parts at the measurement location for two samples. Load noise around the stationarity baseline for the (c) resitive and (d) reactive components.

5.1 Computational experiment

We start with the system at dynamic steady state having and . The system is integrated with backward Euler with stochastic forcing. This discretization is then equivalent to first-order weak convergence in the mean square error sense of the stochastic DAE. Higher-order methods have been used as well, without noticing any qualitative differences for our current setup. The time series generated by the stochastic forcing is given by two independent stationary processes with known distribution :


where and, as before, . Some realizations of these time series are shown in Figure 7(c)-7(d).

The simulation time is seconds, with a time step of (10 ms). From the 10-second window we extract 5 seconds (seconds 3 to 8) to avoid initialization or mixture issues. We consider an ensemble of 1,000 samples integrated with this time step, with 800 being considered as numerical simulations and 200 set aside for observations. In this experiment we do not consider observational noise and do not need to use any regularization. The optimization problem becomes a univariate unconstrained program, and therefore we approximate the gradient with finite differences. Nevertheless, one can compute the gradients via adjoints as has been done for model problem 1 in Section 4 as well.

5.2 Analysis of the results

The time-dependent setting allows us to analyze different aspects of the inverse problem solution. For instance, in the steady-state case such as the first model problem, the spatial domain is fixed, and in general so is the number of observations. In the unsteady example, one typically has control over the observation window. To this end, we begin by exploring the effect of adding observations to the inference process and thus using the mean score (10). Specifically, we compute the score values at integer values of the parameter, from 1 to 35, and use 1 to 200 batches of observations. In other words we explore the mean score values with . One batch is the time series obtained under a stochastic forcing realization. Each batch can be seen as a 5-second simulation, and this assumes that the distribution is stationary for the entire inference window. The observations of the voltage and corresponding to the middle bus are taken at every time step.

We illustrate the results in Figure 8 for the ES-score and VS-score computed with respect to two exact values of the parameter, 10 and 20, respectively. We observe that both the energy and variogram scores converge to the exact value, with the variogram score converging much faster than the energy score especially when the exact value of the parameter is 10. We also remark that convergence guarantees are not easy to ascertain a priori; as reflected in the figure, a different number of observations are necessary in order to reach an accurate conclusion. One possible strategy to mitigate this issue is to use two different scores and observe the system until they are in agreement and do not change with additional observations.

Figure 8: Grid search of the minimizer as a function of observation batches. The exact parameter value is 10 for the left panel and 20 for right one. As the number of observations increases, the minimizer converges to the true value.
Figure 9: Reconstructed parameter for (left) and (right). The results obtained with the energy score are shown in red (cross) and for the variogram score in blue (circles). The bounds used in the optimization solver are shown with a black solid line.

The numerical results are carried out in Matlab using the default optimization solver. In Figure 9 we show the reconstructed parameter values as a function of function evaluations for the exact parameter values (black dashed line) 10 (left) and 20 (right). The results for the energy score are shown in red (cross) and for the variogram score in blue (circles). The bounds set to truth used in the optimization solver are shown with a black solid line. This figure shows that with both scores the optimization solver converges to a relatively good estimate of the exact parameter value in a relatively small number of function evaluations.

6 Conclusions

We have presented a statistical treatment of inverse problems governed by physics-based models with stochastic inputs. The goal of this study is to quantify the quality of the inverted parameter field, measured by a comparison with the “truth,” and the quality of the recovered observable: for example, given the inverted parameter field, quantify how well we fit the distribution of the observable. The end goal of our study is to introduce a Bayesian inverse problem formulation that facilitates the integration of data with physics-based models in order to quantify the uncertainties in model predictions. To this end, inspired from statistics, we propose to replace the traditional least-squares minimization problem—which minimizes the norm of the misfit between data and observables—with a set of loss functions that describes quantitatively the distance between the distributions of model generated data and the distribution of observational data. We refer to these metrics as scores, as known in the statistics community.

To compute the maximum a posteriori (MAP) point for the proposed Bayesian inverse problem, we solve an optimization problem constrained by the physics-based models under stochastic inputs with a quasi-Newton limited-memory algorithm. For efficient calculation of the gradient of the objective with respect to the inversion parameters, we derive adjoint-based expressions. Several challenges are associated with solving such optimization problems. First, these inverse problems are large scale, stemming from discretization of the parameter field in the case of PDE-based models or the size of the power grid network. Second, although we employ an efficient method to calculate derivatives, the number of adjoint solves increases with the number of samples and are coupled. As we indicated above, however, the communication during the adjoint calculations follows a fixed pattern and can be optimized for, arguably resulting in overall scalable strategies. Third, structured error in measurements (data) requires special attention, and convergence guarantees are not easy to ascertain a priori. Nevertheless, as we discussed for the second model, one can use a cross-validation strategy to ascertain and mitigate said convergence issues.

We have studied the performance of the proposed formulation in the context of two applications: a coefficient field inversion for subsurface flow governed by an elliptic PDE with a stochastic source and a parameter inversion for power grid governed by DAEs. In both cases the goal was to obtain predictive probabilistic models that explain the data.


We thank Michael Scheuerer for providing helpful comments on scoring functions and Umberto Villa for helpful discussions about hIPPYlib. The work of C. G. Petra was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. This material also was based upon work supported by the U.S. Department of Energy, Office of Science, under contract DE-AC02-06CH11357.


  • [1] A. Alexanderian, N. Petra, G. Stadler, and O. Ghattas, A fast and scalable method for A-optimal design of experiments for infinite-dimensional Bayesian nonlinear inverse problems, SIAM Journal on Scientific Computing, 38 (2016), pp. A243–A272,,
  • [2] J. L. Anderson, A method for producing and evaluating probabilistic forecasts from ensemble model integrations, Journal of Climate, 9 (1996), pp. 1518–1530.
  • [3] G. Bal, I. Langmore, and Y. Marzouk, Bayesian inverse problems with Monte Carlo forward models, Inverse Problems & Imaging, 7 (2013), p. 81,,
  • [4] S. Balay, K. Buschelman, W. D. Gropp, D. Kaushik, M. Knepley, L. C. McInnes, B. F. Smith, and H. Zhang, PETSc home page, 2001.
  • [5] S. Balay, K. Buschelman, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, B. F. Smith, and H. Zhang, PETSc Web page, 2009.
  • [6] S. Baran, Probabilistic wind speed forecasting using bayesian model averaging with truncated normal components, Computational Statistics & Data Analysis, 75 (2014), pp. 227–238.
  • [7] S. Baran and S. Lerch, Log-normal distribution based ensemble model output statistics models for probabilistic wind-speed forecasting, Quarterly Journal of the Royal Meteorological Society, 141 (2015), pp. 2289–2299.
  • [8] A. Borzì and V. Schulz, Computational Optimization of Systems Governed by Partial Differential Equations, SIAM, 2012.
  • [9] A. Borzì and G. Von Winckel, Multigrid methods and sparse-grid collocation techniques for parabolic optimal control problems with random coefficients, SIAM Journal on Scientific Computing, 31 (2009), pp. 2172–2192.
  • [10] T. Bui-Thanh, O. Ghattas, J. Martin, and G. Stadler, A computational framework for infinite-dimensional Bayesian inverse problems Part I: The linearized case, with application to global seismic inversion, SIAM Journal on Scientific Computing, 35 (2013), pp. A2494–A2523,
  • [11] R. H. Byrd, J. Nocedal, and R. B. Schnabel, Representations of quasi-newton matrices and their use in limited memory methods, Mathematical Programming, 63 (1994), pp. 129–156,,
  • [12] T. Dupont, J. Hoffman, C. Johnson, R. Kirby, M. Larson, A. Logg, and R. Scott, The FEniCS project, tech. report, 2003.
  • [13] C. A. T. Ferro, Measuring forecast performance in the presence of observation error, Quarterly Journal of the Royal Meteorological Society, 143 (2017), pp. 2665–2676,,
  • [14] Y. Gel, A. E. Raftery, and T. Gneiting, Calibrated probabilistic mesoscale weather field forecasting: The geostatistical output perturbation method, Journal of the American Statistical Association, 99 (2004), pp. 575–583.
  • [15] M. Giles, Multilevel Monte Carlo methods, Acta Numerica, 24 (2015), pp. 259–328.
  • [16] T. Gneiting and M. Katzfuss, Probabilistic forecasting, Annual Review of Statistics and Its Application, 1 (2014), pp. 125–151.
  • [17] T. Gneiting and A. E. Raftery, Strictly proper scoring rules, prediction, and estimation, Journal of the American Statistical Association, 102 (2007), pp. 359–378.
  • [18] T. Gneiting, A. E. Raftery, A. H. Westveld III, and T. Goldman, Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation, Monthly Weather Review, 133 (2005), pp. 1098–1118.
  • [19] T. Gneiting, L. I. Stanberry, E. P. Grimit, L. Held, and N. A. Johnson, Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds, Test, 17 (2008), pp. 211–235.
  • [20] M. Gunzburger, N. Jiang, and Z. Wang, A second-order time-stepping scheme for simulating ensembles of parameterized flow problems, Computational Methods in Applied Mathematics, (2017).
  • [21] M. Gunzburger, C. Webster, and G. Zhang, Stochastic finite element methods for partial differential equations with random input data, Acta Numerica, 23 (2014), p. 521–650,
  • [22] M. Hairer, Introduction to Stochastic PDEs. Lecture Notes, 2009.
  • [23] T. M. Hamill, Interpretation of rank histograms for verifying ensemble forecasts, Monthly Weather Review, 129 (2001), pp. 550–560.
  • [24] J. Kaipio and V. Kolehmainen, Bayesian Theory and Applications, Oxford University Press, 2013, ch. Approximate Marginalization Over Modeling Errors and Uncertainties in Inverse Problems, pp. 644–672.
  • [25] J. Kaipio and E. Somersalo, Statistical and Computational Inverse Problems, vol. 160 of Applied Mathematical Sciences, Springer-Verlag, New York, 2005.
  • [26] E. Kalnay, Atmospheric Modeling, Data Assimilation and Predictability, Cambridge University Press, 2003.
  • [27] R. Kass and A. Raftery, Bayes factors, Journal of the American Statistical Association, 90 (1995), pp. 773–795,
  • [28] S. Lerch and T. L. Thorarinsdottir, Comparison of non-homogeneous regression models for probabilistic wind speed forecasting, Tellus A: Dynamic Meteorology and Oceanography, 65 (2013), p. 21206.
  • [29] H. Lie, T. Sullivan, and A. Teckentrup, Random forward models and log-likelihoods in Bayesian inverse problems, ArXiv e-prints, (2017),
  • [30] A. Logg, K.-A. Mardal, and G. N. Wells, eds., Automated Solution of Differential Equations by the Finite Element Method, vol. 84 of Lecture Notes in Computational Science and Engineering, Springer, 2012.
  • [31] S. Mattis, T. Butler, C. Dawson, D. Estep, and V. Vesselinov, Parameter estimation and prediction for groundwater contamination based on measure theory, Water Resources Research, 51 (2015), pp. 7608–7629.
  • [32] P. Naveau and J. Bessac, Forecast evaluation with imperfect observations and imperfect models. 2018.
  • [33] R. Nicholson, N. Petra, and P. J. Kaipio, Estimation of the Robin coefficient field in a Poisson problem with uncertain conductivity field, Inverse Problems, 34 (2018), p. 115005.
  • [34] J. Nocedal and S. J. Wright, Numerical Optimization, Springer, New York, 2nd ed., 2006.
  • [35] C. G. Petra, A memory-distributed quasi-Newton solver for nonlinear programming problems with a small number of general constraints, in review to the Journal of Parallel and Distributed Computed, (2017).
  • [36] N. Petra, J. Martin, G. Stadler, and O. Ghattas, A computational framework for infinite-dimensional Bayesian inverse problems: Part II. Stochastic Newton MCMC with application to ice sheet inverse problems, SIAM Journal on Scientific Computing, 36 (2014), pp. A1525–A1555.
  • [37] N. Petra, C. Petra, Z. Zhang, E. Constantinescu, and M. Anitescu, A Bayesian approach for parameter estimation with uncertainty for dynamic power systems, IEEE Transactions on Power Systems, 32 (2017), pp. 2735–2743,
  • [38] P. Pinson and R. Girard, Evaluating the quality of scenarios of short-term wind power generation, Applied Energy, 96 (2012), pp. 12–20.
  • [39] P. Pinson and J. Tastu, Discrimination ability of the energy score, tech. report, Technical University of Denmark, 2013.
  • [40] A. E. Raftery, T. Gneiting, F. Balabdaoui, and M. Polakowski, Using Bayesian model averaging to calibrate forecast ensembles, Monthly Weather Review, 133 (2005), pp. 1155–1174.
  • [41] E. Rosseel and G. Wells, Optimal control with stochastic PDE constraints and uncertain controls, Computer Methods in Applied Mechanics and Engineering, 213 (2012), pp. 152–167.
  • [42] P. W. Sauer and M. Pai, Power system dynamics and stability, Prentice Hall Upper Saddle River, NJ, 1998.
  • [43] M. Scheuerer, Probabilistic quantitative precipitation forecasting using ensemble model output statistics, Quarterly Journal of the Royal Meteorological Society, 140 (2014), pp. 1086–1096.
  • [44] M. Scheuerer and T. M. Hamill, Variogram-based proper scoring rules for probabilistic forecasts of multivariate quantities, Monthly Weather Review, 143 (2015), pp. 1321–1334.
  • [45] M. Scheuerer and D. Möller, Probabilistic wind speed forecasting on a grid based on ensemble model output statistics, The Annals of Applied Statistics, 9 (2015), pp. 1328–1349.
  • [46] J. M. L. Sloughter, T. Gneiting, and A. E. Raftery, Probabilistic wind speed forecasting using ensembles and Bayesian model averaging, Journal of the American Statistical Association, 105 (2010), pp. 25–35.
  • [47] A. Stuart, Inverse problems: a Bayesian perspective, Acta Numerica, 19 (2010), pp. 451–559.
  • [48] T. Thorarinsdottir, T. Gneiting, and N. Gissibl, Using proper divergence functions to evaluate climate models, SIAM/ASA Journal on Uncertainty Quantification, 1 (2013), pp. 522–534.
  • [49] T. L. Thorarinsdottir and T. Gneiting,

    Probabilistic forecasts of wind speed: ensemble model output statistics by using heteroscedastic censored regression

    , Journal of the Royal Statistical Society: Series A (Statistics in Society), 173 (2010), pp. 371–388.
  • [50] F. Tröltzsch, Optimal Control of Partial Differential Equations: Theory, Methods and Applications, vol. 112 of Graduate Studies in Mathematics, American Mathematical Society, 2010.
  • [51] U. Villa, N. Petra, and O. Ghattas, hIPPYlib: an Extensible Software Framework for Large-scale Deterministic and Bayesian Inversion, (2016),,
  • [52] U. Villa, N. Petra, and O. Ghattas, hIPPYlib: An extensible software framework for large-scale inverse problems, Journal of Open Source Software, 3 (2018), p. 115005,
  • [53] C. R. Vogel, Computational Methods for Inverse Problems, Frontiers in Applied Mathematics, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2002.
  • [54] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, Image quality assessment: from error visibility to structural similarity, IEEE transactions on image processing, 13 (2004), pp. 600–612.
  • [55] N. Zabaras and B. Ganapathysubramanian, A scalable framework for the solution of stochastic inverse problems using a sparse grid collocation approach, Journal of Computational Physics, 227 (2008), pp. 4697–4735.

Appendix A Power grid equations

Below we present the equations extracted from [42] for the power grid example discussed in Sec. 5. The one generator 3-bus system is described by index-1 DAEs. Here we have 7 differential equations and 8 algebraic equations. The differential variables are the first seven variables, with the rest being algebraic.

The initial condition that gives a steady state is given by , here and .

The stochastic noise is characterized by and the parameter sought is , while observing the voltage at the slack bus, and .