Bayesian model inversion using stochastic spectral embedding

05/15/2020
by   P. -R. Wagner, et al.
0

In this paper we propose a new sampling-free approach to solve Bayesian inverse problems that extends the recently introduced spectral likelihood expansions (SLE) method. The latter solves the inverse problem by expanding the likelihood function onto a global polynomial basis orthogonal w.r.t. the prior distribution. This gives rise to analytical expressions for key statistics of the Bayesian posterior distribution, such as evidence, posterior moments and posterior marginals by simple post-processing of the expansion coefficients. It is well known that in most practically relevant scenarios, likelihood functions have close-to-compact support, which causes the global SLE approach to fail due to the high polynomial degree required for an accurate spectral representation. To solve this problem, we herein replace the global polynomial expansion from SLE with a recently proposed method for local spectral expansion refinement called stochastic spectral embedding (SSE). This surrogate-modeling method was developed for functions with high local complexity. To increase the efficiency of SSE, we enhance it with an adaptive sample enrichment scheme. We show that SSE works well for likelihood approximations and retains the relevant spectral properties of SLE, thus preserving analytical expressions of posterior statistics. To assess the performance of our approach, we include three case studies ranging from low to high dimensional model inversion problems that showcase the superiority of the SSE approach compared to SLE and present the approach as a promising alternative to existing inversion frameworks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 19

page 24

05/19/2021

Local estimators and Bayesian inverse problems with non-unique solutions

The Bayesian approach is effective for inverse problems. The posterior d...
07/10/2018

Surrogate-Based Bayesian Inverse Modeling of the Hydrological System: An Adaptive Approach Considering Surrogate Approximation Erro

Bayesian inverse modeling is important for a better understanding of hyd...
10/18/2019

Inverse modeling of hydrologic parameters in CLM4 via generalized polynomial chaos in the Bayesian framework

In this study, the applicability of generalized polynomial chaos (gPC) e...
02/26/2021

Data-Free Likelihood-Informed Dimension Reduction of Bayesian Inverse Problems

Identifying a low-dimensional informed parameter subspace offers a viabl...
06/09/2021

Rare event estimation using stochastic spectral embedding

Estimating the probability of rare failure events is an essential step i...
11/14/2018

Deep Bayesian Inversion

Characterizing statistical properties of solutions of inverse problems i...
04/09/2020

Stochastic spectral embedding

Constructing approximations that can accurately mimic the behavior of co...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Computational models are an invaluable tool for decision making, scientific advances and engineering breakthroughs. They establish a connection between a set of input parameters and output quantities with wide-ranging applications. Model inversion uses available experimental observations of the output to determine the set of input parameters that maximize the predictive potential of a model. The importance of efficient and reliable model inversion frameworks can hardly be overstated, considering that they establish a direct connection between models and the real world. Without it, the most advanced model predictions might lack physical meaning and, consequently, be useless for their intended applications.

Bayesian model inversion is one way to formalize this problem (Jaynes, 2003; Gelman et al., 2014)

. It is based on Bayesian inference and poses the problem in a probabilistic setting by capitalizing on Bayes’ theorem. In this setting a so-called

prior (i.e.

, before observations) probability distribution about the model parameters is updated to a so-called

posterior (i.e., after observations) distribution. The posterior distribution is the probability distribution of the input parameters conditioned on the available observations, and the main outcome of the Bayesian inversion process.

In Bayesian model inversion, the connection between the model output and the observations is established through a probabilistic discrepancy model. This model, which is a function of the input parameters, leads to the so-called likelihood function. The specific form of the likelihood function depends on the problem at hand, but typically it has a global maximum for the input parameters with the model output that is closest to the available observations (w.r.t. some metric), and rapidly goes to zero with increasing distance to those parameters.

Analytical expressions for the posterior distribution can only be found in few academic examples (e.g.

, conjugate priors with a linear forward model,

Bishop (2006); Gelman et al. (2014)). In general model inversion problems, such analytical solutions are not available though. Instead, it is common practice to resort to sampling methods to generate a sample distributed according to the posterior distribution. The family of Markov chain Monte Carlo (MCMC) algorithms are particularly suitable for generating such a posterior sample (Beck and Au, 2002; Robert and Casella, 2004).

While MCMC and its extensions are extensively used in model inversion, and new algorithms are continuously being developed (Haario et al., 2001; Ching and Chen, 2007; Goodman and Weare, 2010; Neal, 2011)

, it has a few notable shortcomings that hinder its application in many practical cases. It is well known that there are no robust convergence criteria for MCMC algorithms, and that their performance is particularly sensitive to their tuning parameters. Additionally, samples generated by MCMC algorithms are often highly correlated, thus requiring extensive heuristic post-processing and empirical rules

(Gelman and Rubin, 1992; Brooks and Gelman, 1998). MCMC algorithms are also in general not well suited for sampling multimodal posterior distributions.

When considering complex engineering scenarios, the models subject to inversion are often computationally expensive. Because MCMC algorithms usually require a significant number of forward model evaluations, it has been proposed to accelerate the procedure by using surrogate models in lieu of the original models. These surrogate models are either constructed non-adaptively before sampling from the posterior distribution (Marzouk et al., 2007; Marzouk and Xiu, 2009) or adaptively during the sampling procedure (Li and Marzouk, 2014; Birolleau et al., 2014). Adaptive techniques can be of great benefit with posterior distributions that are concentrated in a small subspace of the prior domain, as the surrogate only needs to be accurate near high density areas of the posterior distribution.

Polynomial chaos expansions (PCE) are a widely used surrogate modelling technique based on expanding the forward model onto a suitable polynomial basis (Ghanem and Spanos, 1991; Xiu and Karniadakis, 2002). In other words, it provides a spectral representation of the computational forward model. Thanks to the introduction of sparse regression (see, e.g. Blatman and Sudret (2011)), its computation has become feasible even in the presence of complex and computationally expensive engineering models. This technique has been successfully used in conjunction with MCMC to reduce the total computational costs associated with sampling from the posterior distribution (Marzouk et al., 2007; Wagner et al., 2020).

Alternative approaches to compute the posterior distribution or its statistics include the Laplace approximation at a posterior mode (Tierney and Kadane, 1986; Tierney et al., 1989b, a), approximate Bayesian computations (ABC) (Marin et al., 2012; Sisson et al., 2018), optimal transport approaches (El Moselhy and Marzouk, 2012; Parno, 2015; Marzouk et al., 2016) and embarrassingly parallel quasi-Monte Carlo sampling (Dick et al., 2017; Gantner and Peters, 2018).

Recently, Nagel and Sudret (2016) proposed spectral likelihood expansions (SLE), a novel approach that is closely related to PCE and aims at finding a spectral expansion of the likelihood function in an orthogonal basis w.r.t. the prior density function. The advantage of this method is that it provides analytical expressions for the posterior marginals and general posterior moments by post-processing the expansion coefficients. However, the typically compact support of likelihood functions generally prevents a sparse functional representation.

Stochastic spectral embedding (SSE) is a metamodelling technique suitable for approximating functions with complex localized features recently developed in Marelli et al. (2020). In this paper we show how this technique, when extended with adaptive sample enrichment, is effective for the approximation of likelihood functions in Bayesian model inversion problems. Furthermore, we show that due to the construction of the basis polynomials, SSE preserves the most important post-processing properties of SLE.

The paper is organized as follows: In Section 2 we establish the basics of Bayesian inference and particularly Bayesian model inversion. We then give an introduction into spectral function decomposition with a focus on polynomial chaos expansions and their application to likelihood functions (SLE) in Section 3. In Section 4 we present the main contribution of the paper, namely the derivation of Bayesian posterior quantities of interest when SSE is applied to likelihood functions and the extension of the SSE algorithm with an adaptive sampling strategy. Finally, in Section 5 we showcase the performance of our approach on three case studies of varying complexity.

2 Model inversion

The problem of model inversion occurs whenever the predictions of a model are to be brought into agreement with available observations or data. This is achieved by properly adjusting a set of input parameters of the model. The goal of inversion can be twofold: on the one hand the inferred input parameters might be used to predict new realizations of the model output. On the other hand, the inferred input parameters might be the main interest. Model inversion is a common problem in many engineering disciplines, that in some cases is still routinely solved manually, i.e. by simply changing the input parameters until some, often qualitative, goodness-of-fit criterion is met. More quantitative inversion approaches aim at automatizing this process, by establishing a metric (e.g., -distance) between the data and the model response, which is then minimized through suitable optimization algorithms.

While such approaches can often be used in practical applications, they tend not to provide measures of the uncertainties associated with the inferred model input or predictions. These uncertainties are useful in judging the accuracy of the inversion, as well as indicating non-informative measurements. In fact, the lack of uncertainty quantification in the context of model inversion can lead to erroneous results that have far-reaching consequences in subsequent applications. One approach to consider uncertainties in inverse problems is the Bayesian framework for model inversion that will be presented hereinafter.

2.1 Bayesian inference

Consider some non-observable parameters and the observables . Furthermore, let be a set of measurements, i.e., noisy observations of a set of realizations of . Statistical inference consists in drawing conclusions about using the information from (Gelman et al., 2014). These measurements can be direct observations of the parameters () or some quantities indirectly related to through a function or model . One way to conduct this inference is through Bayes’ theorem of conditional probabilities, a process known as Bayesian inference.

Denoting by

a probability density function (PDF) and by

a PDF conditioned on , Bayes’ theorem can be written as

(1)

where is known as the prior distribution of the parameters, i.e., the distribution of before observing the data . The conditional distribution , known as likelihood, establishes a connection between the observations and a realization of the parameters . For a given realization , it returns the probability density of observing the data . Under the common assumption of independence between individual observations, , the likelihood function takes the form:

(2)

The likelihood function is a map , and it attains its maximum for the parameter set with the highest probability of yielding . With this, Bayes’ theorem from Eq. (1) can be rewritten as:

(3)

where is a normalizing constant often called evidence or marginal likelihood. On the left-hand side, is the posterior PDF, i.e., the distribution of after observing data . In this sense, Bayes’ theorem establishes a general expression for updating the prior distribution using a likelihood function to incorporate information from the data.

2.2 Bayesian model inversion

Bayesian model inversion describes the application of the Bayesian inference framework to the problem of model inversion (Beck and Katafygiotis, 1998; Kennedy and O’Hagan, 2001; Jaynes, 2003; Tarantola, 2005). The two main ingredients needed to infer model parameters within the Bayesian framework are a prior distribution of the model parameters and a likelihood function . In practical applications, prior information about the model parameters is often readily available. Typical sources of such information are physical parameter constraints or expert knowledge. Additionally, prior inversion attempts can serve as guidelines to assign informative prior distributions. In cases where no prior information about the parameters is available, so-called non-informative or invariant prior distributions (Jeffreys, 1946; Harney, 2016) can also be assigned. The likelihood function serves instead as the link between model parameters and observations of the model output . To connect these two quantities, it is necessary to choose a so-called discrepancy model that gives the relative probability that the model response to a realization of describes the observations. One common assumption for this probabilistic model is that the measurements are perturbed by a Gaussian additive discrepancy term , with covariance matrix . For a single measurement it reads:

(4)

This discrepancy between the model output and the observables can result from measurement error or model inadequacies. By using this additive discrepancy model, the distribution of the observables conditioned on the parameters is written as:

(5)

where denotes the multivariate Gaussian PDF with mean value and covariance matrix . The likelihood function is then constructed using this probabilistic model and Eq. (2). For a given set of measurements it thus reads:

(6)

With the fully specified Bayesian model inversion problem, Eq. (3) directly gives the posterior distribution of the model parameters . In the setting of model inversion, the posterior distribution represents therefore the state of belief about the true data-generating model parameters, considering all available information: computational forward model, discrepancy model and measurement data (Beck and Katafygiotis, 1998; Jaynes, 2003).

Often, the ultimate goal of model inversion is to provide a set of inferred parameters, with associate confidence measures/intervals. This is often achieved by computing posterior statistics (e.g., moments, mode, etc.). Propagating the posterior through secondary models is also of interest. So-called quantites of interest (QoI) can be expressed by calculating the posterior expectation of suitable functions of the parameters , with , as in:

(7)

Depending on , this formulation encompasses posterior moments ( or for the first and second moments, respectively), posterior covariance () or expectations of secondary models ().

3 Spectral function decomposition

To pose a Bayesian inversion problem, the specification of a prior distribution and a likelihood function described in the previous section is sufficient. Its solution, however, is not available in closed form in the general case.

Spectral likelihood expansion (SLE) is a recently proposed method that aims at solving the Bayesian inversion problem by finding a polynomial chaos expansion (PCE) of the likelihood function in a basis orthogonal w.r.t. the prior distribution (Nagel and Sudret, 2016). This representation allows one to derive analytical expressions for the evidence , the posterior distribution, the posterior marginals, and many types of QoIs, including the posterior moments.

We offer here a brief introduction to regression-based, sparse PCE before introducing SLE, but refer the interested reader to more exhaustive resources on PCE (Ghanem and Spanos, 1991; Xiu and Karniadakis, 2002) and sparse PCE (Xiu, 2010; Blatman and Sudret, 2010, 2011).

Let us consider a random variable

with independent components and associated probability density functions so that Assume further that is a scalar function of which fulfills the

finite variance

condition (). Then it is possible to find a so-called truncated polynomial chaos approximation of such that

(8)

where is an -tuple and . For most parametric distributions, well-known classical orthonormal polynomials satisfy the necessary orthonormality condition w.r.t.  (Xiu and Karniadakis, 2002). For more general distributions, arbitrary orthonormal polynomials can be constructed numerically through the Stieltjes procedure (Gautschi, 2004). If additionally, is a sparse subset of , the truncated expansion in Eq. (8) is called a sparse PCE.

There exist different algorithms to produce a sparse PCE in practice, i.e. select a sparse basis and compute the corresponding coefficients. A powerful class of methods are regression-based approaches that rely on an initial input sample , called experimental design, and corresponding model evaluations (See, e.g. Lüthen et al. (2020) for a recent survey). Additionally, it is possible to design adaptive algorithms that choose the truncated basis size (Blatman and Sudret, 2011; Jakeman et al., 2015).

To assess the accuracy of PCE, the so-called generalization error

shall be evaluated. A robust generalization error estimator is given by the

leave-one-out (LOO) cross validation technique. This estimator is obtained by

(9)

where is constructed by leaving out the -th point from the experimental design

. For methods based on linear regression, it can be shown

(Chapelle et al., 2002; Blatman and Sudret, 2010) that the LOO error is available analytically by post-processing the regressor matrix.

3.1 Spectral likelihood expansions

The idea of SLE is to use sparse PCE to find a spectral representation of the likelihood function occurring in Bayesian model inversion problems (see Eq. (2)). We present here a brief introduction to the method and the main results of Nagel and Sudret (2016).

Likelihood functions can be seen as scalar functions of the input random vector

. In this work we assume priors of the type , i.e. with independent marginals. Additionally, likelihood functions fulfill the finite variance condition (Nagel and Sudret, 2016) and therefore admit a spectral decomposition:

(10)

where the explicit dependence on was dropped for notational simplicity.

Upon computing the basis and coefficients in Eq. (8), the solution to the inverse problem is converted to merely post-processing the coefficients . The following expressions can be derived for the individual quantities:

Evidence

The evidence emerges as the constant polynomial’s coefficient

(11)
Posterior

Upon computing the evidence , the posterior can be evaluated directly through

(12)
Posterior marginals

Let and be two non-empty disjoint index sets such that . We split the random vector into two vectors with components and with components . Denote further by and the prior marginal density functions of and respectively. The posterior marginals then read:

(13)

where . The series in the above equation constitutes a subexpansion that contains non-constant polynomials only in the directions .

Quantities of interest

Finally, it is also possible to analytically compute posterior expectations of functions that admit a polynomial chaos expansion in the same basis of the form . Eq. (7) then reduces to the spectral product:

(14)

The quality of these results depends only on the approximation error introduced in Eq. (10). The latter, in turn, depends mainly on the chosen PCE truncation strategy (Blatman and Sudret, 2011; Nagel and Sudret, 2016) and the number of points used to compute the coefficients (i.e., the experimental design). It is known that likelihood functions typically have quasi-compact supports (i.e., on a majority of ). Such functions require a very high polynomial degree to be approximated accurately, which in turn can lead to the need for prohibitively large experimental designs.

4 Stochastic spectral embedding

Stochastic spectral embedding (SSE) is a multi-level approach to surrogate modeling originally proposed in Marelli et al. (2020). It attempts to approximate a given square-integrable function through

(15)

where is a set of multi-indices with elements for which and where is the number of levels and is the number of subdomains at a specific level . We call a residual expansion given by

(16)

In the present paper the term denotes a polynomial chaos expansion (see Eq. (8)) constructed in the subdomain , but in principle it can refer to any spectral expansion (e.g., Fourier series). A schematic representation of the summation in Eq. (15) is given in Figure 2. The detailed notation and the algorithm to sequentially construct an SSE are given in the sequel.

4.1 SSE for Bayesian model inversion

Viewing the likelihood as a function of a random variable , we can directly use Eq. (15) to write down its SSE representation

(17)

where the variable is distributed according to the prior distribution and, consequently, the local basis used to compute is orthonormal w.r.t. that distribution.

Due to the local spectral properties of the residual expansions, the SSE representation of the likelihood function retains all of the post-processing properties of SLE (Section 3.1):

Evidence

The normalization constant emerges as the sum of the constant polynomial coefficients weighted by the prior mass:

(18)
Posterior

This allows us to write the posterior as

(19)
Posterior marginal

Utilizing again the disjoint sets and from Eq. (13) it is also possible to analytically derive posterior marginal PDFs as

(20)

where

(21)

is a subexpansion of that contains only non-constant polynomials in the directions . Note that, as we assumed that the prior distribution has independent components, the constants and

are obtained as products of univariate integrals which are available analytically from the prior marginal cumulative distribution functions (CDFs).

Quantities of interest

Expected values of function for under the posterior can be approximated by:

(22)

where are the coefficients of the PCE of in the bases . This can also be used for computing posterior moments.

These expressions can be seen as a generalization of the ones for SLE detailed in Section 3.1. For a single-level global expansion (i.e., ) and consequently , they are identical.

4.2 Modifications to the original algorithm

The original algorithm for computing an SSE was presented in Marelli et al. (2020). It recursively partitions the input domain and constructs truncated expansions of the residual. We reproduce it below for reference but replace the model with the likelihood function . We further simplify the algorithm by choosing a partitioning strategy with .

  1. Initialization:

    1. ,

  2. For each subdomain , :

    1. Calculate the truncated expansion of the residual in the current subdomain

    2. Update the residual in the current subdomain

    3. Split the current subdomain in subdomains based on a partitioning strategy

    4. If , , go back to 2a, otherwise terminate the algorithm

  3. Termination

    1. Return the full sequence of and needed to compute Eq. (15).

In practice, the residual expansions are computed using a fixed experimental design and corresponding model evaluations . The algorithm then only requires the specification of a partitioning strategy and a termination criterion, as detailed in Marelli et al. (2020).

Likelihood functions are typically characterized by a localized behaviour: Close to the data-generating parameters they peak while quickly decaying to in the remainder of the prior domain. This means that in a majority of the domain the likelihood evaluation is non-informative. Directly applying the original algorithm is then expected to waste many likelihood evaluations.

We therefore modify the original algorithm by adding an adaptive sampling scheme (Section 4.2.1) that includes the termination criterion and introducing an improved partitioning strategy (Section 4.2.2) that is especially suitable for finding compact support features. The rationale for these modifications is presented next.

4.2.1 Adaptive sampling scheme

The proposed algorithm has two parameters: the experimental design size for the residual expansions and the final experimental design size corresponding to the available computational budget . At the initialization of the algorithm points are sampled as a first experimental design. At every further iteration, additional points are then sampled from the prior distribution. These samples are generated in a space-filling way (e.g. through latin hypercube sampling) in the newly created subdomains to have always exactly points available for constructing the residual expansions. The algorithm is terminated, once the computational budget has been exhausted.

At every step, the proposed algorithm chooses a single refinement domain from the set of unsplit, i.e. terminal domains, creates two new subdomains by splitting the refinement domain and constructs residual expansions after enriching the experimental design. The selection of this refinement domain is based on the error estimator that is defined by

(23)

This estimator incorporates the subdomain size through the prior mass , and the approximation accuracy, through the leave-one-out estimator. The distinction is necessary to assign an error estimator also to domains that have too few points to construct a residual expansion, in which case the error estimator of the previous level is reused.

The algorithm sequentially splits and refines subdomains with large approximation errors. Because likelihood functions typically have the highest complexity close to their peak, these regions tend to have larger approximation errors and are therefore predominantly picked for refinement. The proposed way of adaptive sampling then ends up placing more points near the likelihood peak, thereby reducing the number of non-informative likelihood evaluations.

The choice of a constant is simple and could in principle be replaced by a more elaborate strategy (e.g., based on the approximation error of the current subdomain relative to the total approximation error). A benefit of this enrichment criterion is that all residual expansions are computed with experimental designs of the same size. Upon choosing the domain with the maximum approximation error among the terminal domains, the error estimators then have a more comparable estimation accuracy.

4.2.2 Partitioning strategy

The partitioning strategy determines how a selected refinement domain is split. As described in Marelli et al. (2020)

, it is easy to define the splitting in the uniformly distributed quantile space

and map the resulting split domains to the (possibly unbounded) real space through an appropriate isoprobabilistic transform (e.g., the Rosenblatt transform (Rosenblatt, 1952)).

Similar to the original SSE algorithm presented in Marelli et al. (2020), we split the refinement domain in half w.r.t. its prior mass. The original algorithm chooses the splitting direction based on the partial variance in the refinement domain. This approach is well suited for generic function approximation problems. For the approximation of likelihood functions, however, we propose a partitioning strategy that is more apt for dealing with their compact support.

We propose to pick the split direction along which a split yields a maximum difference in the residual empirical variance between the two candidate subdomains created by the split. This can easily be visualized with an example given by the dimensional domain in Figure LABEL:sub@fig:SSE:algo:splitting:1. Assume this subdomain was selected as the refinement domain. To decide along which dimension to split, we construct the candidate subdomain pairs and estimate the corresponding in those subdomains defined by

(24)

In this expression, and denote subsets of the experimental design inside the subdomains and respectively. The occurring variances can be easily estimated with the empirical variance of the residuals in the respective candidate subdomains.

After computing the residual variance differences, the split is carried out along the dimension

(25)

i.e., to keep the subdomains and that introduce the largest difference in variance. For , the resulting split can be seen in Figure LABEL:sub@fig:SSE:algo:splitting:4.

(a) Refinement domain
(b) Split along
(c) Split along
(d) Selected pair
Figure 1: Partitioning strategy for a 2D example visualized in the quantile space . The refinement domain is split into two subdomains and .

The choice of this partitioning strategy can be justified heuristically with the goal of approximating compact support functions. Assume that the likelihood function has compact support, this criterion will avoid cutting through its support and instead identify a split direction that results in one subdomain with large variance (expected to contain the likelihood support) and a subdomain with small variance. In subsequent steps, the algorithm will proceed by cutting away low variance subdomains, until the likelihood support is isolated.

4.2.3 The adaptive SSE algorithm

The algorithm is presented here with its two parameters , the minimum experimental design size needed to expand a residual, and , the final experimental design size. The sample refers to , i.e. the subset of inside . Further, the multi-index set at each step of the algorithm gathers all indices of unsplit subdomains. It thus denotes the terminal domains: . For visualization purposes we show the first iterations of the algorithm for a two-dimensional example in Figure 2.

  1. Initialization:

    1. Sample from prior distribution

    2. Calculate the truncated expansion of in the full domain , retrieve its approximation error and initialize

  2. For :

    1. Split the current subdomain in sub-parts and update

    2. For each split

      1. If and

        1. Enrich sample with new points inside

      2. If

        1. Create the truncated expansion of the residual in the current subdomain using

        2. Update the residual in the current subdomain

      3. Retrieve the approximation error from Eq. (23)

    3. If no new expansions were created, terminate the algorithm, otherwise go back to 2

  3. Termination

    1. Return the full sequence of and needed to compute Eq. (15)

The updating of the multi-index set in Step 2a refers to removing the current index from the set and adding to it the newly created indices and .

(a) Initialization
(b) st iteration
(c) rd iteration
Figure 2: Graphical representation of the first steps of the adaptive SSE algorithm described in Section 4.2.3 for a two-dimensional problem with independent prior distribution. Upper row: partitioning in the quantile space; Lower row: partitioning in the unbounded real space with contour lines in dashed blue. Red dots show the adaptive experimental design that has a constant size of in each created subdomain. The terminal domains are highlighted in orange. The splitting direction in each subdomain is determined randomly in this example.

5 Case studies

To showcase the effectiveness of the proposed adaptive SSE approach, we present three case studies with increasing dimensionality and different model complexity: (i) a one-dimensional vibration problem for illustrative purposes, (ii) a six-dimensional heat transfer problem that describes the steady-state heat evolution in a solid body with inclusions and (iii) a -dimensional diffusion problem modeling the concentration-driven diffusion in a one-dimensional domain.

For all case studies, we adopt the adaptive sparse-PCE based on LARS approach developed in Blatman and Sudret (2011) through its numerical implementation in UQLab (Marelli and Sudret, 2014, 2019). Each is therefore a degree- and -norm-adaptive polynomial chaos expansion. We further introduce a rank truncation of , i.e. we limit the maximum number of input interactions (Marelli and Sudret, 2019) to two variables at a time. The truncation set for each spectral expansion (Eq. (8)) thus reads:

(26)

where

(27)

The -norm is adaptively increased between while the maximum polynomial degree is adaptively increased in the interval , where the maximum degree for case study (i) and (ii) and for case study (iii) due to its high dimensionality.

In case study (ii) and (iii), the performance of SLE (Nagel and Sudret, 2016), the original non-adaptive SSE (Marelli et al., 2020) and the proposed adaptive SSE approach presented in Section 4.2 is compared. The comparison was omitted for case study (i), because only adaptive SSE succeeded in solving the problem. For clarity, we henceforth abbreviate the adaptive SSE algorithm to adSSE.

To simplify the comparison, the same partitioning strategy employed for adSSE (Section 4.2.2) was employed for the non-adaptive SSE approach. Also, the same experimental designs were used for the non-adaptive SSE and the SLE approaches. Finally, the same parameter was used to define the enrichment samples in adSSE and the termination criterion in non-adaptive SSE.

To assess the performance of the three algorithms considered, we define an error measure that allows to quantitatively compare the similarity of the SSE, adSSE and SLE solution with the reference MCMC solution. This comparison is inherently difficult, as a sampling-based approach (MCMC) needs to be compared to a functional approximation (SSE, adSSE, SLE). We proceed to compare the univariate posterior marginals, available analytically in SSE, adSSE and SLE (See Eq. (13) and Eq. (20

)), to the reference posterior marginals estimated with kernel density estimation (KDE,

Wand and Jones (1995)) from the MCMC sample. Denoting by the SSE, adSSE or SLE approximations and by the reference solution, we define the following error measure

(28)

where is the dimensionality of the problem and is the Jensen-Shannon divergence (Lin, 1991), a symmetric and bounded (

) distance measure for probability distributions based on the Kullback-Leibler divergence.

The purpose of the error measure is to allow for a fair comparison between the different methods investigated. It is not a practical measure for engineering applications because it relies on the availability of a reference solution, and it its magnitude does not have a clear quantitative interpretation. However, it is considerably more comprehensive than a pure moment-based error measure. Because it is averaged over all marginals, it encapsulates the approximation accuracy of all univariate posterior marginals in a scalar value.

As all algorithms (SSE, adSSE, SLE) depend on a randomly chosen experimental designs, we produce replications for case study (ii) and replications for case study (iii) by running them multiple times.

5.1 1-dimensional vibration problem

This first case study serves as an illustrative example of how the proposed adaptive algorithm constructs an adSSE. In the presented problem the goal is the inference of a single unknown parameter with a bi-modal posterior distribution. This problem is very difficult to solve with standard MCMC methods due to the probability valley, i.e. low probability region, between the posterior peaks. The considered problem is fabricated and uses synthetic data, but is presented in the context of a relevant engineering problem.

Consider the oscillator displayed in Figure 3 subject to harmonic (i.e., sinusoidal) excitation. Assume the prior information about its stiffness is that it follows a lognormal distribution with and . Its true value shall be determined using measurements of the oscillation amplitude at the location of the mass . The known properties of the oscillator system are the oscillator mass , the excitation frequency and the viscous damping coefficient . The oscillation amplitude is measured in five independent oscillation events and normalized by the forcing amplitude yielding the measured amplitude ratios .

Figure 3: 1-dimensional vibration problem: Sketch of the linear oscillator

This problem is well known in mechanics and in the linear case (i.e., assuming small deformations and linear material behavior) can be solved analytically with the amplitude of the frequency response function. This function returns the ratio between the steady state amplitude of a linear oscillator and the amplitude of its excitation. It is given by

(29)

We assume a discrepancy model with known discrepancy standard deviation

. In conjunction with the available measurements , this leads to the following likelihood function:

(30)

We employ the adSSE algorithm to approximate this likelihood function with . A few iterations from the solution process are shown in Figure 4. The top plots show the subdomains constructed at each refinement step, highlighting the terminal domains . The middle plots display the residual between the true likelihood and the approximation at the current iteration, as well as the adaptively chosen experimental design . The bottom plots display the target likelihood function and its current approximation.

The initial global approximation of the first iteration in Figure LABEL:sub@fig:cs1:SSEBehaviour:1 is a constant polynomial based on the initial experimental design. By the third iteration, the algorithm has identified the subdomain as the one of interest and proceeds to refine it in subsequent steps. By the th iteration both likelihood peaks have been identified. Finally, by the th iteration in Figure LABEL:sub@fig:cs1:SSEBehaviour:4, both likelihood peaks are approximated well by the adSSE approach.

The last iteration shows how the algorithm splits domains and adds new sample points. There is a clear clustering of subdomains and sample points near the likelihood peaks at and .

(a) st iteration,
(b) rd iteration,
(c) th iteration,
(d) th iteration,
Figure 4: One-dimensional vibration problem: Illustration of the adSSE algorithm approximating the likelihood function .

The results from Eq. (22) show that without further computations it would be possible to directly extract the posterior moments by post-processing the SSE coefficients. In the present bi-modal case, however, the posterior moments are not very meaningful. Instead, the available posterior approximation gives a full picture of the inferred parameter . It is shown together with the true posterior and the original prior distribution in Figure 5.

Figure 5: One-dimensional vibration problem: Comparison of the true multimodal posterior and its adSSE based approximation.

For this case study, non-adaptive experimental design approaches like the standard SSE (Marelli et al., 2020) and the original SLE algorithm (Nagel and Sudret, 2016) will almost surely fail for the considered experimental design of . In numerous trial runs these approaches did not manage to accurately reconstruct the likelihood function due to a lack of informative samples near the likelihood peaks.

5.2 Moderate-dimensional heat transfer problem

This case study was originally presented in Nagel and Sudret (2016) and solved there using SLE. We again solve the same problem with SSE and compare the performance of SLE (Nagel and Sudret, 2016), the original non-adaptive SSE (Marelli et al., 2020) and the proposed adSSE approach presented in Section 4.2.

Consider the diffusion-driven stationary heat transfer problem sketched in Figure LABEL:sub@fig:cs3:setup. It models a 2D plate with a background matrix of constant thermal conductivity and inclusions with conductivities . The diffusion driven steady state heat distribution is described by a heat equation in Euclidean coordinates of the form

(31)

where the thermal conductivity field is denoted by and the temperature field by . The boundary conditions of the plate are given by a no-heat-flux Neumann boundary conditions on the left and right sides (), a Neumann boundary condition on the bottom () and a temperature Dirichlet boundary condition on the top.

We employ a finite element (FE) solver to solve the weak form of Eq. (31) by discretizing the domain into approximately triangular elements. A sample solution returned by the FE-solver is shown in Figure (b)b.

(a) Problem sketch
(b) Steady state solution
Figure 6: Moderate-dimensional heat transfer problem: Model setup and exemplary solution.

In this example we intend to infer the thermal conductivities of the inclusions. We assume the same problem constants as in Nagel and Sudret (2016) (i.e., , , ). The forward model takes as an input the conductivities of the inclusions , solves the finite element problem and returns the steady state temperature at the measurement points, i.e., .

To solve the inverse problem, we assume a multivariate lognormal prior distribution with independent marginals on the inclusion conductivities, . We further assume an additive Gaussian discrepancy model, which yields the likelihood function

(32)

with a discrepancy standard deviation of .

As measurements, we generate one temperature field with and collect its values at points indicated by black dots in Figure LABEL:sub@fig:cs3:setup. We then perturb these temperature values with additive Gaussian noise and use them as the inversion data .

We look at two instances of this problem that differ only by the discrepancy parameter from Eq. (32). The prior model response has a standard deviation of approximately , depending on the measurement point . We therefore solve the problem first with a large value and second with a small value . As the discrepancy standard deviation determines how peaked the likelihood function is, the first problem has a likelihood function with a much wider support and in turn is significantly easier to solve then the second one. It is noted here that in practice, the peakedness of the likelihood function is either increased by a smaller discrepancy standard deviation, or the inclusion of additional experimental data.

To monitor the dependence of the algorithms on the number of likelihood evaluations, we solve both problems with a set of maximum likelihood evaluations . The number of refinement samples is set to .

As a benchmark, we use reference posterior samples generated by the affine-invariant ensemble sampler MCMC algorithm (Goodman and Weare, 2010) with steps and parallel chains, requiring a total of likelihood evaluations. Based on numerous heuristic convergence tests and due to the large number of MCMC steps, the resulting samples can be considered to accurately represent the true posterior distributions.

The results of the analyses are summarized in Figure 7, where the error measure is plotted against the number of likelihood evaluations for the large and small standard deviation case. For the large discrepancy standard deviation case, both SSE approaches clearly outperform standard SLE w.r.t. the error measure . This is most significant at mid-range experimental designs (), where SLE does not reach the required high degrees and fails to accurately approximate the likelihood function. At larger experimental designs SLE catches up to non-adaptive SSE but is still outperformed by the proposed adSSE approach. The real strength of the adaptive algorithm shows for the case of a small discrepancy standard deviation, where the limitations of fixed experimental designs become obvious. When the likelihood function is nonzero in a small subdomain of the prior, the global SLE and non-adaptive SSE approach will fail in practice because of the insufficient number of samples placed in the informative regions. The adSSE approach, however, works very well in these types of problems. It manages to identify the regions of interest and produces a likelihood approximation that accurately reproduces the posterior marginals.

(a) Large discrepancy,
(b) Small discrepancy,
Figure 7: Moderate-dimensional heat transfer problem: Convergence of the error measure (Eq. (28)) as a function of the experimental design size in replications for SLE, SSE with a static experimental design and the proposed adSSE approach. We display the two discrepancy standard deviation cases .

Tables 1 and 2 show the convergence of the adSSE method’s moment estimate (mean and variance) to the reference solution for a single run. In brackets next to the moment estimates , the relative error is also shown. Due to the non-strict positivity of the SSE estimate, one variance estimate computed with Eq. (22) is negative and is therefore omitted from Table 2.

adSSE
MCMC
adSSE
MCMC
Table 1: Moderate-dimensional heat transfer problem: adSSE results with large discrepancy standard deviation . Relative errors w.r.t. MCMC reference solution are shown in brackets.
adSSE