Mutual Information for Explainable Deep Learning of Multiscale Systems

09/07/2020 ∙ by Søren Taverniers, et al. ∙ University of Dundee Stanford University 0

Timely completion of design cycles for multiscale and multiphysics systems ranging from consumer electronics to hypersonic vehicles relies on rapid simulation-based prototyping. The latter typically involves high-dimensional spaces of possibly correlated control variables (CVs) and quantities of interest (QoIs) with non-Gaussian and/or multimodal distributions. We develop a model-agnostic, moment-independent global sensitivity analysis (GSA) that relies on differential mutual information to rank the effects of CVs on QoIs. Large amounts of data, which are necessary to rank CVs with confidence, are cheaply generated by a deep neural network (DNN) surrogate model of the underlying process. The DNN predictions are made explainable by the GSA so that the DNN can be deployed to close design loops. Our information-theoretic framework is compatible with a wide variety of black-box models. Its application to multiscale supercapacitor design demonstrates that the CV rankings facilitated by a domain-aware Graph-Informed Neural Network are better resolved than their counterparts obtained with a physics-based model for a fixed computational budget. Consequently, our information-theoretic GSA provides an "outer loop" for accelerated product design by identifying the most and least sensitive input directions and performing subsequent optimization over appropriately reduced parameter subspaces.



There are no comments yet.


page 16

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction: GSA and Deep Learning for Simulation-Aided Design

Simulations are a key component of product design as they enable rapid prototyping by guiding costly laboratory tests and investigating regions of the parameter space that are difficult to explore experimentally. To optimize design under uncertainty, an “outer loop” can be included to predict the impact of tunable inputs or control variables (CVs) on a system’s quantities of interest (QoIs) PeherstorferWillcoxGunzburger:2018ol . In this approach, CVs are treated as random quantities whose distributions are derived from available experimental data, manufacturing constraints, design criteria, engineering judgment, and/or other domain knowledge. Statistical post-processing of repeated solves of a physics-based model for multiple samples of CVs yields the distributions of QoIs. In the context of optimal and robust design and uncertainty quantification (UQ), this outer loop constitutes a many query problem that becomes prohibitively expensive when queries rely solely on direct simulation of physics-based models.

Data-driven surrogate modeling seeks to alleviate this computational cost by constructing a statistical model for QoIs. Off-the-shelf software such as TensorFlow and PyTorch

facilitates the construction of deep learning surrogates, e.g., deep neural networks (DNNs), from data generated by the underlying physics-based model. This process, which typically involves supervised learning, makes very few assumptions about the nature of the data or data-generating process. This agnosticism makes DNNs suitable for dependent/correlated inputs and non-Gaussian, skewed, multimodal, and/or mutually correlated output QoIs, typically observed in complex real-world systems. While DNNs (e.g., see

Balokas:2018nn ; TripathyBilionis:2018uq ; ZhuZabarasEtAl:2019pc ; RaissiPerdikarisKarniadakis:2019pinns ; HallTaverniersEtAl:2020aa ) can tremendously speed up the design pipeline by accelerating and fully automating the prediction of QoIs, they represent black boxes that do not shed any light on the form of the function they are approximating. They provide no clear link between this function and the network weights. Moreover, they are non-identifiable, since two DNNs with the same topology, but different weights, can yield very similar outputs for a given set of input data GoodfellowEtAl:2016dl .

Global sensitivity analysis (GSA) SaltelliEtAl:2008gs

provides an opportunity to “peek” inside a black-box DNN surrogate and to interpret its predictions by identifying constellations of input parameters that are likely to yield a targeted model response. GSA facilitates exploration of the entire parameter space and quantifies both first-order (individual) and higher-order (interaction) effects that characterize the contribution of variations in CVs to changes in QoIs. Variance-based GSA methods rank input parameters by their contributions to the total variance of a QoI (e.g., Sobol’ indices

Sobol:1993sa and total effects HommaSaltelli:1996im ). Their interpretation is ambiguous when spaces of correlated CVs are large MaraEtAl:2015np ; IoossPrieur:2019se and QoIs are highly non-Gaussian Borgonovo:2006uq ; Borgonovo:2007sa , a situation representative of complex multiscale/multiphysics systems. In contrast, moment-independent GSA approaches are easy to interpret regardless of the nature of the data or data-generating process ciriello-2019-distribution ; CastaingsEtAl:2012sa ; VetterTaflanidis:2012sa ; however, they require knowledge of the CV and QoI distributions or availability of sufficient data to approximate them. DNNs resolve the latter problem by cheaply generating large amounts of data.

A major goal of this study is to harness this synergy between moment-independent GSA and black-box surrogates and to take advantage of their shared agnosticism to the nature of data and a data-generating process. To this end, we develop an information-theoretic GSA that uses a DNN surrogate model to generate sufficient amounts of data. Information theory has been used to carry out both local MajdaGershgorin:2010qu ; KomorowskiEtAl:2011sa ; MajdaGershgorin:2011sa ; PantazisEtAl:2013rn ; PantazisKatsoulakis:2013re ; HallKatsoulakis:2018hd and global Critchfield:1986sa ; LiuEtAl:2006re ; LudtkeEtAl:2008sa ; LiuHomma:2009uq ; Rahman:2016fs ; UmHallEtAl:2019bn sensitivity analyses. Our GSA approach utilizes the concept of differential mutual information (MI) CoverThomas:2006in ; Soofi:1994if to compute Mutual Information Sensitivity Indices (MISIs). It addresses the twin challenges of correlated/dependent CVs and non-Gaussian, skewed, multimodal, and/or mutually correlated QoIs. These features make MISIs an ideal decision-making tool for simulation-aided design.

Our MI-based GSA is compatible with any black-box model including DNNs such as physics-informed NNs Raissi:2019 ; RaissiPerdikarisKarniadakis:2019pinns ; ZhangLuGuoKarniadakis:2019uq ; YangPerdikaris:2019nn ; Meng:2020 and “data-free” physics-constrained NNs Sirignano:2018 ; Berg:2018 ; ZhuZabarasEtAl:2019pc ; SunEtAl:2020sm . Here, we leverage a Graph-Informed Neural Network (GINN) HallTaverniersEtAl:2020aa

that is tailored for multiscale physics and systems with correlated CVs. The GINN’s ability to generate “big data” allows us to consider higher-order effects due to interactions between the CVs. In turn, the MI-based rankings aid the interpretation of the GINN’s black-box predictions, closing engineering design loops. We validate these rankings by evaluating response curves along sensitive and insensitive directions and comparing these to their counterparts computed with a physics-based model. This comparison provides a clear interpretation of the GINN’s black-box predictions in terms of the physics-based model. That enables the GINN to close engineering design loops by deploying it to estimate subsequent effect rankings in parameter subspaces yielding optimal QoI values.

In the context of multiscale design UmHallEtAl:2019bn

, the MISI rankings help interpret the GINN’s predictions by identifying parameter regions that elicit targeted responses and then using new empirical response data predicted by the GINN for those parameter subspaces to refine an existing prototype. Thus, we illustrate how MI-based GSA for explainable DNN surrogate predictions enables outer-loop tasks, such as UQ and optimal design, to benefit from scientific machine learning. These rankings play a role similar to Shapley values 

StrumbeljKononenko:2011bb , partial dependence plots Friedman:2001gb , and individual conditional expectation plots GoldsteinEtAl:2015bb

found in the statistical learning and data mining literature. Similar to MISIs, these metrics aid in visualizing the relationship between predicted responses and one or more features in regression models and classifiers based on changes in certain conditional expectations. Unlike these metrics, our MISIs depend on distributional—as opposed to moment—information and provide a framework for estimating and ranking higher-order effects, or interactions, among correlated CVs that prove to be crucial for design of complex systems (cf. 

Fig. 5).

In Section 2, we develop a GSA framework that includes both first-order and higher-order MISIs. In Section 3, this methodology is combined with a GINN in the context of a testbed problem related to the design of a supercapacitor. Validation of the MI-based rankings and closure of design loops through subsequent rankings in the reduced parameter space with optimal QoI values are performed in Section 4. In Section 5, we summarize the main conclusions drawn from this study and discuss future work.

2 Mutual Information Sensitivity Indices for Model-Agnostic GSA

We consider a model of a complex physical system,


that predicts the response of a collection of QoIs to a collection of tunable CVs . The model propagates distributions on CVs to distributions on QoIs query-by-query, i.e., it generates one sample response , , for each input sample or observation

. We assume that all joint and marginal probability density functions (PDFs) are available for all CVs, but place no restrictions on the dependence structure of the CVs or on the nature of their functional relationship to the response.

2.1 First-order effects described by differential MI

Differential MI is a pseudo-distance used in machine learning Bishop:2006ml ; KollerFriedman:2009gm and model selection BurnhamAnderson:2002ms , among others. Quantifying the amount of shared information between and , the differential MI is defined as CoverThomas:2006in ,


where , , and denote marginal and joint PDFs with support , , and , respectively. The differential MI possesses many of the same properties as the discrete MI, including symmetry and non-negativity (with equality if and only if and are independent). Unlike its discrete counterpart, the differential MI can take on infinite values, e.g., if . The following features make the differential MI appropriate for GSA in multiscale design:

  1. its interpretation does not rely on the dependence structure of the CVs,

  2. its moment independence makes it suitable for a wide range of CV and QoI PDFs, and

  3. its continuous nature is suitable for analysis of continuous systems.

The first two features enable a model-agnostic implementation, while the last one facilitates UQ for downstream computations relying on continuous QoIs.

To describe the first-order effect of a CV on a target QoI , we define a MISI,


and interpret it as a measure of the strength of the association between and . A large score indicates that is a globally influential CV in the design of relative to the given PDF of . In complex systems, is unlikely to be completely described by a single CV , so the value of in creftype 3 is likely to remain finite. Since places equal importance on linear and nonlinear relationships due to the self-equitability of the MI KinneyAtwal:2014pn , it recovers the rankings of Sobol’ indices in the setting of independent CVs , i.e., when Sobol’ rankings are justified.

The MISI in creftype 3 can be estimated from empirical data generated by querying the model . A plug-in Monte Carlo estimator for given by,


can be computed via joint and marginal kernel density estimators (KDEs)

at input-output (io) data pairs ,  KrishnamurthyEtAl:2014rd ; KandasamyEtAl:2015vm . A Gaussian kernel KDE for an unknown PDF based on identically distributed observations of is given by Wasserman:2006np


Among many algorithms for the automated computation of the positive bandwidth parameters , we chose a direct plug-in bandwidth selector called the improved Sheather–Jones method BotevGrotowskiKroese:2010kd . To ensure that the joint and marginal KDEs in creftype 4 are defined consistently, i.e., that

we require the smoothing bandwidths for the joint and marginal PDFs to be equal. That is, the bandwidths related to in and must be the same.

The KDE-based direct plug-in estimator in creftype 4 is easy to implement. However, its computation is not sample-efficient and, hence, unfeasible in the absence of an efficient surrogate; moreover, KDEs are anticipated to fail in high dimensions KrishnamurthyEtAl:2014rd . In such circumstances, one can deploy alternative strategies for estimating MI, such as a non-parametric -nearest neighbor algorithm KraskovEtAl:2004mi and a non-parametric neural estimation approach suitable for high-dimensional PDFs BelghaziEtAl:2018mi . Contrary to the discrete MI indices Critchfield:1986sa ; LudtkeEtAl:2008sa , non-parametric density estimators, such as creftype 4, introduce no bias associated with a quantization of the QoIs, whose continuous nature may need to be preserved for the purpose of a downstream UQ analysis.

Remark 1 (Independent io).

The plug-in estimator creftype 4 involves the joint PDFs, i.e., is useful when io sample pairs are available. A change of measure in creftype 2 yields an equivalent estimator,


that is suitable for independent samples from the input and output distributions (cf. UmHallEtAl:2019bn ).

2.2 Higher-order effects described by conditional differential MI

For large spaces of possibly correlated CVs, it is of interest to also consider the impact of interactions among subsets of CVs on a given QoI. To describe the effects of pairwise interactions between on a target QoI , we define a second-order MISI,


in terms of the conditional differential MI,


The latter represents the MI between and conditioned on that we express in terms of joint and marginal PDFs.333Here and in the sequel we suppress the labels on densities when the distribution is clear from the context. The conditional MI in creftype 8 is related to the MI in creftype 2

through the chain rule,


for and where zero-indexed sets in the conditioning are empty. To see that creftype 7 captures only the second-order effects, we note that describes the full effect of the pair on . According to creftype 9, the full second-order effect is expressed as


which includes first-order effects and , while captures the interaction between and (the latter term vanishes if and are independent). The remaining conditional differential MI in creftype 10 describes the desired second-order effect.

A plug-in Monte Carlo estimator for the second-order index creftype 7 is


based on io triples , . The plug-in estimator is justified in the context of surrogate modeling and is easy to implement using KDEs creftype 5 with suitably equalized bandwidths. It can be built from the same sample data used to evaluate creftype 11 and comes with the same caveats.

A th-order MISI (with ) is defined as


It quantifies the impact of the interaction among the collection of variables on . The conditional multivariate differential MI is defined inductively,


It is symmetric with respect to permutation of the variables , with . For example, the third-order effect of the interactions among the triple , , and on is given by the third-order MISI,


The conditional multivariate MI in creftype 13 and, hence, in creftype 12 can be either positive or negative. They are related to a conditional form of the “interaction information” McGill:1954mi and the “co-information” Bell:2003mi . Instead of interpreting such higher-order (

) effects, we focus on algorithms for ranking the first- and second-order effects with appropriate confidence intervals (cf. 

Fig. 2 for first-order and Fig. 3 for second-order effect rankings).

2.3 Algorithms for MISI ranking with confidence

We assume the availability of a surrogate model for generating large amounts of response data. While based on slightly different theoretical approaches, the two algorithms described below enable the constructing of first- and higher-order effect rankings for a given QoI with a focus on the generation of associated confidence intervals. The latter enable distinguishing closely-ranked CVs and pairs of CVs.

2.3.1 Algorithm 1: Compute MISIs with confidence intervals adjusted for ranking

Algorithm 1 (see the pseudocode) constructs the plug-in estimators in creftype 4 with confidence intervals selected such that pairwise comparisons of the MISIs and their accompanying intervals determine the effect ranks. For fixed , we order the as


Then, the ranked first-order MISI estimators are


where the individual (additive) effects of the CVs are arranged in order of importance, from the greatest () to the least (). The rank of CV is estimated by the plug-in quantity,


such that


Since the MISI in creftype 3 is a global measure of sensitivity, creftype 16 represents a global ranking of the first-order effect of each CV relative to the distribution of .

input :     Surrogate model creftype 1
input :      independent samples of CVs
input : ,    Non-overlap sig. and tolerance
output :     Rankings with adjusted intervals creftype 23
      Compute target QoI observationsfor  to  do
      Compute and rank first-order MISI estimatorsfor  to  do
                First-order MISI creftype 4
      for  to  do
                Rank of th CV creftype 17
                Ranked MISI creftype 18

MISI standard error

creftype 19

Compute comparison-adjusted confidence intervals with average type I error

for  to  do
          Average non-overlap sig. creftype 22
          Normal PDF
       while  do
                Newton--Raphson iterations
          Difference level creftype 23
      return , ,
Algorithm 1 Compute first-order MISIs with confidence intervals adjusted for ranking

One could approximate the standard confidence interval for with



for the standard normal cumulative distribution function

and estimates the standard error . However, this confidence interval does not readily distinguish the rankings. The fact that the confidence intervals for the ranked effects and , with , fail to overlap does not necessarily mean that the difference in the rankings is statistically significant at the level. Following GoldsteinHealy:1995re , Algorithm 1 reports confidence intervals with comparison-adjusted widths, such that the non-overlap significance level meets a given threshold on average. Assuming normality and independence of and , the confidence intervals at level do not overlap if


Inequality creftype 20

holds with probability

, where the pairwise non-overlap significance level is given by


We select the level to ensure that the average of the pairwise errors over all is at a predefined level (set to ), i.e.,


The level , for which creftype 22 holds, is found via Newton–Raphson iteration (cf. Algorithm 1) using sample estimates for the standard errors. For each ranked effect , the approximate confidence interval at level is


the error bars indicate that the non-overlap significance level is on average. The intervals creftype 23 provide the visual comparison of pairwise effects with clear interpretation: overlapping/non-overlapping intervals imply that the associated ranks are indistinguishable/distinguishable.

With a slight modification, Algorithm 1 can be used to rank the second-order MISI estimators creftype 7,


where is the number of pairs with that can be formed from . The computation of creftype 24 replaces that of the first-order indices in Algorithm 1; the computation of the comparison-adjusted confidence intervals for , , proceeds analogously to that of the first-order confidence intervals creftype 23 with the non-overlap significance averaged over all pairs with .

2.3.2 Algorithm 2: Rank MISIs with percentile confidence intervals

Algorithm 2 (see the pseudocode) constructs non-parametric estimates with confidence intervals for the unknown rankings directly from the sampling distribution. In contrast to Algorithm 1 which uses normality theory, the present method builds a distribution for each rank by repeated observation of the MISIs using the surrogate model. For each , we compute MISI replications,


from io sample pairs , . That is, for each replication, we generate new io data pairs using the surrogate model and compute creftype 25 for every with these observations. From these replications, we use creftype 17 to compute rank replications , . The rank estimators are


and the corresponding percentile confidence intervals are


The equal-tail percentiles and are estimated from the replications in the spirit of the bootstrap percentile confidence intervals Efron:1981ci .

The computational burden of Algorithm 2 is greater than that of Algorithm 1, since the estimator is computed for each of the replications. Yet, this method is non-parametric and its results are anticipated to be more easily interpretable for large numbers of CVs and higher-order effect calculations.

input :     Surrogate model creftype 1
input :     Number of replications
input :     Number of observations per replication
input :     Equal tail percentile level
output :     Ranks creftype 26
output :     Percentile confidence intervals creftype 27
      Compute replications of MISI first-order effect ranksfor  to  do
            Generate io samplesfor  to  do
            Calculate one replication of each MISI for  to  do
                      th replication MISI creftype 25
            Calculate one replication of each rank for  to  do
                      th replication rank creftype 17
      Compute rank statistics from replicationsfor  to  do
             quantile () quantile ()
      return    cf. Algorithm 1 output
Algorithm 2 Rank first-order MISIs with percentile confidence intervals

3 MI-Based GSA with Black-Box Surrogates

As highlighted in the introduction, our MI-based approach to GSA is applicable to any black-box surrogate model. To illustrate the ability of MI-based GSA to deal with correlated CVs, for which variance-based GSA approaches are of limited value, we combine it with a GINN, a domain-aware DNN surrogate introduced in HallTaverniersEtAl:2020aa to overcome computational bottlenecks in complex multiscale and multiphysics systems. A multiscale formulation of electrodiffusion in nanoporous media serves as a testbed.

3.1 Multiscale supercapacitor dynamics

We consider an electrical double-layer capacitor (EDLC) Soffer:1972sc , whose electrodes are made of a conductive hierarchical nanoporous carbon structure Narayanan:2016sc . Electrolyte (an ionized fluid) fills the nanopores and contributes to the formation of the EDL at the electrolyte-electrode interface (see, e.g., Fig. A8 in HallTaverniersEtAl:2020aa ). Identification of an optimal pore structure of the carbon electrodes holds the promise of manufacturing EDLCs which boast high power and high energy density Nomura:2019sc ; LiEtAl:2020sc . This and other advancements, such as lower self-discharge electrolytes Wang:2019sc for enhanced long-term energy storage, position EDLCs as a viable replacement of Li-ion batteries in electric vehicles or personal electronic devices. Attractive features of EDLCs are their shorter charging times, longer service life, and reduced reliance on hazardous materials Beguin:2013sc .

Two macroscopic QoIs affect the EDLC performance: effective electrolyte conductivity and transference number (fraction of the current carried by the cations), such that . These QoIs are influenced by seven parameters / CVs: the electrode surface (fluid-solid interface) potential , initial ion concentration , temperature , porosity , (half) pore throat size , solid radius , and Debye length , such that . A physics-based model , derived in ZhangTartakovsky:2017np via homogenization, relates the inputs to the outputs

. This model involves closure variables (second-order tensors)

and EDL potential , whose determination is expensive and constitutes computational bottlenecks . Optimal design of the nanoporous electrodes in EDLCs involves the tuning of the CVs to elicit changes in the QoIs .

Figure 1: Visualization of the BN PDE (lower route) and GINN surrogate (upper route) for a multiscale model of EDL supercapacitor dynamics. The BN encodes conditional relationships between the model variables (both inter- and intrascale) and systematically includes domain knowledge into the physics-based model, ensuring the resulting BN PDE makes physically sound predictions. The GINN takes identical inputs (i.e., structured priors on CVs) to those of the BN PDE, but overcomes the latter’s computational bottlenecks (dashed box) by replacing them with learned features (solid box) in a DNN to predict the QoIs . The nodes in the hidden layers of the GINN make it a black box.

The complex nonlinear and multiscale relationship between and makes this a challenging engineering design problem and allows us to highlight the features (i)–(iii) of the MI-based GSA. The joint PDF of the random CVs

systematically quantifies uncertainties and errors arising in the physics-based representation. This key quantity for decision support is captured by a Bayesian Network (BN) 

HallTaverniersEtAl:2020aa ; UmHallEtAl:2019bn , which encodes both physical relationships and available domain knowledge (Fig. 1). The resulting probabilistic physics-based model , referred to as a BN PDE, propagates the joint PDF of , i.e., a structured prior, via to following the conditional relationships in the BN. As in HallTaverniersEtAl:2020aa , we assume both the CVs , , , and to be independent and their prior PDFs to be uniform on an interval of (for and ) or (for and ) around their respective baseline values (see Table 1 reproduced from HallTaverniersEtAl:2020aa ),


where the hyperparameters

represent the left and right endpoints of the support intervals. The remaining CVs, , and , are related conditionally (Fig. 1) to these independent inputs through the physical relations, equationparentequation


all the constants in creftype 29 and below are defined in HallTaverniersEtAl:2020aa

. The PDFs of these dependent CVs are estimated by sampling the uniform distributions

creftype 28 and computing a corresponding observation via creftype 29. Hence, the physics of the problem induces the correlations between the CVs represented by the conditional relationships in Fig. 1.

Variable label Mean/Baseline Variation Units
Table 1: Statistics of the uniform PDFs of the independent CVs in creftype 28 (from HallTaverniersEtAl:2020aa ).

3.2 GINNs: DNN surrogate models for multiscale physics

GINNs HallTaverniersEtAl:2020aa are domain-aware surrogates for a broad range of complex physics-based models. In the context of EDLCs, a GINN can be used to accelerate the propagation of uncertainty from structured priors on CVs to distributions of QoIs by replacing the computational bottlenecks in the BN PDE (the dashed boxed nodes of the BN in Fig. 1) with the GINN’s hidden layers. In so doing, it alleviates the cost of computing the QoIs , which includes bypassing the need to compute the effective diffusion coefficients of the cations () and anions () according to equationparentequation


3.2.1 GINN construction

The workflow for building the GINN surrogate is summarized as follows. Details, including the procedures for training and testing the GINN using the BN PDE, can be found in HallTaverniersEtAl:2020aa .

  1. Data generation (BN PDE): Generate io samples by drawing the inputs from the structured priors on and computing the corresponding responses with the BN PDE, and select training samples and test samples from this data set.

  2. Training: Using the io pairs and TensorFlow 2

    , train with 100 epochs a fully connected NN comprising:

    1. an input layer consisting of the seven CVs ,

    2. two hidden layers each consisting of 100 neurons,

    3. an output layer consisting of the two QoIs ,

    4. application of the ReLU (Rectified Linear Unit) activation function, and

    5. a given training error tolerance of .

  3. Testing: Test the trained GINN on the io pairs to analyze its generalization capability for unseen data for a given test error tolerance of .

  4. Prediction: Sample inputs from the structured priors on the CVs, and predict the corresponding responses with the trained GINN.

3.2.2 Computational efficiency of the physics- and GINN-based models

For complex numerical simulations, the cost of step 1 outweighs, by orders of magnitude, the combined cost of steps 2–4 HallTaverniersEtAl:2020aa . The GSA results reported below require samples of the BN PDE (physics-based model) to train the GINN that satisfies both the training and test error tolerances.444While testing requires the generation of additional io samples with the BN PDE, it is not strictly required and hence not taken into account when comparing the rankings generated with the physics-based model and the GINN. Since the generation of new samples with the GINN carries a negligible expense compared to the generation of training data with the BN PDE, the computational costs of the physics- and GINN-based GSAs are virtually identical when carrying out the former using samples; this allows us to investigate the performance of both approaches for a fixed computational budget.

3.3 GINN-based MISI rankings

Figure 2 exhibits the first-order MISI values for and (left column) and the corresponding ranks of the CVs (right column), estimated respectively with Algorithm 1 and Algorithm 2. All of these quantities are computed, alternatively, with the physics- and GINN-based models. The MISI values are equipped with the adjusted confidence intervals indicating a pairwise non-overlap significance (on average, at level ). The 95% percentile confidence intervals for the CV ranks in creftype 27, i.e., with , are based on replications; samples for the estimator in each replication are either predicted using the GINN or are bootstrap resampled from a corpus of physics-based simulations. For both Algorithm 1 and Algorithm 2, the physics- and GINN-based estimators are largely consistent, which is to be expected since the GINN surrogate satisfies both a preset training and test error tolerance. The highlighted gaps between clusters of MISI values in Fig. 2a,c indicate the groupings of various CVs by their relative importance. In Fig. 2b,d dashed lines correspond to these highlighted gaps; although the clarity of the rankings in Fig. 2b,d facilitates the automation of decisions in outer-loop tasks, the ranks themselves do not contain information about the relative importance of each parameter as in Fig. 2a,c.

Figure 2: Plug-in Monte Carlo estimators of the first-order MISIs in creftype 4 indicate the most impactful CVs for tuning the QoIs (top row) and (bottom row). For Algorithm 1 (left column), the width of the confidence intervals creftype 23 is chosen to achieve a non-overlap significance level pairwise on average. The highlighted gaps in (a) and (c) indicate clusters of CVs with similar relative importance. For Algorithm 2 (right column), the GINN-based estimators for CV ranking with percentile confidence intervals creftype 27 are consistent with the rankings in (a) and (c). These ranks are resolved, which is not feasible through bootstrapping the samples from the physics-based model. In both algorithms, the GINN surrogate enables querying sufficiently large amounts of data to distinguish closely-ranked CVs with a high degree of confidence. In (b) and (d) the dashed lines correspond to the gaps identified in (a) and (c), respectively.

For both QoIs, the MISI estimators obtained with the physics-based model lead to indeterminate rankings. For , the confidence intervals for and overlap, and therefore the difference between their MISI values (i.e., their ranking) does not differ at an average significance level . Similarly, for the rankings for are not resolved. In contrast, the differences between the corresponding estimators derived from (for ) or (for ) GINN-based predictions are pairwise significant at the level . Likewise, we observe that the ranks generated using replications with observations for or observations for predicted with the GINN are fully resolved (indeed, the 95% percentile confidence intervals are vanishingly small on the plots). Moreover, these ranks are consistent with the resolved rankings of the MISI values for and deduced from the GINN-based estimators obtained with Algorithm 1. These findings demonstrate the benefit of a GINN, since resolving the rankings with the physics-based model is considerably more expensive given the high cost of generating additional response samples with the BN PDE.

Figure 2 also compares the direct distributional method of Algorithm 2 to bootstrapped estimators with percentile confidence intervals computed using the physics-based model. Using bootstrap (i.e., resampling Wasserman:2006np ) replications of observations ( is constrained by the fixed computational budget) yields clusters of indeterminate ranks: the ranking of and for , and the ranking of and for , cannot be resolved at the level. In both cases, this is over of the CVs. Again, that ranking of the first-order effects of the CVs with a high degree of confidence within a constrained computational budget critically depends on the availability of a GINN (or, more generally, a surrogate model) to cheaply generate additional response samples.

Since the CVs are correlated, it is natural to expect higher-order effects due to interactions between the CVs. Figure 3 displays the second-order MISIs and their ranks estimated using Algorithms 2 and 1, respectively, for the QoI . The pairwise comparison of the adjusted confidence intervals with non-overlap significance reveals that approximately 80% of the estimators obtained with physics-based observations are indistinguishable (Fig. 3a). In contrast, the rankings deduced from the estimators obtained with GINN-based observations are fully resolved. The estimated second-order effect ranks based on replications of GINN-based observations are nearly identical, emphasizing the robustness and consistency of Algorithms 2 and 1. A large proportion of the MISI values is clustered and equally important. The GINN-based GSA resolves of the ranks compared to of the ranks distinguishable from bootstrap replications of physics-based observations. Here, the number of replications for the GINN-based ranks is chosen such that the total computational time of implementing Algorithm 2 is similar to that of computing all the physics-based bootstrap replications, which is approximately the case when the product is the same for both models.

Figure 3: Plug-in Monte Carlo estimators of the second-order MISIs in creftype 11 indicate the most impactful interactions between any two CVs in for tuning the QoI . The GINN surrogate improves the resolution of the second-order MISI rankings computed with (a) Algorithm 1 or (b) Algorithm 2. Although the first-order effect of and on are not top-ranked (see Fig. 2 (a) and (b)), we observe above that has the most important second-order effect on .

To summarize the key findings from the numerical experiments presented in Figs. 3 and 2: the big data generated with the GINN surrogate produce completely resolved first-order, and mostly-resolved second-order, MISI rankings. These rankings are largely consistent with the budget-constrained predictions of the physics-based model. Hence, Algorithms 2 and 1 facilitate the deployment of GINN for the acceleration and future automation of outer-loop decision-support tasks.

4 Design with Explainable Black-Box Surrogates

Like all DNNs, GINNs are black boxes that lack a clear functional relationship between inputs and outputs. MI-based GSA aids in interpreting and explaining their predictions, thereby enabling the use of black-box surrogates in simulation-based decision-making, including the closure of engineering design loops to facilitate rapid prototyping.

We validate the first-order MISI rankings discussed in Section 3.3, and then use these rankings to explore subregions of the original parameter space that deliver high values of the effective electrolyte conductivity, . Subsequent effect rankings within this parameter subspace suggest follow-up simulations or novel laboratory experiments, resulting in further refinements to the design of nanoporous electrodes for EDLCs.

4.1 Validation of MISI rankings

Fig. 4 shows normalized response surfaces, in the form of scatter plots and cubic regression splines based on observations, for the QoIs and along sensitive and insensitive parameter directions identified by the first-order MISI rankings in Fig. 2. The most sensitive parameter directions are for and for , and the least sensitive are for and for . The response surfaces for and (top and bottom rows in Fig. 4, respectively) demonstrate nonlinear relationships with respect to the most sensitive CV directions (Figs. 4a,c). In contrast, the random scatter for the least sensitive directions (Figs. 4b,d) suggests the lack of a clear relationship between these CVs and the QoIs. The quality and strength of these functional relationships validate the assigned rankings.

Figure 4: Response surfaces of the QoIs (top row) and (bottom row) with respect to the respective most (left column) and least (right column) sensitive parameters. The plots represent observations which are fitted with cubic regression splines. These results indicate nonlinear response surfaces for the most sensitive parameter directions, (a) and (c), in contrast to the random dispersion of observations for the least sensitive parameter directions, (b) and (d). This validates the first-order effect rankings in Fig. 2.

The MISI effect ranking and above validation step lend interpretability to the black-box predictions. This enables the use of GINNs in design iterations by predicting new response samples in reduced parameter spaces that optimize certain QoIs. The next section illustrates this procedure.

4.2 Design of multiscale systems under uncertainty

The first- and second-order MISI rankings suggest that the CVs , , and have the largest individual contributions to changes in (Fig. 2), and the CV pairs , , and (Fig. 3) have the largest pairwise interaction effect. The GINN-generated response surfaces of