1 Introduction and motivation
Across all data-centric disciplines, sensitivity analysis is an important task. The question “which of my parameters is the most important?” can be heard daily across shop floors, office spaces and board rooms. The measure of importance
is typically the conditional variance, leading to the familiar Sobol’ indices[23, 24] and the related total indices . However, these are not the only metrics available; over the years, other metrics for sensitivity analysis have been proposed, including those based on conditional skewness 
and kurtosis. Beyond this, Hooker  and Liu and Owen  introduce the notion of a superset importance, which is the sum of all Sobol’ indices based on a superset of an input set. Techniques based on partial derivative evaluations of a function have also been extensively studied [1, 15], and the relationship between these elementary effects and total Sobol’ indices have also been investigated (see ).
More recently, there has been work centered on higher-order indices (see Owen et al.  and Geraci et al. ). The former text is particularly interesting as the authors try to draw inference from higher-order skewness- and kurtosis-based indices. More specifically, their working analogy is that a positively skewed distribution will attain more extreme positive values than a symmetric distribution with the same variance; distributions with a large kurtosis will also attain greater extremes of both signs 
. Geraci et al., who introduce polynomial approximation-based ideas for computing skewness- and kurtosis-based indices, find applications of these indices in more accurate approximation of the probability density function (PDF) of the output response. In other words, higher values of a variable’s higher-order sensitivity index indicate that a higher-degree polynomial is required to capture the response’s behavior with respect to that variable—requiring more samples.
These ideas are part of a broader effort to identify which variables—and under what distributions—drive the output response to its highest levels. In this paper, we take clues from  and Monte Carlo filtering (see page 39 of ). We study known metrics based on skewness and, in the general case, we identify them to offer information on the convexity along each input direction, but not necessarily any information on extremum sensitivity, assuming monotonicity. For more general functions, we propose a Monte Carlo filtering-based strategy (see section 3).
Central to our investigation is the tractable computation of various sensitivity metrics. It is well-known that computing sensitivity indices—particularly in high-dimensional problems—can be prohibitive owing to the excessively large number of evaluations required. While access to a function’s gradients can abate this curse of dimensionality, our focus will be on the general case where access to a function’s gradients (or adjoint) is not available. Our approach is to use orthogonal polynomial least squares representations of the function. This implies that we are restricting our study to functions that can be well-approximated by globally smooth and continuous polynomials. Using orthogonal polynomials to this end has been relatively well-studied and prior work can be found in [25, 21] and the references therein.
The remainder of this paper is structured as follows. In section 2 we interpret the global polynomial approximation problem as that of computing its coefficients. For polynomials of a few dimensions, this can be done in the absence of assuming anisotropy; for higher-dimensional approximations, on the other hand, anisotropy is necessary to circumvent the curse of dimensionality. More specifically, we construct polynomial approximations over subspaces—a few linear combinations of all the parameters. From the coefficients in the projected space, we design a polynomial surrogate in the full space for determining the coefficients in the full space. These coefficients are then used for quantifying extremum sensitivity information, detailed in section 3. In section 4, we demonstrate the algorithms and ideas proposed in this paper with two numerical examples.
2 Computing the coefficients of polynomial approximations
Prior to detailing techniques for computing coefficients of the aforementioned polynomials, it will be worthwhile to make more precise our notion of anisotropy. Let us consider a polynomial
For the case where and where
(the identity matrix), we have a polynomial approximation in thefull -dimensional space. When and when the columns of are orthogonal, they span a -dimensional subspace in 111Note that the columns do not have to be orthogonal; they simply have to be linearly independent.. In this case, the function is referred to as a generalized ridge function . Depending on what the weights in are, the polynomial will be anisotropic with the larger weights prescribing linear combinations along which the polynomial varies the most. For notational clarity, we add a subscript below to denote the dimension over which the polynomial is constructed: denotes a polynomial in the full space, while denotes a polynomial over a subspace. From hereon, we will assume that .
2.1 Notation and main problem
Now, let us assume the existence of a smooth and continuous function for which we wish to compute sensitivities. Provided one has access to input/output pairs
and one can express as a polynomial (albeit approximately), we can state
where is an orthogonal polynomial design matrix in dimensions. Writing the expansion of in terms of its basis terms,
we can identify to be
In other words, is a Vandermonde-type matrix. The number of rows corresponds to the number of model evaluations, typically scaling exponentially with respect to , while the number of columns corresponds to the number of unknown polynomial coefficients. Consider two scenarios for the size of vs :
: To solve a linear system with more unknowns than equations, one has to impose some form of regularization. Solution strategies based on linear programming and basis pursuit denoising[26, 6] have been used to solve
This, of course, assumes that is sparse, which need not necessarily be the case, depending on the polynomial basis terms selected.
So, why the need to use a global polynomial approximation? As alluded to already, accurate polynomial approximations of permit us to leverage prior work  on computing global variance-based sensitivities using only the coefficient values. Moreover, these coefficients can also be used for computing skewness-based sensitivity metrics, which may offer insight into understanding which variables push to its extrema.
But what if , in addition to being smooth and continuous, was also anisotropic? The idea of developing global sensitivity metrics for ridge approximations has been studied previously by Constantine and Diaz . Using ideas from active subspaces , the authors define activity scores that are based on the eigendecomposition of the average outer product of the gradient with itself in the input space. They draw comparisons between these activity scores and total Sobol’ indices (see Theorem 4.1 in ). We aim to build upon their ideas and seek to approximate higher-order sensitivity information using , leveraging polynomial approximations.
Consider the ridge approximation
where is another orthogonal polynomial design matrix, but now in . Since typically , we will assume that , implying that we have at least as many model evaluations as unknown coefficients in . In other words, computing using regular least squares should not be a problem222We are assuming that the points are selected such that has a relatively low condition number.. The trouble, however, is that are the coefficients of , and we need access to the coefficients associated with
in order to estimate global sensitivities.
The key is to notice that is simply a rearrangement of coefficients from . In other words, we can convert the coefficients of to those of in a lossless manner. Namely, we can solve the square linear system
where we can choose points of evaluation for both Vandermonde matrices arbitrarily. As long as the points of evaluation are distinct, we can invert to solve for using, e.g., LU factorization. Using this method, the accuracy of the estimated polynomial coefficients is now dependent solely on the accuracy of estimating the dimension-reducing subspace spanned by and the inherent error of a ridge approximation due to variations of outside the range of . Assuming the presence of strong anisotropy, the subspace can be computed with significantly fewer data than a full polynomial approximation using techniques such as active subspaces , polynomial variable projection , and minimum average variance estimation (MAVE) .
As an illustrative example, suppose we calculate
as the leading eigenvectors of the gradient covariance matrix
where we have assumed a PDF and input domain for the function with mild assumptions to allow the existence of this matrix. It can be shown that the mean squared error between a function and its ridge approximation is bounded 
with being the Poincaré constant depending on and and Tr the trace. Assuming now that and , we can bound the total sum of squares difference between the coefficients with this quantity
With a strong ridge structure, the trailing eigenvalues—i.e. the diagonal entries in—will be small compared to the leading ones, resulting in better approximation of the coefficients of the full space.
3 On extremum sensitivity information
We now address the second key contribution of this paper: our study of derived higher-order sensitivities to make inference on the extrema of . We start with a brief overview on the computation of sensitivity indices—including higher-order indices—using orthogonal polynomial expansions. Before we proceed with this endeavour, we need to establish a system to identify sensitivity indices with their associated input variables. For this, we introduce multi-indices, which are similar to formulations in the literature such as .
3.1 Multi-index notation and Sobol’ indices
We associate each scalar index with a multi-index , which is a non-negative integer sequence. This association is bijective, but the order is not important. For consistency with the previous sections, we insist that the first polynomial be a constant, implying that . As an example, with , one possible assignment is shown in table 1. Define a function that returns the positions of the non-zero indices in a multi-index. For instance,
The set of all possible multi-indices in the expansion is known as the index set. For instance, an index set of maximum total order is specified by
With this notation, we can write the polynomial expansion creftype 4 by identifying each basis term with its degree in each input variable.
where the univariate basis polynomials are orthonormal to each other with respect to the marginal input PDFs . This implies that the multivariate basis polynomials also form an orthonormal system under , and are suitable as basis functions in a Sobol’ decomposition , of which we omit the details. In essence, we can consider the decomposition of the total output variance via the following expression
Owing to orthonormality, this expansion allows us to compute the Sobol’ indices straightforwardly given the expansion coefficients. For instance, the Sobol’ index for a set of variables is given by
A measure of the importance of an input variable is the total Sobol’ index , which is the sum of all Sobol’ indices that contain the variable concerned. For instance, for variable ,
3.2 Skewness-based indices
We are now ready to define skewness-based sensitivity indices. The definition is similar to creftype 16, where we used the Sobol’ functional ANOVA decomposition and linked it with the polynomial expansion. Applying the same method here yields the following for the global skewness
However, unlike the case for variance-based indices, the resultant indices cannot be expressed in a form as simple as creftype 17. Here, we cite the full result from  for the skewness index of a set of variables . Letting be the corresponding multi-index to and so on, we define the following sets for a given set of variables
It can be shown that the skewness index for is
The total skewness index for is defined analogously,
From creftype 20 we see how the computation of skewness indices is more involved than computing Sobol’ indices. In particular, we must be careful to selectively compute the integral terms to avoid multiple complete loops over all of the indices, which is computationally intractable. In the appendix we provide an efficient algorithm for computing skewness indices that has been implemented in the open-source polynomial-based uncertainty quantification software package Effective Quadratures , available at https://www.effective-quadratures.org/.
So, how useful are skewness indices for inference related to output extrema? A salient feature of skewness indices is the possibility of them being negative. While Sobol’ indices must be non-negative, skewness indices do not have this restriction—they only have to sum up to unity. In Owen et al. , the authors state that the sign of skewness indices may be interpretable as the measure of importance of variables near extrema of the output—with a large positive skewness index indicating importance near a maximum and large negative skewness index indicating importance near a minimum. The question is whether the signed nature of skewness indices helps differentiate between importance near maxima or minima. We find that the utility of the signs is not clear without further assumptions on the function.
At the heart of sensitivity analysis—and indeed the broader area of uncertainty quantification—is the task of characterizing the statistical moments of the inputs relative to the output. With this goal in mind, we state a theorem from van Zwet
that links the skewness of a random variable to the skewness of a convex transformation of the random variable:
Theorem 3.1 (from ).
Suppose that is a measurable univariate monotonic convex function of a random variable , and all moments up to the third order exist. Then, the skewness of is larger than or equal to the skewness of . If the function is concave, the skewness of is smaller than or equal to the skewness of .
Thus, if we consider additive functions of the form where each only depends on one variable , it is straightforward to conclude that skewness indices are intimately related to the convexity of the subfunctions . For instance, if is convex, the input marginal PDF of is symmetric (zero skewness), and the overall function has a positive skewness, then the skewness index of will be positive, and will contribute to skewing the output PDF towards the left.
Theorem 3.1 offers partial justification to the interpretation that skewness indices give local sensitivity measures near output extrema mentioned earlier under some assumptions—for example, the function must be additive, monotonic, and twice differentiable. This is because a monotonic convex function gives a steeper slope near maxima; and a monotonic concave function gives a steeper slope near minima. However, in general, the second derivative is not always indicative of importance, and it is difficult to make any rigorous statements about the sign of skewness indices. Instead, we will empirically examine the effectiveness of this information through a numerical example in section 4.2.
3.3 Monte Carlo filtering with a polynomial surrogate
Monte Carlo filtering (MCF) [19, p. 38] is a heuristic for determining the input variables that are important in driving the output to certain ranges. The input space is partitioned into two regions—one where the corresponding outputs fall in the range of interest , and one where this does not happen . The importance of a variable is determined by the difference between the conditional PDFs of and . If these distributions are significantly different, the variable is deemed important in distinguishing whether the output falls into our range of interest or not. Statistical tests for the difference between two distributions include the Smirnov two-sample test [19, p. 185] for general distributions. Approaches based on comparison of statistics such as the median  and the moments  have also been proposed.
An approach based on MCF can be used for extremum sensitivity analysis by partitioning the input space between regions leading to output extrema and regions which do not. We can determine whether an input is important near an extremum by examining its conditional variance within the region of interest. The motivation behind using the variance is the notion that it indicates the variability of the input within the selected region—a large variance near an extremum indicates indifference to the input variable concerned in the vicinity of this output extremum. To quantify this concept, we introduce the variance reduction for an input variable ,
The quantity is the restricted variance, indicating the importance of the variable near an extremum value. To calculate the restricted variance, we use rejection sampling to generate a batch of samples that give extreme output values, and calculate their sample variance as an estimate. The cost of this sampling procedure is greatly reduced by an appropriate surrogate model, such as the multivariate polynomial approximation proposed in section 2. We compare the MCF-based heuristic with inference from skewness indices in section 4.2.
4 Numerical examples
4.1 Ridge approximation of polynomial coefficients
To demonstrate the method of approximating polynomial coefficients using a ridge approximation, we study the estimation of Sobol’ indices of the following approximate ridge polynomial ,
and the orthogonal columns of span, respectively, the column and left null space of
In addition, is a noise parameter that represents the deviation of from an exact ridge function within the subspace of . We compare three methods of evaluating the sensitivity indices, namely
Through polynomial ridge approximation: using MAVE , we estimate a dimension-reducing subspace for the function of interest. Then, using the same samples, we fit a low-dimensional polynomial ridge over the reduced space , whose coefficients we use to calculate the coefficients in the full space via creftype 9.
Fitting a polynomial on the full input space via least squares. Note that this requires at least 330 function evaluations, since we are using an isotropic polynomial basis of maximum total order 4.
The Sobol’ indices and
are computed using the methods above. To benchmark these results, we also compute the quantities using integration via Gauss quadrature on a quadratic tensor grid of Legendre polynomials with 2187 points. The fact thatis a polynomial allows this integration to be exact. Figure 1 shows the results of this comparison for and
, respectively. It shows that leveraging the (approximate) ridge structure of the quantity of interest, we are able to compute the polynomial coefficients on the full space with higher accuracy than quasi-Monte Carlo. However, when we are allowed to evaluate the function more than 330 times, regression on the full space using a total order grid is superior. Ridge regression is also shown to be quite robust to small deviations outside of, especially when the number of evaluations is small.
4.2 Higher-order inference
We empirically study the utility of skewness indices in extrema sensitivity analysis via an example. The example concerns a model of a piston adapted from , which describes the cycle time of the piston as a function of seven parameters. These parameters are shown in table 2
. The input variables are independently, uniformly distributed over their support.
|[30, 60]||Piston mass (kg)|
|[0.005, 0.020]||Piston surface area (m)|
|[0.002, 0.010]||Initial gas volume (m)|
|[1000, 5000]||Spring coefficient (N/m)|
|[90000, 110000]||Atmospheric pressure (N/m)|
|[290, 296]||Ambient temperature (K)|
|[340, 360]||Filling gas temperature (K)|
The cycle time in seconds is
We perform sensitivity analysis on this model by examining both the Sobol’ indices and skewness indices, using a seven-dimensional Legendre polynomial basis of maximum total order 5. We approximate the total sensitivity indices by summing skewness indices up to the third order (skewness indices involving up to three variables). The results are shown in table 3, in which the indices with large magnitudes are shown in bold.
Note that our first-order Sobol’ indices results agree well with those of Owen et al.  after normalization. The same cannot be said for the first-order skewness indices. This is most likely due to differences in the definitions of the skewness-based indices—Owen et al. generalize the Sobol’ identity to higher powers, whereas we follow the approach of Geraci et al.  to generalize the Sobol’ decomposition to higher powers. However, with regard to the ranking of the magnitudes and the signs, we can draw an agreement.
Now, we study the utility of the signs of the skewness indices to indicate importance near extrema, by comparing the results of the MCF-based approach detailed in section 3.3 for both the output maximum and minimum. To evaluate the restricted variance for each variable, we use rejection sampling, selecting the top 5% and bottom 5% of the output values, and calculating the sample variance of the input points that correspond to these extreme output values. The overall variance can be calculated analytically—, where is the range of the corresponding variable. The variance reduction of each variable is shown in table 4, and the marginal PDFs for the input variables near the extrema are shown in fig. 2.
First, we note that the global skewness is positive for the piston cycle time. For , both the first order and total skewness indices are positive, indicating importance near the maximum. This corroborates with the MCF results in table 4, although the fact that is also rather important near the minimum is not reflected, since it is dominated by the effect of .
For , the first-order index is negative, while the total skewness index is positive. We can interpret this as the fact that terms involving alone strongly contribute to importance near the minimum, while accounting for every term involving results in a net effect of importance near the maximum. Table 4 indicates that while is the most important near the minimum, it is also significant near the maximum. To obtain a full interpretation one need to consider the skewness indices at different orders.
We also remark that while and are also significant near the maximum, their effects are only reflected in the total skewness indices. The piston model provides an example where the signs of the total skewness indices are correlated to the importance of a variable near either maxima or minima, but it is unclear how general this relationship is.
In this paper we explore two related threads in sensitivity analysis. Through a polynomial-based approximation framework, we show the possibility of leveraging the ridge structure of a quantity of interest to mitigate the curse of dimensionality and reduce the number of observations required for sensitivity analysis. This polynomial-based approximation framework can therefore facilitate more efficient sensitivity analysis. In terms of extremum sensitivity analysis, it allows the efficient calculation of sensitivity indices, including skewness-based indices, or inexpensive sampling that accelerates MCF-based approaches. While we have demonstrated the utility of skewness-based indices in a numerical example, further studies are required to establish the general utility of these indices.
This work was supported by the Cambridge Trust; Jesus College, Cambridge; the Alan Turing Institute and Rolls-Royce plc.
Appendix A Computing skewness indices
As described with creftype 20 in section 3, the method for calculating skewness indices via polynomial expansion coefficients is more involved than calculating the Sobol’ indices, because the integrals of third-order cross terms cannot be simplified easily with orthogonality. Hence, the expectation terms within each summand need to be evaluated using numerical quadrature. That is, we can approximate the expectation of a function using Gauss quadrature
where are the quadrature points and weights. In algorithm 1 we describe the algorithm implemented in Effective Quadratures . In general, an loop is required, and since can scale exponentially with the input dimension, this presents a significant computational burden. Our algorithm incorporates several methods to reduce the computational complexity, namely
Using a selection function similar to that proposed in  to avoid computation of terms which we know are zero;
Scanning the index set in an loop before computation to filter out indices that involve more variables than specified. This cuts the cost of computation significantly for indices involving a small number of variables.
Using a formulation amenable to matrix operations to avoid explicit loops.
The code can be found at https://github.com/Effective-Quadratures/Effective-Quadratures.
-  Francesca Campolongo, Jessica Cariboni, and Andrea Saltelli. An effective screening design for sensitivity analysis of large models. Environmental Modelling & Software, 22(10):1509–1518, 2007.
-  Paul G. Constantine. Active Subspaces: Emerging Ideas for Dimension Reduction in Parameter Studies, volume 2. SIAM, 2015.
-  Paul G. Constantine and Paul Diaz. Global sensitivity metrics from active subspaces. Reliability Engineering & System Safety, 162:1–13, 2017.
-  Aronne Dell’Oca, Monica Riva, and Alberto Guadagnini. Moment-based metrics for global sensitivity analysis of hydrological systems. Hydrology and Earth System Sciences, 21:6219–6234, 2017.
-  Gianluca Geraci, Pietro M. Congedo, Remi Abgrall, and Gianluca Iaccarino. High-order statistics in global sensitivity analysis: Decomposition and model reduction. Computer Methods in Applied Mechanics and Engineering, 301:80–115, 2016.
-  Jerrad Hampton and Alireza Doostan. Compressive sampling of polynomial chaos expansions: Convergence analysis and sampling strategies. Journal of Computational Physics, 280:363–386, 2015.
-  Jon Herman and Will Usher. SALib: An open-source Python library for Sensitivity Analysis. The Journal of Open Source Software, January 2017.
-  Jeffrey M. Hokanson and Paul G. Constantine. Data-Driven Polynomial Ridge Approximation Using Variable Projection. SIAM Journal on Scientific Computing, 40(3):A1566–A1589, 2018.
-  Toshimitsu Homma and Andrea Saltelli. Importance measures in global sensitivity analysis of nonlinear models. Reliability Engineering & System Safety, 52(1):1–17, 1996.
-  Giles Hooker. Discovering additive structure in black box functions. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 575–580. ACM, 2004.
-  Ron Kenett, Shelemyahu Zacks, and Daniele Amberti. Modern Industrial Statistics: with applications in R, MINITAB and JMP. John Wiley & Sons, 2013.
-  Sergei Kucherenko and Ilya M. Sobol’. Derivative based global sensitivity measures and their link with global sensitivity indices. Mathematics and Computers in Simulation, 79(10):3009–3017, 2009.
-  Ruixue Liu and Art B. Owen. Estimating Mean Dimensionality of Analysis of Variance Decompositions. Journal of the American Statistical Association, 101(474):712–721, 2006.
-  Marco Marozzi. Nonparametric Simultaneous Tests for Location and Scale Testing: A Comparison of Several Methods. Communications in Statistics - Simulation and Computation, 42(6):1298–1317, 2013.
-  Max D Morris. Factorial Sampling Plans for Preliminary Computational Experiments. Technometrics, 33(2):161–174, 1991.
-  Art B. Owen. Personal communication, 2018.
-  Art B. Owen, Josef Dick, and Su Chen. Higher order Sobol’ indices. Information and Inference: A Journal of the IMA, 3(1):59–81, 2014.
-  Allan Pinkus. Ridge Functions, volume 205. Cambridge University Press, 2015.
-  Andrea Saltelli, Marco Ratto, Terry Andres, Francesca Campolongo, Jessica Cariboni, Debora Gatelli, Michaela Saisana, and Stefano Tarantola. Global Sensitivity Analysis: The Primer. John Wiley & Sons, 2008.
-  Pranay Seshadri, Gianluca Iaccarino, and Tiziano Ghisu. Quadrature Strategies for Constructing Polynomial Approximations. In Uncertainty Modeling for Engineering Applications, pages 1–25. Springer, 2019.
-  Pranay Seshadri, Akil Narayan, and Sankaran Mahadevan. Effectively Subsampled Quadratures for Least Squares Polynomial Approximations. SIAM/ASA Journal on Uncertainty Quantification, 5(1):1003–1023, 2017.
-  Pranay Seshadri and Geoffrey Parks. Effective-Quadratures (EQ): Polynomials for Computational Engineering Studies. The Journal of Open Source Software, March 2017.
-  Ilya M. Sobol. Sensitivity estimates for nonlinear mathematical models. Mathematical Modelling and Computational Experiments, 1(4):407–414, 1993.
-  Ilya M. Sobol. Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Mathematics and Computers in Simulation, 55(1-3):271–280, 2001.
-  Bruno Sudret. Global sensitivity analysis using polynomial chaos expansions. Reliability Engineering & System Safety, 93(7):964–979, 2008.
-  Gary Tang and Gianluca Iaccarino. Subsampled Gauss Quadrature Nodes for Estimating Polynomial Chaos Expansions. SIAM/ASA Journal on Uncertainty Quantification, 2(1):423–443, 2014.
-  Steve C. Wang. Quantifying passive and driven large-scale evolutionary trends. Evolution, 55(5):849–858, 2001.
-  Frank Wilcoxon. Individual Comparisons by Ranking Methods. Biometrics Bulletin, 1(6):80–83, 1945.
-  Yingcun Xia, Howell Tong, W. K. Li, and Li-Xing Zhu. An adaptive estimation of dimension reduction space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(3):363–410, 2002.
-  Willem R. van Zwet. Convex Transformations of Random Variables. Mathematical Centre Tracts 7. Mathematisch Centrum, Amsterdam, 1964.