1 Introduction
The governing physics in many engineering problems are described by a system of partial differential equations (PDEs). These equations can be solved by suitable discretization methods such as finite element and finite volume methods, where scalar fields—e.g. pressures, temperatures, strains—are computed at each node over the PDE domain. One can interpret these scalar fields as vectorvalued functions, conditioned upon certain boundary conditions and geometry parameters. Here each value of the vector corresponds to the scalar field quantity at a specific node of the domain. Integrands of these scalar field variables typically constitute output qois in uncertainty quantification studies. For instance, when propagating uncertainties in the Mach number and angle of attack of flow over an airfoil, one is interested in quantifying the moments of the lift and drag coefficients
[31]; both lift and drag coefficients are surface integrals of the static pressure field around the airfoil [1, Ch. 1]. In studying the impact of uncertainties in leakage flows in a compressor [25], one is interested in quantifying the moments of isentropic efficiency, which can be expressed as an integral of the pressure and temperature ratios [9, p. 7]. Qois that are integrals of such scalar fields are prevalent well beyond computational fluid dynamics (CFD). For instance, the displacement of a structure is the integral of the strain field [28, sec. 2.1], which allows us to analyze linear elastic displacement problems in areas including soil mechanics [2, sec. 6.4] and machine component design [19, sec. 5.2] using the strain field.Uncertainty quantification studies typically require a design of experiment, where the governing PDE model is evaluated under different inputs. The number of times the model is evaluated depends on the dimensionality associated with the input space and the nonlinearity of the scalar qois. In this paper, we explore a deviation from this paradigm. Rather than storing scalar qois for each model evaluation, we explore whether one can reduce the number of model evaluations if one stores select scalar fields. More specifically, we want to design emulators for the scalar fields themselves and then integrate the emulators to obtain the desired qois. To abate the widely known curse of dimensionality, we will draw ideas from ridge functions [22] for constructing these emulators.
So, why should this reduce the number of model evaluations? Consider the following examples. Over an airfoil in subsonic flow, the static pressure of a point near the leading edge is unlikely to be strongly affected by perturbations far downstream. In a similar vein, the local deflection of a structure is unlikely to be affected by small local changes in elastic properties far away from the point of measurement. As the number of effective degrees of freedom is low, one should be able to design algorithms to exploit the intrinsic sparsity of scalar field quantities—more so than their synthesized (integrated) scalar qoi counterparts. This trait manifests as an approximate containment of the variation of these scalar field quantities within a subspace of the input domain, motivating the idea of a
ridge approximation of a scalar field quantity.A function whose variation is entirely contained within a subspace described by ran(), where has orthogonal columns, is called a generalized ridge function [22]. That is, it can be expressed as
(1) 
In this paper, we will refer to these functions simply as ridge functions for brevity. Many physical qois are characterized by anisotropy in the input domain—i.e. they vary strongly only within a subspace—such as the output components described in the previous paragraph. These qois can be wellapproximated by ridge functions, and the process of finding these ridge functions is known as ridge approximation. Namely, we find with orthogonal columns and such that
(2) 
where the approximation can be formulated by minimizing the integral of the mean squared error (MSE) over the input domain [5, sec. 3.1]. When given a ridge approximation, significant computational runtime savings can be achieved by working with as an emulator for if , where we have effectively reduced the dimension of our problem from to . We refer to the span of the columns of as ridge directions, and as the ridge profile. As takes the dimensional input to an dimensional projection, we also call the column space of the dimensionreducing subspace.
Numerous methods for ridge approximation have been proposed in the literature. Central to this paper are strategies based on analysis of the average outer product of the gradient with itself, which we will call the gradient covariance matrix of . Assuming that is Lipschitz continuous with bounded first derivatives, this matrix is defined as
(3) 
where is the input domain, and is a strictly positive function that integrates to unity within . We assume that all entries in this matrix are finite. Interpreting
as a probability density function (PDF) over the input domain, we can replace the integral by the expectation over
, which we treat as a random variable
^{1}^{1}1In the following, unless specified otherwise, all expectations are computed with respect to .. Samarov [23] and Constantine et al. [4, 5]analyze the eigendecomposition of this matrix and show that, if the variation of the function outside the span of eigenvectors corresponding to large eigenvalues is small, then a suitable choice for the ridge directions are the leading eigenvectors of the gradient covariance matrix. The subspace spanned by these leading eigenvectors is termed the
active subspace of by Constantine et al. [5]. In a recent paper by Zahm and coauthors [33], an extension to vectorvalued functions is proposed. In Section 2 we will explore the properties of their vector gradient covariance matrix.In the absence of gradient information, other methods of ridge approximation have been proposed. Fornasier et al. [12] and Tyagi et al. [30] describe methods to recover lowdimensional ridge structures with finite differencing through compressed sensing and lowrank recovery, respectively. Hokanson et al. [18] describe an algorithm to form a polynomial ridge approximation from fewer data than required for a full response surface by solving a nonlinear least squares problem with variable projection [15]. Constantine et al. [6] and Eftekhari et al. [11] describe methods to estimate the ridge directions based on finite differences. In addition, Glaws et al. [14] draw an analogy between ridge recovery and sufficient dimension reduction (SDR). In SDR, the goal is to find the minimal subspace, described by the column span of a matrix , with which we can establish conditional independence between a set of covariates and the response in a regression setting. That is, we seek a subspace described by such that given . This central subspace is intimately linked to the ridge directions [14, thm. 2], which implies that regressionbased methods within the study of SDR can be applied to ridge approximation. Examples of such methods include both inverse regression [21, 8] and forward regression^{2}^{2}2Given a set of predictor/response pairs , forward regression aims to estimate statistics of the distribution of the response given covariates (), while inverse regression aims to characterize . [32].
Given the ridge directions , the ridge profile that minimizes the mean squared approximation error over can be shown analytically to be [5, 33]
(4) 
Several approaches for approximating this average have been proposed, including the use of Gaussian processes [27] and multivariate orthogonal polynomials [13].
In this paper, we build upon these ideas and introduce an embedded ridge approximation to connect the ridge approximation of desired qois with ridge approximations of their constituent scalar fields. Loosely stated, an embedded ridge approximation is a vectorvalued function, where each component is a ridge approximation—computed for scalar fields at select nodes within the computational domain. We show that it is possible to reduce the number of model evaluations for computing the dimensionreducing subspace of a qoi, by exploiting these embedded ridge approximations.
The rest of the paper is structured as follows: in Section 2, we describe the concept of an embedded ridge function, and provide an algorithm supported by theoretical analysis that leverages this embedded structure to find the dimensionreducing subspace of a scalar qoi based on an underlying spatial field. In Section 3, we leverage the similarity between neighbouring output components to motivate a method of efficiently storing the array of ridge directions associated with each output component, and provide some algorithms for this process. In Section 4, we use analytical and numerical examples to illustrate the algorithms proposed in this paper.
Notation. We denote the approximation of a quantity with a hat. For instance, a finitesample estimate of a matrix is denoted as and a response surface for fitted with finitely many samples is denoted . A general perturbation of a quantity is denoted with a tilde. For instance, a perturbation of is denoted .
2 Embedded ridge approximation
Consider the scalar field where parameterizes our model of interest and is a scalar that denotes the spatial variation of . We place the following assumptions on this field:

the input space is endowed with a probability density ;

the field is square integrable with respect to the probability density and Lipschitz continuous with bounded and square integrable first partial derivatives with respect to for all .
An example of is the pressure within a computational domain—characterized by geometry parameters or boundary conditions and a probe location . Our goal is to study dimensionreducing subspaces induced by the function
(5) 
where is a weight function. In other words, represents a weighted average of the scalar field , and is the relevant qoi. If is the pressure distribution, then one can think of as the lift coefficient or drag coefficient, for instance. Assuming that the partial derivative of the integrand is bounded independent of and , we can write
(6) 
Now, let us assume that can be approximated by a ridge function of the form
(7) 
where , where identifies the dimension of the subspace spanned by the columns of . Note that , and will, in general, depend on . The gradient of is then given by
(8) 
noting that . Thus,
(9) 
In practice, we can approximate this gradient with an point quadrature rule , with quadrature points and quadrature weights , from which we arrive at the approximation
(10) 
where we have expressed as and as for notational convenience. Then, from (3) we can approximate the scalar covariance matrix of as
(11) 
The fact that (11) is expressed in terms of the ridge parameters and is noteworthy. Given , we can find the dimensionreducing subspace of simply via an eigendecomposition, and this equation informs us that it is possible to shift the computational burden from evaluating the gradients to the estimation of the ridge parameters and . In the absence of automatic differentiation or adjoint solvers, the former may require finite differences or the use of a surrogate model, whose cost of formulation suffers from the curse of dimensionality. On the other hand, at a fixed location the dependence of on is likely to be highly anisotropic, depending mainly on the parameters that impact nodes adjacent to . This implies that the number of ridge directions at each location is likely to be small, and will be a lowdimensional function. For many methods of ridge approximation, this means that the amount of simulation data required for this step is reduced for a given approximation accuracy. This brings us to the central idea of this paper: instead of directly calculating the dimensionreducing subspace of , we leverage the decomposition of into its composite scalar field, whose dimensionreducing subspaces are likely to be inexpensive to compute. Then, from these component subspaces, we assemble the scalar gradient covariance of .
2.1 Interpretation as vectorvalued dimension reduction
We assume that there exists a set of quadrature points (domain nodes) to evaluate the scalar field that allows us to formulate the ridge approximation (7) easily; we argue that reducing the dimensionality of the qoi is facilitated by the formulation (11). We can treat the scalar field evaluated at the prescribed positions as a vectorvalued function , whose components exhibit ridge approximations.
Definition 1 (Embedded ridge approximation)
Let be a vectorvalued function with components , where each . Such a function is called an embedded ridge approximation if it satisfies
(12) 
where the th element of the vector, , is wellapproximated by a ridge function of the form , for . Here all subspaces have the same number of rows , but may have different numbers of columns . It is assumed that the input variables are independent with respect to the input PDF .
There are parallels between an embedded ridge approximation and vectorvalued dimension reduction. In [33], the authors introduce a vector gradient covariance matrix, analogous to the scalar form given in (3).
Definition 2 (Vector gradient covariance matrix)
We define the vector gradient covariance matrix as
(13) 
where is a symmetric positive semidefinite matrix of weights, and
(14) 
where is the Jacobian matrix.
Observe that for an embedded ridge function, by setting
(15) 
we can recover the final line of (11). Thus, the embedded ridge approximation can be constructed by calculating the vectorvalued covariance matrix of the discretized weighted scalar field. Note that, although this vector gradient covariance matrix coincides with the formulation in [33], our focus is on a weighted average of the underlying field instead of the vectorvalued function in itself.
2.2 Algorithm for ridge computation
Equipped with a scalar field, we can design a procedure for the embedded ridge approximation of a qoi. Here our ridge function has the form
(16) 
In Algorithm 1, we identify sets of ridge directions—one for every component of the vector . We then fit a ridge profile to obtain a ridge approximation that we use for computing the scalarvalued covariance matrix for .
So, given a limited number of samples of the scalar field , how accurate is this algorithm? In what follows, we study the error on the gradient covariance matrix via the estimate computed in (LABEL:equ:MC_Ch). We establish a bound on the expected norm difference with the matrix Bernstein inequality [29]. We model the accuracy with a quantity dependent on .^{4}^{4}4A better measure of this error can be defined using the more general subspace distance (28), which is basisagnostic. However, it can be shown, in a similar manner to Lemma 1, that a basis can be found such that a small subspace distance is equivalent to the present bound in this section. For ease in exposition, we also assume that the component ridge dimension is constant in space, so for all .
Theorem 1
Assume that for all ,
(21) 
for all , and
Then,
where .
Proof
See Appendix A.
There are several important remarks to make regarding (21). It should be clear that the number of samples required scales as a function of instead of . In Section 4 we provide numerical studies that illustrate this. In addition, we see that the expected norm error varies linearly with , which depends on the quadrature weights. It is possible to derive a bound related to the subspace error of in Algorithm 1 as a corollary to Theorem 1. This involves steps very similar to Lemma 3.9 and Corollary 3.10 in [4], which are based on Corollary 8.1.11 in [16].
3 Efficient storage of embedded ridge approximations
Scalar field quantities are propagated through a PDE domain nodebynode. It is therefore very likely that neighboring nodes—depending on the overall resolution of the mesh—will have similar, if not identical values of these quantities. More specifically, one can think of these quantities as being strongly correlated with their neighbors. When approximating each component as a ridge function, this correlation can be interpreted as similarity in both the ridge directions and the ridge profile . In this section, we assume that the number of ridge directions for each is constant, and equal to , for simplicity.
3.1 A perturbation bound on the mean squared error
Consider a ridge function . Assuming that we keep the ridge profile constant and only perturb the ridge directions from to , we can write down the Taylor expansion of about :
(22) 
Then, we can write a firstorder approximation to the MSE as:
(23) 
Assuming that the square of the gradient is bounded as and (i.e. inputs are independent and identically distributed), then
(24) 
where the first line is due to the CauchySchwarz inequality applied to . Let , and be the th column of . Then,
(25) 
However, we have . So,
(26) 
where we have used the fact that . So, substituting (25) and (26) into (24), we have:
(27) 
We are not concerned about the matrix and its columns by themselves, but the subspace spanned by the columns. The notion of similarity is expressed as a small subspace distance between the sets of ridge directions for neighboring output components. The subspace distance between two vector spaces and is defined by
(28) 
where is a matrix whose columns span for . We may also write to denote the above. However, this is actually equivalent to the formulation in (27), a result we prove in Appendix B. In terms of the subspace distance, we can state the following.
Theorem 2
Let , and be a perturbation of . Then, if , we can pick such that
(29) 
Theorem 2 implies that the ridge approximation error is bounded by the error in the ridge directions to the first order—i.e. we can approximate one ridge function with another provided the ridge directions and profiles are similar.
3.2 Sparse ridge storage and recovery
Given a priori knowledge of the relationship between neighboring ridge directions, we can avoid storing the ridge directions for all output components. This is useful for recreating the PDEscalar field from the selected nodes. Our proposal is that we retain only a subset of suitably subsampled output nodes (components of the vector ) and recover the remaining nodes from these subsamples.
Our algorithm for compression is detailed in Algorithm 2, where each removed component will be reconstructed by the average of two of its closest neighbors. Given an embedded ridge approximation and the number of components that need to be removed , we iterate through all the nodes and identify two neighbors for each node. These neighbors are identified based on the smallest distance (see (28)) between successive nodes; see steps 6 and 7. In step 7, we require that the closest neighbor must be closer to the candidate removed than the first neighbor in step 6. If this step is not enforced, the average between the neighbors is a poor approximation of the removed candidate. Following this, in step 10, we sort the candidates for removal by considering the sum of the distances of the removal candidates to their two neighbors. Removing a candidate with smaller total distance is prioritized over removing one with larger total distance. From step 11 onwards, we attempt to remove the candidates, according to the order determined in step 10.
The recovery algorithm Algorithm 3 reconstructs the missing components based on the list of nearest neighbors (the output from Algorithm 2). We only consider the case where here, permitting us to easily estimate the missing node’s ridge subspace as a linear combination of the neighboring components.
Both the compression and recovery algorithms presented in this section are greedy algorithms, which may therefore not result in the storage configuration that globally minimizes the distance between the missing components and their neighbors. However, to determine the globally optimal solution requires a combinatorial search over every storage configuration, which is computationally prohibitive.
4 Numerical examples
In this section, we illustrate the embedded ridge approximation approach with an analytical example and a CFD example.
4.1 Analytical example
Consider the function
(30) 
where
defined over the domain where the inputs are independent and have uniform marginals. Note that is an exact ridge function with three ridge directions spanned by the columns of . We draw as random vectors with unit Euclidean norm and compare the recovered ridge directions from the drawn vectors using the subspace distance (see (28)). We use polynomial variable projection (VP) [18] to estimate the ridge directions for each component function and , and then calculate the first three leading eigenvectors of the vector gradient covariance matrix of , setting the weights as , to find an estimate of . This represents our embedded ridge approximation.
For direct ridge approximation, we use VP to estimate this threedimensional subspace directly. We vary the number of observations used for each method and examine the subspace error between the recovered directions and the true directions. We note that the results are binary—we either get a small subspace error from successful recovery or a large subspace error from failure in recovery. Thus, we plot the probability of successful recovery—where the subspace error is below 0.005—across 40 trials in Fig. 1. To achieve successful recovery of from the embedded ridge approximation, we need to be able to successfully recover the ridge directions in each individual function. Interestingly, despite the need to successfully find three sets of ridge directions concurrently, the probability of recovery is still significantly higher for the embedded ridge approximation method.
4.2 Shape design of the NACA0012
We apply the embedded ridge function approximation algorithm (see Algorithm 1) to the shape design of the NACA0012 airfoil. We parameterize the shape deformation of the baseline NACA0012 profile using HicksHenne bump functions around the airfoil, and measure the variation in the surface pressure profile. We fix an entry Mach number of 0.3 (subsonic) and an angle of attack of 1.25°, with freestream temperature and pressure at 273.15 K and 101325 Pa, respectively. We solve for the pressure profile using the compressible Euler flow solver in the open source CFD suite [10]. The coefficients of lift and drag^{7}^{7}7Ignoring skin friction and assuming a unit reference area. are known to be linear functions of the pressure around the airfoil [1, Ch. 1], given by
where the integral is evaluated around the airfoil surface, spatially parameterized by . The input variable contains the HicksHenne bump amplitudes; is the surface normal, the direction perpendicular to the flow, and the direction parallel to the flow (see Fig. 2). In the normalizing factors, is the freestream density, and is the freestream speed. We discretize this problem by considering measurements of pressure around the airfoil, resulting in the vectorvalued function representing the surface pressure profile. Note that the approximation in the second line of both expressions comes not only from the discretization but also from the assumption that is independent of . Under this approximation, the coefficients of lift and drag can then be expressed as linear functions of the components of .
As the flow is entirely subsonic and inviscid, we expect the bumps to have a strongly local influence. Hence, we expect the pressure profile to be an embedded ridge approximation, which motivates the following approach to estimate and :

Using a parsimonious computational strategy—i.e. one that requires fewer observations than fitting a global surrogate—estimate the leading ridge direction for each .

Fit a lowdimensional surrogate using this leading mode for each . That is, we seek
(31) for the th component of . We use univariate orthogonal polynomials for the profiles .

We can compute the elements of the Jacobian via these ridge approximations:
(32) where is the th element of .

Compute the gradient covariance matrix with (LABEL:equ:MC_Ch), from which we can compute the dimensionreducing subspaces and form ridge approximations of the scalar qois (lift and drag).
4.2.1 Results
To benchmark our results, we construct two global quadratic polynomial surrogates in for and respectively. We use these surrogates to estimate their active subspaces, using the strategy detailed in [26], where gradients of the polynomials are used to assemble the scalar gradient covariance matrix. Using a total order basis polynomial (see [2]) this requires approximating 1326 coefficients. We create a ()point design of experiment
with uniformly distributed Monte Carlo samples for
. We use inputoutput pairs—where the outputs are the scalar qois—for estimating these coefficients using least squares. Then we compute the eigendecomposition of the scalar gradient covariance matrices of and , the results of which will be the reference values for subsequent comparison. The first eigenvalue of constitutes 99.9% of the trace, while the sum of the first two eigenvalues of covers 99.5%. Hence, we will focus on computing one leading mode of and two leading modes of .The number of samples we use () is a factor of about 100 larger than for both qois, where is the number of ridge directions we calculate. Hence, we expect errors from the Monte Carlo sampling to be negligible [3]. This statement is confirmed with additional design of experiment points.
We will fit a onedimensional ridge function for each component of the pressure profile, , and use a quadratic ridge profile for each component. To find the ridge directions for each component, we fit a global linear model to each component, and take the ridge direction as the normalized parameters of the linear model [4, Algorithm 1.3]—see Appendix C for further details. This is because the majority of pressure components vary largely linearly with the bump amplitudes. However, this is not the case for the pressure components near the leading edge, where the variation is closer to quadratic, as illustrated in Fig. 2. We study two strategies for refining the estimation of these components: polynomial variable projection (VP) [18], and Minimum Average Variance Estimation (MAVE) [32]. For the optimization loop inside the algorithm for VP (see [18, Algorithm 4.1]), we set the convergence criterion to be when the subspace distance between the ridge directions of the previous and current iterations is smaller than . Our implementation of this algorithm can be found in the Effective Quadratures opensource library [24] (https://www.effectivequadratures.org/). For MAVE, we adapt the R code from Hang and Xia [17]. A brief exposition on the MAVE method is provided in Appendix D. We also consider the case where we use a linear model for all components and do not refine the estimation of components near the leading edge.
In what follows, we compare the embedded ridge approximation approach to the direct ridge approximation approach—calculating the scalar gradient covariance matrix of the qoi directly—with the three methods mentioned above, namely VP, MAVE^{8}^{8}8For embedded ridge approximation, this means linear models refined with VP and MAVE near the leading edge respectively. and the linear model. We examine two different metrics of approximation accuracy:

In Fig. 5, we plot the MSE of the surrogate model fitted using the ridge directions resulting from embedded and direct ridge approximation. The MSE of approximating the qoi with is defined as
(33) where are verification samples drawn independently from data used to train the response surfaces. The ridge approximation of evaluated at is denoted , and is the sample variance of across all verification samples.
It is shown that using embedded ridge approximation reduces the subspace error and MSE compared to direct estimation when the number of samples is limited. The errors for embedded VP and MAVE approximately reach convergence in 300 observations for one mode, and 400 observations for two modes. Although the linear models (in both the direct and embedded cases) suffice for estimating (Fig. 3), for functions with stronger nonlinear dependencies such as , the linear model is shown to have a larger error compared to VP and MAVE; see the lefthand plot in Fig. 4. We also note that the use of embedded ridge approximation permits us to extend the capability of linear models to estimate more than one mode in the scalar qois.
When determining the number of observations required, we may also be interested in quantifying the accuracy of the response surfaces fitted to each pressure component. We measure the response surface error via the average of a normalized MSE metric,
(34) 
where the notation is similar to that in (33).
In Fig. 6 we plot the MSE for embedded ridge approximation using VP, MAVE and the linear model. The reference level refers to the case with the global quadratic model with 2000 observations on each component. These results verify that sufficient convergence is achieved with 400 samples if we use VP or MAVE. Moreover, the linear model has a larger error than VP and MAVE as convergence is reached, as it models the quadratic components poorly.
We plot the estimated leading mode coefficients of and using embedded ridge approximation in Fig. 7 and Fig. 8. For the embedded VP and MAVE methods, we choose to use 400 observations, based on the convergence in subspace error shown in Fig. 3 and Fig. 4. The main source of error comes from components near the leading edge, where, although a clear onedimensional ridge structure exists, there is a moderate spread of points in the orthogonal directions (see Fig. 2).
4.3 Sparse storage of NACA0012 pressure profile
In this subsection, we demonstrate the application of sparse storage and recovery algorithms from Section 3.2 on the estimation of the pressure profile around the NACA0012 airfoil. From the results of the previous subsection, we know that all pressure components can be wellapproximated with onedimensional ridge functions. Using the ridge approximations from the previous subsections, we examine the efficacy of our storage and recovery algorithms by selecting a subset of the components to store and attempting to recover the missing components from the subsampled ones.
4.3.1 Results
We remove a range of numbers of components using Algorithm 2, up to the maximum number allowed—when all remaining components are neighbours of certain missing components. Then, we reconstruct the missing ridge directions using Algorithm 3. We compare the reconstructed ridge directions with the datum in terms of subspace angles. In addition, we examine the normalized MSE defined in (34).
In Fig. 9, we plot the subspace errors and the MSE , both averaged across all recovered components. Clearly, the linear interpolation method yields almost exact recovery up to 98 removed components, which is about half of the total components.
5 Conclusions
In this paper we introduce the notion of constructing ridge approximations over localized scalar fields, aimed at improved simulationcentric dimension reduction. Our ideas are born from a simple observation: in many PDEbased models, the scalar field at a certain node is only weakly influenced by farfield perturbations. It is more likely to be governed by locally induced perturbations—caused by variations in local boundary conditions or geometry. By interpreting global scalar qois as integrands of these scalarfield quantities, we hypothesize and demonstrate that constructing ridge approximations over individual scalar field nodes instead and then integrating them can reduce the number of computational evaluations required.
Acknowledgements
CYW acknowledges financial support from the Cambridge Trust, Jesus College, Cambridge, and the Alan Turing Institute. PS was funded through a RollsRoyce postdoctoral fellowship.
Appendix A Proof of Theorem 1
By the matrix Bernstein inequality, it can be shown that (21) implies that [29]
(35) 
We can then write
where
Note that
(36) 
because are identically distributed copies of and whose norms are upper bounded by . Using the triangle inequality and submultiplicativity of the norm, we get
(37) 
from which the theorem follows.
Appendix B Subspace distance and inner products between columns
In Section 3.1 we assert that a bound on the inner product between columns is equivalent to a bound on the subspace distance between the column spaces of two matrices. We provide a justification in this appendix.
Lemma 1
Let be an dimensional subspace of , and be an equidimensional perturbation of . Then, is bounded above if and only if there exists with orthonormal columns such that is bounded below for all , where is the th column of (and similarly for perturbed quantities).
Proof
() Choose and to be a set of principal vectors of and respectively. Then, by construction, we have
(38) 
where is the th principal angle, with . Thus, is bounded below by , the cosine of the largest principal angle. The result follows from the fact that [16, Section 6.4.3].
() Note that
(39) 
for any whose columns define orthonormal bases for and respectively. That is, the distance does not depend on the basis chosen for the orthogonal projector. Then, we can write
(40) 
Hence, if then
(41) 
Appendix C Global linear models
The coefficients of a global linear model of a qoi give a rough estimation of the leading dimensionreducing subspace direction. If the qoi varies largely linearly in the input domain, and it is known that the dimensionreducing subspace is onedimensional, this heuristic will find the required subspace at a low cost. This method is described with Algorithm 1.3 in
[4], which we reproduce below.
Draw input/output pairs where with a certain oversampling .

Solve the following least squares problem
(42) where the th row of is and .

Take the leading ridge direction to be .
Appendix D Mave
The Minimum Average Variance Estimation (MAVE) method for sufficient dimension reduction is proposed by Xia et al. [32]. Given input/output pairs , it aims to find the dimensionreducing subspace spanned by columns of , which is the solution to the following optimization over matrices with orthogonal columns:
(43) 
The firstorder Taylor expansion about an input of the expectation within the parentheses is
(44) 
Hence, the MAVE procedure minimizes the following approximation to (43) as an alternating weighted least squares problem:
(45) 
where the weights are determined through a normalized kernel function evaluated at , namely
(46) 
The procedure iterates with

fixing and optimizing with respect to , and

fixing and optimizing with respect to .
The minimizer for both steps can be expressed analytically, with details in, e.g., [20, Ch. 11].
References
 [1] J. D. Anderson, Fundamentals of Aerodynamics, McGrawHill Education, New York, 5th ed., Feb. 2010.
 [2] G. Blatman and B. Sudret, Adaptive sparse polynomial chaos expansion based on least angle regression, Journal of Computational Physics, 230 (2011), pp. 2345–2367, https://doi.org/10.1016/j.jcp.2010.12.021.
 [3] P. Constantine and D. Gleich, Computing active subspaces with Monte Carlo, arXiv:1408.0545 [math], (2014), http://arxiv.org/abs/1408.0545 (accessed 20181101).
 [4] P. G. Constantine, Active Subspaces: Emerging Ideas for Dimension Reduction in Parameter Studies, Society for Industrial and Applied Mathematics, Mar. 2015, http://dl.acm.org/citation.cfm?id=2815531 (accessed 20181119).
 [5] P. G. Constantine, E. Dow, and Q. Wang, Active Subspace Methods in Theory and Practice: Applications to Kriging Surfaces, SIAM Journal on Scientific Computing, 36 (2014), pp. A1500–A1524, https://doi.org/10.1137/130916138.
 [6] P. G. Constantine, A. Eftekhari, and M. B. Wakin, Computing active subspaces efficiently with gradient sketching, in 2015 IEEE 6th International Workshop on Computational Advances in MultiSensor Adaptive Processing (CAMSAP), Dec. 2015, pp. 353–356, https://doi.org/10.1109/CAMSAP.2015.7383809.
 [7] R. D. Cook, Regression Graphics: Ideas for Studying Regressions Through Graphics, WileyInterscience, New York, 1st ed., Sept. 1998.
 [8] R. D. Cook and S. Weisberg, Sliced Inverse Regression for Dimension Reduction: Comment, Journal of the American Statistical Association, 86 (1991), pp. 328–332, https://doi.org/10.2307/2290564.
 [9] J. D. Denton, Loss Mechanisms in Turbomachines, in ASME 1993 International Gas Turbine and Aeroengine Congress and Exposition, vol. 2, Cincinnati, OH, May 1993, ASME, p. V002T14A001, https://doi.org/10.1115/93GT435.
 [10] T. D. Economon, F. Palacios, S. R. Copeland, T. W. Lukaczyk, and J. J. Alonso, SU2: An OpenSource Suite for Multiphysics Simulation and Design, AIAA Journal, 54 (2016), pp. 828–846, https://doi.org/10.2514/1.J053813.
 [11] A. Eftekhari, M. B. Wakin, P. Li, and P. G. Constantine, Learning the SecondMoment Matrix of a Smooth Function From Random Point Samples, arXiv:1612.06339 [cs, math], (2016), http://arxiv.org/abs/1612.06339 (accessed 20190109).
 [12] M. Fornasier, K. Schnass, and J. Vybiral, Learning Functions of Few Arbitrary Linear Parameters in High Dimensions, Foundations of Computational Mathematics, 12 (2012), pp. 229–262, https://doi.org/10.1007/s102080129115y.
 [13] A. Glaws and P. G. Constantine, A LanczosStieltjes method for onedimensional ridge function approximation and integration, arXiv:1808.02095 [math], (2018), http://arxiv.org/abs/1808.02095 (accessed 20181219).
 [14] A. T. Glaws, P. G. Constantine, and R. D. Cook, Inverse regression for ridge recovery: A datadriven approach for parameter reduction in computer experiments, arXiv:1702.02227 [math], (2017), http://arxiv.org/abs/1702.02227 (accessed 20181220).
 [15] G. Golub and V. Pereyra, Separable nonlinear least squares: the variable projection method and its applications, Inverse Problems, 19 (2003), pp. R1–R26, https://doi.org/10.1088/02665611/19/2/201.
 [16] G. H. Golub and C. F. Van Loan, Matrix Computations, Johns Hopkins University Press, Baltimore, MD, 4th ed., Apr. 2013.
 [17] W. Hang and Y. Xia, MAVE: Methods for Dimension Reduction, 2018, https://CRAN.Rproject.org/package=MAVE.
 [18] J. M. Hokanson and P. G. Constantine, Datadriven polynomial ridge approximation using variable projection, arXiv:1702.05859 [math], (2017), http://arxiv.org/abs/1702.05859 (accessed 20181031).
 [19] R. Lam, O. Zahm, Y. Marzouk, and K. Willcox, Multifidelity Dimension Reduction via Active Subspaces, arXiv:1809.05567 [math], (2018), http://arxiv.org/abs/1809.05567 (accessed 20190124).
 [20] B. Li, Sufficient Dimension Reduction: Methods and Applications with R, Chapman and Hall/CRC, Boca Raton, FL, 1st ed., May 2018.
 [21] K.C. Li, Sliced Inverse Regression for Dimension Reduction, Journal of the American Statistical Association, 86 (1991), pp. 316–327, https://doi.org/10.2307/2290563.
 [22] A. Pinkus, Ridge Functions, Cambridge University Press, Cambridge, 1st ed., Aug. 2015.
 [23] A. M. Samarov, Exploring Regression Structure Using Nonparametric Functional Estimation, Journal of the American Statistical Association, 88 (1993), pp. 836–847, https://doi.org/10.2307/2290772.
 [24] P. Seshadri and G. Parks, EffectiveQuadratures (EQ): Polynomials for Computational Engineering Studies, The Journal of Open Source Software, 2 (2017), https://doi.org/10.21105/joss.00166.
 [25] P. Seshadri, G. T. Parks, and S. Shahpar, Leakage Uncertainties in Compressors: The Case of Rotor 37, Journal of Propulsion and Power, 31 (2015), pp. 456–466, https://doi.org/10.2514/1.B35039.
 [26] P. Seshadri, S. Shahpar, P. Constantine, G. Parks, and M. Adams, Turbomachinery active subspace performance maps, Journal of Turbomachinery, 140 (2018), p. 041003, https://doi.org/10.1115/1.4038839.
 [27] P. Seshadri, S. Yuchi, and G. T. Parks, Dimension Reduction via Gaussian Ridge Functions, arXiv:1802.00515 [math, stat], (2018), http://arxiv.org/abs/1802.00515 (accessed 20190117).
 [28] C. T. Sun, Mechanics of Aircraft Structures, John Wiley & Sons, Hoboken, N.J, 2nd ed., June 2006.
 [29] J. A. Tropp, UserFriendly Tail Bounds for Sums of Random Matrices, Foundations of Computational Mathematics, 12 (2012), pp. 389–434, https://doi.org/10.1007/s102080119099z.
 [30] H. Tyagi and V. Cevher, Learning nonparametric basis independent models from point queries via lowrank methods, Applied and Computational Harmonic Analysis, 37 (2014), pp. 389–412, https://doi.org/10.1016/j.acha.2014.01.002.
 [31] J. A. S. Witteveen, A. Doostan, R. Peŭnik, and G. Iaccarino, Uncertainty quantification of the transonic flow around the RAE 2822 airfoil, Stanford University Centre of Turbulence Research, Annual Research Briefs, (2009), p. 12.
 [32] Y. Xia, H. Tong, W. K. Li, and L.X. Zhu, An adaptive estimation of dimension reduction space, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64 (2002), pp. 363–410, https://doi.org/10.1111/14679868.03411.
 [33] O. Zahm, P. Constantine, C. Prieur, and Y. Marzouk, Gradientbased dimension reduction of multivariate vectorvalued functions, arXiv:1801.07922 [math], (2018), http://arxiv.org/abs/1801.07922 (accessed 20181112).
Comments
There are no comments yet.