# Embedded Ridge Approximations: Constructing Ridge Approximations Over Localized Scalar Fields For Improved Simulation-Centric Dimension Reduction

Many quantities of interest (qois) arising from differential-equation-centric models can be resolved into functions of scalar fields. Examples of such qois include the lift over an airfoil or the displacement of a loaded structure; examples of corresponding fields are the static pressure field in a computational fluid dynamics solution, and the strain field in the finite element elasticity analysis. These scalar fields are evaluated at each node within a discretized computational domain. In certain scenarios, the field at a certain node is only weakly influenced by far-field perturbations; it is likely to be strongly governed by local perturbations, which in turn can be caused by uncertainties in the geometry. One can interpret this as a strong anisotropy of the field with respect to uncertainties in prescribed inputs. We exploit this notion of localized scalar-field influence for approximating global qois, which often are integrals of certain field quantities. We formalize our ideas by assigning ridge approximations for the field at select nodes. This embedded ridge approximation has favorable theoretical properties for approximating a global qoi in terms of the reduced number of computational evaluations required. Parallels are drawn between our proposed approach, active subspaces and vector-valued dimension reduction. Additionally, we study the ridge directions of adjacent nodes and devise algorithms that can recover field quantities at selected nodes, when storing the ridge profiles at a subset of nodes---paving the way for novel reduced order modeling strategies. Our paper offers analytical and simulation-based examples that expose different facets of embedded ridge approximations.

## Authors

• 4 publications
• 13 publications
• 8 publications
• 47 publications
02/01/2018

### Dimension Reduction via Gaussian Ridge Functions

Ridge functions have recently emerged as a powerful set of ideas for sub...
07/18/2019

### Extremum Global Sensitivity Analysis with Least Squares Polynomials and their Ridges

Global sensitivity analysis is a powerful set of ideas and computational...
07/01/2019

### Mean Dimension of Ridge Functions

We consider the mean dimension of some ridge functions of spherical Gaus...
07/09/2021

### Linear/Ridge expansions: Enhancing linear approximations by ridge functions

We consider approximations formed by the sum of a linear combination of ...
07/16/2017

### A Streamline Selection Technique Overlaying with Isosurfaces

Integration of scalar and vector visualization has been an interesting t...
10/16/2018

### Subdivision Directional Fields

We present a novel linear subdivision scheme for face-based tangent dire...
09/14/2021

### Approximation of Curve-based Sleeve Functions in High Dimensions

Sleeve functions are generalizations of the well-established ridge funct...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The governing physics in many engineering problems are described by a system of partial differential equations (PDEs). These equations can be solved by suitable discretization methods such as finite element and finite volume methods, where scalar fields—e.g. pressures, temperatures, strains—are computed at each node over the PDE domain. One can interpret these scalar fields as vector-valued functions, conditioned upon certain boundary conditions and geometry parameters. Here each value of the vector corresponds to the scalar field quantity at a specific node of the domain. Integrands of these scalar field variables typically constitute output qois in uncertainty quantification studies. For instance, when propagating uncertainties in the Mach number and angle of attack of flow over an airfoil, one is interested in quantifying the moments of the lift and drag coefficients

[31]; both lift and drag coefficients are surface integrals of the static pressure field around the airfoil [1, Ch. 1]. In studying the impact of uncertainties in leakage flows in a compressor [25], one is interested in quantifying the moments of isentropic efficiency, which can be expressed as an integral of the pressure and temperature ratios [9, p. 7]. Qois that are integrals of such scalar fields are prevalent well beyond computational fluid dynamics (CFD). For instance, the displacement of a structure is the integral of the strain field [28, sec. 2.1], which allows us to analyze linear elastic displacement problems in areas including soil mechanics [2, sec. 6.4] and machine component design [19, sec. 5.2] using the strain field.

Uncertainty quantification studies typically require a design of experiment, where the governing PDE model is evaluated under different inputs. The number of times the model is evaluated depends on the dimensionality associated with the input space and the non-linearity of the scalar qois. In this paper, we explore a deviation from this paradigm. Rather than storing scalar qois for each model evaluation, we explore whether one can reduce the number of model evaluations if one stores select scalar fields. More specifically, we want to design emulators for the scalar fields themselves and then integrate the emulators to obtain the desired qois. To abate the widely known curse of dimensionality, we will draw ideas from ridge functions [22] for constructing these emulators.

So, why should this reduce the number of model evaluations? Consider the following examples. Over an airfoil in subsonic flow, the static pressure of a point near the leading edge is unlikely to be strongly affected by perturbations far downstream. In a similar vein, the local deflection of a structure is unlikely to be affected by small local changes in elastic properties far away from the point of measurement. As the number of effective degrees of freedom is low, one should be able to design algorithms to exploit the intrinsic sparsity of scalar field quantities—more so than their synthesized (integrated) scalar qoi counterparts. This trait manifests as an approximate containment of the variation of these scalar field quantities within a subspace of the input domain, motivating the idea of a

ridge approximation of a scalar field quantity.

A function whose variation is entirely contained within a subspace described by ran(), where has orthogonal columns, is called a generalized ridge function [22]. That is, it can be expressed as

 f(x)=g(WTx). (1)

In this paper, we will refer to these functions simply as ridge functions for brevity. Many physical qois are characterized by anisotropy in the input domain—i.e. they vary strongly only within a subspace—such as the output components described in the previous paragraph. These qois can be well-approximated by ridge functions, and the process of finding these ridge functions is known as ridge approximation. Namely, we find with orthogonal columns and such that

 f(x)≈g(WTx), (2)

where the approximation can be formulated by minimizing the integral of the mean squared error (MSE) over the input domain [5, sec. 3.1]. When given a ridge approximation, significant computational run-time savings can be achieved by working with as an emulator for if , where we have effectively reduced the dimension of our problem from to . We refer to the span of the columns of as ridge directions, and as the ridge profile. As takes the -dimensional input to an -dimensional projection, we also call the column space of the dimension-reducing subspace.

Numerous methods for ridge approximation have been proposed in the literature. Central to this paper are strategies based on analysis of the average outer product of the gradient with itself, which we will call the gradient covariance matrix of . Assuming that is Lipschitz continuous with bounded first derivatives, this matrix is defined as

 C(f)=∫D∇f(x)∇f(x)Tρ(x)dx=E[∇f(x)∇f(x)T], (3)

where is the input domain, and is a strictly positive function that integrates to unity within . We assume that all entries in this matrix are finite. Interpreting

as a probability density function (PDF) over the input domain, we can replace the integral by the expectation over

, which we treat as a random variable

111In the following, unless specified otherwise, all expectations are computed with respect to .. Samarov [23] and Constantine et al. [4, 5]

analyze the eigendecomposition of this matrix and show that, if the variation of the function outside the span of eigenvectors corresponding to large eigenvalues is small, then a suitable choice for the ridge directions are the leading eigenvectors of the gradient covariance matrix. The subspace spanned by these leading eigenvectors is termed the

active subspace of by Constantine et al. [5]. In a recent paper by Zahm and co-authors [33], an extension to vector-valued functions is proposed. In Section 2 we will explore the properties of their vector gradient covariance matrix.

In the absence of gradient information, other methods of ridge approximation have been proposed. Fornasier et al. [12] and Tyagi et al. [30] describe methods to recover low-dimensional ridge structures with finite differencing through compressed sensing and low-rank recovery, respectively. Hokanson et al. [18] describe an algorithm to form a polynomial ridge approximation from fewer data than required for a full response surface by solving a non-linear least squares problem with variable projection [15]. Constantine et al. [6] and Eftekhari et al. [11] describe methods to estimate the ridge directions based on finite differences. In addition, Glaws et al. [14] draw an analogy between ridge recovery and sufficient dimension reduction (SDR). In SDR, the goal is to find the minimal subspace, described by the column span of a matrix , with which we can establish conditional independence between a set of covariates and the response in a regression setting. That is, we seek a subspace described by such that given . This central subspace is intimately linked to the ridge directions [14, thm. 2], which implies that regression-based methods within the study of SDR can be applied to ridge approximation. Examples of such methods include both inverse regression [21, 8] and forward regression222Given a set of predictor/response pairs , forward regression aims to estimate statistics of the distribution of the response given covariates (), while inverse regression aims to characterize . [32].

Given the ridge directions , the ridge profile that minimizes the mean squared approximation error over can be shown analytically to be [5, 33]

 g(WTx)=E[f(x)|WTx]. (4)

Several approaches for approximating this average have been proposed, including the use of Gaussian processes [27] and multivariate orthogonal polynomials [13].

In this paper, we build upon these ideas and introduce an embedded ridge approximation to connect the ridge approximation of desired qois with ridge approximations of their constituent scalar fields. Loosely stated, an embedded ridge approximation is a vector-valued function, where each component is a ridge approximation—computed for scalar fields at select nodes within the computational domain. We show that it is possible to reduce the number of model evaluations for computing the dimension-reducing subspace of a qoi, by exploiting these embedded ridge approximations.

The rest of the paper is structured as follows: in Section 2, we describe the concept of an embedded ridge function, and provide an algorithm supported by theoretical analysis that leverages this embedded structure to find the dimension-reducing subspace of a scalar qoi based on an underlying spatial field. In Section 3, we leverage the similarity between neighbouring output components to motivate a method of efficiently storing the array of ridge directions associated with each output component, and provide some algorithms for this process. In Section 4, we use analytical and numerical examples to illustrate the algorithms proposed in this paper.

Notation. We denote the approximation of a quantity with a hat. For instance, a finite-sample estimate of a matrix is denoted as and a response surface for fitted with finitely many samples is denoted . A general perturbation of a quantity is denoted with a tilde. For instance, a perturbation of is denoted .

## 2 Embedded ridge approximation

Consider the scalar field where parameterizes our model of interest and is a scalar that denotes the spatial variation of . We place the following assumptions on this field:

1. the input space is endowed with a probability density ;

2. the field is square integrable with respect to the probability density and Lipschitz continuous with bounded and square integrable first partial derivatives with respect to for all .

An example of is the pressure within a computational domain—characterized by geometry parameters or boundary conditions and a probe location . Our goal is to study dimension-reducing subspaces induced by the function

 h(x)=∫Dω(s)f(x,s)ds, (5)

where is a weight function. In other words, represents a weighted average of the scalar field , and is the relevant qoi. If is the pressure distribution, then one can think of as the lift coefficient or drag coefficient, for instance. Assuming that the partial derivative of the integrand is bounded independent of and , we can write

 ∇xh=∫Dω(s)∇xf(x,s)ds. (6)

Now, let us assume that can be approximated by a ridge function of the form

 f(x,s)≈g(WTsx), (7)

where , where identifies the dimension of the subspace spanned by the columns of . Note that , and will, in general, depend on . The gradient of is then given by

 ∇xf(x,s)≈∂gs(WTsx)∂x=Ws∇gs(WTsx), (8)

noting that . Thus,

 ∇xh≈∫Dω(s)Ws∇gs(WTsx)ds. (9)

In practice, we can approximate this gradient with an -point quadrature rule , with quadrature points and quadrature weights , from which we arrive at the approximation

 ∇xh≈N∑i=1ωiWsi∇gsi(WTsix),=N∑i=1ωiWi∇gi(WTix), (10)

where we have expressed as and as for notational convenience. Then, from (3) we can approximate the scalar covariance matrix of as

 C(h):=E[∇xh∇xhT]≈E⎡⎣(N∑i=1ωiWi∇gi(WTix))(N∑j=1ωjWj∇gj(WTjx))T⎤⎦=N∑i=1N∑j=1ωiωjWiE[∇gi∇gTj]WTj. (11)

The fact that (11) is expressed in terms of the ridge parameters and is noteworthy. Given , we can find the dimension-reducing subspace of simply via an eigendecomposition, and this equation informs us that it is possible to shift the computational burden from evaluating the gradients to the estimation of the ridge parameters and . In the absence of automatic differentiation or adjoint solvers, the former may require finite differences or the use of a surrogate model, whose cost of formulation suffers from the curse of dimensionality. On the other hand, at a fixed location the dependence of on is likely to be highly anisotropic, depending mainly on the parameters that impact nodes adjacent to . This implies that the number of ridge directions at each location is likely to be small, and will be a low-dimensional function. For many methods of ridge approximation, this means that the amount of simulation data required for this step is reduced for a given approximation accuracy. This brings us to the central idea of this paper: instead of directly calculating the dimension-reducing subspace of , we leverage the decomposition of into its composite scalar field, whose dimension-reducing subspaces are likely to be inexpensive to compute. Then, from these component subspaces, we assemble the scalar gradient covariance of .

### 2.1 Interpretation as vector-valued dimension reduction

We assume that there exists a set of quadrature points (domain nodes) to evaluate the scalar field that allows us to formulate the ridge approximation (7) easily; we argue that reducing the dimensionality of the qoi is facilitated by the formulation (11). We can treat the scalar field evaluated at the prescribed positions as a vector-valued function , whose components exhibit ridge approximations.

###### Definition 1 (Embedded ridge approximation)

Let be a vector-valued function with components , where each . Such a function is called an embedded ridge approximation if it satisfies

 f(x)=⎡⎢ ⎢⎣f1(x)⋮fN(x)⎤⎥ ⎥⎦≈⎡⎢ ⎢ ⎢ ⎢ ⎢⎣g1(WT1x)⋮gN(WTNx)⎤⎥ ⎥ ⎥ ⎥ ⎥⎦, (12)

where the -th element of the vector, , is well-approximated by a ridge function of the form , for . Here all subspaces have the same number of rows , but may have different numbers of columns . It is assumed that the input variables are independent with respect to the input PDF .

There are parallels between an embedded ridge approximation and vector-valued dimension reduction. In [33], the authors introduce a vector gradient covariance matrix, analogous to the scalar form given in (3).

###### Definition 2 (Vector gradient covariance matrix)

We define the vector gradient covariance matrix as

 H(f)=E[J(x)RJ(x)T]=∫DJ(x)RJ(x)Tρ(x)dx, (13)

where is a symmetric positive semi-definite matrix of weights, and

 J(x)=[∂f1∂x,…,∂fN∂x], (14)

where is the Jacobian matrix.

Observe that for an embedded ridge function, by setting

 R=ωωT,whereω=(ω1,ω2,…,ωN)T, (15)

we can recover the final line of (11). Thus, the embedded ridge approximation can be constructed by calculating the vector-valued covariance matrix of the discretized weighted scalar field. Note that, although this vector gradient covariance matrix coincides with the formulation in [33], our focus is on a weighted average of the underlying field instead of the vector-valued function in itself.

### 2.2 Algorithm for ridge computation

Equipped with a scalar field, we can design a procedure for the embedded ridge approximation of a qoi. Here our ridge function has the form

 h(x)=∫Dω(s)f(x,s)ds≈ωTf(x). (16)

In Algorithm 1, we identify sets of ridge directions—one for every component of the vector . We then fit a ridge profile to obtain a ridge approximation that we use for computing the scalar-valued covariance matrix for .

So, given a limited number of samples of the scalar field , how accurate is this algorithm? In what follows, we study the error on the gradient covariance matrix via the estimate computed in (LABEL:equ:MC_Ch). We establish a bound on the expected norm difference with the matrix Bernstein inequality [29]. We model the accuracy with a quantity dependent on .444A better measure of this error can be defined using the more general subspace distance (28), which is basis-agnostic. However, it can be shown, in a similar manner to Lemma 1, that a basis can be found such that a small subspace distance is equivalent to the present bound in this section. For ease in exposition, we also assume that the component ridge dimension is constant in space, so for all .

###### Theorem 1

Assume that for all ,

 M≥2L2log(2r)ϵ2∥∥E[∇gi∇gTj]∥∥2, (21)

for all , and

 ∥∥^Wi−Wi∥∥2≤ηM.

Then,

where .

###### Proof

See Appendix A.

There are several important remarks to make regarding (21). It should be clear that the number of samples required scales as a function of instead of . In Section 4 we provide numerical studies that illustrate this. In addition, we see that the expected norm error varies linearly with , which depends on the quadrature weights. It is possible to derive a bound related to the subspace error of in Algorithm 1 as a corollary to Theorem 1. This involves steps very similar to Lemma 3.9 and Corollary 3.10 in [4], which are based on Corollary 8.1.11 in [16].

## 3 Efficient storage of embedded ridge approximations

Scalar field quantities are propagated through a PDE domain node-by-node. It is therefore very likely that neighboring nodes—depending on the overall resolution of the mesh—will have similar, if not identical values of these quantities. More specifically, one can think of these quantities as being strongly correlated with their neighbors. When approximating each component as a ridge function, this correlation can be interpreted as similarity in both the ridge directions and the ridge profile . In this section, we assume that the number of ridge directions for each is constant, and equal to , for simplicity.

### 3.1 A perturbation bound on the mean squared error

Consider a ridge function . Assuming that we keep the ridge profile constant and only perturb the ridge directions from to , we can write down the Taylor expansion of about :

 g(~WTx)=g(WTx)+((~W−W)Tx)T∇g(u)|u=WTx∇ug(WTx) + o((~W−W)Tx). (22)

Then, we can write a first-order approximation to the MSE as:

 E[(g(~WTx)−g(WTx))2]≈ϵ=E[(xT(~W−W)∇ug(WTx))2] (23)

Assuming that the square of the gradient is bounded as and (i.e. inputs are independent and identically distributed), then

 (24)

where the first line is due to the Cauchy-Schwarz inequality applied to . Let , and be the -th column of . Then,

 η=E[yTy]=r∑i=1aTiE[xxT]ai=σ2xr∑i=1aTiai. (25)

However, we have . So,

 aTiai=(~wi−wi)T(~wi−wi)=2−2~wTiwi, (26)

where we have used the fact that . So, substituting (25) and (26) into (24), we have:

 ϵ≤G2σ2xr∑i=1(2−2~wTiwi). (27)

We are not concerned about the matrix and its columns by themselves, but the subspace spanned by the columns. The notion of similarity is expressed as a small subspace distance between the sets of ridge directions for neighboring output components. The subspace distance between two vector spaces and is defined by

 dist(S1,S2)=∥∥W1WT1−W2WT2∥∥2, (28)

where is a matrix whose columns span for . We may also write to denote the above. However, this is actually equivalent to the formulation in (27), a result we prove in Appendix B. In terms of the subspace distance, we can state the following.

###### Theorem 2

Let , and be a perturbation of . Then, if , we can pick such that

 ϵ≤G2σ2xr∑i=1(2−2cos(θr)). (29)

###### Proof

This follows from Lemma 1 and (27).

Theorem 2 implies that the ridge approximation error is bounded by the error in the ridge directions to the first order—i.e. we can approximate one ridge function with another provided the ridge directions and profiles are similar.

### 3.2 Sparse ridge storage and recovery

Given a priori knowledge of the relationship between neighboring ridge directions, we can avoid storing the ridge directions for all output components. This is useful for re-creating the PDE-scalar field from the selected nodes. Our proposal is that we retain only a subset of suitably subsampled output nodes (components of the vector ) and recover the remaining nodes from these subsamples.

Our algorithm for compression is detailed in Algorithm 2, where each removed component will be reconstructed by the average of two of its closest neighbors. Given an embedded ridge approximation and the number of components that need to be removed , we iterate through all the nodes and identify two neighbors for each node. These neighbors are identified based on the smallest distance (see (28)) between successive nodes; see steps 6 and 7. In step 7, we require that the closest neighbor must be closer to the candidate removed than the first neighbor in step 6. If this step is not enforced, the average between the neighbors is a poor approximation of the removed candidate. Following this, in step 10, we sort the candidates for removal by considering the sum of the distances of the removal candidates to their two neighbors. Removing a candidate with smaller total distance is prioritized over removing one with larger total distance. From step 11 onwards, we attempt to remove the candidates, according to the order determined in step 10.

The recovery algorithm Algorithm 3 reconstructs the missing components based on the list of nearest neighbors (the output from Algorithm 2). We only consider the case where here, permitting us to easily estimate the missing node’s ridge subspace as a linear combination of the neighboring components.

Both the compression and recovery algorithms presented in this section are greedy algorithms, which may therefore not result in the storage configuration that globally minimizes the distance between the missing components and their neighbors. However, to determine the globally optimal solution requires a combinatorial search over every storage configuration, which is computationally prohibitive.

## 4 Numerical examples

In this section, we illustrate the embedded ridge approximation approach with an analytical example and a CFD example.

### 4.1 Analytical example

Consider the function

 h(x)=[235]⎡⎢⎣f1(x)f2(x)f3(x)⎤⎥⎦ (30)

where

 f1(x) =(wT1x)2+(wT1x)3, f2(x) =exp(wT2x), f3(x) =sin((wT3x)π),

defined over the domain where the inputs are independent and have uniform marginals. Note that is an exact ridge function with three ridge directions spanned by the columns of . We draw as random vectors with unit Euclidean norm and compare the recovered ridge directions from the drawn vectors using the subspace distance (see (28)). We use polynomial variable projection (VP) [18] to estimate the ridge directions for each component function and , and then calculate the first three leading eigenvectors of the vector gradient covariance matrix of , setting the weights as , to find an estimate of . This represents our embedded ridge approximation.

For direct ridge approximation, we use VP to estimate this three-dimensional subspace directly. We vary the number of observations used for each method and examine the subspace error between the recovered directions and the true directions. We note that the results are binary—we either get a small subspace error from successful recovery or a large subspace error from failure in recovery. Thus, we plot the probability of successful recovery—where the subspace error is below 0.005—across 40 trials in Fig. 1. To achieve successful recovery of from the embedded ridge approximation, we need to be able to successfully recover the ridge directions in each individual function. Interestingly, despite the need to successfully find three sets of ridge directions concurrently, the probability of recovery is still significantly higher for the embedded ridge approximation method.

### 4.2 Shape design of the NACA0012

We apply the embedded ridge function approximation algorithm (see Algorithm 1) to the shape design of the NACA0012 airfoil. We parameterize the shape deformation of the baseline NACA0012 profile using Hicks-Henne bump functions around the airfoil, and measure the variation in the surface pressure profile. We fix an entry Mach number of 0.3 (subsonic) and an angle of attack of 1.25°, with free-stream temperature and pressure at 273.15 K and 101325 Pa, respectively. We solve for the pressure profile using the compressible Euler flow solver in the open source CFD suite [10]. The coefficients of lift and drag777Ignoring skin friction and assuming a unit reference area. are known to be linear functions of the pressure around the airfoil [1, Ch. 1], given by

 Cl =112ρ0v2∞∮p(x)n⋅kds ≈112ρ0v2∞N∑i=1pi(x)ni⋅kΔsi =ωTlp(x),
 Cd =112ρ0v2∞∮p(x)n⋅jds ≈112ρ0v2∞N∑i=1pi(x)ni⋅jΔsi =ωTdp(x),

where the integral is evaluated around the airfoil surface, spatially parameterized by . The input variable contains the Hicks-Henne bump amplitudes; is the surface normal, the direction perpendicular to the flow, and the direction parallel to the flow (see Fig. 2). In the normalizing factors, is the free-stream density, and is the free-stream speed. We discretize this problem by considering measurements of pressure around the airfoil, resulting in the vector-valued function representing the surface pressure profile. Note that the approximation in the second line of both expressions comes not only from the discretization but also from the assumption that is independent of . Under this approximation, the coefficients of lift and drag can then be expressed as linear functions of the components of .

As the flow is entirely subsonic and inviscid, we expect the bumps to have a strongly local influence. Hence, we expect the pressure profile to be an embedded ridge approximation, which motivates the following approach to estimate and :

1. Using a parsimonious computational strategy—i.e. one that requires fewer observations than fitting a global surrogate—estimate the leading ridge direction for each .

2. Fit a low-dimensional surrogate using this leading mode for each . That is, we seek

 pi(x)≈^gi(^wTix), (31)

for the -th component of . We use univariate orthogonal polynomials for the profiles .

3. We can compute the elements of the Jacobian via these ridge approximations:

 ^J(x)ij=^wji^gj(^wTjx), (32)

where is the -th element of .

4. Compute the gradient covariance matrix with (LABEL:equ:MC_Ch), from which we can compute the dimension-reducing subspaces and form ridge approximations of the scalar qois (lift and drag).

#### 4.2.1 Results

To benchmark our results, we construct two global quadratic polynomial surrogates in for and respectively. We use these surrogates to estimate their active subspaces, using the strategy detailed in [26], where gradients of the polynomials are used to assemble the scalar gradient covariance matrix. Using a total order basis polynomial (see [2]) this requires approximating 1326 coefficients. We create a ()-point design of experiment

with uniformly distributed Monte Carlo samples for

. We use input-output pairs—where the outputs are the scalar qois—for estimating these coefficients using least squares. Then we compute the eigendecomposition of the scalar gradient covariance matrices of and , the results of which will be the reference values for subsequent comparison. The first eigenvalue of constitutes 99.9% of the trace, while the sum of the first two eigenvalues of covers 99.5%. Hence, we will focus on computing one leading mode of and two leading modes of .

The number of samples we use () is a factor of about 100 larger than for both qois, where is the number of ridge directions we calculate. Hence, we expect errors from the Monte Carlo sampling to be negligible [3]. This statement is confirmed with additional design of experiment points.

We will fit a one-dimensional ridge function for each component of the pressure profile, , and use a quadratic ridge profile for each component. To find the ridge directions for each component, we fit a global linear model to each component, and take the ridge direction as the normalized parameters of the linear model [4, Algorithm 1.3]—see Appendix C for further details. This is because the majority of pressure components vary largely linearly with the bump amplitudes. However, this is not the case for the pressure components near the leading edge, where the variation is closer to quadratic, as illustrated in Fig. 2. We study two strategies for refining the estimation of these components: polynomial variable projection (VP) [18], and Minimum Average Variance Estimation (MAVE) [32]. For the optimization loop inside the algorithm for VP (see [18, Algorithm 4.1]), we set the convergence criterion to be when the subspace distance between the ridge directions of the previous and current iterations is smaller than . Our implementation of this algorithm can be found in the Effective Quadratures open-source library [24] (https://www.effective-quadratures.org/). For MAVE, we adapt the R code from Hang and Xia [17]. A brief exposition on the MAVE method is provided in Appendix D. We also consider the case where we use a linear model for all components and do not refine the estimation of components near the leading edge.

In what follows, we compare the embedded ridge approximation approach to the direct ridge approximation approach—calculating the scalar gradient covariance matrix of the qoi directly—with the three methods mentioned above, namely VP, MAVE888For embedded ridge approximation, this means linear models refined with VP and MAVE near the leading edge respectively. and the linear model. We examine two different metrics of approximation accuracy:

1. In Fig. 3 and Fig. 4, we plot the subspace distance compared to the modes directly estimated with the aforementioned massively oversampled global quadratic models of and as we vary the number of observations used for VP, MAVE and the linear models.

2. In Fig. 5, we plot the MSE of the surrogate model fitted using the ridge directions resulting from embedded and direct ridge approximation. The MSE of approximating the qoi with is defined as

 ϵh=1M′M′∑j=1(h(y(j))−^h(y(j)))2σ2h, (33)

where are verification samples drawn independently from data used to train the response surfaces. The ridge approximation of evaluated at is denoted , and is the sample variance of across all verification samples.

It is shown that using embedded ridge approximation reduces the subspace error and MSE compared to direct estimation when the number of samples is limited. The errors for embedded VP and MAVE approximately reach convergence in 300 observations for one mode, and 400 observations for two modes. Although the linear models (in both the direct and embedded cases) suffice for estimating (Fig. 3), for functions with stronger non-linear dependencies such as , the linear model is shown to have a larger error compared to VP and MAVE; see the left-hand plot in Fig. 4. We also note that the use of embedded ridge approximation permits us to extend the capability of linear models to estimate more than one mode in the scalar qois.

When determining the number of observations required, we may also be interested in quantifying the accuracy of the response surfaces fitted to each pressure component. We measure the response surface error via the average of a normalized MSE metric,

 ϵR=1NM′N∑i=1M′∑j=1(pi(y(j))−^pi(y(j)))2σ2pi, (34)

where the notation is similar to that in (33).

In Fig. 6 we plot the MSE for embedded ridge approximation using VP, MAVE and the linear model. The reference level refers to the case with the global quadratic model with 2000 observations on each component. These results verify that sufficient convergence is achieved with 400 samples if we use VP or MAVE. Moreover, the linear model has a larger error than VP and MAVE as convergence is reached, as it models the quadratic components poorly.

We plot the estimated leading mode coefficients of and using embedded ridge approximation in Fig. 7 and Fig. 8. For the embedded VP and MAVE methods, we choose to use 400 observations, based on the convergence in subspace error shown in Fig. 3 and Fig. 4. The main source of error comes from components near the leading edge, where, although a clear one-dimensional ridge structure exists, there is a moderate spread of points in the orthogonal directions (see Fig. 2).

### 4.3 Sparse storage of NACA0012 pressure profile

In this subsection, we demonstrate the application of sparse storage and recovery algorithms from Section 3.2 on the estimation of the pressure profile around the NACA0012 airfoil. From the results of the previous subsection, we know that all pressure components can be well-approximated with one-dimensional ridge functions. Using the ridge approximations from the previous subsections, we examine the efficacy of our storage and recovery algorithms by selecting a subset of the components to store and attempting to recover the missing components from the subsampled ones.

#### 4.3.1 Results

We remove a range of numbers of components using Algorithm 2, up to the maximum number allowed—when all remaining components are neighbours of certain missing components. Then, we reconstruct the missing ridge directions using Algorithm 3. We compare the reconstructed ridge directions with the datum in terms of subspace angles. In addition, we examine the normalized MSE defined in (34).

In Fig. 9, we plot the subspace errors and the MSE , both averaged across all recovered components. Clearly, the linear interpolation method yields almost exact recovery up to 98 removed components, which is about half of the total components.

## 5 Conclusions

In this paper we introduce the notion of constructing ridge approximations over localized scalar fields, aimed at improved simulation-centric dimension reduction. Our ideas are born from a simple observation: in many PDE-based models, the scalar field at a certain node is only weakly influenced by far-field perturbations. It is more likely to be governed by locally induced perturbations—caused by variations in local boundary conditions or geometry. By interpreting global scalar qois as integrands of these scalar-field quantities, we hypothesize and demonstrate that constructing ridge approximations over individual scalar field nodes instead and then integrating them can reduce the number of computational evaluations required.

### Acknowledgements

CYW acknowledges financial support from the Cambridge Trust, Jesus College, Cambridge, and the Alan Turing Institute. PS was funded through a Rolls-Royce postdoctoral fellowship.

## Appendix A Proof of Theorem 1

By the matrix Bernstein inequality, it can be shown that (21) implies that [29]

 E∥∥ ∥∥1MM∑m=1∇g(m)i∇g(m)Tj−E[∇gi∇gTj]∥∥ ∥∥2≤(ϵ+ϵ2)∥∥E[∇gi∇gTj]∥∥2, (35)

We can then write

 E∥∥^C(h)−C(h)∥∥2 =E∥∥ ∥∥1MM∑m=1∑ijωiωj^Wi∇g(m)i∇g(m)Tj^WTj−∑ijωiωjWiE[∇gi∇gTj]WTj∥∥ ∥∥2 =E∥∥ ∥∥∑ijωiωjEij∥∥ ∥∥2 ≤∑ij|ωiωj|E∥∥Eij∥∥2,

where

 Eij =1MM∑m=1^Wi∇g(m)i∇g(m)Tj^WTj−WiE[∇gi∇gTj]WTj =1MM∑m=1(^Wi∇g(m)i∇g(m)Tj^WTj−Wi∇g(m)i∇g(m)Tj^WTj) +1MM∑m=1(Wi∇g(m)i∇g(m)Tj^WTj−Wi∇g(m)i∇g(m)TjWTj) +1MM∑m=1(Wi∇g(m)i∇g(m)TjWTj)−WiE[∇gi∇gTj]WTj =(^Wi−Wi)(1MM∑m=1∇g(m)i∇g(m)Tj)^WTj +Wi(1MM∑m=1∇g(m)i∇g(m)Tj)(^WTj−WTj) +Wi(1MM∑m=1∇g(m)i∇g(m)Tj−E[∇gi∇gTj])WTj.

Note that

 E∥∥ ∥∥1MM∑m=1∇g(m)i∇g(m)Tj∥∥ ∥∥2≤1MM∑m=1E∥∥∇g(m)i∇g(m)Tj∥∥2≤L2 (36)

because are identically distributed copies of and whose norms are upper bounded by . Using the triangle inequality and sub-multiplicativity of the norm, we get

 ∑ij|ωiωj|E∥∥Eij∥∥2≤C(2ηML2+(ϵ+ϵ2)L2) (37)

from which the theorem follows.

## Appendix B Subspace distance and inner products between columns

In Section 3.1 we assert that a bound on the inner product between columns is equivalent to a bound on the subspace distance between the column spaces of two matrices. We provide a justification in this appendix.

###### Lemma 1

Let be an -dimensional subspace of , and be an equidimensional perturbation of . Then, is bounded above if and only if there exists with orthonormal columns such that is bounded below for all , where is the -th column of (and similarly for perturbed quantities).

###### Proof

() Choose and to be a set of principal vectors of and respectively. Then, by construction, we have

 wTi~wi=cos(θi), (38)

where is the -th principal angle, with . Thus, is bounded below by , the cosine of the largest principal angle. The result follows from the fact that [16, Section 6.4.3].

() Note that

 dist(S,~S)=∥∥WWT−~W~WT∥∥2, (39)

for any whose columns define orthonormal bases for and respectively. That is, the distance does not depend on the basis chosen for the orthogonal projector. Then, we can write

 ∥∥WWT−~W~WT∥∥2=∥∥ ∥∥r∑i=1wiwTi−~wi~wTi∥∥ ∥∥2≤r∑i=1∥∥wiwTi−~wi~wTi∥∥2=r∑i=1√1−(wTi~wi)2. (40)

Hence, if then

 dist(S,~S)≤r∑i=1√1−(wTi~wi)2≤r∑i=1√1−cos2(θr)=rsin(θr). (41)

## Appendix C Global linear models

The coefficients of a global linear model of a qoi give a rough estimation of the leading dimension-reducing subspace direction. If the qoi varies largely linearly in the input domain, and it is known that the dimension-reducing subspace is one-dimensional, this heuristic will find the required subspace at a low cost. This method is described with Algorithm 1.3 in

[4], which we reproduce below.

1. Draw input/output pairs where with a certain oversampling .

2. Solve the following least squares problem

 minimizec,w∥Xw+c−y∥2, (42)

where the -th row of is and .

3. Take the leading ridge direction to be .

## Appendix D Mave

The Minimum Average Variance Estimation (MAVE) method for sufficient dimension reduction is proposed by Xia et al. [32]. Given input/output pairs , it aims to find the dimension-reducing subspace spanned by columns of , which is the solution to the following optimization over matrices with orthogonal columns:

 minimizeWE[(y−E[y|WTx])2]subject toWTW=I. (43)

The first-order Taylor expansion about an input of the expectation within the parentheses is

 E[y(i)|WTx(i)]≈a0+bT0WT(x(i)−x(0)). (44)

Hence, the MAVE procedure minimizes the following approximation to (43) as an alternating weighted least squares problem:

 minimizeaj,bj,WM∑i=1M∑j=1[y(i)−aj−bTjWT(x(i)−x(j))]2ωijsubject toWTW=I, (45)

where the weights are determined through a normalized kernel function evaluated at , namely

 ωij=Kh(WT(x(i)−x(j)))∑Mk=1Kh(WT(x(k)−x(j))). (46)

The procedure iterates with

• fixing and optimizing with respect to , and

• fixing and optimizing with respect to .

The minimizer for both steps can be expressed analytically, with details in, e.g., [20, Ch. 11].