1 Introduction
In addressing reliability and design optimization challenges in modern engineering and manufacturing processes, the need for an increasing accuracy in the predictive capabilities of the analysis codes has been of paramount importance. Taking into account the presence of aleatoric uncertainties resulting from the severe complexity of the physical system under investigation and the limited knowledge available to the experimenter, simulationbased uncertainty quantification (UQ) tasks can result in computational costs that are unacceptable in practice smith_uq ; lemaitre ; koch_size .
Metamodeling techniques wang_review have gained increasing popularity in recent years, as they provide a way to alleviate the computational burden associated with quantifying uncertainties in computer model outputs and inverse modeling morokoff ; tarantola . The key idea is to replace the expensive computer solver with a surrogate that is cheap to evaluate and at the same time it maintains high predictive accuracy. These metamodels can consist of functional representations of the forward solver such as Polynomial Chaos ghanem_spanos ; lemaitre_knio ; xiu_karniadakis , kernelbased methods cristianini ; cortes
, relevance vector machines
bilionis_rvm ; tsilifis_rvmand neural networks
tripathy_nn ; zhu or they can be nonparametric models such as Gaussian Processes bilionis_mogp ; chen_jcp ; perdikaris_scicomp .Gaussian Process (GP) regression techniques allow us to quantify the epistemic uncertainty associated with the limited number or training data points by providing explicit Bayesian posterior updates on the error bars ohagan ; ohagan_quad
. Recent works have demonstrated their wide applicability for UQ analysis and probabilistic machine learning and have been exhaustively used for predicting solutions to differential equations, including local adaptations for detecting discontinuities
bilionis_local , discovering governing equations raissi_gp and constructing latent variable models for solving inverse problems atkinson. Challenges associated with efficiently training an accurate GP model, include acquiring the training dataset necessary to guarantee the desired predictive performance and tuning the model’s hyperparameters
rasmussen . Collecting the data often requires a significant number of model simulations that increases exponentially as a function of the dimensionality of the input parameters and can easily become prohibitive due to computational budget limitations. On the other hand, training GPs using massive datasets, when available, poses numerical challenges that can result in poor performance. These are mainly associated with repeatedly performing covariance matrix inversions that can be extremely inaccurate and computer memorydemanding. Applying sparse techniques herbrich and batch optimization hensman have only partially managed to address these issues.An alternative approach involves leveraging a cheaper computer solver that provides with lowcost, yet less accurate observations to be used for training the GP, leading to formulations where the observable depends on more that one covariates. Such approaches, although already known to the geostatistics community as cokriging cressie , were formalized within the context of multifidelity simulations in the pioneering work of Kennedy & O’Hagan kennedyohagan . The autoregressive scheme presented therein, decomposes the expensive simulation code as a sum of its cheap approximation and a discrepancy term, both modeled as independent Gaussian Processes. Techniques for training the model have been proposed forrester and involve learning the degree of correlation between the codes, while a model calibration variation of the scheme is also available kennedyohagan_calib . Similarly to standard GP regression in singlefidelity settings, training the autoregressive scheme includes inverting large, illconditioned covariance matrices, leading to numerical inaccuracies. Le Gratiet & Garnier le_gratiet_doe offered an elegant solution to the problem that consists of decoupling the information stemming from different levels of fidelity and thus simplifying the covariance matrix structure. These ideas found wide applicability in discovering high dimensional response surfaces arising from dynamical systems paris_rspa , model inversion paris_rsif , multifidelity Bayesian optimization pang and uncertainty propagation in the bigdata regime paris_siam .
From a different perspective, an efficient strategy for constructing a GP metamodel would involve addressing input dimensionality by means of reducing the number of variables in the input design space. Standard ways of doing so include sensitivity analysis saltelli
and unsupervised learning methods that explore correlations of the input variables, such as PCA
pearson , kernel PCA ma or even truncating KarhunenLoève expansions ghanem_spanos . More sophisticated approaches to dimensionality reduction in the context of GPs include applying embeddings on the input design space in order to obtain a low dimensional latent variable space. Several criteria for discovering linear embeddings have been proposed in the literature. Constantine in his seminal work constantine_AS ; constantine_book , used a gradientbased approach to exploit what is termed Active Subscape (AS). The key step in this approach is to map the input space using a matrix derived by an orthogonal decomposition of the covariance of the gradient vector of the observable quantity. Intuitively the approach is sound as it explores the directions along which the output quantity exhibits most of its variability. The major limitation however is that it is impractical in blackbox simulations where gradient information is hardly available. Gradientfree approaches that learn orthogonal tripathy_bilionis or arbitrary garnett linear mappings onthefly, have also been developed. In these works, the matrices were inferred using MLE or the Laplace approximation of the posterior of the linear mapping, respectively. At last, attempts to reduce the dimensionality using nonlinear mappings have been successfully applied for Bayesian optimization moriconi , where the forward mapping to the latent variables is modeled using a feedforward neural net while the original input reconstruction mapping is carried out using multioutput GPS.The ultimate goal in this work is to combine dimensionality reduction principles with multifidelity simulations so as to manifest their benefits in a single context. We seek to simultaneously address numerical issues and computational limitations arising in the presence of high input dimensionality and expensive data acquisition procedures. We thus present a unified framework where the classic autoregressive Gaussian Process (ARGP) scheme can be built on a linearly embedded subspace that can be learned while training the model. To the authors best knowledge, such an approach is presented for the first time. Specifically, we build on the KennedyO’Hagan autoregessive scheme for modeling expensive computer outputs using a multifidelity source of information. Regardless of the quality of their output as an approximation of the high fidelity solver, we assume that the different fidelity codes are highly correlated and thus they exhibit similar dependence on the input variables, thus intuitively it is natural to assume that a common linear embedding can be applied to all GPs at different levels of fidelity. A training procedure for tuning the hyperparameters of the model is presented where the ARGP parameters are estimated using either MLE or Bayesian methods and the orthogonal embedding is inferred using a fully Bayesian approach that relies on the Geodesic Monte Carlo algorithm
byrne that has been particularly tailored for sampling orthogonal matrices tsilifis_rspa . Such an approach to learn linear embeddings is novel in the context of dimensionality reduction for GPs and clearly prevails previously presented techniques tripathy_bilionis ; garnettin that the full posterior of the orthogonal matrix can be exploited as opposed to a Laplace approximations while it relieves us of the computational burden caused by highdimensional gradientbased optimization of the likelihood.
We structure this paper as follows: Section 2.1 presents the basic elements of the classic autoregressive Gaussian Process scheme, Sec. 2.2 presents the prior setting and predictive distributions and Sec. 2.3 discusses the model training approach. Section 3.1 then formulates the ARGP model defined on a linearly embedded subspace using an orthogonal projection, that is trained using the proposed algorithm presented in Sec. 3.2. Our numerical examples include two toy problems with known embeddings that are learned using observations from three levels of fidelity (Sec. 4.1 & 4.2) and a challenging threedimensional airfoil optimization problem with a 85dimensional input space where observations are available from a high and a lowfidelity simulator.
2 Recursive Multifidelity Gaussian Processes
2.1 Autoregressive Gaussian Process model
We consider the following scenario where a hierarchy of computer codes is available, say , indexed by input vector where is the design space consisting of all feasible inputs. The codes are in order of increasing fidelity from the cheapest one , to the most accurate one, . The key assumption as was first stated by Kennedy & O’Hagan kennedyohagan for any two consecutive levels of code and , is that given observation of the low fidelity code at , nothing more can be learnt about by observing at any . This translates to the Markov property
(1) 
for any
. Based on this principle, we write the relation between any two consecutive codes using the autoregressive model
(2) 
where “” denotes statistical independence and therefore the discrepancy between successive levels is characterized by an independent correction term and by the scaling coefficient which captures the correlation between the models, as it satisfies
(3) 
As suggested in kennedyohagan , in what follows we assume for simplicity that is constant, although dependence on has been modeled using regression functions and was shown to be worthwhile le_gratiet_juq ; le_gratiet_doe . At last accounts for measurement noise in the obserbations at level that is assumed to be independent of both and .
2.2 Prior and predictive distributions
Prior to observing any outputs of the multifidelity codes, we assume that and are Gaussian Processes
(4) 
where , are scaling factors and are covariance kernel functions. For the measurement noise we take
(5) 
The kernel functions are modeled using the squared exponential kernel
(6) 
where are the lengthscales along the dimensions. The choice of the covariance kernel depends primarily on the prior belief about the smoothness of the response surface (Ch.4, rasmussen ).
Let now be the experimental design set at level that consists of input points in , and be the observations of the th code . We denote with the parameters introduced at level , so that and , and . For a fixed set of parameter values by stacking all observations to form a vector , we write the predictive distribution of the highest level of code at any new set of test points , for , as
(7) 
where the predictive mean is given by
(8) 
and the predictive variance is
(9) 
In the above expressions we have
(10) 
where the diagonal block matrices are given by
(11) 
where , is the correlation matrix with entries , . The offdiagonal blocks are written
(12) 
with . Analogously, is the correlation matrix with entries , . Further, the vector is defined as , where
(13) 
and and . At last, the variance is defined as
(14) 
for .
2.3 Estimating the model parameters
In order for the above predictive distribution to be of practical use, it is crucial to train the model by means of finding the optimal set of parameters . Pursuing a fully Bayesian approach to model training, although robust, it can be computationally challenging as the number of levels, and consequently the dimension of , increases. It can be therefore preferable in such cases to resort to more efficient strategies, such as maximum likelihood estimation (MLE). For the sake of completeness, we present below both approaches, that we use interchangeably in our numerical examples.
2.3.1 Maximumlikelihood estimation
Here we explore the possibility of training the model using maximum likelihood estimation (MLE). More specifically we seek to minimize the negative loglikelihood, that is to identify such that
(15) 
where
(16) 
writing to emphasize the dependence of the covariance matrix on the parameters. Minimization of can be performed using standard gradient based algorithms. The gradient of with respect to any of its arguments is given by
(17) 
Note that a convenient simplification applies in the special case considered in kennedyohagan ; le_gratiet_juq ; le_gratiet_doe , where the design points corresponding to the observations are nested, that is . By expanding the likelihood using conditional probabilities and making use of the Markov property in eq. (1) we write
(18) 
thus the autoregressive model can be trained by solving distinct optimization problems with respect to the parameters , corresponding to the different levels of fidelity . At an arbitrary level , the loglikelihood is written
(19) 
After differentiating with respect to the components of and setting equal to zero, one can derive the maximumlikelihood estimates
(20) 
and
(21) 
where , , is the set of observations from code corresponding only to the design points in . Further, is a vector of length , filled with ones and is the indicator function that is one in , and zero otherwise. At last, the two estimates given above are dependent on the lengthscales and the noise variance . Those can be estimated by maximizing the concentrated restricted loglikelihood function
(22) 
2.3.2 Markov Chain Monte Carlo sampling
A fully Bayesian updating strategy of the model parameters can be carried out using Markov Chain Monte Carlo (MCMC) sampling in order to generate samples of the posterior distribution of . The latter is written as
(23) 
where the likelihood distribution is with given in eq. (16). The prior distributions of parameters
corresponding to lengthscales at various levels of fidelity are modeled using independent Beta distributions and those corresponding to the variances are modeled using independent inverse gamma distributions. In our implementations we use a MetropolisHastings algorithm
gelman that allows jumps in order to fully explore possible multimodal behavior.3 Multifidelity Gaussian Processes on lowdimensional embeddings
3.1 Dimensionality reduction using projection matrices
As discussed in the introduction, the main goal in this paper is to learn a response surface on a high dimensional input space within a multifidelity context, i.e. by leveraging observations from low accuracy simulators that are cheaper to evaluate. Several shortcomings can make the learning process problematic in the presence of high dimensions and large datasets. For instance, the large number of parameters in the case of anisotropic kernels can result in poor performance of the MCMC algorithm or convergence of the MLE procedure to suboptimal solutions. Below, we develop an dimensionality reduction framework where the ARGP model is trained on a low dimensional input space that is the result of a linear embedding of the original space. We assume throughout this work that the target function can be described or be wellapproximated by a function , defined in a ddimensional space , where , such that
(24) 
Here, is assumed to be a orthonormal matrix that maps the original design space to . The choice of orthonormality is made so that the columns of form a basis on and therefore the matrix itself is a projection from to . That further implies that once is identified, any other set of basis vectors forms a projection that can describe the same approximation of .
Next, it is important to assume that all the lower fidelity codes can be approximated by similar “link” functions defined on the same low dimensional space , that is using the same projection function . We therefore assume that for each
(25) 
for functions , . Thus, we can define the autoregressive model (2) on as
(26) 
where and .
The idea behind this formulation is that, by identifying such that the above approximations are accurate, the multifidelity output quantify of interest is described as a function of a low dimensional input, thus, it becomes simpler to characterize its predictive distribution. Applying the same projection at all levels of fidelity, practically means that all codes exhibit most of their variability within the same “active” subspace, as it was termed by Constantine constantine_AS ; constantine_book . Although this might seem as a strong assumption, in fact, considering that different fidelity codes are typically highly correlated as they simulate the same physical process at different levels of accuracy, the assumption is fairly plausible. Furthermore, it is worth pointing out that the ARGP model (26) where the low dimensional spaces are defined using different , at different levels, does no longer honor the Markov property (1). We do not pursue further such a scenario in this work.
Assigning the same prior distributions as in eq. (4) for the reduced dimensionality ARGP introduced above, results in the same posterior expressions given in eqs. (8)(9), where the covariance matrices describe the correlations of the training points in . By denoting the projections of all design points where the model outputs are observed, as , , we can rewrite the covariance kernels and as functions of the high dimensional inputs in the original space , with entries
(27) 
respectively. By incorporating these expressions in the likelihood and posterior distributions, it becomes clear that can be considered as an additional set of model parameters that needs to be inferred from observations, in a similar manner as in the singlefidelity setting presented in tripathy_bilionis .
3.2 Simultaneous autoregressive GP training and embedding learning
As highlighted above, training the autoregressive GP model and identifying the lowdimensional design space requires learning the model parameters and the rotation matrix simultaneously. To do so, we propose a twostep iterative procedure that iterates between tuning the parameters for a fixed and updating while keeping the parameters fixed. Such algorithms have been used in the past for dimensionality reduction purposes within the context of GP regression tripathy_bilionis ; garnett and Polynomial Chaos adaptations tsilifis_rspa ; tsilifis_cs and have demonstrated great potential. To further justify the choice of updating scheme, we can write the Bayesian posterior of the joint parameters given observations , as
(28) 
Assuming that sampling from the marginal posteriors of and conditional on each other, a Gibbs sampler would consist of generating a Markov chain that eventually converges to the joint posterior above chaspari , therefore generating a chain of and samples in such a fashion will ultimately explore .
In order to carry out such a sampling scheme, we employ the following strategy: When is given, the updating step is performed using the methods presented in Section 2.3. For updating while is kept fixed, we use the Geodesic Monte Carlo method that samples from target distributions defined on embedded manifolds, as it is described in Byrne & Girolami byrne . In our case, is a matrix that takes values on the Stiefel manifold of orthonormal frames on muirhead , and the target density is the marginal posterior . The iterative procedure is summarized in Algorithm 1. The details of Geodesic Monte Carlo sampling are presented in the next section.
3.2.1 Geodesic Monte Carlo
The Geodesic Monte Carlo algorithm developed in byrne is at its core a Hamiltonian Monte Carlo (HMC) sampling technique girolami_hmc defined on a Riemannian manifold embedded in . A Hamiltonian function is defined that describes the dynamics of a spatial variable and is characterized by its target density. Next, the Hamiltonian flows are simulated through numerical integration in order to propose new samples that are to be accepted or rejected, based on a MetropolisHastings step hitchcock . In our case, we are working on the Stiefel manifold that is defined as
(29) 
for , where the special cases and correspond to the set of all orthonormal square matrices and the ()dimensional hypersphere on , respectively. The Hamiltonian corresponding to our target posterior density is defined as
(30) 
where is an auxiliary velocity variable and the dynamics of are described by
(31) 
The key difference of the sampling strategy proposed in byrne from traditional HMC algorithms is that instead of relying on symplectic integrators neal to simulate the Hamiltonian flow, we take advantage of the fact that the dynamics of the kinetic term describe a flow over a geodesic curve that are explicitly known for certain manifolds, therefore numerical integration applies only on the potential term, thus improving accuracy and performance.
In summary, at the th step of the algorithm, a random velocity vector is proposed that is tangent on the manifold at the previously accepted step , and it specifies the direction along which the Hamiltonian is going to move. Then the two terms and are integrated over time by first updating for a time step , followed by updating for using the known geodesic curve formula, and then is integrated again for . The procedure is repeated until a userdefined final time is reached. At last, the resulting “spatial” coordinate will be accepted in the chain with probability
(32) 
The full mathematical details of the integrator scheme for the Hamiltonian flow and the geodesic formulas on are given in A. Algorithm 2 describes one complete acceptreject step of the above procedure. As a prior on , we use a MatrixLangevin (mL) distribution chikuse whose density function is given by
(33) 
where is the normalizing constant that is parametrized by the matrix . Details on the geometric interpretation of the mL density and how to tune the prior parameters are given in B. Note that updating by integrating the potential term requires computing the gradient of the marginal logposterior distribution . The gradient of the marginal logposterior becomes
(34) 
where the likelihood gradient is given by (17) and the gradient of consists of gradients of block matrices , . Those involve differentiating that have th entries
(35) 
At last, the logprior gradient is .
4 Numerical examples
For all numerical examples presented in this section we have used the following settings:

The number of time steps and the stepsize used in the integration of the Hamiltonian in Algorithm 2 are set to and respectively. These values have been carefully selected after numerical experimentation with the algorithm and they provide a moderate acceptance rate while keeping running times to reasonably low levels. Intuitively, a highly accurate integration resulting in high acceptance rate would require a large value for and very small but it would slow down the algorithm significantly. On the other hand, coarse integration using large and small would speed up the algorithm but would reduce the acceptance rate. Another characteristic of the algorithm that manifests in high dimensions for both and is that for an arbitrary initial point the acceptance rate during the first few iterations is significantly low, as a result of the standard Gaussian proposal from which we sample , leading to completely random directions along with the Hamiltonian flow evolves. To improve performance of the algorithm, we change the mL prior at each iteration by setting its parameter . Each iteration stops when one single sample is accepted and we reduce the step size to after successive rejections in order to refine the Hamiltonian integration and reset on the next iteration. We stop sampling as soon as the first sample is accepted.

To train the ARGP using MCMC sampling, we assign beta priors with parameters on the lengthscales and inverse Gamma priors with parameters and on the variance parameters and respectively, for . To proceed with sampling , the hyperparameters are then fixed to the median values of the full MCMC chain after 200 samples are accepted. The MCMC algorithm is run using General Electric Global Research Center’s inhouse Bayesian modeling toolbox GEBHM ghosh2020advances . To train the ARGP using MLE, the loglikelihood or the restricted loglikelihoods are maximized using the BFGS algorithm byrd .
4.1 Academic example 1: Threefidelity model with known 1dimensional embedding
We consider the following three levels of code
(36) 
where so that is a scalar variable and the link functions are given by
(37) 
For this example we take and we generate randomly, by fixing the random seed in order to ensure reproducibility. For our numerical experiments, the projection matrix is fixed to
(38) 
We generate synthetic data that consists of observations obtained on nested design points and from the low, intermediate and high fidelity codes respectively, where , and
. The observations are corrupted by Gaussian noise at all levels of fidelity with their standard deviations being equal to
, and , corresponding to , and respectively. Fig. 1 shows the full training data set along with the true 1dimensional link functions , and . To motivate our study, a singlelevel Gaussian process regression is performed on the highfidelity data points that are available. The predictive mean, depicted with solid blue line, matches the training points but clearly is unable to capture the true model’s fluctuations due to the absence of a sufficient amount of data, let alone the fact that the true and the resulting 1dimensional representation is hypothetically not available to the experimenter. That essentially means that a dimensionality reduction approach performed using only the high fidelity dataset, as proposed in tripathy_bilionis would provide such a predictive mean only in the best case scenario, where would be recovered exactly. In addition. even learning this rotation matrix would be challenging due to the limited data availability and convergence of the two step algorithm would be expected to be extremely slow. Lastly, it is needless to say that gradientbased methods such as constantine_AS would simply fail dramatically due to the poor gradient Monte Carlo estimate using only data points. On the other hand, in a multifidelity GP regression setting in the original 10dimensional design space, the available data might fail to provide meaningful inference results, while training the model becomes again challenging due to the large number of parameters to be inferred, including different lengthscales that are present in the anisotropic kernels, as well as the repeated use of Cholesky decomposition for inverting large covariance matrices.We run Algorithm 1 for this particular setting and after only eleven iterations we obtain the converged rotation. Fig. 2 (left) shows the drawn samples during iterations 511. It can be seen that the values are in full agreement with  which is a valid rotation since the representation of through its link function is invariant under reflections about the origin. Fig. 2 (right) shows the plots of the predictive mean and the true link function from eq. . The excellent agreement between the two is apparent. At last, Table 1 shows the estimated model parameter values . For comparison we display the value of the singlelevel GP regression shown in Fig. 1. As one can conclude from the plot, the singlelevel GP overestimates the lengthscale and interprets the data discrepancies as observation noise (). On the contrary, the obtained ARGP gives high fidelity lengthscale while the noise variances for all fidelities are almost negligible. By capturing accurately the correlations between the different levels and (recall the true values and ), the autoregressive model leverages the low fidelity data effectively and captures the model’s fluctuations even in areas such as the interval where high fidelity training points are fully absent.
GP  5.223  11.601  4.017  

ARGP  0.508  7.022  8.297  1.955  3.296  11.518  17.868  1.242  1.700  13.107  20.123 
4.2 Academic example 2: Threefidelity model with known 2dimensional embedding
A threelevel autoregressive Gaussian Process is presented in this example where the low dimensional embedding is now 2D and the additional challenge of identifying the dimensionality of is also explored. Let be a fixed projection matrix with columns and . The three levels of code are given by the functions
(39) 
We consider again to be the dimensionality of and the projection matrix is fixed to
(40) 
For this example we generate data again from nested design points , and where this time , and . Observations are contaminated with Gaussian noise whose standard deviation is equal to , and at the corresponding levels.
d=1  2.005  0.651  3.082  2.419  0.327  4.901  6.903  2.505  2.510  8.807  21.089 

d=2  (2.67, 4.40)  4.932  15.004  0.123  (2.96, 0.26)  4.756  8.888  0.529  (0.49, 0.31)  4.083  18.466 
We run again Algorithm 1, first for and then for and we report our results below. For the case, the algorithm converged after roughly 12 iterations and, as expected, the trained model fails to capture a suitable embedded space that could honor all training data points and further represent the high fidelity code as a 1dimensional function. Fig. 3 (left), compares the posterior values of accepted while running the Geodesic MC algorithm, with the values of the two columns of the true used to generate the training data, Fig. 3 (right) shows the 1dimensional predictive mean of the posterior ARGP with 3standarddeviationwide confidence bands along with test data points. One can observe that the entries of the inferred tend mostly towards the values of the 2nd column of the true . This can be intuitively explained from the fact that the terms including appear to be more dominant in the overall expression of and particularly the low fidelity model has small impact on the high fidelity code. At the same time, the predictive capabilities of the posterior model clearly fail to span the regions where the additional test data might be observed, indicating that a higher dimensional embedding should be learned. In the case, the situation improves significantly. As can be seen in Fig. 4, top row, the sampled posterior values of and are in agreement with the true values of in both columns. The samples have been obtained during iterations 1020. Furthermore, the bottom row graphs show the 2dimensional predictive mean of the train ARGP model along with test data points and a 45degree line plot comparing observations versus predictions on the same data points. The overall predictive performance of the model is in agreement with the true model output.
5 Airfoil shape optimization problem
The aerodynamic optimization problem analyzed in this section is the full 3D design of a typical large last stage blade (LSB) of an industrial gas turbine (IGT). As the largest rotating component in the turbine, last stage blades are one of the most mechanically challenged components in an IGT and often determine the total power output of the machine. Increasing push for larger and hotter turbines to drive down cost of electricity has resulted in increasingly challenging LSB designs. With deference typically skewed towards durability, this has generally resulted in greater aerodynamic compromises that negatively impact blade efficiency. This also drives longer design cycles as designers incrementally search for acceptable aeromechanical solutions. In this type of design, aerodynamic assessments of power, efficiency, and flow capacity are based on expensive highfidelity computational fluid dynamics (CFD) simulations. Each design iteration requires one or more CFD simulations, depending on the number of inner aerodynamic iterations required to satisfy the cycle’s flow capacity requirements. A full 3D optimization would enable the enhancement of turbine performance. However, the process normally implies a large computational cost because of the high dimensionality of the design space.
In the current work, turbine performance is evaluated using steadystate RANS CFD at two different fidelity levels; a fast running coarse mesh for broader design space exploration, and a slower running fine mesh for accuracy refinement. The 3D airfoil is parametrized using the approach illustrated in Figure 6, which shows the airfoil parametrization at one section on the left, and a view of the full 3D rotor on the right. The airfoil surface is constructed with 7 2D profiles at different spanwise locations from hub to tip. Each airfoil section is characterized by 12 independent parameters. The ranges for these parameters are selected to provide a wide design space while respecting geometrical constraints. The 12 parameters, as sketched in Figure 6, include the stagger angle, leading and trailing edge metal angles, leading edge diameter, suction and pressure side wedge angles, leading edge and trailing edge metal angles, and additional parameters to control the airfoil curvature between the leading and trailing edges.
The 2D sections are aligned relative to each other in circumferential and axial space by aligning the section centers of gravity (CG) along a radial line through the hub section CG (referred to as the stacking line). After the section CGs are aligned, one additional parameter, referred to as the airfoil lean angle, is applied to reorient the stacking line relative to the radial direction. Surfaces are fit through the seven stacked sections to create the full, continuous, 3D airfoil definition. Eighty five total parameters are therefore required to define the complete 3D airfoil shape.
The parameters are expressed as offsets from a baseline, requiring that each section starts from an appropriate reference design. An inhouse software package tailored specifically for turbomachinery design uses the parameters described above to create the airfoil coordinates, and these coordinates are then transformed into a full 3D CAD model of the rotor blade for the CFD grid generation. A 3D structured mesh is built using a commercially available software package. Grid templates are built from the baseline geometry and used consistently for all the cases throughout the optimization. With this approach, all the grids have similar refinement and quality metrics. To simulate the full stage, the upstream stator is included in the CFD calculation for each case.The design of the vane and the vane mesh are not altered through the optimization.
The 3D CFD analysis is performed using GE’s inhouse CFD solver TACOMA, a 2ndorder accurate (in time and space), finitevolume, blockstructured, compressible flow solver seeley ; ren . The steady ReynoldsAveraged NavierStokes (RANS) calculations are solved with a mixing plane between rotating and stationary components. Source terms are included at various locations along the endwalls to simulate the injected cooling, leakage, and purge flows. The two objectives are to maximize the aerodynamic efficiency while matching the baseline degree of reaction. For this problem, with the full stage modeled, the degree of reaction can be calculated using standard turbine definitions.

Aerodynamic efficiency: The efficiency is calculated as the ratio of mechanical power and isentropic power. All inputs required to produce the efficiency value are available from the output of the CFD simulation. Design preference is to maximize efficiency.

Pseudoreaction: In a stage calculation, the degree of reaction indicates the split of flow acceleration between stator and rotor. The degree of reaction can be calculated based on the full stage CFD, so the flow quantities between stator and rotor are extracted from the numerical results for each case. This objective is monitored to ensure that the changes in airfoil shape do not significantly affect the turbine operating condition. The design preference of this objective is to be as close as possible to the baseline value.
Multiple quantities are extracted and processed from the CFD, like ideal Mach number at various spanwise locations. Radial profiles of flow quantities, like pressure and flow angle, are obtained to characterize the quality of flow field propagating from the turbine to the downstream exhaust diffuser, which has not been modeled here. Additional objectives are formulated based on the desirability of the turbine exit flow profile. Diffusion rate, diffusion ratio, and shock intensity are assessed for each case to provide additional selection criteria. The resulting simulated data consists of 160 shape configuration points are generated to be run on the coarse grid and another 120 for the fine grid that have successfully converged to CFD solutions. The two main objectives, efficiency and degree of reaction, are plotted in Figure 7, as variations from each respective baseline. The cases assessed with a low fidelity grid are represented with gray circles, while the cases assessed with high fidelity are represented with blue diamonds. Given the small number of points compared to the number of parameters, a conclusive Pareto front has not yet been identified.
For this problem, we fix the correlation coefficient between the two models to since the different level of accuracy of the two solvers varies as a result of different mesh discretization, unlike the previous examples where higher fidelity codes resulted from scaling the low fidelity ones. We split the available high fidelity observations to 75 training points and 45 test points while all low fidelity observations are used in training the ARGP using MCMC. As expected, Algorithm 1 converges slower than in the previous synthetic examples due to the higher dimensionality. Figs. 8 & 9 show the comparison of the ARGP mean predictions ( 2standard deviations) for the two quantities of interest vs the test observations for reduced dimensionalities . Clearly the fit improves significantly as increases and reported residual mean square error (RMSE) value simply confirms this fact by reducing to as low as & respectively. The quality of fit on the test data for can serve as a stopping criterion when testing for different reduced dimensionalities and no need for further exploration on is required. To further support this claim we report values of the loglikelihodd function and the Bayesian Information criterion (BIC) bishop . The latter is a standard criterion used for model selection and is given by the loglikelihood expression penalized by a term that depends on the number of training data points and model parameters. Specifically,
(41) 
where is the number of model parameters to be inferred and is the maximum likelihood estimate. The largest values of BIC typically indicate the most favorable model. We plot the maximum loglikelihood and the BIC values as a function of in Fig. 10. We observe that the loglikelihood becomes almost constant, indicating that no further improvement in terms of fitting the training set can be achieved. The BIC on the other hand drops at , indicating that the number of parameters (penalty term) becomes significantly large, thus the tradeoff between datafit improvement and number of parameters to be inferred has already a negative trend.
At last, the display the sample values of the 1st column of , denoted as for both QoIs in Fig. 11. The values corresponding to the same input dimension in general do not seem to be similar, indicating that the importance of each input parameter on the different QoIs varies. Most importantly, very few of the entries are near zero, thus only a small number of the parameters in the original input space are negligible with no effect in the output QoIs. This allows us to conclude that the reduction from to is not the result of simply discarding unimportant parameters but that our algorithm was able to reveal the lowdimensional linear embedding that captures the the models observed behavior accurately.
6 Conclusions
We have thus far presented a methodology to exploit low dimensional subspaces for input dimensionality reduction and have demonstrated it’s applicability to uncertainty propagation problems using multifidelity simulations. This was achieved by employing an autoregressive Gaussian Process scheme with a linear embedding that is modeled using an orthogonal matrix that maps the original input variable to its latent counterpart. We treat the matrix as an additional set of hyperparameters and we learn the mapping jointly with training the ARGP by employing a twostep algorithm that train the ARGP using standard MLE and MCMC techniques while it updates the orthogonal matrix using the proposed Geodesic Monte Carlo sampling. We validated our method on synthetic examples with known linear embedding and we further utilised it to study the problem of 3D airfoil optimization by building a low dimensional responce surface to understand the impact of airfoil shape parameters on the particular quantities of interest.
Our work continues on the findings of constantine_AS ; tripathy_bilionis towards the development of a surrogate that enables a fully Bayesian exploration of the active subspace. Quoting Tripathy & Bilionis tripathy_bilionis “the big challenge is the construction of proposals that force to remain on the Stiefel manifold such approaches would open the way for more robust AS dimensionality selection”. We have achieved a Bayesian treatment that relieves us from such headaches by tailoring our HMC sampling method byrne particularly on the Stiefel manifold. In addition, the capability to leverage information from computer codes of a varying degree of accuracy and cost is another novel step that opens new directions in multifidelity simulations for uncertainty quantification, including multioutput Gaussian Processes and multiobjective Bayesian optimization. Such topics are to be explored in future research.
Acknowledgments
The information, data, or work presented herein was funded in part by the Advanced Research Projects AgencyEnergy (ARPAE), U.S. Department of Energy, under Award Number DEAR0001204. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.
Appendix A Mathematical background on Geodesic Monte Carlo
a.1 Hamiltonian dynamics on an embedded manifold
Consider the problem of sampling from a probability density defined on a manifold that is isometrically embedded in
and equipped with a metric tensor
, serving as the isometry between the two spaces, thus, preserving the distance. In order to employ a Hamiltonian Monte Carlo algorithm neal , the Hamiltonian function is formed(42) 
and its dynamics are described by
(43) 
Integrating the Hamiltonian on a Euclidean space typically involves a StormerVerlet leapfrog scheme bishop ; hairer which in our case requires inverting the metric tensor . To avoid numerical drawbacks associated with such operations, Byrne & Girolami byrne suggest splitting to two distinct Hamiltonians, namely with dynamics
Comments
There are no comments yet.