1 Introduction and motivation
It is virtually impossible today for a computational simulation to replicate the unsteady, complex, and random nature of fluid moving through an engine at cruise; there are simply far too many unknowns. Blades in a single row are susceptible to manufacturing variations, the casing is never perfectly circular, operating conditions exhibit some variability, and stateoftheart 3D unsteady solvers model the turbulence and do not resolve all the pertinent scales and their evolution. Our best window into this world is undoubtedly the data that arises from engine tests.
The challenge with engine tests is that they typically have low instrumentation coverage as there is limited space for instrumentation and capacity for acquiring signals from the various measurement devices. Furthermore, access to specific areas of interest in the gas path is extremely limited. In a rig, by contrast, the temperatures, pressures and vibration levels will generally be reduced, though they do offer greater access for instrumentation. The measurement methods in a rig may also be different from those in an engine, i.e., certain probe designs that are not permissible in an engine may be used in a rig. That said, the rig is never fully representative of the engine, in particular, it does not fully capture the interactions between adjacent components in the engine.
Our focus in this paper will be on onground engine temperature measurements, obtained from a series of rakes at a specific axial station. We wish to study these temperature values, as they cannot be reproduced in rigs or CFD. Consider the schematic shown in Figure 1. One can consider the spatial variation of engine temperature to be a superposition of engine modes (not visible in a rig), bladetoblade modes (likely visible in a rig), and noise. The bladetoblade modes originating from leakage flows, tip vortices, stator and rotor wakes and upstream potential fields lie at the higher end of the frequency spectrum (see page 9 of [ernst2011analysis]) as these will be functions of the blade numbers. They can be experimentally determined using time resolved temperature and pressure traverse measurements. Annular asymmetries in these measurements are expected as a superposition of bladetoblade modes with the engine modes will result in variations in viscous mixing effects, inviscid wake stretching and the transport of low momentum wake air across the passage [sanders2002multi, mailach2008periodical]. Engine modes, by contrast, are captured by positioning rakes at the same pitch relative to the upstream stator vanes. Asymmetries in engine modes can be introduced by upstream bleed positions, upstream ducts, structural members and downstream potential fields.
So what is typically done with engine measurements? The first notion is to arithmetically average them. In fact, historically, engines were fitted with equal area weighted probes (see [stoll1979effect, francis1989measurement]) permitting relatively straightforward area average calculations. Over the years different averaging techniques have emerged (see Cumpsty and Horlock [cumpsty2006averaging] for a thorough review) including the workaverage method of Pianko [pianko1983propulsion], the momentum mixing method of Dzung [dzung1971] and the mass average—all three requiring details on other primal flow quantities compared to the area average. Other notions include analyzing radial profiles and using them for engine prognostic and diagnostic efforts.
But how precise is our averaging process given that we do not approximate the spatial flow field itself? Furthermore, how do we account for the uncertainties introduced by both individual measurements and any spatial model? As engine measurements shape our understanding of the aerothermodynamics of the gas path, the answers to these questions are of great importance. Moreover, as these measurements (and subsequent inferences) make their way into preliminary design tools and overall uncertainty budgets, inaccurate estimates of both spatial approximations and their averages can have repercussions on new engine programs, measurement and testing practices, and even subsystemlevel design.
In the first part of this twopart paper (see part II in [seshadri2019b]), we develop a framework for studying engine temperature measurements with a more principled, datacentric approach. Specifically, we propose a regressionbased model for approximating the spatial flowfield at a given axial plane. To the best of the authors’ knowledge, there remains a dearth of publicly available information on this subject; in section 2 we discuss prior work done in this area and highlight our contributions. In section 3 we detail our regressionbased model and deploy it on a range of engine datasets. Finally, in section 4, we address the issue of area averaging.
[06]Operator [03]Number [05]Punctuation Symbol [04]Upper Case Letter [01]Lower Case Letter [02]Greek Character
2 Datacentric spatial flowfield approximations
In this section, we outline prior work undertaken from both the modeling and spatial averaging perspectives.
2.1 Spatial modeling and area averaging
As mentioned previously, there has been very little published on spatial approximation, compared to spatial averaging
(e.g., the aforementioned methods of Pianko and Dzung). One can broadly classify existing area averaging practices into three categories:

Numeric average: An ensemble average of all the measurements that does not factor into account radial and circumferential locations of the measurements (see page 28 in [skiles1980turbine]).

Area weighted: Each measurement is weighted by the sector area coverage associated with its probe (see page 8 in [skiles1980turbine]).

Fourier 1D: Temperature values along individual rakes are averaged radially (numeric average) and then a Fourier expansion is fitted circumferentially, typically using least squares (for the latter see page 9 of [chilla2019reducing]).
The latter reference is noteworthy as the authors do make use of a least squares strategy to estimate the flowfield at a single radial height.
That said, each of these averaging practices introduces uncertainties that are difficult to accurately quantify. We address the issue of uncertainties in spatial averages and the impact of imprecise measurements in our companion paper [seshadri2019b], and for the remainder of this paper focus solely on obtaining a spatial approximation of the flowfield, which we can then integrate analytically to obtain an area average.
2.2 Supervised machine learning
There are numerous techniques that fall under the moniker of supervised learning
. They include linear regression, support vector machines, decision trees, linear discriminant analysis, principal component analysis and neural networks. Foundational texts that delve into techniques for supervised learning include Hastie, Tibshirani and Friedman
[friedman2009elements], Rogers and Girolami [rogers2016first] and more recently Goodfellow et al. [goodfellow2016deep]. In a nutshell, these methods are usually trained on a set of data to establish values of the key model parameters. The model itself is typically associated with some basis, e.g. Fourier, polynomial, Gaussian kernels, etc. Establishing the values of these parameters involves optimizing an objective function, which can be defined either with respect to the or norm or some combination thereof. Once these model parameters have been established, the model is applied to a testing dataset to certify its utility.The question we are faced with in this paper concerns how we obtain a 2D spatial representation using a few scattered measurements. To this end, conventional nearest neighbor or linear interpolation techniques fail to embed frequency information pertaining to key engine harmonics. Other techniques based on standard fast Fourier decompositions demand far more rakes than can physically be placed in an engine—even for identifying a few key harmonics—owing to the sampling constraints imposed by the NyquistShannon bound. Methods based on
norm regression, e.g. basis pursuit and least absolute shrinkage and selection operator (see Chapter 3 in [friedman2009elements]), are contingent on sparsity in the model coefficients, a characteristic that is not necessarily guaranteed. Methods centered on norm regression are, in practice, less robust than approaches, but are useful in the absence of sparsity in the coefficients. Furthermore, they can be used even with a few measurements, a requirement in our work.3 Datadriven spatial model
In this section we describe our approach to fit a parametric 2D model to the engine data. A key component of this modeling paradigm lies in our strategy for determining the harmonics.
3.1 The datasets
First, we detail the datasets that will be used throughout this work. We study the temperature measurements obtained from 5 different engines from the same engine family, at the same measurement plane, relative to its upstream and downstream components. The rakes in all the engines are axially placed downstream of a series of outlet guide vanes and are circumferentially positioned to avoid the wakes associated with these vanes. Details of the rake positions in the engines and the number of different extracts are provided in Table 1. Each extract corresponds to a temperature measurement reading at a given engine power setting. The greater the power setting, the higher the mean temperature. Each measurement is obtained by sampling the raw thermocouple voltage at 192 kHz with a rolling average for the last 20 milliseconds. This signal is then filtered to remove noise components at 50 Hz and 400 Hz. Finally, the signal is averaged over a 30 second interval at a sampling rate of 33 Hz. Appropriate calibrations are then applied to convert this voltage into a temperature (in Kelvin) value. In this paper we will assume the uncertainties associated with this temporal averaging are negligible.
Engine  No. of rakes  Rake positions ()  Extracts 

A  6  54.0, 90.0, 162.0, 234.0, 270.0, 342.0  13 
B  6  54.0, 90.0, 162.0, 234.0,306.0, 342.0  20 
C  6  54.0, 90.0, 162.0, 234.0, 306.0, 342.0  26 
D  6  54.0, 90.0, 162.0, 234.0, 306.0, 342.0  11 
E  8  18.75, 60.625, 140.0, 179.58, 219.375, 258.75, 298.75, 340.0  12 
To permit rapid scripting of the measurements and subsequent analysis, the engine data was formatted into a series of XML documents and the Python xml library was used to extract the temperature data for subsequent modeling.
3.2 Multivariate regression model
We now describe our modeling framework. We are given temperature measurements, obtained from rakes and probes per rake at a given axial station in an engine. We define the measurement matrix
(1) 
where for is a vector of the temperature values for the rake. In this section, we will assume that there are no errors in , and our aim here is to develop a model that best describes the spatial variation in temperature using the finite measurements in . More specifically, we are trying to determine the wholeengine variation and not the bladetoblade variation. Given the harmonic nature associated with this sectortosector variation, we will use a Fourier expansion in the circumferential direction. We define the Fourier matrix where represents the number of harmonics. Elements of are given by
(2) 
where is the circumferential location of the rakes in degrees. The number of harmonics in used in the spatial approximation is chosen such that .
Our objective is to solve the multivariate regression problem
(3) 
where the columns of the unknown matrix represent the Fourier coefficients at each of the radial positions associated with the probes. Solutions to (3) can be readily found provided that is not singular. We remark here that (3) does not distinguish between measurements obtained at midspan and those towards the hub or casing. One can, however, premultiply both and by a weight matrix that allocates greater preference to obtaining a model that is accurate at midspan than at other spanwise locations.
Consider a parameterized analogue of , denoted by , where
(4)  
(5) 
It is clear that the product corresponds to the temperature values at a specific value of across all radial locations. Now, to obtain a complete spatial representation of temperature, we need to radially interpolate these values using a polynomial with an appropriately selected degree. Once again, we resort to solving this as a least squares problem. We define the Vandermonde matrix where is the highest degree of the polynomial. Columns of have the form
(6) 
where stores the radial locations of the taps. Then, we solve for the coefficients associated with the least squares problem
(7) 
Let us now define a parameterized analogue of , denoted by , where
(8) 
Thus the product gives us the temperature approximation given the tuple . For completeness we give the full expression
(9) 
where . Computing (9) involves matrix vector products for each tuple. Having an analytical formula also permits us to obtain the overall area average
(10) 
where and are the outer and inner radii respectively. This formula will be used later when computing area averages given measurements . Computing the standard rootmeansquared error (with respect to the measurements) is also straightforward and given by
(11) 
where one can utilize the thin QR factorization of , i.e. , and for ease in computation. In (11) the symbol denotes the Kronecker product and vec denotes a vectorized version of a matrix, obtained by stacking columns of the matrix sequentially.
3.3 Frequency selection algorithm
This error metric can be used to frame the following optimization problem
(12) 
This requires us to iterate over the space of integers for obtaining the frequencies, while simultaneously iterating over the space of real matrices—making (12) a challenging nonconvex optimization problem to solve. Further, solutions have to satisfy the inequality constraint where is a constant tailored to avoid solutions with large norms. We expand upon this salient point below.
For some combinations of and rake positions , can be illconditioned, implying that the linear equations are undetermined [hansen2010discrete]. Standard least squares solves on such matrices result in having a large norm, manifesting in our problem as large amplitude overshoots between measurement points. This would result in Fourier series expansions that closely match the temperature values at prescribed , but give wildly varying temperature values at other circumferential locations. As such overshoots are not physical and entirely numerical, we regularize the problem with the addition of the inequality constraint in (12).
While iterative optimization strategies can be constructed to solve (12), we pursue a different approach. We first select a suitable set of frequencies , construct and then find by solving the regularized least squares problem
(13) 
Here is a scalar value, chosen such that one obtains a favorable compromise between a sufficiently smooth solution and at the same time a small residual. A wellworn criterion for selecting is based on finding the knee of the Lcurve (see 4.7 in [hansen2010discrete])—a loglog scatter plot of the solution norm on the horizontal axis and the residual norm on the vertical axis, plotted for different values of . A representative example is shown in Fig. 2 for an extract from Engine A with . Each marker here denotes the solution norm and the residual norm for a particular choice of . The range of values used in our studies (and in the plot in Fig. 2) was varied from to . Nongraphical methods for finding can be found in [castellanos2002triangle].
Once an appropriate value for has been chosen, one solves for using
(14) 
We encode this approach into our overall brute force frequency selection strategy, shown in Algorithm 1. The idea here is to iterate over the different frequency combinations that yield low values of the RMS error. Our choice of the number of harmonics is dictated largely by the number of rakes in Engines A, B, C and D (6 in all of them), which permits to have a maximum of 6 columns. We therefore restrict ourselves to two frequencies, , which results in having dimensions of . The maximum value of the frequency pairs is also restricted to focus on lower harmonics, i.e. .
In Algorithm 1 we have made the assumption that regularization is necessary. To further examine this assumption, consider the results shown in Fig. 3 both with and without (commenting out lines 4 to 7 in Algorithm 1) regularization. Shown here are the values for , and the condition number of (all on a base10 logarithmic scale) for various frequency pairs. For the regularized case, we plot the condition number of the augmented system given by
(15) 
It should be clear from Fig. 3 that, although regularization does increase for certain frequency pairs, it offers a more stable system of linear equations, resulting in physically plausible temperature variations. We reiterate the importance of having solutions with small norms, as they are unlikely to exhibit massive oscillations between measurement points, i.e., they will be more smooth. In other words, it is better to have solutions that have nonzero residuals and are smooth, than solutions that have a residual of zero and are nonsmooth.
3.4 Identifying suitable harmonics
We apply Algorithm 1 to the 70 extracts in Engines A, B, C and D with . Fig. 4 plots the average values of for the different frequency pairs. It is apparent from these results that across all these extracts—corresponding to a range of different operating points on each engine’s power curve—there are four frequency pairs that consistently yield low errors: , , and .
Full spatial representations of temperature associated with these harmonics are shown in Fig. 5 for the first extract of Engine A. The contours are generated using the Fourier series expansion in the circumferential direction, based on the values of , and a quadratic polynomialbased extrapolation in the radial direction, as explained in Sec. 3. The colored circular markers shown in the four subfigures of Fig. 5 denote the thermocouple positions on the six rakes and their corresponding temperature values. In what follows, we focus our attention solely on the four harmonic pairs identified above.
Before we move on, it will be worthwhile to discuss aliasing in the context of our frequency selection algorithm. Aliasing (see page 91 in [strang1996wavelets]) is an artifact that causes signals to be indistinguishable when sampled; typically the original signal is sampled at an insufficient number of points to recreate it in its entirety. In our context, based on the placement of the rakes and the frequencies selected for approximation, one can get multiple frequencies that interpolate the sampled data points exactly.
3.5 Training and testing the model
Thus far we have not distinguished between training and testing datasets—a key practice in datacentric approaches. One typically infers the parameters of a model on the training data, and tests its suitability on the testing data. Bootstrap or fold crossvalidation are widely adopted (see Chapter 7 in [friedman2009elements]
). With regard to the latter, caution must be exercised when selecting the number of folds: too few and the crossvalidation estimator can have high variance as the training data is similar; too many and we risk incurring a bias. In what follows, we train and test our multivariate regression model on Engine E. Recall our prior experiments on Engines A, B, C and D have pointed us towards four particular harmonic pairs. Here we aim to further prune down the number of harmonics that best estimate the spatial temperature field across all the engines
^{2}^{2}2Assuming that the same engine modes are prevalent over different engines measured at the same measurement plane (relative to upstream and downstream components)..We utilize a leave P out crossvalidation on Engine E; in the case of Engine E, we leave out two rakes for testing while six rakes are used for training. While we note that this engine has more rakes than Engines A, B, C and D, we wish to avoid overfitting the data, and thus restrict ourselves to using six rakes for training. For clarity, we denote the position of the circumferential rakes used for training by and those for testing by . Definitions of the measurements follow suit, with and . Finally, the training Fourier matrix is specified by
(16) 
and the testing one by
(17) 
where we set and to be the remaining rakes. In crossvalidation our objective is to minimize the predictive error, given by
(18) 
Consider the results in Fig. 6 shown for a single combination of training and testing rakes, chosen randomly and repeated for the four different frequency pairs. The plots show the predictive error as a function of the various extracts.
Given that we have eight potential rake locations and we restrict ourselves to selecting only six for training (and the remaining for testing), we have 28 possible combinations. For each of these 28 rake arrangements, we compute the errors; the results are shown in Fig. 7. The shaded transparent markers show the predictive errors for the different trials (rake arrangements), while the thick lines denote the mean values of over the 28 trials. Note, it is clear from these results that on average yields the lowest value of , both in expectation and in variance—the latter shown by the reduced scatter in the results.
4 An assumed case study
In section 2
we mentioned that owing to the constraints imposed by the NyquistShannon sampling theorem, we cannot recover the amplitudes and phases of all the harmonics using classical fast Fourier transform techniques. We also commented that, although methods based on
norm minimization can potentially aid in signal recovery, they require sparsity in the coefficients. In this section, we study two particular aspects of these statements with the objective of recovering the entire spatial temperature profile—not simply the first two harmonics—with sparse measurements.In our analysis thus far, we have operated under the premise that we do not know what the true spatial variation in temperature is. To expose our methods, and to offer techniques to the wider turbomachinery community, we study an analytically generated temperature profile, shown in Fig. 8. This profile has an average temperature value of 526.85 K and is comprised of four harmonics .
4.1 Identifying the harmonics
Our goal here is apply the regularized least squares approach from Sec. 3 on this temperature profile to determine whether the approach can capture the two dominant frequencies with only 6 rakes; in some sense serving as a validation of Algorithm 1.
We study four different sets of virtual rake positions, defined by the values of , shown in Table 2. The first is based on the arrangement in Engine A, while the other three are randomly selected such that there is at least one rake in each of the four quadrants. Fig. 9 plots the (once again on a base10 logarithm scale) errors using all 6 rakes—i.e., we do not split the data into a testing and training set. It is clear that across all four rake positions, does yield the lowest error; however, there are other frequency pairs that also yield low errors, e.g., , and , but only for some sets of rake positions.
Case  Rake positions (in deg) 

I  54, 90, 162, 234, 306, 342 
II  15, 45, 123, 190, 250, 316 
III  60, 114, 180, 250, 310, 351 
IV  0, 75, 150, 220, 250, 320 
The temperature profiles associated with are illustrated in Fig. 10 for the four different rake positions. Although the rake positions are different, the spatial profiles are similar. In fact, even if the rakes were circumferentially equidistant from each other, one can still recover similar patterns to Fig. 10.
4.2 Signal approximation
In this subsection, we are interested in ascertaining whether we can estimate the amplitudes and phases of all four harmonics given 6 rakes, assuming a priori information on the harmonics is known. This leads to the solution of an undetermined system of equations where . While there are infinitely many solutions to such a linear system, we are interested in the solution that has the lowest norm. Note that this solution strategy is different from the previously discussed Lcurve approach, which is typically used on tall or square matrices; our matrix here is fat with more columns than rows. One approach to compute the minimum norm solution is given below. Let
(19) 
be the pivoted QR factorization^{3}^{3}3
The pivoted QR factorization is a wellknown heuristic for subset selection. See page 276 in Golub and Van Loan
[golub2013matrix] of , where is a permutation matrix,is an orthogonal matrix and
has the submatrix that is upper triangular. These matrices have been partitioned based on the numerical rank of , which depends on the rake positions in . In this example we use rake positions corresponding to case I, which yields a rank of 5. Thus, (and is invertible), and . The least norm solution is then given by(20) 
where as before . Fig. 11 plots the circumferential distribution of temperature at measurement points close to the hub, midspan and casing. The red line in this figure corresponds to the least norm solution (obtained using (20)), the thick gray line corresponds to the standard least squares approximation with only the first two harmonics, and the black line represents the true temperature distribution.
While the least norm solution does not offer perfect signal recovery, it is able to match the measurements exactly at the measurement locations, and offers an acceptable approximation to the true temperature distribution. It should be noted that this heuristic works even if we alter the amplitudes of the four harmonics—even setting them to be equivalent.
4.3 A comparison of area averages
It is worth reiterating our motivation for approximating the full spatial temperature field. First, images such as those shown in Fig. 10 are useful to understand key engine modes and to identify spatial asymmetries. They are also very important for computing averages.
Recall the discussion in section 1 on averaging practices. For the engine measurements considered throughout this paper, we do not have massflow rate distributions to compute massweighted or momentumweighted averages. It is not uncommon to area average the temperature measurements and then compute an areatomass conversion factor, as shown in Fig. 12. This conversion is usually estimated by CFD on the engine component; a process that in itself has uncertainties well beyond the scope of this work. What we are concerned with here is the way the area average in Fig. 12 is computed.
When using the standard area averaging approach, each measurement is weighted by the sector area coverage associated with its probe (see page 8 in [skiles1980turbine]). Thus, no allowance is made to account for scenarios where the rake positions may capture only the peaks or the troughs of the wave forms. In fact, for the assumed profile considered in this section, the area weighted average technique yields a temperature of 526.20 K.
For integrating the approximated spatial field, consider the formula in (10). This can be written as
(21) 
where we have used the fact that the harmonic terms in (all terms except the first one), when integrated between 0 to , will become zero. Here corresponds to the first column of , which comprises of the constant terms of each of the Fourier expansions. As both approximations in Fig. 11, corresponding to and , are able to obtain the same value of the constant terms, their area averages computed using (21) are equivalent. We report an area average of 525.85 K, which is greater than the area weighted value, but equivalent to the profile average temperature value. Area average temperature comparisons across extracts in Engines A through E—between (21) and the sector weighted area average value—revealed differences between 0.5 and 2 K.
Conclusions
In this paper, we have developed a datacentric model for engine temperature measurements. Our model takes as an input the temperature values obtained from a few circumferentially placed rakes and outputs a 2D spatial temperature field. A key component of our strategy is an iterative approach for selecting frequency pairs (restricted in our investigation to the first two harmonics due to limits on the number of rakes) that introduces regularization to avoid solutions with large norms—manifesting as large overshoots between successive data points.
There is a compelling case to be made regarding area averaging using our strategy. Rather than computing an average solely based on temperature values and their positions, our framework fits a 2D spatial model to the data to estimate its average. The model need not necessarily capture all the spatial harmonics, but in some cases it can still deliver the true area average. Our investigation in this paper has also revealed the importance of rake positions and the assumed temperature values themselves. Future work will be aimed at studying the impact of uncertainties in these measurements.
Acknowledgements
The authors are grateful to Raúl Vázquez Díaz (RollsRoyce) and John Longley (Cambridge University). This work was funded by RollsRoyce plc; the authors are grateful to RollsRoyce for permission to publish this paper.
Comments
There are no comments yet.