1 Introduction
Noise, vibration and harshness (NVH) characteristics of a vehicle is an important criterion for validation during the development phase as it significantly affects the customer’s quality perception and the overall image of the vehicle. For automotive OEMs, a significant challenge is to continuously develop new methods for estimating the performance indicators and carry out large number of testing campaigns which pushes them towards leveraging simulationdriven design processes. During the vehicle development phase, the vibration response and acoustic levels can be determined using physicsbased numerical simulations considering the complex fullvehicle structuralacoustic computational models (refined 3D finite element models), which are usually time consuming. The highly uncertain behavior that arises from manufacturing tolerances, natural variability in material properties and conditions employed during the physical testing procedures [durand2008structural], leads to a challenging level of uncertainty from a modeling perspective. In such cases, methods based on timeconsuming physics based simulations are not relevant as no precise (unique) design nor the detailed information about the vehicle and the powertrain is available. As a result, a comprehensive framework for vehicle NVH assessment with domain expertise and experimental databases is needed along with fast computing models. Hence, the earlystage design aspects raise the following questions: First, how much information should be considered available to the designer in order to derive useful conclusions? Second, how can the available measurement databases be exploited quantitatively to provide the best relevant prior knowledge together with a measure of uncertainty in the outputs?
In light of these challenges, so called “metamodels” or “surrogate” models can be used to replace the computationally intensive simulation models or measurement data (retrieved through physical testing), using analytical relations between the design variables and system responses. Metamodels make it possible to quickly explore the design alternatives through parameterized calculations when the general relative performance of design alternatives is more important than a precise estimate. Conceptualized in 1974 by Blanning and popularized by Kleijnen as a “model of a model” [kleijnen1975comment], metamodels have been used as “cheap” yet robust approximations to get better insight into the functional relationships. Metamodelling techniques have been applied to many engineering disciplines such as crashworthiness, engine modelling, structural reliability, NVH and so on [kianifar2020performance]. In the context of automotive NVH, metamodels based on design of experiments were employed to achieve minimal piston noise [hoffman_robust_2003], response surface method was used to minimize the sound pressure level inside the cabin [zheng_design_2015]
, radial basis functions was used to reduce the vehicle mass with vibration constraints
[kiani_comparative_2016], and Park et al [park_efficient_2020] reduced the structureborne noise using Kriging surrogate models. Kuznar et al. [kuvznar2012improving] proposed a novel approach to learning regression models that are consistent with domainexpert knowledge in the form of quantitative constraints for aerodynamic windnoise. In several other studies, researchers compared and benchmarked different techniques as shown in [chen_multiobjective_2018, ibrahim_surrogatebased_2020]. A detailed review of different metamodelling techniques can be found in [wang2007review].In order to quantify uncertainties, deterministic metamodels are not sufficient and instead probabilistic metamodels are used where the output response is not just a point estimate but a probability distribution. This provides flexibility to evaluate design alternatives with a given level of knowledge and metamodel complexity. Monte Carlo (MC) simulation techniques have been primarily used by researchers for the probabilistic quantification of uncertainties. In the field of computational mechanics, several contributions have been made by researchers, which are detailed in
[soize2017uncertainty]. More precisely, in the automotive industry, Durand et al. built a nonparametric model to capture the variability in the booming noise prediction through random matrices [durand2008structural], Barillon et al. proposed a methodology to quantify the variability in booming noise and bodyinwhite [barillon2012vibro], and recently, Brogna et al. [brogna2018probabilistic] used Bayesian approaches with Gibbs sampling to model the global vibroacoustic behaviour. In this regard, Bayesian approach towards metamodelling suits well as it allows to include prior knowledge (based on domain expertise) about the parameters of the system under consideration.In this work, a global metamodelling technique is presented, which is based primarily on the parameters that are selected at early stages of the design process. sec_stochastic_metamodelling describes the proposed Bayesian metamodelling workflow with the relevant theoretical foundations. The application of developed methods to assess interior cabin noise in vehicles is described in sec_application_context. sec_model_convergence describes the model assessment with convergence statistics and the conclusions are presented in sec_conclusions.
2 Stochastic metamodelling for uncertainty quantification
2.1 Proposed workflow
Here, a general workflow for stochastic metamodelling is proposed taking into account the predominant physical laws. fig_workflow describes the different steps involved in the workflow. Operating point (OP) conditions are typically the client usage profiles or driving conditions that are collected and represented in the form of a distribution function. These OP conditions usually consist of vehicle speed and the wheel torque which marks the starting point of the metamodelling toolchain. Then, the laws governing the physical process is identified using “firstprinciples” (whitebox) and are combined with the lowfidelity datadriven regression fit representing the blackbox part of the model. This forms an additive functional (greybox) modelled using Generalized additive models (GAMs) [hastie2017generalized] of two (or multiple) predictor variables. The model is evaluated using fold crossvalidation. To quantify the uncertainties, selection parameters are used to refine the data pertaining to a specific vehicle type. This allows us to consider different dataset for the same model (for instance, the data used for will be different from the data used for ) on the basis of different selection parameters and the Bayesian learning method is applied on one such subdataset. Bayesian part of metamodelling is described in sec_Bayesian.
2.2 Generalized Additive Models
In this research work, Generalized Additive Models (GAMs) [hastie2017generalized]
are exploited for modelling the deterministic surrogates for aerodynamic noise with respect to multiple parameters. In this paper, a lowercase character denotes a scalar variable, a lowercase bold character represents a vector, and a matrix is expressed as an uppercase bold character.
Generalized Linear Model (GLM) relaxes the assumptions of a classical linear model by considering the monotone transformation of the mean response to be a linear function of the predictors and that the conditional distribution of the response belongs to a oneparameter exponential family of densities rather than just being Gaussian [ruppert2003semiparametric]. GAM is an extension to GLM with possibly nonlinear predictors where the smooth functions of predictors are considered in an additive fashion. Let us assume that the observations are given, where
is the response variable from the vector of observed responses,
, being the total number of observed samples and represent different predictor variables from their respective vector quantities with being the total number of predictor variables. The general form of the oneparameter exponential family of densities is given by , where and determine the respective distributions. The exponential family includes many distributions for practical modelling, such as the Gaussian, Poisson, binomial, and gamma.The additive problem can be formulated for a particular observed point as
(1) 
where represents the fitting error and the model error, which is supposed to follow . Considering to be the intercept (or bias) term, the conditional expectation of , denoted as depends on the predictor variables and is given by,
(2) 
where denotes a series of smooth functions of the predictor variables . Each of these functions can be approximated with finite basis expansions such as:
(3) 
where are the basis expansion functions, the unknown parameter values, and the total number of bases in each function.
Most commonly, these unknown functions and are fitted using penalized splines [ruppert2003semiparametric]. For this study, the focus is on considering the whitebox knowledge available in the form of analytical equations. The function models these known analytical relations. Therefore, two different deterministic formulations for
are considered depending on the choice of basis functions. Also, it is important to build models where the parameters have physical sense and are interpretable which in turn allows more control and flexibility over the output response. Therefore, parametric surrogate modelling approach is investigated, where we have a fixed number of parameters that do not grow with the size of the input data set. Such parametric models make stronger assumptions on the nature of data distribution and are generally faster than the nonparametric models which are more flexible but often computationally intractable
[murphy_machine_2012].
Deterministic model 1: with polynomial basis functions
The first model proposed is with polynomial bases due to their simple structure that comes at a low computational cost where overfitting issues can also be dealt with depending upon the particular order [myers1989response]. This can be represented as(5) where is the th polynomial coefficient and represents its order. Although simple enough, one limitation with polynomials is that they are global functions of the input variable (as they are nonzero over an infinite region) which limits their use when the input space changes.

Deterministic model 2: with Gaussian basis functions
(6) (7) where is the amplitude, governs their location in the input space and controls their spatial width. Gaussian basis function is proposed as it allows better control with its location, width and scale parameter.
2.3 Deterministic model validation
The accuracy of the trained or identified deterministic model when it encounters the unseen data can be validated using fold crossvalidation (CV). The idea is to divide the input set into subsets randomly. Then the metamodel is trained times and each time one of the subsets is left as a test set. On this test set, the training error measures are evaluated, for instance, the coefficient of determination, which is given by
(8) 
where is the observed th datapoint, is the predicted value and is the average of the observed response. If the value for the th fitted subset is given by , then the expected generalization error can be computed as follows [friedman2017elements]:
(9) 
To find the best value of
, the ‘biasvariance’ tradeoff needs to be considered. The value of
depends on the size of the dataset [friedman2017elements] and should be chosen such that the samples in the training set and the test set are representative of the larger dataset. A very common choice is to use or as they show lower variance [breiman1992submodel, forrester2008engineering, friedman2017elements]. Another approach is to consider , where is the size of the dataset. With this approach, each observed datapoint is given an opportunity to be in the holdout test dataset. This approach, also referred to as leaveoneout crossvalidation, as one could imagine, is computationally expensive if the analysis is done on a huge dataset where a single model evaluation takes hours. Hence, fold CV (with and ) is the preferred method of deterministic model assessment in this study.2.4 Bayesian framework for uncertainty quantification (UQ)
To quantify and better control the nondeterminism in the responses, in the context of finite element analyses, a distinction has been made in the literature between possibilistic (nonprobabilistic) approaches and probabilistic approaches [faes_recent_2020]. Probabilistic metamodels and particularly, Bayesian approaches have garnered a lot of attention due to the flexibility they provide in choosing the plausible design alternatives based on priorknowledge which imposes an inherent regularization. For instance, Bayesian framework was proposed for force reconstruction [zhang2012bayesian], inverse acoustic problems [pereira2015empirical] and for designdriven validation approach [chen2008design].
In the Bayesian formulation, considering the GAM equation (4) and let
denote the “prior” probability density function (pdf) of the parameters
, and let denote the “full posterior distribution” of the parameters conditional on the observed (measured) data. Bayes’ theorem provides an analytical relation to compute this posterior distribution
[bolstad2009understanding, gelman2013bayesian]:(10) 
In the equation above,

is called the “likelihood” of observing the data,

the marginal distribution, is the “evidence” of the observed data.
The aim here is to compute , but as the evidence scales with the parameter space, the closed form of this integral is not always available which leads to its intractability. Nevertheless, only the shape of the posterior distribution is needed for sampling the parameters (up to a proportionality constant) [bolstad2009understanding]. The shape of the unnormalized posterior density is given by:
(11) 
Stochastic simulation methods (Monte Carlo methods) are used to sample from this unscaled distribution [bolstad2009understanding] which approximates the true distribution provided that the sample size is large enough.
However, such algorithms remain inefficient for highdimensional parameter space [bolstad2009understanding]
. Therefore, another class of methods called Markov Chain Monte Carlo (MCMC) methods
[hastie2017generalized] which is based on sequentially simulating draws from a Markov chain^{1}^{1}1A Markov chain is defined as a stochastic process where the conditional probabilities of the process at time
given the states at all the previous times only depends on the single previous state at time [bolstad2009understanding]. are used in this study. The idea is to let the Markov chain run for long enough until it converges to a limiting (or stationary) distribution. Samples drawn after this initial runin (also called “burnin”) time are the draws from the approximated target posterior distribution [bolstad2009understanding]. MetropolisHastings sampler and Gibbs sampler are typical examples of such methods but according to [neal1993probabilistic], these samplers can become extremely timeconsuming as they explore the parameter space via inefficient random walks.In this study, NoUTurnSampler (NUTS) is preferred, which is based on Hamiltonian Monte Carlo (HMC) (also called as Hybrid MC) due to its ability to adapt the tunable parameters of HMC i.e. stepsize and the number of steps [hoffman2014no].
2.5 Parametric Bootstrapping
Bootstrapping is a databased simulation method for statistical inference which was introduced in 1979 [efron1979bootstrap]. In principle, there are two types of Bootstrapping methods mentioned in the literature: parametric (based on bootstrapping the inputoutput pairs) and nonparametric which is based on bootstrapping the residuals [efron_introduction_1993]. In this work, parametric approach is used as the model has limited parameters that need to be estimated. Such method is useful when the estimator is a complex function of the parameters [murphy_machine_2012].
Let denote the statistical estimate of the parameter . Here, the bootstrap approach is used to assign measures of accuracy to such statistical estimates. The bootstrap algorithm works by randomly sampling values with replacement from the observed data (), also called as bootstrapped samples , followed by the evaluation of the corresponding bootstrap replications . The sample variances computed from these bootstrapped samples are the estimators of the variances of the parameters. The point estimates are obtained using the nonlinear leastsquares approach, where the expected value is given by the deterministic models shown in sec_model_formulations. Parametric bootstrap algorithm implemented in this work is as follows:

get a point estimate from the observed data (
here can be the selected data points after considering the categorical variables, defined by the nuisance parameter
) 
repeat for

simulate variables , and

simulate data ,

estimate from ,


get an approximation of from the histogram of
3 Application context: Interior cabin noise level assessments
3.1 Physical mechanisms
The overall noise experienced inside the passenger cabin is a relative contribution of different sources coming from the engine, wheels, powertrain, and wind. At different vehicle speeds, some of these are more dominating than the others. Especially in electric vehicles (EVs), due to the absence of engine’s masking noise, the tonal whine from electric motors is one of the most dominant sources of noise. Regardless of the type of propulsion, two background noise sources, which lie in the broadband frequency regime contributing towards the masking effect, are generally present during the real driving conditions:

Vehicle aerodynamic noise: At higher speeds around 100 kmph, aerodynamic noise remains a dominant source of noise and discomfort [oettle_automotive_2017]. Three types of mechanisms associated to the aerodynamic noise can be found in the literature [cerrato2009automotive, oettle_automotive_2017]. As shown in fig_aerodynamic_noise_generation_mechanism, it is characterized by different types of sources that depend on the vehicle speed (or the flow speed) and the experimental conditions (tightly sealed vehicle, crosswind, etc.). In this study, the flow speed considered is 140 kmph and 200 kmph, which is same as the conditions used during windtunnel testing. For such high speeds, Mach number, and hence the dominant source of noise are the dipole types of sources where the sound intensity, , where is the flow speed [oettle_automotive_2017]. This priorknowledge based on first principle will be exploited to develop deterministic surrogate for aerodynamic noise. This paper is largely based on the analysis of this type of noise.
Figure 2: Sources of vehicle masking noise due to wind (a) and due to tireroad interaction shown in (b) 
Interior tireroad noise: Physics behind the interior noise generated due to tireroad interaction is relatively more complex as it involves numerous design parameters. The structureborne contribution (dominant at low frequencies Hz and low speeds) comes from the induced vibrations through sources such as tread impact and rolling deflections. Airborne noise (dominant at higher frequencies kHz and higher speeds) is directly propagated through the medium due to airpumping, airturbulence and Helmholtz resonances [li2018literature]. fig_tire_road_noise_image shows various tireroad noise contributing factors.
3.2 Model formulations
The rest of the paper considers the following convention. is the sound pressure level (SPL) as a complex function of frequency , is the SPL record available from database, is the predicted SPL, the vector of predictor variables, such as operating conditions, is the vector of parameters of a given submodel, and denotes the fitting error, including both model errors and experimental noise.
Referring (4) and choosing the speed and the frequency to be the predictor variables,
(12)  
where and . Therefore, two deterministic nonlinear mappings and are formulated for aerodynamic wind noise considering and (the superscript in [] refers to two different models). As per [mat_ali_wind_2018], for dipole types of sources, is considered in the following equations,
(13) 
(14) 
Similarly, for tireroad noise, one physically informed model is:
(15) 
3.3 fold Cross Validation
For the analysis, two different vehicle bodytypes namely, Sedan and Hatchback are considered. The values of , which is the number of parts the dataset is divided into, are chosen to be 5 and 10, with a total of 1000 runs carried out to analyze the model assessment process.
Sedan  Hatchback  

0.89  0.83  0.91  0.84  
6.39e04  0.0012  6.08e04  0.0017 
Histograms of the fold CV analysis for two different vehicle body types with polynomial basis functions can be seen in fig_Histogram_kCV_vehicle_body_type_Sedan_Hatchback and summary of its mean and variance is shown in tab_comparison_of_kCV. It is observed that if the training dataset is divided into 5 parts, then the CV accuracy is with a very small variance.
3.4 Data selection for UQ
An important aspect of stochastic modelling is to be able to capture the inherent variability in the data. For instance, large dispersion in the measured database can be seen in fig_SPL_aero_data_measurement_onlyForSedan_showingLargeVariance_NoYticks which corresponds to a large number of vehicles tested, depending on operating conditions. This being stated, real engineering practice is to refine this database and select the data corresponding to the specific category of vehicle under consideration. Therefore, categorical variables are defined, that are used as selection parameters in the database. For instance, for aerodynamic windnoise, the database is filtered as per one particular: 1) body design, 2) segment, 3) energy, 4) target market, rooftype, 5) state, and 6) measurement position. fig_SPL_aero_data_measurement_withCategoricalVar_NoYticks shows the refined database on the basis of categorical variables which is used for the Bayesian learning approach in this paper. This particular choice of categorical variables will define the model .
3.5 Bayesian hierarchical models for UQ in
In this section, the two models, given by (13) and (14), are formulated in the Bayesian context. It is assumed that the observed data in the measurement set, given by
in dB, is distributed according to Normal distribution, which belongs to the exponential family of distribution as mentioned in sec_GAM, with mean
and variance. Also, all the unknown parameters (noncategorical) are modelled as random variables with a prior probability distribution. Choice of prior distributions here are supposed to be in line with the objective knowledge the analyst has on the parameters before observing the data (through domainexpertise, physical laws, etc.). If too little objective information is available, noninformative priors can be chosen. Also, for the sake of simplicity, priors that are conjugate to the likelihood are preferred, refer
[brogna2018probabilistic, gelman2013bayesian, bolstad2009understanding].fig_Bayesian_Hierarchical_models provides a way to represent full joint probability of the random variables in the form of a graph using the assumption of conditional independence. Let us say that there are two events and , then they are conditionally independent given , if and only if we can express . These graphical models display the independence relationships and are defined in terms of directed acyclic graphs. The nodes in the graph are modelled as random variables and edges encode their relationships [jordan_graphical_2004]. Also, the plate notation is exploited for capturing the replication in graphical models, which indicates that the data is independently and identically distributed (i.i.d) [murphy_machine_2012]. The observed or deterministic variables (here, the measurement data represented as the likelihood function) are shown as shaded region and the unobserved random variables are represented as unshaded circular regions. The arrows on the edges, for example , indicate that is causing or influencing
. Moreover, the hyperparameters are not influenced by any other parameter in the hierarchy. Each distribution is characterized by its own hyperparameters, which are deterministic values and are chosen by the analyst, for example, a Normal distribution can have two hyperparameters:
for location and for scaling. The higher we move up the hierarchical model, lesser is the impact of that parameter on the likelihood of observing the data and on the inference. If there’s not much information available about the hyperparameters, then noninformative hyperpriors can be assigned to them. In fig_Bayesian_Hierarchical_models,is the hyperprior for the random variable
and is characterized by its own hyperparameters [gelman2013bayesian]. For the sake of visual clarity, we have not shown hyperparameters in these graphs.Considering to be the total number of measurement data available and be the total number of parameters to learn, the two Bayesian models are formulated as

Bayesian Model 1 (BM1) with polynomial basis functions
(16) 
Bayesian Model 2 (BM2) with Gaussian basis functions with heteroscedasticity
(17)
4 Model convergence and results comparison
Bayesian computational statistics allows us to approximate the posterior distribution and solve complex Bayesian models which could have been, otherwise, mathematically intractable. In order to make sure that the samples drawn are independently and identically distributed, several visual and numerical diagnostic tools have been developed, for relevant metrics refer [martin2022bayesian].
The two Bayesian models formulated in (16) and (17) are specified using an opensource library PyMC3 [salvatier2016probabilistic], through which NUTS sampler can be used that automatically adapts the stepsize parameter and is efficient in generating independent samples [hoffman2014no]. For each model, 4 chains were simulated with 10,000 samples in each. The burnin samples were set to 2,000 to let the chain converge to its stationary distribution.
MCMC convergence results for both the Bayesian models simulated with 4 chains and 10,000 samples each. Left columns of both the subfigures are the kernel density estimate (KDE) plot corresponding to each unobserved random variable and the right columns consist of the rank plots
The convergence statistics are shown in fig_mcmc_convergence_statistics. It shows that the marginalized posterior distribution of the unknown parameters in the model. Moreover, rank plots can be observed, which are the histograms of the ranked posterior samples plotted separately for each chain [vehtari2021rank]. If all chains are drawing from the same posterior (i.e. there’s good mixing), the ranks in each chain should be uniform. Chains in BM1 show good mixing behavior as compared to the BM2 where the rank plots for some parameters deviate from uniformity. This is due to the fact that there exists a bimodal marginal posterior distribution for some parameters.
A very commonly used numerical diagnostic is the potential scale reduction factor or GelmanRubin statistic denoted by . It is computed for a particular parameter
as the ratio of the standard deviation of all the samples of
from all chains, to the root mean square of the individual withinchain standard deviations [gelman2013bayesian].It is expressed as
(18) 
where denotes withinchain variances and is the marginal posterior variance of the parameter given by
(19) 
where is the total number of draws per chain and is the betweenchain variance (refer [gelman2013bayesian, vehtari2021rank] for detailed formulations). Ideally, as the variance between the chains should be same as the variance withinchain.
For BM1, for all the parameters in the model which indicates that the chains are mixed well and the draws are from the same posterior distribution. For BM2, the number of parameters are significantly higher than model 1 and which again denotes that the chains converged well and the samples are not much correlated.
Bayesian credible interval
The posterior predictive distribution, which is the distribution of the potential or future data is according to the posterior distribution of the parameters . This is the model’s prediction after seeing the observed data . It is given by
(20) 
where is the vector of nuisance variables that characterizes a particular model, for instance, model described in sec_data_selection_for_UQ.
The posterior predictive distribution plots are shown in fig_posterior_pred_distribution. It can be seen that BM1, which has significantly less number of parameters, do not capture some intricate peaks in the model. On the other hand, with BM2, the heteroscedasticity allows the model to capture physical phenomenon quite well. Indeed, this comes at a cost of higher number of parameters and it requires more time to draw samples from the joint distribution of the parameters. Depending on the final use of the metamodel, BM1 or BM2 will be preferred.
The developed models should not be too primitive that they miss or fail to capture the valuable information in the data nor too complex that they fit the noise in the data. Therefore, computing the outofsample predictive accuracy for the fitted Bayesian models becomes essential. This is also called as generalization error, which is a measure of how well the model performs when it sees the data not used to fit it. Crossvalidation and information criteria are the two methods for estimating this outofsample predictive accuracy. In this work, Bayesian LeaveOneOut cross validation (LOOCV) [vehtari2017practical] is used, where the holdout dataset consists of a single datapoint. The expected log pointwise predictive density (ELPD) is given by,
(21) 
where represents the th datapoint and indicates all datapoints except th. However, computing (21) is costly since is not known a priori. To circumvent this, (21) is approximated from a single fit using Pareto smoothed importance sampling LOOCV (PSISLOOCV)[vehtari2017practical]. The idea is that the observations are assumed to be conditionally independent so that (21) is approximated using normalized weights as
(22) 
Weights are computed using importance sampling where
(23) 
To avoid overshooting the variance of importance weights, a smoothing procedure is applied using the generalized Pareto distribution. The estimated parameter of the Pareto distribution allows to detect highly influencial observations. These are the observations that have a significant effect on the posterior distribution when they are considered in the holdout set. For well specified data and models, the value of remains low (), according to [vehtari2017practical, gabry2019visualization].
tab_comparison_of_LOO summarizes the LOOCV approach for the two Bayesian models. Rank denotes the ranking for the two models on the basis of LOO value. Penalization term gives an indication of the total number of effective parameters in the model and standard error is the error of LOO computations where a lower value is preferred
[vehtari2017practical]. Clearly from tab_comparison_of_LOO, BM2 with Gaussian basis function and heteroscedasticity is preferred over BM1 due to its lower error and better LOO value. Moreover, from fig_LOO_CV_khat, we notice that for all the datapoints in BM1 have a very low value (), whereas for BM2, there are 0.4% of the data points that lie above the threshold 0.7. These are the same points at 4 kHz, 5 kHz, and 6.3 kHz, that the model is not able to explain due to their outlier behaviour.
Rank  LOO  penalizationLOO  Standard error  

Gaussians  0  1259.2  40.7  22.6 
Polynomials  1  1367.6  5.2  24.1 
For interior tireroad noise, parametric bootstrapping algorithm was implemented since the likelihood function given by (15) is accurate only up to 84% (as per the value). The parametric bootstrap algorithm is explained in sec_parametric_bootstrapping which considers (15) to generate simulated data from the point estimates. fig_parametric_bootstrap shows the prediction on randomly test data set for multiple inputs (for eg, at speeds 50 kmph, 70 kmph, 90 kmph) and for one particular test speed. The histograms for the estimated parameters can also be seen in fig_tire_Bootstrapping_parameter_Histogram. As one can notice, such an algorithm belongs to nonBayesian category. The accuracy of the model depends on the number of bootstrapped samples and the datagenerating mechanism. The intricate peaks in the data are not captured which indicates that the model can further be refined. Bootstrap algorithm is simple and easy to implement, however Bayesian approaches give more control for the considered industrial application case and provide more mathematically robust formulations and diagnostic tools.
line with shaded region being the 95% confidence interval
5 Conclusions
In this research work, Bayesian metamodels are developed for vehicle broadband noises such as aerodynamic wind and tireroad interaction noise using measurement databases. For aerodynamic noise, two different Bayesian models are proposed which not just consider the available data but also rely on physical laws. The model parameters are interpretable where the domainexpert knowledge is exploited to define priordistributions on them. The deterministic models are validated using fold cross validation with good accuracy and the Bayesian model assessment is done through various MCMC diagnostic tools. It is noted that, despite being computationally intensive, model with Gaussian basis functions produces better results, in terms of the distribution shape and LeaveOneOut cross validation statistic. However, for broadband masking noises overall estimators, detailed modelling is not usually desired and therefore BM1 can be used for fast computations. For tireroad noise, a nonBayesian approach was tested and it was observed that Bayesian approaches provide more flexibility and control over the predicted responses.
6 Acknowledgements
This research work is funded by the European Commission’s H2020Innovative training network (ITN) under the project ECODRIVE (Grant Nr. 858018) and we gratefully acknowledge the support of OpenLab VibroAcousticTribology@Lyon, Laboratoire Vibrations Acoustique (LVA), INSA Lyon and the NVH department of Stellantis N.V.