I Introduction and Overview
Construction of realtime or near realtime gravitational wave surrogates based on a small but carefully selected set of known waveforms has over the last years become a discipline on itself in gravitational wave (GW) science, for a review see [1]. These surrogates are based on a fiducial or underlying model, which can be obtained from full numerical relativity (NR) solutions to the Einstein equations or physical approximants such as Effective One Body (EOB), Phenomenological (Phenom) or PostNewtonian (PN) ones, as physical examples. In the reduced basis (RB) [2, 3] approach a set TS of fiducial waveforms is used to build a compact basis which approximates it with high accuracy. In some but not all cases the approximation error is smaller than that one of the fiducial waveforms themselves, it is in this sense that we refer to the surrogate model as being essentially indistinguishable from the fiducial one. For NR the error of the fiducial model can refer to truncation error when solving set of elliptichyperbolic equations that constitute Einstein’s theory, and in the case of approximants to the systematic error present due to physical approximations.
One approach for building surrogates, which as of this writing is the one commonly used when NR is the fiducial model, is through the following three steps [4]:

A sparse reduced basis with elements is chosen by a special selection of waveforms from TS, of size , through a greedy algorithm, until a sufficiently small userspecified tolerance error is met. Then, the number of instances is reduced from waveforms to with, in general, .

A sparse subset of time (or frequency) nodes is chosen using the Empirical Interpolation Method (EIM), from time samples to . The number of EIM nodes is by construction the same as the number of greedy parameter ones:
. This therefore achieves a second compression, now in the time (frequency) domain, on top of the compression in parameter space from the RBgreedy first step, leading to an overall compression factor of
. 
The previous two steps provide a representation of known waveforms. The next goal is to build a predictive surrogate model. That is, one which can produce new waveforms not present in the training set TS. This is achieved performing regressions on waveform values at each EIM node. The newly predicted waveform is then obtained at all times using the EIM. For a summary of all these steps combined see Figure 1.
With steps 12 one can achieve very low representation errors. This leaves the third step as the principal source of error in the pipeline outlined above (see Section II.2), which motivates us to search for methods that contribute the less possible to the error of the surrogate models. Regression is usually performed using polynomials through interpolation or least squares. Even though polynomials approximations are a seemingly natural choice for which there is a vast amount of theory, they could suffer from several drawbacks in high dimensional problems, such as the curse of the dimensionality [5, 6, 7] and Runge’s phenomenon [8], the latter becoming even more relevant when data are sparse and a tradeoff between stability and precision is needed [9, 10]. Then, it is worth looking at nonalgorithmic (probabilistic) approaches to regression. Here we are interested in approaches from Machine Learning (ML), for which there is a huge diversity.
AutoML [11]
is a research and development area of Artificial Intelligence (AI) that targets the endtoend automation of the machine learning process. It allows the use of ML techniques of any complexity in order to solve problems without full expert knowledge. AutoML intends to automate the learning workflow in all its stages: data preparation (e.g. feature extraction, transformation of non structured data to tabular form), feature engineering (missing and null data imputation, feature selection, feature generation), model selection and tuning (hyperparameter optimization), metric generation, data and result visualization.
In this work we analyze the possibility of using AutoML to drive the selection of sufficiently accurate regression approaches from options. We restrict our study to the case of regression at EIM nodes as needed when building surrogates. The physical scenario used as testbed is that one of the GWs emitted by the collision of two black holes initially without spin and in quasicircular orbit, as predicted by numerical relativity. We used DataRobot [12], an AI platform which provides an AutoML pipeline for automation of all steps described above along the usual ones such as training, validation and testing. We address feature engineering before automation through dimensional reduction techniques. The idea is to build synthetic features which might further improve the learning process.
We find that out of
set of available models, the most accurate surrogates are built based on three approaches: Gaussian Process Regression (GPR), Support Vector Machines (SVM) and Symbolic Regression through Genetic Programming (Eureqa).
This article is organized as follows. In Section II we discuss the physical and computational setups for our study, and in Section III the AutoML flow used. Section IV presents our results. Appendix A lists the blueprints for AutoML used in this work and Appendix B describes the construction of our reduced basis and empirical interpolant, both of which are used to build features for AutoML. We close with some remarks putting this work in perspective in Section V.
Ii Setup
ii.1 Fiducial Models
The physical system used as testbed in this article consists of waveforms emitted by the collision of two black holes initially without spin and in quasicircular orbit, in the time range M, where M is the total mass and the waveforms are aligned so that corresponds to the peak of the amplitudes of the GWs (approximately, the time of merger of the two black holes). This setup corresponds to orbits before merger. Due to the scale invariance of GR, the only relevant parameter is the mass ratio – with the mass of each black hole – here chosen to be in the range . For definiteness we restrict our studies to the dominant multipole angular mode.
As points of comparison we use two almost equivalent fiducial models for our physical scenario:

A RBEIM surrogate built in [13] following the prescriptions described in the previous section, from those NR simulations, using polynomial least squares for the regression step. The model agrees after a simulationdependent time shift and physical rotation with the full NR results within numerical truncation errors with the advantage of being fast to evaluate for any value of . We refer to the associated waveforms as polynomial leastsquares or leastsquares (for short) waveforms, and denote any of them as . We compute them using the publicly available Python package GWSurrogate [15].
In this work we present an automatic construction of a myriad of waveform surrogates using the three steps mentioned in the previous section but with a large number of ML regression approaches through AutoML. We generically refer to them, for short, as our ML surrogates, and the associated waveforms as . There are several steps in this exploratory study in which the fiducial model needs to be a posteriori evaluated for any value of as many times as needed, in a very fast manner. One of these steps is in the testing process of the AutoML flow. Since we use a Cross Validation (CV) approach with a small and sparse training set, being able to do a thorough testing of the models with highest CV scores allows us to see if there are any correlations between those scores and test ones (we find that, indeed, there are). Next, for this study we need to evaluate the errors of ML models with respect to a fiducial reference. For these reasons we use the leastsquares surrogate as fiducial model to build our ML ones. Not withstanding, we also measure the errors of ML models compared to numerical relativity waveforms.
ii.2 Feature engineering: Reduced Basis and the Empirical Interpolation Method
Instead of training a complete regression pipeline with features coming from discrete waveform values – something that would pose a two dimensional problem for each waveform component, we build new features through dimensional reduction using the RB and EIM approaches. Here a waveform is thought of as a parametrized function of time , the parameter is the mass ratio of the black holes.
In this context, the goal of the RB method is to build from a training set TS of waveforms a basis able to span all training elements within a given tolerance (in this study, machine precision). This is done by iteratively selecting from the parameter space a hierarchical set of greedy points labeling those waveforms that will end up forming the reduced basis. The waveforms chosen to conform the basis are orthonormalized for conditioning purposes. Therefore, an RB expansion has the form
where , given some inner product, is the projection coefficient of onto the th basis element. The number of basis elements can be controlled by specifying a tolerance for the representation error of the expansion. Lower tolerance means adding more basis elements to the actual basis to meet that precision.
On a second stage, the EIM allows us to build from the reduced basis an empirical interpolant of the form
(1) 
which can also approximate any waveform in the training set with high accuracy.
The are the EIM nodes, a set of points especially selected from the time domain through a greedy procedure in the EIM. The functions satisfy for all and are computed from the EIM nodes and the reduced basis. Later, both greedy and empirical samples (in parameter and time, respectively) play a prominent role in our regressions. A more extended review with applications to GW surrogates can be found in [1].
Additionally, we decompose waveforms into phase and amplitude
each of which has less structure than the corresponding bare complex waveforms , therefore facilitating the learning process. In this manner, we build a set of functions
(2) 
composed by phase and amplitude functions of evaluated at the EIM nodes that will serve to select the relevant features for regressions.
In the regression step we build surrogates for each and , where labels the corresponding EIM node . We denote them as
(3) 
In the next section we discuss the selection of points to conform the training set for this step.
Inserting regressions for phases and amplitudes into the interpolant formula (1) leads us to the assembly of the final surrogate:
(4) 
In the previous section we remarked that the regression step is the most important source of error in building predictive models once the RBEIM representation is accurate enough. We can see this from the next inequality for discretized time [4]:
The first term in the r.h.s. is the error associated to the RBEIM approximation, which we ensure is small enough (at the level of machine precision) for all practical purposes. The second term corresponds to regression and is the leading source of error.
ii.3 Training method for regressions
The selection of a suitable training set TS for phase and amplitude regressions comes from a compromise with data availability. Indeed, when the surrogate is to be built from NR, the number of available simulations is very limited due to the computational cost of numerical evolving the full Einstein equations.
As an example, the first NR GW surrogate (nonspinning binary black holes, which is the test case discussed in this article) used an EOB approximation to select the greedy points in parameter space, and afterwards full NR simulations were ran for those values. That is, there was no training set of NR waveforms to build the basis but, instead, one based on EOB. Similarly, for the full 7dimensional case, as of this writing NR waveforms are used as training set to build a reduced basis with a compression factor smaller than . So, as a worse case scenario one might consider that only the reduced basis waveforms are available. In other scenarios, where it is not so expensive to produce those waveforms, such as in EOB, Phenom or PN approximations, one might have a dense training set to start with.
Following the above worse case rationale, we choose to conform the TS with phase and amplitude values at greedy points and EIM nodes obtained from the application of the RB and EIM algorithms to the TS. We summarize the results from this step in Appendix B.
The application of these algorithms using a tolerance equal to machine precision () gives us a total of greedy parameters and the same number of EIM nodes. In consequence, the training set for regressions takes the form
(5) 
where denote the greedy parameter points and labels the EIM nodes. In other words, to obtain a single surrogate model (4) the phase and amplitude regressions (3) as functions of need to be calculated at each .
There are two possible data partitioning approaches for training an AutoML pipeline and comparing regression methods: i) train, validation (TV) and ii)
fold cross validation (CV). A test set is also needed for either approach, which is never seen in training stages and is used to give a final estimation of models’ accuracy for unseen samples. In our case it has data corresponding to 500 waveforms.
With TV, the entire dataset is split into three parts: training, validation and test. ML models are trained in the first set and validated in the second one. In general, this strategy is well suited for large datasets for which there is less bias induced by a trainvalidation split. On the other hand, leaving apart the test set, fold CV splits the data into smaller subsets (folds) and models are trained times in folds with the remaining one left for validation. At the end of the process, all data points are used for training, thereby reducing bias.
Since our datasets are small, we choose a special case of fold CV as our training strategy: leaveoneout crossvalidation (LOOCV). In this case, the number of partitions equals the number of training samples. Since each regression dataset consists of points, we set .
ii.4 Surrogates assessment
In Section IV we compare our MLregression surrogates with fiducial waveforms . Here, is a numerical relativity or a leastsquares waveform . For those comparisons we use the relative error,
(6) 
where
For computation of integrals we use Riemann quadratures with M.
Iii AutoML flow
DataRobot (DR) is an AutoML platform that applies “best practices” in the ML process leveraging a high level of automation to build efficient ML models. For modeling, DR uses special pipelines called blueprints. Each blueprint maps inputs to predictions by concatenating a ML model with a preprocessing step. For the datasets used in this study DR uses
different ML models which come from open source and proprietary algorithms such as Eureqa
[16], H20 [17], Keras
[18], LightGBM [19], ScikitLearn [20][21], Vowpal Wabbit [22], and XGBoost
[23]. These models are combined with several preprocessing methods to finally conform the blueprints described in Appendix A. This, and the possibility of creating ensembles of different models, give us a wide spectrum for AutoML exploration. We point out that many of the underlying ML algorithms used by DR are opensource and available, for example, in ScikitLearn.Data Partitioning.– The first stage is to split data into train and test partitions. The test partition is used to compute a final estimation of models’ generalization. The second stage consists in splitting the train partition into folds for subsequent application of fold CV. Recall that in this work (LOOCV).
Below we illustrate how the partitions are arranged in a generic fold CV approach.
Training and hyperparameter tuning.– In an AutoML flow we can trace at least three optimization levels. The first optimization relies on selecting the best set of hyperparameters for each model, that is, parameters that control the behavior of the ML algorithms. In a Deep Neural Net, for example, the hyperparameters correspond to the number of layers (depth), the number of units per layer, the learning rate and penalizing constants, just to name the most relevant. This process is called hyperparameter tuning. Completed the first level, the next step is to train each model with the optimized hyperparameters. The final step is to rank all models following some metric, in this study, Root Mean Square Error (RMSE).
For this work, hyperparameter tuning is performed through grid searches using fold CV over training datasets. Below we illustrate the process for a grid of samples. A dataset is divided into partitions and for each hyperparameter tuple corresponding to some ML algorithm, models are trained, each of one using different partitions. As usual, predictions are computed for each model in the remaining partition.
The remaining folds cycle through the trainvalidation process ( times, validation scores) and then a final CV score is computed as an average. At the end, models are prepared for final training with the entire folds’ data by selecting the best hyperparameters for it.
Model scoring.– All models are ranked according to their CV scores. This gives us a list of the most accurate ones with respect to the train partition. Finalized the training stage, the test partition is used for models assessment on unseen data.
Iv Results
We now show the results for regressions as described in the previous two sections. We perform fits along waveforms phase and amplitude values at each EIM node by only using a sample in the parameter space corresponding to the RB greedy values of at that node for training. Then we evaluate and rank their accuracy according to their LOOCV and test RMSE scores to choose the ones that perform best along training and test datasets, respectively, for each phase and amplitude groups.
From the AutoML perspective and this work each blueprint consists of an ML algorithm and a set of preprocessing steps. A model, in turn, is the output of training a blueprint. To have a short terminology for each blueprint we use BP 1, BP 2, …, BP 87, in decreasing order according to the accuracy of their corresponding waveform surrogates, we refer to Table 1. For accuracy we use the maximum RMSE over test values of . In Appendix A we list all the blueprints used.
We use the DR Python API [24] to perform our regressions and the Python language for surrogates construction and validation.
iv.1 Regressions
We bench the performance of phase and amplitude regressions over the TS by computing the mean LOOCV for each blueprint, and for each phase and amplitude groups of regressions. For instance, in the case of phase regressions, we have phase models (one per EIM node) per blueprint, each of which has its own LOOCV score. This gives us LOOCV scores that we average to quantify the overall performance of that blueprint across those phase regressions. This is done for each blueprint inside the phase group of regressions. We also replicate this process for amplitude regressions.
The results are shown in Figure 2, from which one can see that there is a predominance of Gaussian Process Regression (GPR) models, ranked in first places. In particular, for phase and amplitude there is the same model (BP 2) ranked in first place, it is a GPR model with Matérn kernel (see Appendix C).
ranked models for phase and amplitude datasets in LOOCV, sorted by mean LOOCV score. Mean values (black dots) are displayed along standard deviations (red lines) of LOOCV scores per BP. Note: Eureqa models are not included in this ranking since the AutoML platform used does not give LOOCV scores for that particular model for technical implementation reasons.
Figure 3, in turn, shows the equivalent results but now using the test datasets to compute the RMSE scores. An observation is that blueprints 16 appear among the highest ranked ones both with respect to CV and test scores (with the exception of BP 4, Eureqa, for the reason explained in the figure). This accordance cannot be related in any way to the ML methodology employed, since the CV score is calculated with the training dataset and the test score is independent of the regression model creation process. Furthermore, as we will see below, the models associated with these blueprints are also the ones with highest scores with respect to a completely different error metric, given by Equation 6, for the whole surrogate waveforms.
iv.2 Waveform ML surrogates
Max error rank  Max  Median  Mean 

BP 1  2.84e07  6.71e08  9.44e08 
BP 2  2.86e05  4.92e11  3.19e07 
BP 3  3.64e04  2.90e11  3.90e06 
BP 4  4.68e04  3.15e05  3.35e05 
BP 5  1.50e03  2.17e05  7.58e05 
BP 6  1.08e01  4.13e03  8.33e03 
BP 7  4.39e01  1.73e01  1.79e01 
BP 8  7.72e01  7.05e02  1.02e01 
BP 9  9.16e01  3.66e01  3.81e01 
BP 10  9.89e01  2.10e01  2.28e01 
BP 11  1.01e+00  3.31e02  5.17e02 
BP 12  1.08e+00  2.14e01  2.47e01 
BP 13  1.10e+00  3.03e04  8.76e02 
BP 14  1.10e+00  3.03e04  8.76e02 
BP 15  1.10e+00  8.26e02  1.87e01 
We have so far analyzed the ranking of the most accurate blueprints for regression of the phase and amplitude datasets according to LOOCV and test RMSE scores. We now turn to how well different blueprints perform on the resulting complex surrogate waveforms themselves, since those are the ultimate functions of interest.
For this purpose we constructed ML surrogates as described in Sections I and II.2. For each surrogate we then computed the relative waveform errors defined by Eq. (6) using the test values of and leastsquares surrogate as fiducial model. The results are summarized in Table 1 for the
models with highest accuracy. Out of them, the first six correspond to blueprints of three families: Gaussian Process Regression (GPR), Eureqa (symbolic regression) and Support Vector Machines (SVM) and are within the
models with highest accuracy in terms of RMSE for both LOOCV and test scores, for phase and amplitude. In Appendix C we briefly describe each of these three algorithms.We next discuss in more detail the results for these three families. In Figure 4 we show the models with highest accuracy for GPR, Eureqa and SVM. The solid curves show the errors (6) as a function of , using the test values computed from the leastsquares surrogate model . The dotted dashed red curve in turn shows the error of , here defined as (6) where . That is, those red points show the error of leastsquares surrogate waveforms with respect to the numerical relativity ones used to build it. We computed them using the package GWSurrogate and the public data for those NR simulations. They have been shown in Ref. [13] to be comparable to the NR numerical truncation errors themselves. We perform time and phase shifts on the NR waveforms using the Python Package GWTools [gwtools] so as to align them with waveforms. Along this line here we notice that, for example, the errors of BP 1 with respect to model are in turn smaller than those of with respect to NR.
Figure 5 shows a comparison between the model resulting from BP 1 and NR waveform (plus polarization), for . This is one of the NR values, close to the center of the interval used for : .
Prediction from the surrogate model with GPR and Radial Basis Function compared to the numerical relativity gravitational waveform for
.V Discussions
In this work we presented a case study in waveform modeling showing that, within the Reduced BasisEmpirical Interpolation (RBEIM) framework, automated machine learning (AutoML) regression can help obtain waveforms which are very fast to evaluate and essentially indistinguishable from those generated by NR simulations.
This study is driven by two main motivations related to data availability and to harnessing the power of machine learning methods in the most autonomous way. The first one concerns the general issue of efficiently generating NR waveforms, as they are the endpoint of complex computational pipelines involving the numerical resolution of the Einstein equations. The second motivation involves the reduction of bias in the learning process. Given some data distribution, the act of selecting any supervised ML algorithm to be trained on that data has an unavoidable inductive bias associated with the same selection process. Furthermore, the No Free Lunch Theorem [25] basically shows that there is no universal regressor capable of approximating equally well all possible data distributions. Then, the selection of an ML algorithm introduces severe constraints in learning since the possibility of any better regression approach is automatically discarded. This fact motivates the use of AutoML to explore the most accurate possibility between available algorithms at a time of suppressing or, at least, diminishing human bias in the learning process. With AutoML we let available algorithms compete with each other for better performance, therefore avoiding potentially suboptimal decisions driven by humanbiased criteria. Out of the regression blueprints analyzed by AutoML, we found that those based on Gaussian Process Regression, Symbolic Regression through Genetic Programming, and Support Vector Machines give the most accurate models (see Figs. 2, 3 and 4).
We point out the high precision of these models regardless of the small size of datasets used for training, which were synthesized using the RBEIM framework. In order to test if this form of feature selection helps the ML algorithms reach their performance, we repeated all our experiments using random parameters instead of greedy points. We found that the resulting surrogates can get to perform 10100 times worse than the greedybased ones.
Deep Learningbased models are not among the best performing ones in this study. Indeed, the Neural Nets blueprints used in this work rank last in the list of models arranged by performance. This can be related to the fact that deep architectures need in general large datasets to perform well. Since our datasets are small, their size could be insufficient for Neural Nets to reach good performance.
GPRbased models show a promising avenue for modeling waveform sparse data based on RBEIM. The training cost of these models naively grows as , with the size of the training set, although this does not pose a problem in this context given the small size of the training set. This, and the linear cost por prediction once GPR models are trained makes these algorithms appealing for further research. Indeed, for batch sizes , the cost per point evaluation of a GPR regression trained in this work stabilizes about fractions of milliseconds on a desktop computer.
To our knowledge this is the first regression study in waveform modeling using the RBEIM framework and AutoML techniques. A related work is [26], in which a few regression techniques for GWs are explored in the case of spinaligned nonprecessing and fully precessing compact binaries, although neither RBEIM nor AutoML are used in the process, and the results have been claimed as being general knowledge without a comprehensive study [27]. Our approach proposes a way of evaluating paradigms without human biases.
Vi Acknowledgments
We thank DataRobot for a free academic license, William Disch, Scott E. Field and Ganesh Siddamal for helpful discussions.
Appendix A Blueprints used
List of the blueprints used in this study, ranked from smallest to largest error with defined by Equation (6).

BP 1: Missing Values Imputed  Gaussian Process Regressor with Radial Basis Function Kernel

BP 2: Missing Values Imputed  Gaussian Process Regressor with Matérn Kernel (nu=2.5)

BP 3: Missing Values Imputed  Gaussian Process Regressor with Matérn Kernel (nu=1.5)

BP 4: Missing Values Imputed  Eureqa Regressor (Default Search: 3000 Generations)

BP 5: Missing Values Imputed  Gaussian Process Regressor with Matérn Kernel (nu=0.5)

BP 6: Regularized Linear Model Preprocessing v20  Nystroem Kernel SVM Regressor

BP 7: Numeric Data Cleansing  Standardize  ElasticNet Regressor (mixing alpha=0.5 / LeastSquares Loss) with KMeans Distance Features

BP 8: Regularized Linear Model Preprocessing v22 with Unsupervised Features  ElasticNet Regressor (mixing alpha=0.5 / LeastSquares Loss) with Unsupervised Learning Features

BP 9: Missing Values Imputed  Eureqa Regressor (Instant Search: 40 Generations)

BP 10: Treebased Algorithm Preprocessing v1  eXtreme Gradient Boosted Trees Regressor with Early Stopping

BP 11: Missing Values Imputed  Eureqa Regressor (Quick Search: 250 Generations)

BP 12: Treebased Algorithm Preprocessing v20  eXtreme Gradient Boosted Trees Regressor with Early Stopping

BP 13: Missing Values Imputed  Smooth Ridit Transform  Partial Principal Components Analysis  Gaussian Process Regressor with Matérn Kernel (nu=0.5)

BP 14: Missing Values Imputed  Smooth Ridit Transform  Gaussian Process Regressor with Matérn Kernel (nu=0.5)

BP 15: Treebased Algorithm Preprocessing v1  Adaboost Regressor

BP 16: Treebased Algorithm Preprocessing v1  Gradient Boosted Trees Regressor (LeastSquares Loss)

BP 17: Treebased Algorithm Preprocessing v15  Gradient Boosted Greedy Trees Regressor (LeastSquares Loss)

BP 18: Missing Values Imputed  Gradient Boosted Trees Regressor (LeastSquares Loss)

BP 19: Regularized Linear Model Preprocessing v19  Ridge Regressor

BP 20: Regularized Linear Model Preprocessing v14  Ridge Regressor

BP 21: Regularized Linear Model Preprocessing v14  Ridge Regressor

BP 22: Numeric Data Cleansing  Standardize  Vowpal Wabbit Low Rank Quadratic Regressor

BP 23: Numeric Data Cleansing  Standardize  Ridge Regressor  Light Gradient Boosting on ElasticNet Predictions

BP 24: Numeric Data Cleansing  Standardize  Vowpal Wabbit Stagewise Polynomial Regressor

BP 25: Numeric Data Cleansing  Standardize  Vowpal Wabbit Regressor

BP 26: Missing Values Imputed  Standardize  Autotuned KNearest Neighbors Regressor (Euclidean Distance)

BP 27: Treebased Algorithm Preprocessing v1  eXtreme Gradient Boosted Trees Regressor

BP 28: Treebased Algorithm Preprocessing v20  eXtreme Gradient Boosted Trees Regressor

BP 29: Treebased Algorithm Preprocessing v22 with Unsupervised Learning Features  eXtreme Gradient Boosted Trees Regressor with Unsupervised Learning Features

BP 30: Numeric Data Cleansing  Standardize  Autotuned Stochastic Gradient Descent Regression

BP 31: Numeric Data Cleansing  Standardize  Lasso Regressor

BP 32: Numeric Data Cleansing  Standardize  ElasticNet Regressor (mixing alpha=0.5 / LeastSquares Loss)

BP 33: Missing Values Imputed  Standardize  ElasticNet Regressor (mixing alpha=0.5 / LeastSquares Loss)

BP 34: Missing Values Imputed  Standardize  Ridge Regressor

BP 35: Missing Values Imputed  Standardize  Linear Regression

BP 36: Numeric Data Cleansing  Standardize  Ridge Regression

BP 37: Treebased Algorithm Preprocessing v1 with Anomaly Detection  eXtreme Gradient Boosted Trees Regressor

BP 38: Average Blender

BP 39: Treebased Algorithm Preprocessing v1  eXtreme Gradient Boosted Trees Regressor with Early Stopping  Forest (10x)

BP 40: Treebased Algorithm Preprocessing v1  eXtreme Gradient Boosted Trees Regressor (learning rate =0.01)

BP 41: Gradient Boosting Model Preprocessing v4  eXtreme Gradient Boosted Trees Regressor (learning rate =0.01)

BP 42: Treebased Algorithm Preprocessing v15  eXtreme Gradient Boosted Trees Regressor (learning rate =0.01)

BP 43: Regularized Linear Model Preprocessing v5  Nystroem Kernel SVM Regressor

BP 44: Regularized Linear Model Preprocessing v5  Support Vector Regressor (Radial Kernel)

BP 45: Missing Values Imputed  Smooth Ridit Transform  Partial Principal Components Analysis  Gaussian Process Regressor with Radial Basis Function Kernel

BP 46: Missing Values Imputed  Smooth Ridit Transform  Gaussian Process Regressor with Radial Basis Function Kernel

BP 47: Regularized Linear Model Preprocessing v5  Autotuned KNearest Neighbors Regressor (Euclidean Distance)

BP 48: Missing Values Imputed  Smooth Ridit Transform  Gaussian Process Regressor with Matérn Kernel (nu=1.5)

BP 49: Missing Values Imputed  Smooth Ridit Transform  Partial Principal Components Analysis  Gaussian Process Regressor with Matérn Kernel (nu=1.5)

BP 50: Numeric Data Cleansing  Smooth Ridit Transform  Keras Slim Residual Neural Network Regressor using Adaptive Training Schedule (1 Layer: 64 Units)

BP 51: Missing Values Imputed  Decision Tree Regressor

BP 52: Numeric Data Cleansing  Ridge Regressor with Binned numeric features

BP 53: Treebased Algorithm Preprocessing v2  RandomForest Regressor

BP 54: Treebased Algorithm Preprocessing v2  RandomForest Regressor (Shallow)

BP 55: Treebased Algorithm Preprocessing v1  RandomForest Regressor

BP 56: Missing Values Imputed  RandomForest Regressor

BP 57: Missing Values Imputed  Smooth Ridit Transform  Gaussian Process Regressor with Matérn Kernel (nu=2.5)

BP 58: Missing Values Imputed  Smooth Ridit Transform  Partial Principal Components Analysis  Gaussian Process Regressor with Matérn Kernel (nu=2.5)

BP 59: Regularized Linear Model Preprocessing v2  Ridge Regressor

BP 60 : Regularized Linear Model Preprocessing v12  Ridge Regressor

BP 61: Regularized Linear Model Preprocessing v12  Lasso Regressor

BP 62: Constant Splines  Ridge Regressor

BP 63: Numeric Data Cleansing  Smooth Ridit Transform  Keras Slim Residual Neural Network Regressor using Training Schedule (1 Layer: 64 Units)

BP 64: Treebased Algorithm Preprocessing v1  Light Gradient Boosted Trees Regressor with Early Stopping

BP 65: Treebased Algorithm Preprocessing v1  Dropout Additive Regression Trees Regressor (15 leaves)

BP 66: Treebased Algorithm Preprocessing v17  ExtraTrees Regressor

BP 67: Missing Values Imputed  Text fit on Residuals (L2 / LeastSquares Loss)  Generalized Additive Model

BP 68: Treebased Algorithm Preprocessing v2  LightGBM Random Forest Regressor

BP 69: Missing Values Imputed  RuleFit Regressor

BP 70: Missing Values Imputed  Text fit on Residuals (L2 / LeastSquares Loss)  Generalized Additive2 Model

BP 71: Mean Response Regressor

BP 72: Missing Values Imputed  Eureqa Generalized Additive Model (40 Generations)

BP 73: Regularized Linear Model Preprocessing v5  Support Vector Regressor (Linear Kernel)

BP 74: Missing Values Imputed  Eureqa Generalized Additive Model (1000 Generations)

BP 75: Missing Values Imputed  Eureqa Generalized Additive Model (10000 Generations)

BP 76: Numeric Data Cleansing  Smooth Ridit Transform  Binning of numerical variables  Keras Slim Residual Neural Network Regressor using Adaptive Training Schedule (1 Layer: 64 Units)

BP 77: Numeric Data Cleansing  Smooth Ridit Transform  OneHot Encoding  Keras Deep Residual Neural Network Regressor using Training Schedule (2 Layers: 512, 512 Units)

BP 78: Regularized Linear Model Preprocessing v4  TensorFlow Neural Network Regressor

BP 79: Neural Network Algorithm Preprocessing v4  Keras Residual AutoInt Regressor using Training Schedule (3 Attention Layers with 2 Heads, 2 Layers: 100, 100 Units)

BP 80: Numeric Data Cleansing  Smooth Ridit Transform  Keras Deep SelfNormalizing Residual Neural Network Regressor using Training Schedule (3 Layers: 256, 128, 64 Units)

BP 81: Numeric Data Cleansing  Smooth Ridit Transform  Keras Deep Residual Neural Network Regressor using Training Schedule (3 Layers: 512, 64, 64 Units)

BP 82: Numeric Data Cleansing  Smooth Ridit Transform  OneHot Encoding  Keras Wide Residual Neural Network Regressor using Training Schedule (1 Layer: 1536 Units)

BP 83: Neural Network Algorithm Preprocessing v4  Keras Residual Cross Network Regressor using Training Schedule (3 Cross Layers, 4 Layers: 100, 100, 100, 100 Units)

BP 84: Neural Network Algorithm Preprocessing v1  TensorFlow Deep Learning Regressor

BP 85: Neural Network Algorithm Preprocessing v3  TensorFlow Deep Learning Regressor

BP 86: Neural Network Algorithm Preprocessing v5  TensorFlow Deep Learning Regressor

BP 87: Neural Network Algorithm Preprocessing v4  Keras Residual Neural Factorization Machine Regressor using Training Schedule (2 Layers: 100, 100 Units)
Appendix B Building the reduced basis and empirical interpolant
The reduced basis is built using the Reduced Basis algorithm ([2], see also [3] and citations therein for early applications in waveform modeling), being implemented in Python. The training set is composed of waveforms from the fiducial model, equally spaced in the mass ratio . After building the reduced basis we construct the empirical interpolant using the EIM algorithm [28, 29, 30].
We assess the representation accuracy of the RB and EIM approaches in the norm. The RB error for a waveform is defined as
(7) 
where is the orthogonal projection onto the basis of size . The basis is hierarchically built until reaching a tolerance for the representation error of any waveform in the training set:
(8) 
Analogously, the interpolation error is defined as
(9) 
where is the EIM interpolant associated to the basis.
As described in sections II.2 and I, both algorithms comprise information related to the waveforms’ parameter and time variations. This is done by selecting a suitable and minimal set of greedy parameter points and EIM time nodes which later serve to build the RB basis and the EIM interpolant. In Figure 6 we show (left) the greedy selection of both algorithms along (right) the representation errors achieved by each one as a function of the dimensionality of the basis.
We make some remarks.

In order to achieve machine precision reduced basis waveforms are needed. To that end we chose a greedy tolerance .

Greedy and EIM points distributions concentrate in the lowmassratio and mergerringdown sectors (left panel of Figure 6). These correspond to sectors in which waveforms change more rapidly than the rest of the parametertime domain.

In the right panel of Figure 6, . This is a general fact, since both, orthogonal projection and interpolation , are linear combinations of reduced basis elements. It is well known that the best linear combination in the leastsquares sense is the first one, . Any other representation in the form of linear expansions of the basis will perform worse in the norm than the optimal one (); this includes interpolation.

We found that the reduced basis dimensionality converges to for large training sets (top panel of Figure 7), meaning that a reduced basis with captures all the structure related to waveforms in the range at a tolerance .

Although one expects some quality loss when moving from a RB representation to EIM interpolation, this loss is negligible when . In the bottom panel of Figure 7 we assess the representation capacity of both, RB and EIM approaches, in a test set composed of waveforms equally spaced in the mass ratio . As shown, all interpolation errors fall below or close to the roundoff line .
Appendix C Main regression approaches
Gaussian Process Regression
.– GPR uses Bayesian statistics to infer models from data
[31]. Its ultimate goal is to build suitable probability distributions over the space of possible models by performing Bayesian inference on observed data:
Before any data is seen, the prior describes an initial guess distribution over possible models . When observed data is available, the prior gets actualized and gives a posterior distribution over models which makes better estimates for regression and the corresponding uncertainties.
Strictly speaking, a Gaussian process is a set of random variables for which any finite subset has a joint Gaussian distribution. In this context, a model
can be seen as a infinitedimensional stochastic process from which any finite subsample is jointly Gaussian. This kind of processes is completely defined by a mean function and a covariance/kernel function . The kernel function plays a capital role in Bayesian regression since it determines the kind of functions that will be available for modeling and related properties such as degree of smoothness, characteristic lengths, etc. The specification of the kernel determines the type of GPR to carry out. A prototypical example is the Radial Basis Function (RBF) or Gaussian kernel defined byThis kernel is the most common and widely used in the field. Its name comes from the fact that the dependency enters the function only through its normed difference , a feature also present in SVM approaches. The parameter is the characteristic lengthscale and controls the scale of variation of inferred models. More general kernels can be built from the RBF by the addition of parameters to the function. Once a kernel is defined, all free parameters can be optimized for better performance. In fact, GPR enters the AutoML flow through hyperparameter optimization of these variables.
Another important family of kernels is the Matérn class. It is described by covariance functions of the form
where , are positive parameters and is a modified Bessel function. In some cases Matérn kernels are preferred over RBF ones since the latter carries strong smoothness assumptions that may turn out to be unrealistic in many physical scenarios [31].
Besides the advantage of GPR approaches at providing not only predictive models but an uncertainty estimation for them, its algorithm scales as with the size of the dataset, something that may cause computational bottlenecks for large data.
Eureqa
.– Eureqa models implement symbolic regression using evolutionary algorithms through genetic programming
[32, 33, 34]. As opposed to more standard approaches such as polynomial regression, the goal of symbolic regression is to find not only the best set of parameters for fitting analytical forms to data but the form itself. This is done first by specifying a dictionary of building blocks, e.g. elementary functions, that serves to build more complex forms. Naturally, this leads to a problem of combinatorial complexity since the search space is composed of all the possible combinations of building blocks with their respective free parameters. To overcome this, an evolutionary search, in mimicking the ways of nature, is performed across many possible combinations, relaxing the search complexity by introducing randomness in the selection process. Through the evolution of populations using e.g. random mutations or crossover, different models compete with each other following some optimization criteria until the best individuals are left. In Eureqa, the best models are ranked following a compromise between accuracy and complexity summarized in a Pareto front curve. Early applications of these methods in gravitational wave modeling can be found in [35].Support Vector Machines.– SVM algorithms [36, 37] approximate functions by fitting expansions of the form
where are free parameters, are data points, and is a kernel function in the same sense as in GPR. The kernel plays the role of a generalized inner product between nonlinear feature maps. These maps take the input vectors and project them to other feature spaces. In those one can perform trivial linear regression.
SVM is computationally efficient since convex optimization can be directly applied as the problem is linear in the free parameters . It has the obvious advantage over linear approaches of being able to model nonlinearities present in training data, something that direct linear regression can get to perform very poorly. As in GPR, kernel functions play a prominent role. For example, kernel functions may be computationally more tractable than using feature maps in some cases. Even worse, feature maps can be infinite dimensional and therefore impossible of computer implementation.
References
 Tiglio and Villanueva [2021a] M. Tiglio and A. Villanueva, Reduced order and surrogate models for gravitational waves (2021a), arXiv:2101.11608 [grqc] .

Prud’homme et al. [2002]
C. Prud’homme, D. V. Rovas, K. Veroy, L. Machiels, Y. Maday, A. T. Patera, and G. Turinici, Reliable realtime solution of parametrized partial differential equations: Reducedbasis output bound methods,
J. Fluids Eng. 124, 70 (2002).  Field et al. [2011] S. E. Field, C. R. Galley, F. Herrmann, J. S. Hesthaven, E. Ochsner, and M. Tiglio, Reduced basis catalogs for gravitational wave templates, Phys. Rev. Lett. 106, 221102 (2011).
 Field et al. [2014] S. E. Field, C. R. Galley, J. S. Hesthaven, J. Kaye, and M. Tiglio, Fast prediction and evaluation of gravitational waveforms using surrogate models, Phys. Rev. X 4, 031006 (2014).
 Bellman [1957] R. Bellman, Dynamic Programming (Dover Publications, 1957).
 Adcock et al. [2017] B. Adcock, S. Brugiapaglia, and C. G. Webster, Compressed sensing approaches for polynomial approximation of highdimensional functions, arXiv eprints , arXiv:1703.06987 (2017), arXiv:1703.06987 [math.NA] .

Chkifa et al. [2015]
A. Chkifa, A. Cohen, and C. Schwab, Breaking the curse of dimensionality in sparse polynomial approximation of parametric pdes,
Journal de Mathématiques Pures et Appliquées 103, 400 (2015).  Epperson [1987] J. F. Epperson, On the runge example, Am. Math. Monthly 94, 329–341 (1987).
 Cohen et al. [2013] A. Cohen, M. A. Davenport, and D. Leviatan, On the stability and accuracy of least squares approximations, Foundations of Computational Mathematics 13, 819 (2013).
 Cohen and Migliorati [2017] A. Cohen and G. Migliorati, Optimal weighted leastsquares methods, The SMAI journal of computational mathematics 3, 181 (2017).
 Hutter et al. [2018] F. Hutter, L. Kotthoff, and J. Vanschoren, eds., Automated Machine Learning: Methods, Systems, Challenges (Springer, 2018) in press, available at http://automl.org/book.
 [12] DataRobot. Enterprise AI. https://www.datarobot.com.
 Blackman et al. [2015] J. Blackman, S. E. Field, C. R. Galley, B. Szilágyi, M. A. Scheel, M. Tiglio, and D. A. Hemberger, Fast and accurate prediction of numerical relativity waveforms from binary black hole coalescences using surrogate models, Phys. Rev. Lett. 115, 121102 (2015).
 [14] SXS Gravitational Waveform Database. https://data.blackholes.org/waveforms/catalog.html.
 [15] GWSurrogate, GWSurrogate (2014–2020), accessed 31 May 2021.
 [16] Eureqa Software. https://www.datarobot.com/nutonian/.
 [17] H2O.ai. https://www.h2o.ai/.
 [18] Keras. https://keras.io/.
 [19] LightGBM. https://lightgbm.readthedocs.io.
 Pedregosa et al. [2011] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, Scikitlearn: Machine learning in Python, Journal of Machine Learning Research 12, 2825 (2011).
 [21] TensorFlow. https://www.tensorflow.org/.
 [22] Vowpal Wabbit. https://vowpalwabbit.org/.
 Chen and Guestrin [2016] T. Chen and C. Guestrin, Xgboost: A scalable tree boosting system, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016).
 [24] Datarobot python package. https://datarobotpublicapiclient.readthedocshosted.com/en/v2.26.0/.
 Wolpert [1996] D. H. Wolpert, The lack of a priori distinctions between learning algorithms, Neural Computation 8, 1341 (1996).
 Setyawati et al. [2019] Y. Setyawati, M. Pürrer, and F. Ohme, Regression methods in waveform modeling: a comparative study (2019), arXiv:1909.10986 [astroph.IM] .
 Cuoco et al. [2020] E. Cuoco, J. Powell, M. Cavaglià, K. Ackley, M. Bejger, C. Chatterjee, M. Coughlin, S. Coughlin, P. Easter, R. Essick, H. Gabbard, T. Gebhard, S. Ghosh, L. Haegel, A. Iess, D. Keitel, Z. Márka, S. Márka, F. Morawski, T. Nguyen, R. Ormiston, M. Pürrer, M. Razzano, K. Staats, G. Vajente, and D. Williams, Enhancing gravitationalwave science with machine learning, Machine Learning: Science and Technology 2, 011002 (2020).
 Barrault et al. [2004] M. Barrault, Y. Maday, N. C. Nguyen, and A. T. Patera, An ‘empirical interpolation’ method: application to efficient reducedbasis discretization of partial differential equations, Comptes Rendus Mathematique 339, 667 (2004).
 Maday et al. [2009] Y. Maday, N. C. Nguyen, A. T. Patera, and S. H. Pau, A general multipurpose interpolation procedure: the magic points, Communications on Pure and Applied Analysis 8, 383 (2009).
 Chaturantabut and Sorensen [2010] S. Chaturantabut and D. Sorensen, Nonlinear model reduction via discrete empirical interpolation, SIAM J. Scientific Computing 32, 2737 (2010).
 Rasmussen and Williams [2006] C. E. Rasmussen and C. k. I. Williams, Gaussian Processes for Machine Learning, Adaptive Computation and Machine Learning series (The MIT Press, 2006).
 Koza [1992] J. R. Koza, Genetic programming: on the programming of computers by means of natural selection (MIT Press, Cambridge, MA, USA, 1992).
 Schmidt and Lipson [2009] M. Schmidt and H. Lipson, Distilling freeform natural laws from experimental data, Science 324, 81 (2009).
 Schmidt and Lipson [2010] M. Schmidt and H. Lipson, Symbolic regression of implicit equations, in Genetic Programming Theory and Practice VII, edited by R. Riolo, U.M. O’Reilly, and T. McConaghy (Springer US, Boston, MA, 2010) pp. 73–85.
 Tiglio and Villanueva [2021b] M. Tiglio and A. Villanueva, On ab initiobased, free and closedform expressions for gravitational waves, Sci. Rep. 11, 5832 (2021b).

Boser et al. [1996]
B. Boser, I. Guyon, and V. Vapnik, A training algorithm for optimal margin classifier,
Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory 5 (1996).  Cortes and Vapnik [1995] C. Cortes and V. Vapnik, Support vector networks, Machine Learning 20, 273 (1995).
Comments
There are no comments yet.