1 Introduction
Several problems arise in observational and theoretical contexts that demand the resolution of computationally intensive differential equations. From structural analysis in engineering to spacetime simulations in astrophysics, rapid and reliable evaluations of solutions to these equations are crucial due to the need for computing in real or nearreal time quantities that depend on those solutions. Due to pervasive computational costs led by the inherent complexities present in many problems, achieving fast responses becomes an ubiquitous bottleneck.
As a case study, let us take the example of an ongoing problem in the field of gravitational wave research, namely the template bank problem (Field et al. (2012)
). To be able to detect the very faint gravitational wave signals among the noisy background of groundbased interferometers, the LIGOVirgo collaboration use match filtering against a bank of GW signal templates to maximize the signaltonoise ratio (SNR) (
Cutler & Flanagan (1994); Abbott et al. (2016, 2020)).The observed time series is filtered to decide whether a binary coalescence took place, and the parameter estimation allows us to infer the properties of precursors of the remnant, such as masses and spins for binary black hole mergers.
It is convenient to count with large enough template banks of theoretical waveforms to fill the parameter space of GWs. However, this pose an exceedingly challenging task due to the complexity of solving the Einstein Equations of General Relativity. A single numerically generated GW waveform can take from days to weeks to become available for production (Lehner & Pretorius (2014)).
A palliative solution is the construction of approximate waveform models, which are more direct to build and deploy than numerical relativity ones. There are several methods to build these approximations. Analytical examples are the PostNewtonian (Blanchet (2006)), Effective One Body (Damour & Nagar (2011)) and Phenomenological (Sturani et al. (2010); Hannam et al. (2014); Khan et al. (2019)) approximations. However, we focus here on a particular set of methods that have proven to be very fertile in GW research and produced some of the milestones in waveform modeling in recent years: Reduced Order methods.
Reduced Order Modeling (ROM) is an umbrella term that encompasses a variety of techniques for dimensional reduction, developed to address the problem of complexity in numerical simulations. In particular, Surrogate Models obtained through application of ROM to ground truth solutions are low resolution representations intended to be fast to build and evaluate without compromising accuracy. We take a datadriven approach, i.e. driven only by data, as opposed to more standard and intrusive ones in which reduction methods are coupled to differential solvers to build solutions (Quarteroni et al. (2015); Hesthaven et al. (2015)).
In waveform modeling, the combination of two ROM methods originally posed for intrusive problems and recreated later for datadriven ones led to significant success in constructing surrogate models for GWs. Those methods are the Reduced Basis (RB) (Boyaval et al. (2009); Field et al. (2011, 2012)) and the Empirical Interpolation (EI) (Barrault et al. (2004); Chaturantabut & Sorensen (2010)) methods, which we describe in the next section.
The primarily purpose of this work is to disseminate these tools in the astronomy/astrophysics community by introducing a single and userfriendly Python package for datadriven dimensional reduction: Arby.
Arby arises as a response to a lack of well documented, tested and actively maintained code for reduced basis and surrogate modeling in the scientific community while adhering to the datadriven and userfriendliness premise.
2 Theory Overview
This section briefly describes the basics of a reducedorder pipeline for building surrogate models. As we stated above, it merges two main ingredients, the Reduced Basis and the Empirical Interpolation methods, for dimensional reduction of raw data. This pipeline was first introduced in (Field et al. (2014)) followed by the construction of several surrogate waveform models based on this method (Blackman et al. (2015, 2017b, 2017a); Varma et al. (2019a, b); Rifat et al. (2020)). See also (Tiglio & Villanueva (2021)) for a review.
Representation.– We are interested in parametrized scalar (real or complex) functions, solutions or models of the form
(1) 
where represents the parameter/s of the model and is the independent variable. Both are real and possibly multidimensional. In physical models, can represent parametrized time series with being the time variable. For convenience, we denote the spaces for and as the parameter and physical domains, respectively.
In the first stage, we look for a lowresolution representation of the model . The RB method consist in representing a whole set of solutions, usually called the training set
by linear combinations of basis elements of the form
(2) 
where
(3) 
defines the inner product between training functions, is the complex conjugate of –if we deal with complex functions– and is the physical domain. The set is called the reduced basis and is composed by a subset of optimally chosen solutions from the training set. The construction of the reduced basis is iterative: at each step of the algorithm, the most dissimilar (orthogonal) element from the training set joins to the current basis, and the process stops when an userspecified tolerance is met. This tolerance is related to the maximum error of the difference between solutions and approximations. In consequence, the different spaces spanned by each reduced basis built at each step are nested, i.e. The addition of a new element to the basis implies a fine tuning of the previous approximation space.
The RB method supposes a compression in the parameter space of solutions. The next step is to achieve compression in time. To this we turn to interpolation replacing the projectionbased approach described so far by an interpolation scheme. We pose the problem by looking for an efficient linear interpolation operator such that
(4) 
subject to
(5) 
for strategically selected nodes out from the physical domain. The EI method (Maday et al. (2009); Barrault et al. (2004); Chaturantabut & Sorensen (2010)) gives us an algorithm for building such interpolant. The Empirical Interpolation (EI) algorithm, as described in (Field et al. (2014)), selects iteratively the nodes from the physical domain following a local optimization criteria (Tiglio & Villanueva (2020)).
The EI algorithm receives as unique input the reduced basis and selects the interpolation nodes for building the interpolant. Note that there is no need of the whole training set since the RB algorithm already did the introspection of it, and we assume that the relevant information about is hardcoded in the reduced basis.
It is possible to show that, under some conditions, the interpolation error is similar to the projection one, which in most applications has exponential decay in the number of basis elements (see (Tiglio & Villanueva (2021)) and citations therein). This leads to an efficient and (in most cases of interest) accurate representation of the training set by means of an empirical interpolation.
To summarizing: first, a reduced basis is built from a training set using the RB method, which leads to the linear representation in (2). This step completes a compression in the parameter domain. Next, an empirical interpolant is built solely from the reduced basis. This step completes a compression in the physical domain. Finally, we end up with an empirical interpolation (4) which provides efficient and highaccuracy representations of all functions in the training set.
Predictive models.– We want our model to represent solutions that are not present in the training set. That is, we look for the predictability. For this, we perform parametric fits at each empirical node along with training values. Let us break it down.
Recall the approach is datadriven, so we do not fill the training set with more solutions to approximate newer ones. Instead, we predict them by performing fits along data that we already know, that is, along . For D problems (), deciding for problemagnostic fitting methods that are wellsuited for most cases can be a challenging task, not to mention the high dimensional case, which remains an open question (Tiglio & Villanueva (2021)). For a rough classification, we refer to regression and interpolation methods. The former deals with calibrating free parameters of some model by optimizing an error function; the latter, with solving an interpolation problem consisting essentially in solving algebraic systems which are possibly subject to constraints (e.g. Eqs. (4,5)).
Here we address the second approach for parametric fits. The procedure consists in interpolating the values
along parameter samples for each empirical node . As we describe in Section 4, as of now, Arby uses splines for this step, though we expect to generalize this. Once fits are done, we end up with a surrogate model which can represent and predict at any in the parameter domain with high accuracy and low computational cost.
3 Algorithms
The pipeline for building surrogates described in the previous section is valid in any number of parameter and physical dimensions if we consider arbitrary fittings through parameter space. For surrogate modeling, Arby supports in its present version 1D parameter and physical domains (real number intervals of the form for ) and realvalued functions. Again, this restriction is only for building surrogate models. On the other hand, Arby supports multidimensional parameter domains (although still restricted to 1D domains in the physical dimension) and complexvalued functions for building reduced bases and empirical interpolants.
Below we summarize the algorithm for surrogate modeling. We refer the reader to the Appendices for technical details about the RB and EI algorithms used in intermediate stages.
The inputs are the training set , the parameter set , and the greedy tolerance .
Build a continuous function by doing fits along values . 
Let’s make some remarks on Alg. 1.

The training set is built from a discretization of the parameter domain. In the current version of Arby, is a discretized real interval .

The RB algorithm used for building the reduced basis is fully described in Alg. 2, see the Appendices. It selects from points , called the greedy points, labeling those functions in the training set that conform the reduced basis. For conditioning purposes, the basis is orthonormalized at each step, so the algorithm’s final output is a set of orthonormal basis elements along with the set of greedy points. So we use the term reduced basis interchangeably for both, the basis conformed by greedy solutions and its orthonormal version, due to its equivalence (they span the same space).
The number depends on the greedy tolerance . In Arby we must specify a discretization of the physical domain so as to be able to do integrals (see (3)). In this context, is a real interval and Arby must receive as input an equispaced discretization of it.

To achieve predictability Steps 46 perform parametric fits along training values for each empirical node . Let’s illustrate this by looking at the first iteration of the loop in Steps 46. For the first node collect all values
and perform a fit along them. This results in a function that is continuous in the interval . This is repeated times for each empirical node. The resulted functions constitute along the reduced basis the building blocks for the final surrogate assembly. The current Arby version implements splines (J.H. Ahlberg & Walsh (1967)) for parametric fits, i.e., piecewise polynomial interpolation of some degree arbitrarily set by the user.
3.1 Related works
A previous implementation of the RBEI approach is GreedyCpp ^{1}^{1}1ttps://bitbucket.org/sfield83/greedycpp/, an MPI/OpenMP parallel code written in C++ (Antil et al. (2018)). Although it is not designed for building surrogates and training sets have to be loaded at runtime, it allows for building reduced bases, empirical interpolants and reducedorder quadratures. Another example is ROMPy ^{2}^{2}2https://bitbucket.org/chadgalley/rompy/, a previous attempt written in pure Python which supports surrogate modeling.
Other ROM implementations in the Python ecosystem are not fully datadriven. Typically they are weakly or strongly coupled to solvers for differential equations. Mature examples are PyMOR (Milk et al. (2016)) and RBniCS (Ballarin et al. (2015)). The latter built on top of the FEniCS Logg et al. (2011) library for differential equations, and the former allows for coupling with external solvers.
4 Arby
Arby is a Python package for datadriven surrogate modeling satisfying standard software compliance along quality assurance (see Section 4.4). It allows the user for building reduced basis and empirical interpolants at any number of parameter dimensions. At the current release, Arby builds surrogate models for 1D domains.
4.1 Implementation
Integrals and inner products (see 3) must be discretized in implementing Alg. 1. For this, the physical interval is sampled in equispaced points to define a discrete inner product between two functions,
The bar represents complex conjugation in case of being complex. The ’s are positive real values called weights. Weights and sample points constitute a quadrature rule. Arby uses quadrature rules to compute integrals.
4.2 Public API
Classes.– The main class in Arby is PythonReducedOrderModel. It implements Alg. 1 for surrogate modeling. There are three basic inputs for this class:
These inputs represent the minimum and indispensable for building surrogates. Optional parameters are:

Pythonintegration_rule: a string specifying the quadrature rule;

Pythongreedy_tol: the greedy tolerance for the reduced basis, and

Pythonpoly_deg: the polynomial degree of splines.
These parameters can be tuned for controlling the model accuracy. See the Arby documentation ^{3}^{3}3https://arby.readthedocs.io for a thorough tutorial on this.
Once a PythonReducedOrderModel object is created, the construction and evaluation phases of the surrogate are unified in a single call: just invoke the surrogate at some parameter/s to obtain a prediction.
In principle, the first call of the surrogate is more expensive than subsequent ones, but this is not a problem given that it is done only once. The algorithm design allows upcoming calls to be of fast deployment. By doing so, the process was split in offlineonline stages (Field et al. (2014)). Thus, the offline stage corresponds to a (possibly) expensive first building; the online one corresponds to fast model evaluations.
There is a class to compute inner products and integrals: the PythonIntegration class. It receives a 1D array corresponding to the physical points and the integration rule. Available rules are Riemann (default), Trapezoidal and Euclidean. The former two works for continuous data; the latter for discrete data reduces to discrete inner products. Once a rule is specified, the following methods are unlocked: Pythonintegral, Pythondot, Pythonnorm and Pythonnormalize. They compute integrals, inner products, norms and normalizations, respectively.
The PythonBasis class encompasses data utilities for handling arbitrary bases, whether they are reduced bases or userspecified ones. The PythonBasis class receives as input a basis and an PythonIntegration object. Available methods are: Pythoninterpolate, Pythonproject and Pythonprojection_error. The former two take input arrays and interpolate/project them in a finitedimensional space spanned by the basis. The interpolation is empirical, i.e. following the EI directives. The projection is simply an orthogonal projection onto the basis. The Pythonprojection_error method computes squared projection errors due to projecting arrays onto the basis.
Auxiliary classes are PythonRB and PythonEIM. They are containers for RB/EIM information.
Functions.– The main function in Arby is Pythonreduced_basis. It builds a reduced basis giving as input a training set and physical points. This function takes full advantage of the hierarchical feature of the RB algorithm, leading to constant complexity with the number of basis elements (Tiglio & Villanueva (2021)). For conditioning purposes, there are two patterns related to the normalization of the training set which lead to two different implementations of the greedy algorithm.
There is a function for GramSchmidt orthonormalization called Pythongram_schmidt. It implements an Iterative GramSchmidt algorithm (Hoffmann (1989)) to orthonormalize a set of linearly independent arrays. Internally, this algorithm builds the reduced bases.
4.3 Benchmarks
It is interesting within the work to present a performance analysis of the the most important routine of the project: Pythonreduce_basis.
From a theoretical point of view, we expect an overall complexity of , where are the number of training and physical samples, respectively, and is the number of basis elements. Since the training set size is fixed, the addition of one element to the basis at some stage of the greedy loop carries a constant computational cost. If data is amenable for dimensional reduction, we hope . On the other side, in the worst case scenario we expect . This happens if there is almost no redundancy in data to exploit, for example random data. In this case the cost grows as .
Besides theoretical assumptions, we intend to empirically know how the different parameters of Pythonreduce_basis function influence the general performance of the algorithm and if this adjusts to the theoretical estimates. These parameters are

Pythonintegration_rule – The quadrature rule to define an integration scheme, can be: riemann, trapezoidal and euclidean.

Pythonnormalize – True if training data must be normalized before training or False otherwise.

Pythongreedy_tol – The greedy tolerance as a stopping condition for the reduced basis. This allows to control the representation accuracy of the basis. We choose to test two different tolerances, and .

Pythontraining_set – The training data as a 2D array. We tested on square random arrays with sizes and .

Pythonphysical_points – Physical points for quadrature rules. Must match the number of columns of training set.
With these parameters, we simulated 100 training sets for each one of the 24 possible combinations, giving a total of 24000 test cases. The benchmark was then executed on a computer with the following specifications:

CPU – 4 x 2.4 GHz AMD Opteron(tm) Processor 6282 SE.

RAM – 251GB DDR3L.

OS – CentOS 7 Linux 3.10.0514.el7.x86_64

Software – Python 3.9.0.final.0 (64 bit), NumPy 1.21.1 and SciPy 1.7.0
The results are presented in Figure 1. As we can anticipate, the size of the training set is the most important factor impacting the execution times. All other parameters, out of Pythongreedy_tol, are mainly used for initial configurations outside the greedy loop, hence used only once.
To further explore the relationship between Pythontraining_set sizes and execution time, we run a second benchmark leaving fixed Pythonnormalize=False, Pythonintegration_rule=’riemman’ and Pythongreedy_tol=1e12. We generate 100 random Pythontraining_set by varying the size between and , giving a total of test samples.
In this case, the results (Figure 2) show the anticipated behavior: in front of random others the cost has a growth as we increase the size of Pythontraining_set.
For more details the entire benchmark dataset can be found in ^{4}^{4}4https://zenodo.org/record/5139187#.YPIAXVKhhE (Villanueva et al., 2021).
4.4 Quality assurance
To ensure the proper software quality of Arby, we provide standard quantitative and qualitative metrics, in particular i) unit testing and ii) codecoverage, and adhere to the PEP style guide(Van Rossum et al., 2001) throughout the entire project.

Unit testing:
Its purpose is to ensure that the individual software components work as expected (Jazayeri, 2007).

Codecoverage:
It measures the amount of code covered by the unit test suite, expressed as a percentage of executed sentences(Miller & Maloney, 1963). By providing comprehensive codecoverage we ensure code validation, expand the ability and efficiency of error handling, and increase confidence in the code.
Arby currently uses pytest (Okken, 2017) and Coverage.py ^{5}^{5}5https://coverage.readthedocs.io for unit testing and coverage, respectively, completing up to of codecoverage using Python versions , , and .
The PEP  Style Guide for Python Code (Van Rossum et al., 2001) is one of a series of guidelines and practices on how to write Python code to improve code readability and consistency. There are several of PEP’s (Python Enhancement Proposals), including PEP 8. The latter has recommendations for code layout, whitespaces, comments, naming conventions and programming recommendations. In addition, there are tools, called linters, that can be used, in particular, to automate compliance with PEP 8; Arby currently uses flake8 ^{6}^{6}6https://pypi.org/project/flake8/, which checks for any deviation in code style.
Finally, the entire source code is MITlicensed (Initiative et al., 2006) and is publicly available from its GitHub repository^{7}^{7}7https://github.com/aaronuv/arby. All versions committed to this code are automatically tested in a continuousintegration service using Travis CI^{8}^{8}8https://travisci.com/github/aaronuv/arby and GitHub Actions ^{9}^{9}9https://github.com/features/actions. Documentation is automatically built from the repository and made public in the readthedocs service at ^{10}^{10}10https://arby.readthedocs.io/en/latest/.
Arby is built on top of the Python scientific stack: Numpy (Walt et al., 2011) to perform efficient numerical linear algebra operations; Scipy (Virtanen et al., 2020), used in the current release for splines interpolation.
The Arby package is available for installation on the PythonPackageIndex (PyPI) and can be installed using the command Pythonpip install arby.
5 Toy model: a damped pendulum
We illustrate the construction of surrogate models applying Arby to a classical problem in physics: the damped pendulum. This system is a simple pendulum of given longitude subject to gravity and a dissipative force such as friction, allowing for the pendulum oscillations to damp at long times. We encode the generic dynamics of this system in the ordinary differential equation (ODE)
(8) 
where represents the timedependent angle of the pendulum with respect to the equilibrium axis and dots represent time differentiation. The symbols and represent friction and gravity strength per unit length, respectively, at fixed values of the pendulum’s longitude. Time units do not play any role in this example, so for practical purposes, we choose it adimensional. This election makes the two parameters of the model, and , also adimensional. In order to make the model onedimensional and being able to apply Arby for surrogate modeling, fix to a convenient value so as to cover the variation of solutions in the selected time range widely. Fixed
, our parametrized model consist on damped oscillations
, being time the physical variable and the parameter.We must solve numerically the ODE in (8) in order to generate the training set of solutions to feed Arby. To this, we set and choose intervals for physical and parameter ranges as and . The initial conditions are set to , where , meaning the pendulum departs from rest at and falls under the action of gravity. We generate a training set of solutions using an ODE solver from the Scipy Python package (Virtanen et al., 2020). We discretize the parameter and time domains in and equispaced points respectively, and generate solutions using the same initial conditions^{11}^{11}11The code for this example can be found in https://arby.readthedocs.io/en/latest/. See Fig. 3.
To build a surrogate for solutions to the pendulum equation, we invoke the PythonReducedOrderModel class and feed it with the three main parameters: training set, physical points and parameter points which we denote Pythontraining, Pythontime and Pythonparam, respectively.
In addition, we modify default values of optional class parameters to increase the surrogate’s quality:
[breaklines, frame=lines, framesep=2mm]pycon ¿¿¿ from arby import ReducedOrderModel as ROM
# set the greedy tolerance to 1e14 and # splines degree to 5 and create a model ¿¿¿ pendulum = ROM(training, time, param, greedy_tol=1e14, poly_deg=5)
Once the model for the pendulum is created the next step is to build/evaluate the surrogate. For that, we just call it.
[breaklines, frame=lines, framesep=2mm,]pycon # define the parameter ‘par‘ for surrogate evaluation ¿¿¿ par = 2. # evalute ¿¿¿ pendulum.surrogate(par) array([1.57079633, 1.5683046 , 1.56086267, …, 0.00993474, 0.00975876, 0.009536])
In order to test the surrogate’s accuracy, we build a test set composed of solutions ( denser than training set) for the same parameter and physical domains. It means the test set contains the training set plus several solutions not used in the training stage.
We use the norm to compute the relative error between surrogate and ground truth,
(9) 
where
This metric allows us to quantify how well the surrogate globally matches the ground truth model at some parameter value. Furthermore, we compute these errors not only for the surrogate model based on training parameters but for a set of surrogates built upon different discretizations, starting from very sparse ones until reaching the discretization of training parameters. With the integration tools available in Arby, these computations become straightforward.
In Fig. 4 we built a colormap of errors for different discretizations and parameter values. Naturally, the biggest errors correspond to very sparse training sets and fall below for discretizations . For a specific model (an horizontal line in the colormap), brightdark patterns describe the behavior of the model when it alternates between insample and outofsample parameter evaluations. The lowest errors usually correspond to insample evaluations, where splines become exact. The largest ones usually correspond to outofsample evaluations, where the errors due to parametric fits become relevant.
For the surrogate trained with solutions, in Fig. 5 (top panel) we show the function curves for both, surrogate and ground truth models, for a parameter value corresponding to the worst global error. The surrogate evaluation at this parameter is a prediction, i.e. the associated parameter do not correspond to a training one. Since both surrogate and ground truth models are indistinguishable at eyeball resolution, we plot the absolute value for both functions in logarithmic scale to locate the dissimilarity sectors between curves better. We conclude that even the worstcase scenario shows almost no difference between surrogate and ground truth solution. The bottom panel of Fig. 5 shows the absolute value of the pointwise difference between surrogate and test solutions. We removed from the test set those points which correspond to training parameters in order to focus only on generalization errors of the surrogate. We see that pointwise errors jump up at most to whereas the bulk remains close to .
6 Cosmic Microwave Background Anisotropies: a multidimensional case
A valuable application of surrogate models could stem from its application to modeling CMB temperature and polarization anisotropies power spectra as measured by satellites like Planck (Planck Collaboration et al., 2020). Such power spectra have a strong dependence on the underlying cosmological model, defined by a set of cosmological parameters , here taken to be dimensional.
Using the CAMB (Code for Anisotropies in the Microwave Background)) for each cosmological model (Lewis et al., 2000), we generated their corresponding observed temperature anisotropies power spectra. We randomly sampled the cosmological parameter space using points; as we will see, this is sufficiently dense. The independent or physical variable here results to be the angular multipole index , which we sampled using discrete points .
We compute a reduced basis using Arby. For a greedy tolerance of we obtain a set of greedy parameters identifying those elements in the training set that conform the reduced basis, allowing us to describe any power spectra in our sample as linear combinations of them. In particular, any training set composed of power spectra is equivalent to a set of just 84 reduced functions. Thus, for example, in Fig. 6 we show the distribution of the cosmological parameters for a training set of
CMB power spectra as blue points and the corresponding selected set of reduced basis as orange points. This methodology could accelerate the estimation of the cosmological parameters from CMB anisotropies by using a surrogate model built using Machine Learning algorithms on the reduced basis set. The utilization of this approach to estimate cosmological parameters will be the subject of a forthcoming publication.
We report the convergence of the reduced basis number as a function of the training set size in Fig. 7. This plot shows that the reduced basis number stabilizes close to , meaning that, for the specified greedy tolerance, the training set begins to saturate at . This proves that the full points dataset involves highly redundant information, for which only functions are enough to represent the entire set. With the reduced basis we reach a compression factor of just simply by computing projections of the training set.
7 Conclusion
We have introduced Arby, an open source Python package that provides a set of tools to generate and handle fast and highlyaccurate surrogate models in a nonintrusive way. Arby can be used to construct continuous models from sparse data composed by functions generated perhaps by differential equations or to explore redundancies in data by dimensional reduction. The offlineonline architecture of Arby allows for fast deployments of predictive models that can approximate functions which otherwise can be expensive to compute.
We assessed the package with unit testing tools and ensured it satisfies proper software quality assurance, which improves in robustness and readability of code. We also perform benchmarks measuring computation times of reduced bases along different parameter combinations.
To date, for surrogate modeling Arby works on 1D domains for both spaces, the parametric and the physical, though it supports multidimensional parameter domains for reduced basis and empirical interpolation. In future releases we want to extend Arby to several dimensions for surrogate modeling and combine it with stateoftheart regression methods at the fitting stages. The curse of the dimensionality present in high dimensional problems like the CMB power spectra discussed in section 6 surely become a bottleneck, so parallelized scenarios are worth to be explored in the future.
The userfriendly API of Arby expands the usability of the code to virtually anyone looking for a continuous model out from discrete data, even with little or no knowledge of ROM methods.
Acknowledgements.
The authors would like to thank to their families and friends, and also IATE astronomers and Manuel Tiglio for useful comments and suggestions. This work was partially supported by the Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET, Argentina). A.V., J.B.C and M.Ch. are supported by a fellowship from CONICET. This research has made use of the http://adsabs.harvard.edu/, Cornell University xxx.arxiv.org repository, adstex (https://github.com/yymao/adstex) and the Python programming language.References
 Abbott et al. (2016) Abbott, B. P., Abbott, R., Abbott, T. D., et al. 2016, Phys. Rev. Lett., 116, 241102
 Abbott et al. (2020) Abbott, B. P., Abbott, R., Abbott, T. D., et al. 2020, Classical and Quantum Gravity, 37, 055002
 Antil et al. (2018) Antil, H., Chen, D., & Field, S. E. 2018, Comput. Sci. Eng., 20, 10
 Ballarin et al. (2015) Ballarin, F., Sartori, A., & Rozza, G. 2015, ScienceOpen Posters
 Barrault et al. (2004) Barrault, M., Maday, Y., Nguyen, N. C., & Patera, A. T. 2004, Comptes Rendus Mathematique, 339, 667
 Blackman et al. (2015) Blackman, J., Field, S. E., Galley, C. R., et al. 2015, Phys. Rev. Lett., 115, 121102
 Blackman et al. (2017a) Blackman, J., Field, S. E., Scheel, M. A., et al. 2017a, Phys. Rev. D, 95, 104023
 Blackman et al. (2017b) Blackman, J., Field, S. E., Scheel, M. A., et al. 2017b, Phys. Rev. D, 96, 024058
 Blanchet (2006) Blanchet, L. 2006, Living Rev. Rel., 9, 4
 Boyaval et al. (2009) Boyaval, S., Bris, C. L., Maday, Y., Nguyen, N. C., & Patera, A. T. 2009, Computer Methods in Applied Mechanics and Engineering, 198, 3187
 Chaturantabut & Sorensen (2010) Chaturantabut, S. & Sorensen, D. 2010, SIAM J. Scientific Computing, 32, 2737
 Cutler & Flanagan (1994) Cutler, C. & Flanagan, E. E. 1994, Phys. Rev. D, 49, 2658
 Damour & Nagar (2011) Damour, T. & Nagar, A. 2011, Fundam. Theor. Phys., 162, 211
 Field et al. (2011) Field, S. E., Galley, C. R., Herrmann, F., et al. 2011, Phys. Rev. Lett., 106, 221102
 Field et al. (2014) Field, S. E., Galley, C. R., Hesthaven, J. S., Kaye, J., & Tiglio, M. 2014, Phys. Rev. X, 4, 031006
 Field et al. (2012) Field, S. E., Galley, C. R., & Ochsner, E. 2012, Phys. Rev., D86, 084046
 Hannam et al. (2014) Hannam, M., Schmidt, P., Bohé, A., et al. 2014, Phys. Rev. Lett., 113, 151101
 Hesthaven et al. (2015) Hesthaven, J. S., Rozza, G., & Stamm, B. 2015, Certified Reduced Basis Methods for Parametrized Partial Differential Equations, 1st edn., Springer Briefs in Mathematics (Switzerland: Springer), 135
 Hoffmann (1989) Hoffmann, W. 1989, Computing, 41, 335
 Initiative et al. (2006) Initiative, O. S. et al. 2006, 2015b.[Online]. Available: https://opensource. org/licenses/MIT.[Accessed 27 March 2017]
 Jazayeri (2007) Jazayeri, M. 2007, in 2007 Future of Software Engineering, FOSE ’07 (USA: IEEE Computer Society), 199–213
 J.H. Ahlberg & Walsh (1967) J.H. Ahlberg, E. N. & Walsh, J. L. 1967, The theory of splines and their applications (Academic Press)
 Khan et al. (2019) Khan, S., Chatziioannou, K., Hannam, M., & Ohme, F. 2019, Phys. Rev. D, 100, 024059
 Lehner & Pretorius (2014) Lehner, L. & Pretorius, F. 2014, Ann. Rev. Astron. Astrophys., 52, 661
 Lewis et al. (2000) Lewis, A., Challinor, A., & Lasenby, A. 2000, ApJ, 538, 473
 Logg et al. (2011) Logg, A., Wells, G., & Mardal, K.A. 2011, Automated solution of differential equations by the finite element method. The FEniCS book, Vol. 84
 Maday et al. (2009) Maday, Y., Nguyen, N. C., Patera, A. T., & Pau, S. H. 2009, Communications on Pure and Applied Analysis, 8, 383
 Milk et al. (2016) Milk, R., Rave, S., & Schindler, F. 2016, SIAM Journal on Scientific Computing, 38, S194
 Miller & Maloney (1963) Miller, J. C. & Maloney, C. J. 1963, Commun. ACM, 6, 58
 Okken (2017) Okken, B. 2017, Python testing with Pytest: simple, rapid, effective, and scalable (Pragmatic Bookshelf)
 Planck Collaboration et al. (2020) Planck Collaboration, Aghanim, N., Akrami, Y., et al. 2020, A&A, 641, A6
 Quarteroni et al. (2015) Quarteroni, A., Manzoni, A., & Negri, F. 2015, Reduced Basis Methods for Partial Differential Equations: An Introduction, UNITEXT (Springer International Publishing)
 Rifat et al. (2020) Rifat, N. E. M., Field, S. E., Khanna, G., & Varma, V. 2020, Phys. Rev. D, 101, 081502
 Sturani et al. (2010) Sturani, R., Fischetti, S., Cadonati, L., et al. 2010, Journal of Physics: Conference Series, 243, 012007
 Tiglio & Villanueva (2020) Tiglio, M. & Villanueva, A. 2020
 Tiglio & Villanueva (2021) Tiglio, M. & Villanueva, A. 2021, Reduced Order and Surrogate Models for Gravitational Waves
 Van Rossum et al. (2001) Van Rossum, G., Warsaw, B., & Coghlan, N. 2001, Python. org, 1565
 Varma et al. (2019a) Varma, V., Field, S. E., Scheel, M. A., et al. 2019a, Phys. Rev. Research., 1, 033015
 Varma et al. (2019b) Varma, V., Field, S. E., Scheel, M. A., et al. 2019b, Phys. Rev. D, 99, 064045
 Villanueva et al. (2021) Villanueva, A., Beroiz, M., Cabral, J., Chalela, M., & Dominguez, M. 2021, Benchmark dataset for arby
 Virtanen et al. (2020) Virtanen, P., Gommers, R., Oliphant, T. E., et al. 2020, Nature methods, 17, 261
 Walt et al. (2011) Walt, S., Colbert, S. C., & Varoquaux, G. 2011, Computing in Science and Engineering, 13, 22
Appendix A Build reduced bases
By defining

the training set ,

the projection error at parameter
where is the projector operator associated to a sized basis,

: the greedy tolerance,

GS: Orthonormalizes against a basis through a GramSchmidt procedure,

gp: the greedy points, and

rb: the reduced basis,
the Reduced Basisgreedy algorithm proceeds as follows:
Appendix B Build Empirical Interpolants
Appendix C On computing projection coefficients
The most relevant step in terms of computational cost in the RB greedy algorithm is the computation of projection coefficients, that is, step of Alg. 2. Taking full advantage of the reduced basis orthonormality
we can write projection errors as
(10) 
where are the projection coefficients. Note that
We omitted the label for simplicity. This allows for constant computational cost in the addition of a new element to the basis, since one only need to compute the projection coefficients for the basis element while storing those corresponding to the previous basis.
In practice, the orthonormalization of the basis is not perfect and carries some error due to machine precision . Therefore, inner products between basis elements write as
and this error propagates through projection error as
Then, naive implementations of this rule (saving projection coefficients to compute the next projection error) can lead to undesired error amplifications whenever and therefore wrong estimations of projection errors.
This is avoided simply by normalizing the training set. By doing so, we ensure and keep controlled all orthonormalization errors.
Arby allows for normalizing options in the Pythonreduced_basis function. Depending on whether the training set is normalized or not, the computation of the greedy projection errors is done differently in each case, guaranteeing the conditioning of the algorithm at all resolutions within machine precision.
Comments
There are no comments yet.