Estimating the values of parameters of mathematical models of dynamical systems is often formulated as an optimization task with computationally expensive objective function . Given measurements of the behavior of a dynamical system, the task is to find values of model parameters that lead to model simulation that closely fits the measurements. Computationally expensive simulation of the model is needed to assesses the discrepancy between simulated and measured behavior of the observed system. Therefore, optimization approaches to parameter estimation can highly benefit from surrogate-based optimization. The latter can in turn substantially improve the efficiency of mathematical modeling.
Surrogate-based optimization solves optimization problems in situations where the resources for evaluating the objective function are limited. In computational domains, the most common limiting resource is computation time, which becomes critical when dealing with computationally expensive objective functions. The fundamental idea of surrogate-based optimization is to replace the computationally expensive objective function with a surrogate, i.e., an computationally efficient model that approximates the value of the true objective function. To this end, surrogate-based methods employ machine learning algorithms for learning the surrogate model from training instances based on the available evaluations of the true objective function.
Surrogate-based optimization can be deployed in two different application contexts. The first assumes a very limited number of evaluations of the true objective function. The aim of the surrogate model is to guide the selection of the most promising candidate solutions for evaluation. The Bayesian optimization approach  uses the surrogate model predictions and the corresponding confidences for selecting the next candidate solution that will be evaluated with the true objective function. The computational complexity of the selection process increases proportionally with the cube of the number of previously evaluated candidate solutions.
The use of Bayesian optimization is thus prohibitive in the second application context that assumes a large number of evaluations of the true objective function. The aim of the surrogate model, in this context, is to improve the efficiency of the optimization by replacing a large portion of the evaluations of the true objective function with evaluations of its surrogate. The parameter estimation task, addressed in this paper, fits this application context. The key component of the approaches applicable in this context is the substitution strategy that, for a given candidate solution, decides whether to evaluate it with the surrogate function or the true objective function . However, the current approaches focus on maximizing the predictive performance of the surrogate model and use fixed, hard-coded substitution strategy [4, 5, 6].
In this paper, we design a meta-model framework for surrogate-based optimization with substitution strategy that dynamically adapts to the space of evaluated candidate solutions. It includes two learning components: a component for learning a surrogate model of the true objective function and a component for learning a model of the substitution strategy. Additionally, the meta-model framework encapsulates the objective function, the surrogate model, the model of the substitution strategy, the history of evaluations and the learning components in a single, yet modular entity. The important consequences of the encapsulation is that the meta-model can be seamlessly coupled with an arbitrary optimization algorithm without any modification of the algorithm or the meta model. The latter replaces the true objective function and autonomously decides what function or model to use for the evaluation of a given candidate solution.
In our previous study , we show that learning the substitution strategy improves the overall performance of surrogate-based optimization. By learning the substitution strategy, instead of using a predefined one, the meta model is capable of solving complex numerical optimization problems while significantly reducing the number of evaluations of the true objective function. In this paper, we focus on the configuration of the learning components of the meta model. In particular, we conjecture that the selection of appropriate learning algorithms for the surrogate and substitution-strategy models significantly impacts the overall performance of surrogate-based optimization.
To test the validity of the conjecture, we perform an extensive empirical evaluation of different instances of the meta-model framework. Each corresponds to a pair of learning algorithms for training the surrogate, on one hand, and the substitution-strategy models, on the other. We select among six alternative algorithms for learning predictive models previously used in the literature on surrogate-based optimization—linear regression, decision trees, nearest neighbors, support vector machines, Gaussian processes and random forests—leading to 36 meta-model instances. In the first series of experiments, performed on synthetic benchmarks
, we tune the parameters of each meta-model instance. In turn, we select the most successful instances that significantly outperform the others. The selected meta-model variants are evaluated in a second series of experiments on three real-world tasks of estimating the parameters of models of dynamical systems described by systems of coupled ordinary differential equations[1, 9].
We first describe in more detail the task of numerical optimization and parameter estimation. Next, we introduce and formally define the meta model, its components and parameters. We then lay out the setup for the empirical evaluation and report the results of our analyses. Finally, we provide a summary of conclusions and outline directions for further research.
Numerical optimization and parameter estimation
We consider the task of numerical optimization involving a single, nonlinear objective function in unconstrained, continuous space . The task is to find a solution that leads to the extremum of the objective function . The objective function can be either minimized or maximized: in the former case, the result of optimization is .
If the analytic solution for the minimum of is intractable, numerical methods are applied. These can be clustered into two groups of local and global optimization methods. While the local methods  are efficient, they suffer from myopia, i.e., the tendency to end up in a local extreme point in the neighborhood of the initial point. On the other hand, global methods  are concerned with finding the global optimum point using different strategies for sampling the solution space. To improve their efficiency, they are often coupled with surrogates.
Parameter estimation aims at finding values of the parameters of a given model of dynamical system that result in model simulation that closely fits a given set of measurements of the observed system behavior. Models of dynamical systems are usually formalized as systems of coupled ordinary differential equations , where denotes the vector of the observed state variables of the dynamical system, is the vector of the time derivatives of the state variables, is the function representing the model structure, and denotes the vector of the real-valued constant parameters of the model. Given an initial condition , i.e., the value of at the initial time point , the model simulation leads to a set of trajectories of the dynamical change of the state variables through time. Analytic solution of a system of coupled ordinary differential equations is rarely an option, so computationally expensive numerical approximation methods for integration (simulation) are typically applied.
The task of the parameter estimation (Figure 1 (A)) can be formalized as follows. Given the measured behavior of the system variables at time points , the task is to maximize the likelihood of the observed behavior leads given a particular value of , i.e., , where is a likelihood function. In practice, due to the complexity of the models, the likelihood-based function is approximated by a least-squares function , where denote the simulated behavior of the system variables at time points . Recall however, that is obtained using a computationally intensive method for integrating differential equations, often leading to inefficient optimization and poor optima.
This is especially true in the processes of discovery of knowledge about the complex behavior and function of biological systems. This often involves mathematical modeling of dynamical systems from observation data , with a key aspect being the task of parameter estimation . Regarding the choice of a parameter estimation method for problems coming from the domain of systems biology, global stochastic and hybrid methods based on metaheuristics are considered most promising in the literature [13, 14]. These methods require a large number of objective function evaluations, which makes them ideal candidates for applying surrogate-based optimization.
Meta-model framework for surrogate-based optimization
We first introduce the mathematical meta-model framework for surrogate-based optimization in its abstract form. This part is accompanied by a graphical overview of the framework depicted in Figure 1(B). We then gradually proceed by specifying the machine learning components of the framework. Finally, we introduce a relevance-based surrogate management strategy that allows the meta model autonomous decisions upon which function or model to use for evaluating a given candidate solution.
Our meta-model framework consists of four components. The first is the objective function that is the subject of optimization. The second component is the surrogate model that for a given candidate solution (or a query point) and based on the history of previous meta-model evaluations computes , the surrogate approximation of . The third component is the decision function that implements the dynamic substitution strategy: for a given and based on , it decides whether to use the objective function (output 1) or its surrogate (output 0). Finally, the fourth important component of the meta-model framework is the “history of evaluations”, where the data on past meta-model evaluations is kept. The evaluation history is a finite sequence of vectors: where is the dimension of the optimization problem and the dimension of additional data being kept in the history.
More formally, the function is defined by three functions and a history of evaluations , corresponding to the components of the meta model:
objective function ,
surrogate function ,
decision function .
evaluation history .
Given this components of the meta-model framework, the function is defined as111Note that an alternative notation will be used whenever we want to emphasize the modification of the evaluation history.
While the functions can be arbitrary black boxes, we assume that the evaluation history of the meta model is updated after every evaluation. For our current needs, the evaluation history records the query point , the result of the meta model and the value of the decision function . If we denote the -th evaluation of the meta model at the point with and the next one with , then is the extension of with the vector . The last two values are used to define the targets in the data sets for learning the predictive models in the components and .
Algorithm 1 presents the meta model for evaluating a given candidate solution . First, the meta model checks whether there are enough evaluations of the true objective function (variable nEvalsF) in the evaluation history for training (or re-training) the surrogate and relevator models (functions and , see the subsection on the relevator below). To this end, parameters , , and are being used; for further explanation check the next subsection on the surrogate. Next, if the predictive models are not available yet, the meta model opts for evaluating the true objective function (variable decision has a value of 1). Otherwise, based on the evaluation history and the parameter , it updates the decision threshold (see the subsection on the relevator) and uses it to decide whether to use the surrogate model (equivalent to in the pseudo code) or the true objective function to evaluate . It stores the evaluation in the variable value, which is the result of the meta model function. The meta model updates the counter nEvalsF that keep track of the number of evaluations of the true objective function.
The surrogate function takes care of learning, updating and evaluating the surrogate predictive model . There are three important aspects to be considered when constructing a good surrogate function: the type of the predictive model, the size of the training set used for its initialization and the frequency of the model updates.
We aim at selecting a surrogate predictive model that closely approximates the objective function and can be evaluated efficiently. Because we do not wish to additionally sample data for , we only rely upon the data stored in the evaluation history . This also has a further benefit when working with population-based methods as the samples are more concentrated in the current area of our population, where we want better accuracy of our prediction model. Moreover, the efficiency of the surrogate function depends upon the trade-off between the frequency of surrogate learning and the size of the training set. Having a high update frequency is desirable since the surrogate then always takes into account the most recent history of evaluations. On the other hand, frequent surrogate updates are unprofitable unless the learning time is fairly low compared to the evaluation time of the true objective function. To this end, we introduce user-defined parameters that determine the number of true object evaluations between the consecutive surrogate updates (parameters and in the Algorithm 1).
A larger training set substantially slows down the algorithm by increasing the time needed to learn the surrogate model, which in turn negatively impacts the time performance of the meta-model. We attempt to solve the problem with filtration of the history of evaluations. In order to minimize the error of our prediction model, only evaluations of are used, which are easily extracted from the evaluation history because we additionally keep the values of (records from the evaluation history are extracted, where
). Furthermore, when using the meta model in conjunction with a population-based method, the population slowly converges towards the minimum of the true objective function. We want the surrogate to be more precise in the current area of the search, which coincides with the most recent evaluation points. By focusing only on the most recent evaluations, we additionally emphasize points with a lower value of the objective, meaning less noise from the high value outliers, which are less relevant for learning the surrogate, since they correspond to non-optimal points. Therefore, in our implementation, the training set includes a user-defined number of the most recent points from the history of evaluations (parametersand in the Algorithm 1). Additional filtration schemes are possible, for instance considering only the points with the lowest values of the objective function or a combination of the two. While all schemes are subsumed by adding a weight function, that introduces a large amount of additional parameters.
Selecting a suitable decision function is of vital importance. As shown in our previous work a simple uninformed decision function, which uses the step number as it’s only argument, performs poorly on harder optimization problems. One possible way of constructing a more successful decision function is by predicting how relevant the point will be for our optimization algorithm. It is based on the idea that points of high “relevance” for the optimization algorithm need to be predicted more accurately in order not to slow down progress. Thus, the evaluation of the most relevant points should be performed using the true objective function while less relevant points can be evaluated using the surrogate model.
Taking a decision based on the relevance of a point, brings up two issues. First, how do we formally define point relevance and second, how can the relevance of a point be estimated before evaluating the objective function. The point relevance can be calculated with a function passed to the meta model as an argument. However, in our current implementation, we define the relevance of the point relative to the lowest value of the objective function seen so far: the closer is the value to the lowest value, the higher is its relevance. In particular, if we use to denote the vector of values of in the evaluation history, we define the point relevance as
The relevance of a point is bound to the interval where the value of corresponds to a point of low relevance and to a point of high relevance and points close the current average value get mapped close to .
Having defined the point relevance, we have to resolve the remaining issue of its estimation without evaluation of the objective function. We could use the same prediction model as to approximate and then calculate the relevance by simply replacing with in the formula above. Instead, we decided to learn a separate model for predicting the point relevance. This not only gives us a much wider array of possible meta models but also allows us to dynamically adapt our relevance prediction model to the optimization task at hand. We refer to this model as the relevator.
To reduce the number of the framework parameters, when constructing the training set for the relevator, we have decided to reuse the same filtering scheme as with the surrogate. Thus, to learn the relevance function, we construct the vector by using the evaluation history . However, we decided to only include values of present in our training set. Using the whole history of evaluations tends to increase the average of making it difficult to distinguish relevant points as their relevance moves closer towards . This problem is reduced by only using “recent values” present in out filtered train set.
As with the surrogate function , a call of the relevator not only predicts the value of , where is taken from the filtered , but also learns and updates the relevance prediction model whenever needed. In order to reduce the number of parameters we reuse the surrogate parameters for update frequency.
In addition to the relevator, the decision function includes a decision threshold that distinguishes the points with high relevance, which should be evaluated using the true objective, from points with low relevance, which should be evaluated with the surrogate. To allow for the dynamic change of the threshold value , we define it as a function of the evaluation history .
Thus, the decision function of a relevator meta model is the indicator function , where is the relevance estimate of point given the history of evaluations and is a dynamical relevance threshold function.
We implement the dynamical relevance threshold using an iterative update procedure with the goal to control and locally adjust the frequency of surrogate evaluations. By considering the user-defined number of most recent evaluations, we can either raise or lower the threshold after every meta model evaluation in order to increase or decrease the frequency of surrogate evaluations to (locally) achieve the desired substitution rate.
In this section, we present the setup and the results of two series of experiments with the proposed meta-model framework. In the first series, we use 45 standard benchmark problems for numerical optimization to tune the parameters and evaluate the performance of the framework with different meta-model instances. Based on the comparison of their performance, we identify the machine learning methods that lead to suitable surrogate and substitution-strategy (relevator) models. The most successful among them are evaluated on a second series of experiments on three tasks of estimating the parameters of three real-world models of dynamical systems from the domain of systems biology.
Meta-model tuning and selection
The construction of both the surrogate and the relevator functions for the meta model can be readily framed as a regression problem. We considered combinations of six different methods for learning: linear regression (LINEAR), regression tree with variance reduction and reduced-error pruning (TREE), k-nearest neighbors with k=5 (KNN), Gaussian processes with squared exponential covariance (GP),-SVM with RBF kernel (SVM) and Random Forest with 100 trees (RF). We used the default parameters from the Weka implementation for each method .
For each of the 36 surrogate-relevator combinations, we tuned the five parameters of the meta model, by using COCO, the platform for comparing numerical optimization methods in a black-box setting . The parameters were tuned using a grid search with values as shown in Table 1. The parameters and are used to calculate the training set size and the parameters and are used to calculate the interval for rebuilding the surrogate as and . Both sizes are relative to the dimension of the problem . The last parameter is , the desired rate of substitution of the objective function with the surrogate model.
|0, 10, 25, 50|
|100, 200, 500|
|0, 10, 25, 50|
|50, 100, 200|
|0.3, 0.4, 0.5, 0.6, 0.7, 0.8|
The widely-used COCO platform contains a set of black-box optimization functions that are used as benchmarks problems for numerical optimization. From this set, we selected 15 functions from three different classes of problems that resemble real-world parameter estimation tasks: uni-modal functions with high conditioning, multi-modal functions with adequate global structure and multi-modal functions with weak global structure. Within each class the functions differ by levels of deceptiveness, ill-conditioning, regularity, separability and symmetry. Each optimization function can be generalized to a different number of dimensions. We consider instances of each function in 5, 10 and 20 dimensions. Thus, we use the total of 45 benchmark optimization functions.The best set of parameters for each surrogate-relevator pair is selected that maximizes the improvement in performance relative to a baseline optimization method without surrogates.
Regarding the choice of a parameter estimation method, global stochastic and hybrid methods based on metaheuristics are considered as most promising in the literature [13, 14]. Out of the many different methaheuristic methods, Evolutionary Strategies and Differential Evolution have been identified as the most successful [16, 17] in the intended domain of application in this work – systems biology. We use Differential Evolution (DE)  as the baseline optimization method. It is a staple evolutionary and population-based method that has consistently shown robust and reliable performance on various problems in many domains .
For each of the 45 functions we establish the baseline performance of DE without meta model by running it with a budget of evaluations. The performance improvement of a meta-model on a single benchmark function is then calculated as . Note that represents the number of evaluations needed to reach the minimum value without surrogates, while is the number of true function evaluations needed to reach the same minimum of using the meta model. The overall performance improvement of a meta-model is then calculated as , where is the average performance improvement on all 45 functions and is the proportion of all functions with .
For each surrogate-relevator pair we selected the parameters that resulted in the best performance improvement. While we cannot draw any firm conclusions about the best size of the training set and the rebuild interval, the results show a potential of the meta model to achieve a very high replacement rate of evaluations of the true objective function with evaluations of the surrogate function. For parameters and all possible values were selected equal amount of times as the best configuration. The lowest values were most frequently selected for and (17/36 and 14/36). Most importantly, the highest possible value of 0.8 or the parameter was selected in 24 out of 36 best configurations.
To analyze the impact of the selection of the algorithms for learning the surrogate and the relevator model, we performed Friedman’s rank sum test and a Nemenyi post-hoc analysis 
. To perform the tests, we grouped the methods in two ways. First, we grouped the meta-model instances in 6 groups according to the algorithm used for learning the surrogate. Second grouping is based on the algorithm used for learning the substitution-strategy model. For both groupings, the Friedman test validates the null hypothesis that all the groups of meta-model instances perform equally well. In both cases, the null hypotheses is rejected with a p-value of less than, leading to a conclusion that the selection of the algorithm for learning the surrogate or the relevator has a significant impact on the performance of the meta model. Furthermore, the post-hoc analysis uses the Nemenyi test to investigate the significance of the differences by calculating the critical distance between the average ranks of the groups. The performance of the two groups significantly differ, if their difference is larger or equal to the critical distance. In both groupings, the critical distance at the significance level of 0.05 equals 0.4594.
Figure 2 employs average-rank diagrams to the summarize the results of the comparative analysis. The horizontal axis of the average-rank diagram corresponds to the rank of the group of meta-model instances: the top-ranked group is on the left-most position on the axis. The line above the axis, labeled CD, depicts the value of the critical distance. The groups of meta-model instances with utilities that are not significantly different are connected with thick lines below the axis.
The average-rank diagram in Figure 2(A) groups the meta-model instances according to the algorithm used to learn the surrogate. Considering the significance of the pair-wise differences between the groups, we can only exclude linear regression as an algorithm leading to meta-model instances with significantly inferior performance. The average-rank diagram in Figure 2(B) groups the meta-model instances with respect to the algorithm for learning the substitution-strategy model. Meta-model instances using random forest, nearest neighbors and support vector machines for learning the relevator significantly outperform other meta-model instances using the other three algorithms. The comparison of the two graphs reveals that the meta model is more sensitive to the selection of the algorithm for learning the relevator then the selection of the algorithm for learning the surrogate. This is an important insight showing that the choice of the model for the dynamical substitution strategy adapted to the problem at hand has a significant impact on the utility of the surrogate-based optimization.
Estimating the parameters of dynamical biological systems
We are interested in the performance of the meta-model in the case of the real-world problem of estimating the parameters of models of dynamical biological systems. We selected three dynamical biological systems with varying degrees of complexity (shown in Figure 3). The three systems have been well studied in terms of their dynamical properties and identifiability [21, 22].
The first system is a synthetic oscillatory network of three protein-coding genes interacting in an inhibitory loop, known as the Repressilator, modeled by . The system is modeled by six variables and four constant parameters. The time-series data for this problem is obtained by numerical integration of the system of ordinary differential equations using parameter values reported by  for 30 integer time-points. The objective function used is the sum of squared errors between the simulated trajectories of the model and the available data.
The second system is a metabolic pathway representing a biological NAND gate modeled by . The model is represented by a set of five ODEs with 15 constant parameters. The observation data for the metabolic pathway model was obtained from . It consists of 12 sets of observations obtained by simulating the model using 12 different pairs of input step functions sampled uniformly at 7 time points. The objective function used is the negative log-likelihood of the simulated trajectories of the model and the observations, summed across all datasets.
The third system is a genetic network modeled by . The system is represented as a five variable S-system model with 23 constant parameters. The observation data for the S-system model was also obtained from . It consists of 10 sets of observations obtained by simulating the model using 10 different sets of initial conditions for all variables sampled uniformly at 11 time points. The objective function is a log-transformation of the negative log-likelihood function, with preserved order and mapping to . As in the previous system, the objective function was summed across all datasets.
To establish a baseline, we ran Differential Evolution (DE) without a meta-model on each problem with a budget of evaluations, where is the dimensionality of the problem, i.e., the number of constant parameters in the system. For the meta-model we considered the 5 methods for learning a surrogate and the 3 methods for learning a relevator function, that were shown to have statistically significant performance improvements over the other methods. In all cases the parameter estimation was repeated 10 times with different random seeds.
Figure 4 shows the transposed convergence curves for the optimization runs (minimum of 10 restarts) without using a meta-model (DE) and using a meta-model with different surrogate-relevator pairs. In general, for a given budget of evaluations of the true objective function, the optimization with a meta-model achieves lower minima than optimization without a meta-model for all problems.
More importantly, the meta-model achieves a significant improvement in the speed of convergence. To compare the convergence behavior of the meta-model to the baseline (DE) across all problems, we performed Page’s trend test for ordered alternatives as proposed by 
on 20 uniformly distributed cut points along the log-transformed convergence curves. We test the null hypothesis that the difference between two curves (minimum across 10 restarts) does not increase with the number of evaluations, i.e. there is no difference in the speed of convergence. Table2 shows the p-values obtained for the different surrogate-relevator pairs.
Page’s trend test indicates that in terms of improvement in the speed of convergence, Random Forest is the superior choice for the relevator of a meta-model when combined with any choice of method for learning the surrogate function. The use of the k-nearest neighbors method for the relevator also results in more efficient convergence, however only with a more limited choice of methods for learning the surrogate function.
Both Random Forest and k-nearest neighbors are conservative estimators in that they are limited in their ability to extrapolate predictions for candidate solutions with feature values outside the space covered by solutions in the training set. We conjecture that this property of the surrogates and the relevator is exploited by the optimization method to efficiently explore non-optimal parts of the objective space, which improves the convergence. This property also reduces the possibility of error from evaluating the surrogate function when exploring parts of the solution space that have high potential for optimality.
We further analyze the performance improvement of the surrogate-relevator pairs. Table 3 shows the values achieved by the meta-model with different surrogate-relevator pairs. The meta-model achieves a remarkable performance improvement with an average reduction of up to 77% of the number of true function evaluations (RF-RF). On individual problems the meta-model achieves up to 94% performance improvement (SVM-RF).
As was the case for the improvement in speed of convergence, the best performing relevator function is Random Forest closely followed by k-nearest neighbors. It is compelling that the best performing surrogate function on average is a simple regression tree closely followed by SVM.
Regarding the performance achieved on individual problems, it is worth noting that the SVM relevator is unable to improve the optimization performance on the simplest problem, Repressilator. The performance improvement achieved on the other problems by using the SVM as relevator is on par with others. The Gaussian processes surrogate exhibits the same issue.
The best performing surrogate-relevator pairs are heterogenious. Learning the surrogate function and the relevator function are clearly independent tasks that require different learning methods. Overall, given the results of the empirical evaluation and taking into account the computational time needed by the different learning methods, we recommend choosing a strong and robust learner for the relevator function such as Random Forest. The computational cost can be leveraged by the choice of a simpler learner for the surrogate function without compromising the performance.
In the literature on surrogate-based optimization, the substitution strategy is referred to as a surrogate management strategy . Figure 5 depicts the clustering of the state-of-the-art surrogate-based optimization methods into two classes of wrapper (B) and embedded (C) methods. To familiarize with the figure, consider first the simple situation of a numerical optimization algorithms that do not use surrogates (A). In such an environment, the optimization method interacts only with the true objective function by requesting numerous evaluations of candidate solutions . At the end, the method reports the optimal solution that minimizes (or maximizes) the value of .
Wrapper methods place the surrogate management strategy outside the optimization method. Following this approach, the wrapper first initializes the surrogate using a sample of candidate solutions and their respective objective evaluations . In consecutive iterations, the wrapper first runs the optimization method on the surrogate model , obtaining a solution . Next, it evaluates the solution using the true objective function. Finally, the solution and its evaluation are then added to the training set and the surrogate model is updated (re-learned) before running the next iteration. Note that wrapper method use fixed surrogate management strategy that in encoded in the wrapper. Recent development of the wrapper-approach methods include the methods for constrained numerical optimization COBRA  and SOCOBRA .
Embedded methods rely on encoding the substitution strategy within the optimization method. Following this approach, the decision whether to use the surrogate or the true objective function is based on the various artifacts of the algorithm . In particular, population-based evolutionary optimization methods use the surrogate model to evaluate the offspring candidates for the next generation of individuals. On the other hand, the selection of the top candidates to be actually included in the next generation, is based on the evaluation of the true objective function . A simpler, generation-based management strategy evaluates the surrogate function in some generations, and the true objective function in others. Following the embedded approach, numerous new variants of the classic optimization methods in general  and [28, 6] in particular have been developed.
In sum, the comparison of the wrapper and embedded class of methods, depicted in Figure 5, shows the following. Wrapper approaches are inflexible when it comes to the substitution strategy, since it forces the evaluation of the surrogate function within the wrapped optimization method, while the true objective function can only be evaluated from outside the method. On the other hand, the embedded approaches are more flexible, but the decision function relies directly on the current state of the core optimization algorithm. Also, it requires re-implementation or modification of an existing implementation of the base optimization method.
The proposed meta-model framework for surrogate-based optimization combines the simplicity of the wrapper approaches with the flexibility of the embedded approaches. On one hand, similarly to the wrapper methods, the meta model can be coupled with any core optimization method since it is used as a black box (see Figure 5(D)). In contrast to the other wrapper methods, the surrogate model and the substitution strategy are coupled together with the true objective function independently from the optimization algorithm. On the other hand, similarly to the embedded methods, the substitution strategy of the meta model is more flexible. While embedded approaches base the substitution decision on the artifacts of the optimization algorithm, the meta-model substitution strategy dynamically adapts to the solution space of the optimization problem at hand.
The main contribution of this paper is the novel meta-model framework for surrogate-based optimization. In contrast with the prevailing focus of the existing surrogate-based optimization methods on learning accurate surrogate models, the proposed framework involves two learning components. One for these learns the surrogate model and the otherlearns the decision function that takes the decision on when to substitute the true objective function with the surrogate model.
The results of the empirical evaluation of the meta-level framework confirm our initial hypothesis that the selection of appropriate surrogate and decision-function models can have significant influence on the overall performance of surrogate-based optimization. Moreover, the results show that the meta-model performance is more sensitive to the selection of the decision-function model: while almost all learning algorithms (except linear regression) lead to useful surrogate models, only random forests, nearest neighbors and support vector machines are appropriate in the role of decision-function models.
Another contribution of the paper is a novel surrogate-based approach to estimating the parameters of ordinary differential equations. On the three parameter estimation tasks with different complexity, we showed that the use of a meta model improves the efficiency of optimization. In particular, the use of the relevator meta model for surrogate-based optimization significantly and efficiently improves the convergence rate and the final result of the optimization when considering a limited number of evaluations of the true objective function.
The presented meta-model framework significantly contributes to the current machine learning literature by establishing a new paradigm for coupling optimization and machine learning methods. While most of the studies in the machine learning literature are currently based on the sequential model-based optimization (SMBO) paradigm , the proposed framework opens a whole new avenue of research, rich with opportunities for coupling surrogate models with other state-of-the-art optimization methods. The proposed framework is ready to be applied in the context of the currently very active machine learning research on hyper-parameter optimization , algorithm configuration , and to other meta-learning tasks .
The current conceptualization of the meta-level framework is limited to single-objective, unconstrained optimization. Its generalization towards dealing with multiple objective function and/or constraints represent two possible directions for further research. Despite the fact that we have applied the framework to numerical optimization only, it is general enough to address also combinatorial or mixed optimization problems. The evaluation of the framework, presented in this paper is also limited to its coupling with the Differential Evolution method. While this is a typical representative of a more general class of population-based optimization methods, further experimental evaluation is necessary to establish its generality with respect to the selection of the base optimization algorithm. Finally, further evaluation include comparative analysis of the performance of the meta-model framework relative to the performance of wrapper and embedded surrogate approaches.
The authors acknowledge the financial support of the Slovenian Research Agency (research core funding No. P2-0103, No. P5-0093 and project No. N2-0056 Machine Learning for Systems Sciences) and the Slovenian Ministry of Education, Science and Sport (agreement No. C3330-17-529021).
J.T. and L.T. initiated the work. Ž.L. implemented the meta-modeling framework. J.T. and L.T. conceived the experiments, J.T. and Ž.L. performed them. Ž.L., J.T. and L.T. analyzed the results. J.T. and L.T. drafted the manuscript, S.D. provided critical advice in different phases of manuscript revision. All authors reviewed and approved the final manuscript.
The source of the implementation of the meta-model framework for surrogate-based optimization, the models and the data used in the experiments are freely available at http://source.ijs.si/zluksic/metamodel.
Competing interests: The authors declare no competing interests.
-  Jaqaman, K. & Danuser, G. Linking data to models: data regression. Nature Reviews Molecular Cell Biology 7, 813–819 (2006).
-  Jones, D. R., Schonlau, M. & Welch, W. J. Efficient global optimization of expensive black box functions. Journal of Global Optimization 13, 455––492 (1998).
Surrogate-assisted evolutionary computation: Recent advances and future challenges.Swarm and Evolutionary Computation 1, 61–70 (2011).
-  Bagheria, S., Konena, W., Emmerich, M. & Bäck, T. Solving the g-problems in less than 500 iterations: Improved efficient constrained optimization by surrogate modeling and adaptive parameter control (2015). URL http://arxiv.org/abs/1512.09251.
-  Das, S., Mullick, S. S. & Suganthan, P. N. Recent advances in Differential Evolution - an updated survey. Swarm and Evolutionary Computation 27, 1–30 (2016).
-  Rammohan, M. & Minho, L. An evolving surrogate model-based differential evolution algorithm. Applied Soft Computing 34, 770–787 (2015).
-  Lukšič, Ž., Tanevski, J., Džeroski, S. & Todorovski, L. General meta-model framework for surrogate-based numerical optimization. In Yamamoto, A., Kida, T., Uno, T. & Kuboyama, T. (eds.) Discovery Science, 51–66 (Springer International Publishing, Cham, 2017).
-  Hansen, N., Auger, A., Mersmann, O., Tušar, T. & Brockhoff, D. COCO: A platform for comparing continuous optimizers in a black-box setting. arXiv 1603.08785 (2016).
-  Kirk, P., Silk, D. & Stumpf, M. P. H. Uncertainty in Biology: A Computational Modeling Approach, chap. Reverse Engineering Under Uncertainty, 15–32 (Springer, 2016).
-  Nocedal, J. & Wright, S. J. Numerical Optimization (Springer, 2006), 2nd edn.
-  Pintér, J. D. Global Optimization in Action: Continuous and Lipschitz Optimization: Algorithms, Implementations and Applications (Springer, 1995).
-  Murray, J. D. Mathematical Biology (Springer, 1993), 2nd edn.
-  Ashyraliyev, M., Fomekong-Nanfack, Y., Kaandorp, J. A. & Blom, J. G. Systems biology: parameter estimation for biochemical models. FEBS Journal 276, 886–902 (2009).
-  Chou, I. & Voit, E. O. Recent developments in parameter estimation and structure identification of biochemical and genomic systems. Mathematical Biosciences 219, 57–83 (2009).
-  Witten, I. H., Frank, E., Hall, M. A. & Pal, C. J. Appendix B - The WEKA workbench. In Data Mining, 553–571 (Morgan Kaufmann, 2016), fourth edition edn.
-  Sun, J., Garibaldi, J. M. & Hodgman, C. Parameter estimation using metaheuristics in systems biology: A comprehensive review. IEEE/ACM Transactions on Computational Biology and Bioinformatics 9, 185–202 (2012).
Tashkova, K., Korošec, P.,
Šilc, J., Todorovski, L. &
Parameter estimation with bio-inspired meta-heuristic optimization: Modeling the dynamics of endocytosis.BMC Systems Biology 5, 1–26 (2011).
-  Storn, R. & Price, K. Differential Evolution - a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 11, 341–359 (1997).
-  Chakraborty, U. K. Advances in Differential Evolution (Springer Publishing Company, Berlin, Heidelberg, 2008), 1 edn.
Statistical comparisons of classifiers over multiple data sets.Journal of Machine Learning Research 7, 1–30 (2006).
-  Buse, O., Pérez, R. & Kuznetsov, A. Dynamical properties of the repressilator model. Physical Review E 81, 066206 (2010).
-  Gennemark, P. & Wedelin, D. Efficient algorithms for ordinary differential equation model identification of biological systems. IET Systems Biology 1, 120–129 (2007).
-  Elowitz, M. B. & Leibler, S. A synthetic oscillatory network of transcriptional regulators. Nature 403, 335–338 (2000).
-  Arkin, A. & Ross, J. Statistical Construction of Chemical Reaction Mechanisms from Measured Time-Series. The Journal of Physical Chemistry 99, 970–979 (1995).
Kikuchi, S., Tominaga, D.,
Arita, M., Takahashi, K. &
Dynamic modeling of genetic networks using genetic algorithm and s-system.Bioinformatics 19, 643–650 (2003).
Derrac, J., García, S.,
Hui, S., Suganthan, P. N. &
Analyzing convergence performance of evolutionary algorithms: A statistical approach.Information Sciences 289, 41–58 (2014).
-  Regis, R. G. Engineering Optimization 46, 218–243 (2013).
-  Su, G. Gaussian process assisted differential evolution algorithm for computationally expensive optimization problems. In Proceedings of the IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application, 272–276 (2008).
Wistuba, M., Schilling, N. &
Scalable gaussian process-based transfer surrogates for hyperparameter optimization.Machine Learning 107, 43–78 (2018).
-  Eggensperger, K., Lindauer, M., Hoos, H., Hutter, F. & Leyton-Brown, K. Efficient benchmarking of algorithm configurators via model-based surrogates. Machine Learning 107, 15–41 (2018).
-  Brazdil, P. & Giraud-Carrier, C. Metalearning and algorithm selection: progress, state of the art and introduction to the 2018 special issue. Machine Learning 107, 1–14 (2018).