Errata
The original article appeared in Proceedings of the 17th Conference on Information Technologies—Applications and Theory (ITAT 2017) Martinské hole, Slovakia, September 22–26, 2017, published in print by CreateSpace Independent Publishing Platform and online by CEUR Workshop Proceedings.
1 Introduction
The problem of optimization of realvalued functions without a known mathematical expression, arising in many engineering tasks, is referred to as continuous blackbox optimization. Evolutionary strategies, a class of randomized populationbased algorithms inspired by natural evolution, are a popular choice for continuous blackbox optimization. Especially the Covariance Matrix Adaptation Evolution Strategy (CMAES) [6] is considered the stateoftheart continuous blackbox optimizer of the several past decades. Since values of a blackbox function can only be obtained empirically and at considerable costs in practice, the number of function evaluations needed to obtain a desired function value is a key criterion for evaluating blackbox optimizers.
The technique of surrogate modelling aims at saving function evaluations by building a surrogate model of the fitness and using that for a portion of function evaluations conducted in the course of the evolutionary search. Several surrogate modelassisted versions of the CMAES have been developed (see [13] for a recent comparison of some of the most notable algorithms). Surrogate CMAES (SCMAES) [2] utilizes random forests or Gaussian processesbased surrogate models, which possess an inherent capability to quantify uncertainty of the prediction.
In order to control surrogate model’s error, SCMAES uses the surrogate model for a given number of generations before a new instance of the model is trained on a population evaluated with the fitness, which is a strategy called generationbased evolution control [9]. In [2], two values, in particular , have been benchmarked on the COCO/BBOB framework. In many cases, the higher value of outperformed the lower one in earlier phases of the optimization, but the reverse order was observed towards later phases of the optimization.
The ACMES algorithm [11]
introduced an adaptive evolution control adjusting surrogate hyperparameters and
lifelength, i. e., the number of modelevaluated generations, as a function of previous model’s error.In this paper, we use the procedure for adjusting from ACMES in connection with three different surrogate model error measures. The three SCMAES versions are compared on the COCO/BBOB framework. We restrict our attention to SCMAES with Gaussian processes, since they outperformed random forestbased surrogates [2].
2 Surrogate CMAES
The CMAES operates on a population of
candidate solutions sampled from a multivariate normal distribution:
(1) 
where is the normal distribution function; and
are the mean and the covariance matrix of the estimated search distribution, respectively; and the
is the overall search step size. The candidate solutions are ranked according to their fitness values:(2) 
Upon a (weighted) selection of highest ranked points, the mean and the covariance matrix of the multivariate normal distribution are adapted according to a procedure that takes as input, among other variables, a cumulation of the past search steps [5]. The SCMAES modifies the CMAES by replacing its sampling (1) and fitnessevaluation (2) steps with a procedure depicted in Algorithm 1.
Depending on the generation number , the procedure evaluates all candidate solutions either with the real fitness or with the model. In each case, the sampling of the estimated multivariate normal distribution is unchanged (step 1).
If the population is originalfitnessevaluated (step 3), the new evaluations are saved in an archive of known solutions (step 4). Afterwards, a new model is trained on a set of points within the Mahalanobis distance from the current CMAES distribution (step 5).
In modelevaluated generations, the fitness values of the whole population of candidate solutions are estimated by the model (step 9).
2.1 Gaussian Processes
A Gaussian process (GP) is a collection of random variables
, such that any finite subcollection has an dimensional normal distribution. A Gaussian process is defined by a mean function (often assumed to be zero) and a covariance function , whereis a vector of parameters of
, hence hyperparameters of the Gaussian process. Given a set of training data , the covariance matrix of a GP prior is , where is a matrix given by for all ;is the variance of an additive, i. i. d. noise and
is a identity matrix. Given a new point , Gaussian process regression is derived by conditioning the joint normal distribution of on the prior, which yields a univariate Gaussian (see [14] for more details). The hyperparameters of a GP regression model are estimated using the maximum likelihood estimation method.3 Adaptive Evolution Control for Surrogate CMAES
The generationbased evolution strategy optimizes the fitness function and the surrogate model thereof in certain proportion. On problem areas that can be approximated well, a surrogateassisted optimization might benefit from frequent utilization of the model, while on areas that are hard for the surrogate to approximate, frequent utilization of the model might degenerate the performance due to the model’s inaccuracy.
Adaptation of the number of model evaluated generations (in addition to other surrogate model parameters that we don’t investigate here) depending on the previous model’s error has been proposed in ACMES [11].
Let be a generation that is marked as originalfitnessevaluated, and a newlytrained surrogate model . If is the first surrogate trained so far, put . Otherwise, an error of a previous surrogate model is estimated on the newly evaluated population (Algorithm 2). The error is then mapped into a number of consecutive generations , for which the surrogate will be used (Algorithm 3).
We investigate three approaches for expressing surrogate model error. As the CMAES depends primarily on the ranking of candidate solutions, the first two approaches, Kendall correlation coefficient and Rank difference are based on ranking. The third one, previously proposed in [12], uses KullbackLeibler divergence a. k. a. information gain to measure a difference between a multivariate normal distribution estimated from the fitness values and a multivariate normal distribution estimated for the predicted values .
Kendall rank correlation coefficient
Kendall rank correlation coefficient measures similarity between two different orderings of the same set. Let and be the sequences of the fitness values and the predicted values of a population , respectively. A pair of indices , such that , is said to be concordant, if both and or if both and . A discordant pair is one fulfilling that both and or both and . Let and denote the number of concordant and discordant pairs of indices from , respectively. The Kendall correlation coefficient between vectors and is defined as:
In the corresponding branch of Algorithm 2, the value is decreasingly scaled into interval .
Ranking difference error
The ranking difference error is a normalized version of a measure used in [10]. Given the rank of the th element of and the rank of the th element of , the ranking difference error is the sum of elementwise differences between and taking into account only the bestranked points from :
where is the group of all permutations of set .
KullbackLeibler divergence
KullbackLeibler divergence from a continuous random variable
with probability density function
to a continuous random variable with probability density function is defined as:For two multivariate normal distributions and with the same dimension , the KullbackLeibler divergence from to is:
The algorithm of model error estimation (Algorithm 2) in generation computes KullbackLeibler divergence from a CMAestimated multivariate normal distribution w. r. t. fitness values to a CMAestimated multivariate normal distribution w. r. t. predicted values . Procedure cma_update in steps 7 and 8 refers to one iteration of the CMAES from the point when a new population has been sampled. The result is normalized by the historical maximum (step 13).
Adjusting the number of model generations
The model of dependence of the number of consecutive model generations on the model error (Algorithm 3) is almost identical to the approach used in [11]. The history of surrogate model errors is exponentially smoothed with a rate (step 1). The error is truncated at a threshold so that resulting for all values (step 3). In contrast to [11], we consider two different transfer functions (plotted in Figure 1) that scale the error into the admissible interval :
(3)  
(4) 
Both functions are defined on , moreover, and for . Transfer function
is a simple sigmoid function defined to be slightly less sensitive near the edges than in the middle. More control can thus be achieved in the regions of low and high error values. The parameter
determines the steepness of the sigmoid curve.4 Experimental Setup
The proposed adaptive generationbased evolution control for the SCMAES with three different surrogate model error measures is evaluated on the noiseless testbed of the COCO/BBOB (Comparing Continuous Optimizers / BlackBox Optimization Benchmarking) framework [7, 8] and compared with the SCMAES and CMAES.
Each function is defined everywhere on and has its optimum in for all dimensionalities . For every function and every dimensionality,
trials of the optimizer are run on independent instances, which differ in linear transformations of the
space or shifts of the space. In our experiments, instances recommended for BBOB 2015 workshop, i. e., , were used. Each trial is terminated when the is reached within a small tolerance or when a given budget of function evaluations, in our case, is used up. Experiments were run for dimensionalities , , , and . The algorithms’ settings are summarized in the following subsections.4.1 CmaEs
The CMAES results in BBOB format were downloaded from the BBOB 2010 workshop archive ^{1}^{1}1http://coco.gforge.inria.fr/dataarchive/bbob/2010/. The CMAES used in those experiments was in version 3.40.beta and utilized a restart strategy (known as IPOPCMAES), where the population size is increased by factor IncPopSize after each restart [1]. The default parameter values employed in the CMAES are , , , .
4.2 SCmaEs
The SCMAES was tested with two numbers of modelevaluated generations, (further denoted as “GP1”) and (“GP5”). All other SCMAES settings were left as described in [2]. In particular, the Mahalanobis distance was , the starting values of the Matérn covariance function were and the starting value of the GP noise parameter was . If not mentioned otherwise, the corresponding settings of adaptive versions of the SCMAES are as just stated.
In order to find the most promising settings for each considered surrogate error measure, a full factorial experiment was conducted on one half of the noiseless testbed, namely on functions for . The discretization of continuous parameters is reported in Table 1. All possible combinations of the parameters were ranked on the selected functions according to the lowest achieved (see Section 6) for different numbers of function evaluations . The best settings were chosen according to the highest sum of st rank counts. Ties were resolved according to the lowest sum of ranks. All of the best settings included maximum modelevaluated generations . The remaining of the winning values are summarized in the following paragraphs.
Kendall correlation coefficient (ADAKendall)
Transfer function , error threshold and update rate .
Ranking difference error (ADARD)
The same, except transfer function was .
KullbackLeibler divergence (ADAKL)
Transfer function , error threshold and update rate .
5 CPU Timing
Algorithm  

ADAKL  
ADAKendall  
ADARD 
In order to assess computational costs other than the number of function evaluations, we calculate CPU timing per function evaluation for each algorithm and each dimensionality. Each experiment was divided into jobs by dimensionalities, functions and instances. All jobs were run in a single thread on the Czech national grid MetaCentrum. The average time per function evaluation for each algorithm and each tested dimensionality is summarized in Table 2.
6 Results
Dim  

13  1  13  1  13  1  13  1  13  1  
CMAES  
GP1  
GP5  
ADAKL  
ADAKen  
ADARD  
We test the difference in algorithms’ convergence for significance on the whole noiseless testbed with the nonparametric Friedman test [3]. The algorithms are ranked on each BBOB function with respect to medians of logscaled minimal distance from the function optimum, denoted as , at a fixed budget of function evaluations.
To account for different optimization scenarios, the test is conducted separately for all considered dimensionalities of the input space and two function evaluation budgets, a higher and a lower one. Let be the smallest number of function evaluations at which at least one algorithm reached the target, i. e., satisfied , or if the target has not been reached. We set the higher budget for the tests to and the lower budget to .
Mean ranks from the Friedman test are given in Table 3. The critical value for the Friedman test is .
The mean ranks differ significantly for all tested scenarios except for both tested numbers of function evaluations in and the higher tested number of function evaluations in . Starting from upwards, the lowest mean rank is achieved either by ADAKendall or ADARD at both tested .
In order to show pairwise differences, we perform a pairwise comparison of the algorithms’ average ranks by the posthoc Friedman test with the BergmannHommel correction of the familywise error [4]
in cases when the null hypothesis of equal algorithms’ performance was rejected. To better illustrate algorithms differences, we also count the number of benchmark functions at which one algorithm achieved a higher rank than the other. The pairwise score and the statistical significance of the pairwise mean rank differences are reported in Table
4. In the posthoc test, ADAKendall significantly outperforms both the CMAES and GP5 in and . It also significantly outperforms GP1 in at the higher tested .For illustration, the average control frequency given by the ratio of the number of total originalfitnessevaluated generations to the number of total modelevaluated generations within one trial, for data from trials on (Rosenbrock’s function) in is given in Figure 2.
The algorithm ADAKL led to generally lower control frequencies than its competitors, which might explain its slightly inferior performance. Similar results were observed for the remaining functions and dimensionalities.
The cases when ADAKendall and ADARD are able to switch between more exploitationoriented and more datagatheringoriented behaviour can be studied on the results from COCO’s postprocessing. GP5 outperforms both GP1 and the CMAES on the lower and middle parts of the empirical distribution functions (ECDFs) basically for all dimensionalities (Figure 3). On the other hand, GP1 outperforms GP5 especially in later phases of the search (Figure 3).
The ability of ADAKendall and ADARD to switch to a lessexploitation mode when appropriate is eminent on the ECDFs plots in , especially on the moderate and the allfunction groups (top right and bottom right on Figure 3), with exception of the well structured multimodal group (middle right), when they fail in the middle part and the weakly structured multimodal group (bottom left), when they fail towards the end of the search.
CMAES 
GP1 
GP5 
ADAKL 
ADAKen 
ADARD 


13  1  13  1  13  1  13  1  13  1  13  1  
CMAES  —  —  8  8  11  8  10  7  11  10  7  9 
GP1  16  16  —  —  12  9  13  8  13  9  9  6 
GP5  13  16  12  15  —  —  11  10  14  13  9  14 
ADAKL  14  17  11  15  13  13  —  —  16  14  12  15 
ADAKen  13  14  11  15  10  10  8  9  —  —  7  11 
ADARD  17  15  15  17  15  9  12  9  17  12  —  — 
CMAES 
GP1 
GP5 
ADAKL 
ADAKen 
ADARD 

13  1  13  1  13  1  13  1  13  1  13  1  
CMAES  —  —  11  9  7  13  8  10  9  9  7  8 
GP1  13  15  —  —  7  14  7  11  6  8  10  9 
GP5  17  11  17  10  —  —  15  9  13  6  14  9 
ADAKL  16  14  17  13  9  15  —  —  13  11  10  9 
ADAKen  15  15  18  16  11  17  11  13  —  —  11  12 
ADARD  17  16  14  15  10  14  14  15  13  10  —  — 
CMAES 
GP1 
GP5 
ADAKL 
ADAKen 
ADARD 

13  1  13  1  13  1  13  1  13  1  13  1  
CMAES  —  —  8  12  11  14  11  14  7  10  2  8 
GP1  16  12  —  —  11  12  11  12  9  7  3  4 
GP5  13  10  13  12  —  —  10  6  9  5  8  7 
ADAKL  13  10  13  11  14  18  —  —  7  10  8  5 
ADAKen  17  14  15  17  15  19  17  14  —  —  10  9 
ADARD  22  16  21  20  16  17  16  19  14  14  —  — 
CMAES 
GP1 
GP5 
ADAKL 
ADAKen 
ADARD 

13  1  13  1  13  1  13  1  13  1  13  1  
CMAES  —  —  7  12  13  14  10  14  1  8  1  4 
GP1  17  12  —  —  15  15  14  14  4  5  5  4 
GP5  11  10  9  9  —  —  8  11  6  4  8  5 
ADAKL  14  10  10  10  16  13  —  —  8  5  9  8 
ADAKen  23  16  20  19  18  20  16  19  —  —  13  12 
ADARD  23  20  19  20  16  19  15  15  11  12  —  — 
CMAES 
GP1 
GP5 
ADAKL 
ADAKen 
ADARD 

13  1  13  1  13  1  13  1  13  1  13  1  
CMAES  —  —  7  5  11  12  9  13  4  3  3  5 
GP1  17  19  —  —  14  17  12  16  7  6  9  8 
GP5  13  12  10  7  —  —  4  5  6  5  10  6 
ADAKL  15  11  12  8  20  19  —  —  9  8  13  10 
ADAKen  20  21  17  18  18  19  15  16  —  —  17  17 
ADARD  21  19  15  16  14  18  11  14  7  7  —  — 
separable functions  moderate functions 
illconditioned functions  multimodal functions 
weakly structured multimodal functions  all functions 
7 Conclusion
In this paper, we implemented several modifications of the Surrogate CMAES (SCMAES), an algorithm using generationbased evolution control in connection with GPs. We considered three measures of surrogate model error according to which an adequate number of upcoming modelevaluated generations could be estimated online. Three resulting algorithms were compared on the COCO/BBOB framework with the SCMAES parametrized by two different numbers of consecutive modelevaluated generations. Since the work on the adaptive extension is still in progress, the presented results summarize the performance of all compared algorithms on the whole BBOB framework or its function groups. We found two error measures, the Kendall rank correlation and the rank difference error, that significantly outperformed the SCMAES used with a higher number of modelevaluated generations, especially in higher dimensionalities of the input space. However, both of these algorithms provided only a minor improvement of the SCMAES used with a lower number of modelevaluated generations and in some tested scenarios fell behind both tested settings of the SCMAES. An area for further research is the adjustment of other surrogate model parameters beside the control frequency, such as the number of the training points or the radius of the area from which they are selected.
8 Acknowledgments
The research reported in this paper has been supported by the Czech Science Foundation (GAČR) grant 1701251.
Access to computing and storage facilities owned by parties and projects contributing to the National Grid Infrastructure MetaCentrum, provided under the programme "Projects of Large Research, Development, and Innovations Infrastructures"(CESNET LM2015042), is greatly appreciated.
Literatúra

[1]
A. Auger and N. Hansen.
A restart CMA evolution strategy with increasing population size.
In
2005 IEEE Congress on Evolutionary Computation
. IEEE, 2005.  [2] L. Bajer, Z. Pitra, and M. Holeňa. Benchmarking Gaussian processes and random forests surrogate models on the BBOB noiseless testbed. In Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference  GECCO Companion '15. Association for Computing Machinery (ACM), 2015.

[3]
J. Demšar.
Statistical comparisons of classifiers over multiple data sets.
J. Mach. Learn. Res., 7:1–30, December 2006.  [4] S. García and F. Herrera. An extension on "statistical comparisons of classifiers over multiple data sets"for all pairwise comparisons. J. Mach. Learn. Res., 9:2677–2694, 2008.
 [5] N. Hansen. The CMA evolution strategy: A tutorial. CoRR, abs/1604.00772, 2016.
 [6] N. Hansen, A. Auger, R. Ros, S. Finck, and P. Pošík. Comparing results of 31 algorithms from the Blackbox Optimization Benchmarking BBOB2009. In Proceedings of the 12th annual conference comp on Genetic and evolutionary computation  GECCO '10, pages 1689–1696, New York, NY, USA, 2010. ACM.
 [7] N. Hansen, S. Finck, R. Ros, and A. Auger. Realparameter BlackBox Optimization Benchmarking 2009: Noiseless functions definitions. Technical report, INRIA, 2009, updated 2010.
 [8] N. Hansen, S. Finck, R. Ros, and A. Auger. Realparameter BlackBox Optimization Benchmarking 2012: Experimental setup. Technical report, INRIA, 2012.
 [9] Y. Jin and B. Sendhoff. Fitness approximation in evolutionary computation–A survey. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’02, pages 1105–1112, San Francisco, CA, USA, 2002. Morgan Kaufmann Publishers Inc.
 [10] S. Kern, N. Hansen, and P. Koumoutsakos. Local Metamodels for Optimization Using Evolution Strategies, pages 939–948. Springer Berlin Heidelberg, Berlin, Heidelberg, 2006.
 [11] I. Loshchilov, M. Schoenauer, and M. Sebag. Selfadaptive surrogatesssisted Covariance Matrix Adaptation evolution strategy. In Proceedings of the fourteenth international conference on Genetic and evolutionary computation conference  GECCO '12. Association for Computing Machinery (ACM), 2012.
 [12] I. Loshchilov, M. Schoenauer, and M. Sebag. Klbased control of the learning schedule for surrogate blackbox optimization. CoRR, abs/1308.2655, 2013.
 [13] Z. Pitra, L. Bajer, J. Repický, and M. Holeňa. Overview of surrogatemodel versions of covariance matrix adaptation evolution strategy. In Proceedings of the Genetic and Evolutionary Computation Conference 2017, Berlin, Germany, July 15–19, 2017 (GECCO ’17). ACM, July 2017.

[14]
C. E. Rassmusen and C. K. I. Williams.
Gaussian processes for machine learning
. Adaptive computation and machine learning series. MIT Press, 2006.