1 CMA Evolution Strategy
is the stateoftheart evolutionary optimization method, at least in the area of continuous blackbox optimization. Basically, it consists in generating new search points by sampling from a multidimensional normal distribtion, the mean and variance of which are updated from generation to generation. In particular, the population
of the st generation,, follows the normal distribution with mean
and variance resulting from the update in the th generation,(1) 
Here, is the step size in the th generation. and are updated separately,
being in the most simple case obtained as the unbiased empirical estimate based on the
th generation:(2) 
The positive semidefinite matrix
can be diagonalised in the basis formed by its eigenvectors
,(3) 
where and is a diagonal matrix such that its
th diagonal element is the eigenvalue corresponding to the eigenvector
. Consequently, the coordinates of points generated according to (1) in that basis, , are uncorrelated:(4) 
2 Optimization Based on Gaussian Processes
A Gaussian process (GP) on a dimensional Euclidean space
is a collection of random variables, GP
, such that the joint distribution of any finite number of them is a multidimensional normal distribtion. Following
[4], we differentiate two ways of using GPs in blackbox optimization:
As a surrogate model to be optimized instead of the black box objective function. On the found optimum or optima, the original blackbox objective function is then evaluated. This use of GPs has been introduced in the popular Efficient Global Optimization (EGO) algorithm [14]. As the optimization method, also evolutionary optimization can be used, in particular CMAES [4], but this is not the only possibility: For example, traditional lowdegree polynomial models, aka response surface models [20], are much more efficiently optimized using traditional smooth optimization methods.
Irrespective of the way in which a GP is used, its construction is always based on a sequence of training pairs and is subsequently employed to compute the random variable for . Since the distribution of is multidimensional normal, also the conditional distribution of conditioned on is normal,
(5) 
where . There are two commonly encountered possibilities how define the functions , describing the conditional GP mean, and , describing the conditional GP variance:

A GP is the superposition of a deterministic function and a GP with zero mean. For the latter, it can be shown [21] that the function describing its mean fulfils
(6) whereas the function describing its variance fulfills
(7) Here, an i.i.d. Gaussian noise is assumed, is the
dimensional identity matrix,
is a symmetric function, and(8) The resulting superposition then fulfils
(9) whereas .

A GP is the superposition of a Bayesian mean assuming the multiplicative form and a GP with zero mean, where is a random variable with and has the same meaning as above. In that case, it can be shown [21] that for ,
(10) (11) where
(12) (13)
Simple, but frequently used examples of the function occurring in (6)–(7) and (10)–(11) include:

Squared exponential,
(14) Here, is a parameter called characteristic lengthscale. It is a parameter of the function , not of the Gaussian process, the process is nonparametric. Therefore is referred to as a hyperparameter.

Exponential,
(15) which has 2 hyperparameters, and .

Dot product,
(16) which has 2 hyperparameters, and , or in its generalized version,
(17) which has, in addition, the matrix of hyperparameters , defining a dot product in general coordinates.
2.1 GPbased criteria to choose the points for evaluation
Whereas traditional response surface and surrogate models employ basically only one criterion for the choice of points in which the black box objective function should be evaluated, namely the global or at least local minimum of the model (if the optimization objective is minimization), GPs offer several additional criteria:

Probability of improvement (PoI),
(20) where denotes the distribution function of , is the minimum value found so far. More generally, probability of improvement with respect to a given ,
(21)
3 Possible Synergy
Directly connecting both considered Gaussian approaches is not possible because the normal distribution in CMAES is a distribution on the input space of the objective function (fitness), whereas the normal distribution in GPs is on the space of its function values. Nevertheless, it is still possible to achieve some synergy through using information from CMAES for the GP, and/or using information from the GP for CMAES. According to whoat was recalled at the beginning of Section 2, the latter possibility corresponds to using GP for the evolution control of CMAES.
3.1 Using Information from CMAES for the GP
We see 2 straightforward possibilities where some information from CMAES can be used in the GP.

The function occurring in (9), (10) and (11) can be constructed using the fitness values of individuals from some particular generation or several generations of CMAES. Its construction can be as simple as setting to a constant aggregating the considered fitness values, e.g., their mean or weighted mean, but it can also consist in training, with those values, a response surface model, principally of any kind.

Kruisselbrink et al. [16], who combine CMAES with a GP using the exponential function , propose to employ in (15
) the Mahalanobis distance of vectors
and given by the covariance matrix obtained in the th generation of CMAES, , instead of their Euclidean distance. This is actually a specific consequence of another possibility of using information from CMAES for the GP – replacing the original space of dimensional vectors, , by the space of their principal components with respect to . In this context, it is worth recalling that in [2, 3, 15], Mahalanobis instead of Euclidean distance was used when combining CMAES with quadratic response surface models in [2, 3, 15]. In the approach presented there, the space of the principal components with respect to is used, together with an estimate of density, to locally weight the model predictions with respect to the considered input. In this way, a connection to the other kind of synergy is established, i.e., to the evolution control of CMAES, giving us the possibility to use both kinds in combination.
3.2 GPBased Evolution Control of CMAES
We intend to test the following approaches to using GP for the evolution control of CMAES.

Basic approach. In the th generation, points are sampled from the distribution , where is several to many times larger then . For all of them, a selected criterion from among those introduced in 2.1 is computed. Based on the value of that criterion, the points for the evaluation by the original blackbox fitness are chosen. This can be done according to various strategies, we intend to use the one described in Algorithm 1
, with which we have a good experience from using radial basis function networks for the evolution control in the evolutionary optimization of catalytic materials
[10]. The linear ordering stands in the cases PoI and EI for , in the case for . The use of a linear ordering relates the proposed approach to rankingbased evolution control of CMAES [2, 3, 15, 23], as well as to surrogate modelling of CMAES by ordinal regression [17, 18, 19, 22]. Most similar is the approach by Ulmer et al. [23], the difference being that they don’t use clustering. 
GP on lowdimensional projections attempts to improve the basic approach in view of the experience reported in the literature [15] and obtained also in our earlier experiments [1] that GPs are actually advantageous only in low dimensional spaces. Instead of the sampled points , the GP is trained only with their first principal components with respect to , , i.e., with the projections of to the dimensional space , provided the orthonormal eigenvectors are enumerated according to decreasing eigenvalues.

GP on lowdimensional projections within restricted distance attempts to decreaase the deterioration of the GP on lowdimensional projections with respect to the GP on the original sampled points by using only points within a prescribed small distance from their respective projections . To this end, points from the distribution are resampled until points are obtained fulfilling
(25) 
Twostage sampling. Trainig the GP for the th generation can easily suffer from the lack of training data in a part of the search space where a substantial proportion of the points will be sampled. To alleviate it, the points to be evaluated by the original fitness can be sampled in two stages:

First, points , where , are sampled from the distribution , evaluated by the original fitness, and included into training the GP.

Then, points are sampled from the same distribution, and for them, a selected criterion from among those introduced in 2.1 is computed, based on which the points for the evaluation by the original blackbox fitness are chosen. To this end, again Algorithm 1 can be used, with the following changes:

In the input, the numbers and have now to fulfill
(26) 
In Step 3, is chosen only for .

The output contains only the points .

Needless to say, this approach has to be combined with some of the approaches (i)–(iii) and can be combined with any of them.


Generationbased evolution control. Whereas the previous four approaches represent, in terms of [11, 12], individualbased evolution control, we want to test also one approach that is generationbased, in the sense that the desired number of points is evaluated by the original blackbox fitness function only in selected generations. Similarly to the evolution strategy in [18, 19], our approach selects those generations adaptively, according to the agreement between the ranking of considered points by the surrogate model and by the blackbox fitness. Differently to that strategy, however, we want to base the estimation if that agreement not on the generations in which points have been evaluated by the blackbox fitness, but on evaluating a small and evolvable number of additional points in each generation. In this context, it is important that in situations when the fitness is evaluated empirically, using some measurement or testing, the evaluation hardware causes the evaluation costs to increase stepwise, and to remain subsequently constant for some evaluated points (e.g. in the optimization of catalyst preformance described in [10], is the number of channels in the chemical reactor in which the catalysts are tested). In such a situation, the costs of evaluation are the lowest if is a multiple of , and if the evaluation of the additional points in each generation is cumulated for generations such that
(27) Setting covers the case when such a situation does not occur and the evaluation costs increase linearly with the number of points. If is the last generation in which points have been evaluated by the blackbox fitness, then for , points are sampled from the distribution , from which the points are selected by Algorithm 1, in which the input is replaced with . Subsequently, all the points are evaluated by the blackbox fitness, and for each generation , the agreement between the ranking of by the current Gaussian process, GP, and the blackbox fitness is estimated. If the agreement is sufficient in all considered generations, then the procedure is repeated using instead of and unchanged . Otherwise, additional points are selected from for the first generation in which the agreement was not sufficient. Then a new Gaussian process, GP, is trained using the training set
(28) optionally including also some or all points from the set used for training GP. At that occasion, aslo the value of can be changed.
Provided the conditions and are fulfilled, the value of can be arbitrary. Needless to say, the smaller , the larger will be the proportion of points from the generation in which the GP was trained in its training set, and the more similar the obtained GP will normally be to a GP trained only with data from that generation, which is the usual way of using surrogate models in traditional generationbased strategies [11, 12, 18, 19]. It is also worth pointing out that our generationbased evolution control was explained here as a counterpart to the basic individualbased approach (i), but counterparts to the approaches (ii) and (iii) are possible as well.
References
 [1] L. Bajer, V. Charypar, and M. Holeňa. Model guided sampling optimization with Gaussian processes for expensive blackbox optimization. In GECCO 2013, pages 1715–1716, 2013.
 [2] Z. Bouzarkouna, A. Auger, and D.Y. Ding. Investigating the localmetamodel cmaes for large population sizes. In EvoApplications 2010, pages 402–411, 2010.
 [3] Z. Bouzarkouna, A. Auger, and D.Y. Ding. Localmetamodel cmaes for partially separable functions. In GECCO 2011, pages 869–876, 2011.

[4]
D. Büche, N.N. Schraudolph, and P. Koumoutsakos.
Accelerating evolutionary algorithms with Gaussian process fitness function models.
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 35:183–194, 2005. 
[5]
M.A. ElBeltagy and A.J. Keane.
Evolutionary optimization for computationally expensive problems
using Gaussian processes.
In
International Conference on Artificial Intelligence
, pages 708–714, 2001.  [6] M. Emmerich, A. Giotis, M. Özdemir, T. Bäck, and K. Giannakoglou. Metamodelassisted evolution strategies. In PPSN VII, pages 361–370, 2002.

[7]
N. Hansen.
The CMA evolution strategy: A comparing review.
In
Towards a New Evolutionary Computation
, pages 75–102. Springer Verlag, Berlin, 2006.  [8] N. Hansen and A. Ostermaier. Completely derandomized selfadaptation in evolution strategies. Evolutionary Computation, 9:159–195, 2001.
 [9] E. Hoffman and S. Holemann. Controlled model assisted evolution strategy with adaptive preselection. In International IEEE Symposium on Evolving Fuzzy Systems, pages 182–187, 2006.
 [10] M. Holeňa, D. Linke, and L. Bajer. Surrogate modeling in the evolutionary optimization of catalytic materials. In GECCO 2012, pages 1095–1102, 2012.
 [11] Y. Jin. A comprehensive survery of fitness approximation in evolutionary computation. Soft Computing, 9:3–12, 2005.
 [12] Y. Jin, M. Olhofer, and B. Sendhoff. Managing approximate models in evolutionary aerodynamic design optimization. In CEC 2001, pages 592–599, 2001.
 [13] D.R. Jones. A taxonomy of global optimization methods based on response surfaces. Journal of Global Optimization, 21:345–383, 2001.
 [14] D.R. Jones, M. Schonlau, and W.J. Welch. Efficient global optimization of expensive blackbox functions. Journal of Global Optimization, 13:455–492, 1998.
 [15] S. Kern, N. Hansen, and P. Koumoutsakos. Local metamodels for optimization using evolution strategies. In PPSN IX, pages 939–948, 2006.
 [16] J.W. Kruisselbrink, M.T.M. Emmerich, A.H. Deutz, and T. Bäck. A robust optimization approach using kriging metamodels for robustness approximation in the CMAES. In CEC 2010, pages 1–8, 2010.
 [17] I. Loshchilov, M. Schoenauer, and M. Seba. Comparisonbased optimizers need comparisonbased surrogates. In PPSN XI, pages 364–373, 2010.
 [18] I. Loshchilov, M. Schoenauer, and M. Seba. Selfadaptive surrogateassisted covariance matrix adaptation evolution strategy. In GECCO 2012, pages 321–328, 2012.
 [19] I. Loshchilov, M. Schoenauer, and M. Seba. Intensive surrogate model exploitation in selfadaptive surrogateassisted CMAES (saACMES). In GECCO 2013, pages 439–446, 2013.
 [20] R.H. Myers, D.C. Montgomery, and C.M. AndersonCook. Response Surface Methodology: Proces and Product Optimization Using Designed Experiments. John Wiley and Sons, Hoboken, 2009.

[21]
E. Rasmussen and C. Williams.
Gaussian Process for Machine Learning
. MIT Press, Cambridge, 2006.  [22] T.P. Runarsson. Ordinal regression in evolutionary computation. In PPSN IX, pages 1048–1057, 2006.
 [23] H. Ulmer, F. Streichert, and A. Zell. Evolution strategies assisted by Gaussian processes with improved preselection criterion. In IEEE Congress on Evolutionary Computation, pages 692–699, 2003.