Log In Sign Up

Improving NeuroEvolution Efficiency by Surrogate Model-based Optimization with Phenotypic Distance Kernels

by   Jörg Stork, et al.

In NeuroEvolution, the topologies of artificial neural networks are optimized with evolutionary algorithms to solve tasks in data regression, data classification, or reinforcement learning. One downside of NeuroEvolution is the large amount of necessary fitness evaluations, which might render it inefficient for tasks with expensive evaluations, such as real-time learning. For these expensive optimization tasks, surrogate model-based optimization is frequently applied as it features a good evaluation efficiency. While a combination of both procedures appears as a valuable solution, the definition of adequate distance measures for the surrogate modeling process is difficult. In this study, we will extend cartesian genetic programming of artificial neural networks by the use of surrogate model-based optimization. We propose different distance measures and test our algorithm on a replicable benchmark task. The results indicate that we can significantly increase the evaluation efficiency and that a phenotypic distance, which is based on the behavior of the associated neural networks, is most promising.


Surrogate Models for Enhancing the Efficiency of Neuroevolution in Reinforcement Learning

In the last years, reinforcement learning received a lot of attention. O...

Distance-based Kernels for Surrogate Model-based Neuroevolution

The topology optimization of artificial neural networks can be particula...

Linear Combination of Distance Measures for Surrogate Models in Genetic Programming

Surrogate models are a well established approach to reduce the number of...

A First Analysis of Kernels for Kriging-based Optimization in Hierarchical Search Spaces

Many real-world optimization problems require significant resources for ...

Neural Model-based Optimization with Right-Censored Observations

In many fields of study, we only observe lower bounds on the true respon...

Data-efficient Neuroevolution with Kernel-Based Surrogate Models

Surrogate-assistance approaches have long been used in computationally e...

On Uncertainty Estimation by Tree-based Surrogate Models in Sequential Model-based Optimization

Sequential model-based optimization sequentially selects a candidate poi...

1 Introduction

Artificial Neural Networks (ANN) are utilized in many different fields, such as data regression, data classification, or reinforcement learning. In each of these tasks, the network topology is significant for the ANNs performance. Often, only parameters such as the edge weights, number of hidden layers, and number of elements per layer are considered during the optimization of an ANN. In NeuroEvolution (NE), ANNs are generated by Evolutionary Algorithms (EAs). NeuroEvolution allows severe modifications of networks, such as individual connections between neurons and different neuron transfer functions, leading to a large search space of potential topologies. Two example algorithms of this category are NeuroEvolution of Augmenting Topologies (NEAT) 

[1] and Cartesian Genetic Programming of Artificial Neural Networks (CGPANN) [2, 3]. Both NEAT and CGPANN require numerous fitness evaluations to find adequate topologies in the large search space. This makes them inefficient for applications with expensive function evaluations such as simulations or real-world experiments, e.g., when an ANN is the controller of a robot that operates in a complex real-time environment. Surrogate Model-based Optimization (SMBO) is frequently used for data efficient optimization of real-world processes, as it has a high evaluation efficiency [4]. Surrogate models are commonly employed in continuous optimization, where Euclidean spaces and distance metrics exist. The more complex discrete search spaces are less often investigated [5, 6]. In this work, we want to extend CGPANN to employ Surrogate Model-based NeuroEvolution (SMB-NE) to reduce the load of fitness evaluations. For this task, we defined several distances based on the CGPANN genotypes and a distance that measures, instead of the structure, the difference in behavior of the ANNs. Our approach allows the definition of a phenotypic distance that is indifferent to the size or topology of an ANN. It only requires a representative input set to model the input to output correlation. Our main research questions are:

  1. Is SMB-NE able to outperform CGP NeuroEvolution in terms of evaluation efficiency without loss of accuracy?

  2. How can we create a representative input set for the phenotypic distance in SMB-NE?

For all experiments, we utilize data-mining classification tasks as a controllable and cheap to evaluate benchmark for the learning efficiency of SMB-NE using few fitness evaluations. This article is structured as follows: Section 2 discusses related work. Section 3 illustrates the utilized methods and the SMB-NE algorithm. Section 4 introduces the different distance measures for SMB-NE. In Section 5 the experimental setup and the results are shown and further discussed. Finally, in Section 6 we conclude the paper and give an outlook to future work.

2 Related Work

This work is an extension of an unpublished study presented at an informal workshop.111For sake of the double blind review process, we did not present references/or the name of the workshop. A recent study by Zaefferer et al. [7] investigated the use of SMBO in Genetic Programming with genotypic and phenotypic distance measures. In constrast to this study, they performed tests comparing an EA with SMBO for a bi-level symbolic regression optimization problem. They concluded, that the SMBO approach is able to outperform the EA in terms of evaluation efficiency. Moreover, the phenotypic distance performed best and was very fast to compute. A first surrogate model for ANN optimization was applied to NEAT by Gaier et al. [8]. They use a surrogate-assisted optimization approach by combining an EA with a surrogate distance-based model, employing a genotypic compatibility distance that is part of NEAT. With this approach, they are also able to increase the evaluation efficiency. To our best knowledge, nobody else used a surrogate model-based approach for the task of NeuroEvolution. Phenotypic distances are also used in other applications, for example, the optimization of job dispatching rules. For this tasks, Hildebrandt and Branke [9] utilize a surrogate-assisted EA, which is able to increase the evolution towards good solutions. The use of surrogates including genotypic distances is more often discussed, e.g., for optimization of fixed neural network topologies in reinforcement learning [10].

3 Data Efficient Optimization for Neuroevolution

3.1 Neuroevolution by Cartesian Genetic Programming

Figure 1: A CGPANN genotype with two inputs, eight nodes, an arity of three and different transfer functions. Each node has a transfer function, a boolean activity gene and several inputs with adjacent weights. Green nodes are active and part of the encoded ANN.

For NeuroEvolution with CGP, we use the C library CGP by A. Turner222 - accessed: 2018-01-12, which also allows the application of NeuroEvolution [11]. The CGP library was modified with function interfaces to the statistical programming language R, distance measures described below, and additional fitness functions. The genotype of a CGPANN individual consists of a fixed number of nodes. Each node has a number of connection genes based on the pre-defined arity with adjacent weight genes and a single categorical function gene. Nodes are only connected to preceding nodes. Duplicate connections to nodes are possible and if present, adjacent weights will be added to form a single connection in the resulting ANN. Moreover, each node has a Boolean activity gene, which signals if it is used in the active ANN topology. Fig. 1 displays an example for a CGPANN genotype and the encoded ANN with two inputs, eight nodes and an arity of three. In CGP NeuroEvolution, the network topology and weights are optimized using an evolutionary approach, utilizing a (1+4)-Evolution Strategy (ES), i.e., one parent and four new individuals in each generation. In contrast to the standard selection in ES, the rank-based selection process favors the offspring, and thus the new solution, over a parent with the same fitness. Different mutation operators are available, the default is probabilistic random mutation. Inactive genes will not influence the fitness of an ANN.

3.2 Kriging

Our SMBO algorithm is based on a Kriging regression model, which assumes that the observations are samples from a Gaussian process [12]. Kriging is a kernel-based model, i.e., the model uses a kernel, or correlation function, to measure the similarity of two samples. One typical kernel for real-valued samples is the exponential kernel, i.e., . Here, the kernel parameter expresses how quickly the correlation decays to zero, with increasing Euclidean distance

between the samples. The parameter is determined by Maximum Likelihood Estimation (MLE). It is straightforward to extend kernel-based models to combinatorial search spaces 

[13, 6]. Essentially, the Euclidean distance is replaced by a corresponding distance that applies to the respective search space, with the exponential kernel . The design of appropriate distances for neural networks is in the focus of this paper.

Kriging is frequently employed in SMBO algorithms, because in addition to relatively accurate predictions, it also provides an estimate of the uncertainty of each prediction. The predicted value and the uncertainty estimate are used to compute infill criteria. These infill criteria are supposed to express how desirable it is to evaluate a new candidate solution . To that end, the uncertainty estimate is integrated to push away from known, well-explored areas, instead preferring solutions that have large uncertainties, yet promising predicted values. One of the most frequently used criteria is Expected Improvement (EI) [14, 15]. We employ the Kriging implementation of the R package CEGO [16, 6]. It uses distance-based kernels to model data from structured, combinatorial search spaces.

3.3 Surrogate Model-based NeuroEvolution (SMB-NE)

1 begin
       // phase 1: initialization
3       initialize neural networks () at random
4       evaluate their fitness on the objective function
5       build Kriging surrogate model utilizing and distance measure ;
       // phase 2: optimization
6       while not termination-condition do
7             if  then
8                  rebuild surrogate model with a set of observations
9             end if
10            optimize EI with a (1+4)-ES to discover improved
11             evaluate network fitness on the objective function
12             add evaluated networks to archive
14       end while
16 end
Algorithm 1 Surrogate Model-based Neuroevolution

We extend CGP NeuroEvolution with an SMBO approach, leading to the Surrogate Model-based NeuroEvolution (SMB-NE) strategy outlined below. The strategy is intended to perform a data efficient search by predicting the fitness of candidate solutions with the help of a Kriging surrogate model. The algorithm is outlined in 1.

The SMB-NE process starts by creating a random initial set of individuals, in our case ANNs, and evaluates them with the objective function. The resulting data is used to learn a Kriging model. For learning the model, we define different genotypic and phenotypic distance measures for ANNs in Sec. 4. With the model, we are able to estimate the EI of an individual. In each following iteration, the model is utilized to suggest new promising ANNs by optimizing the EI criterion with the (1+4)-ES algorithm of CGP. The (1+4)-ES is used to introduce the ability to directly compare CGPANN to SMB-NE, without an additional optimization algorithm with a different operator set, or parametrization influencing the results. In each iteration, the single most promising individual is evaluated and added to the archive . As the archive grows during the optimization process, the computational effort of creating the Kriging model rises with , where is the surrogate model sample size. To keep the computational effort on a feasible level, a subset of size is used for the modeling process.

This modeling set is formed by selecting of the best and randomly drawn individuals out of the archive in each iteration. We chose the fractions in the strategy to ensure a balance between exploration and exploitation. If the size of the archive is smaller than or equal to the size of , all individuals are selected. The influence of the size of the modeling set is investigated in Sec. 5.2.

4 Proposed Kernels and Distances

In the following, we always use an exponential kernel . Here, and represent ANNs. They consist of the weights , input labels , activity labels , and transfer function labels , i.e., . All distances are scaled to . In addition, we also considered employing the graph edit distance, but decided against it due to its complexity. Computing the graph edit distance is NP-hard [17].

We illustrate the different distances by providing an example that compares two specific ANNs, with a similar structure as outlined in Fig. 1. For the sake of simplicity and comparability, all weights are set to 1. The two ANNs only differ with respect to two nodes. The transfer function and connections of one active (node 3) and one inactive node (node 8) are changed.

4.1 Genotypic Distance (GD)

Figure 2: Example calculation the GD distance for two distances. By introducing only small changes to the genotype, the normalized GD stays rather small.

The genotypic distance is based solely on the genotype of a CGPANN individual. As the CGPANN genotypes have a fixed structure, the distance of two individuals can be calculated by a row-wise comparison of their nodes. By combining the distances of weights, inputs, activity of nodes, and transfer functions we obtain a distance , where denotes the Hamming distance, i.e., is 0 if the transfer function is identical, else is 1. The GD is further normalized by the total number of possible comparisons for the given genotypes. Fig. 2 illustrates an example for calculating the GD distance. Although taking the non active nodes into account, the normalized GD is rather small. We further extended the GD by ordering the inputs of each ANN before the calculation to match the weight distances to the correct input if (and only if) two nodes have similar connections, but a different ordering of inputs in the genotype.

4.2 Genotypic ID Distance (GIDD)

Figure 3: Example calculation for GIDD. The IDs are based on prior nodes or inputs and the number (a=1, b=2, c=3) of non-duplicate connections. The calculated distance is high because of the different node IDs, which are based on their relative position in the ANN.

The GIDD is intended to solve an important issue of the genotypic distance: different nodes and adjacent functions and weights in one row of a CGPANN genotype are not easily comparable, if their influence on the resulting ANN and also phenotype is considered. The idea behind the GIDD is to only compare those nodes, which are placed in the same position in the ANN and to solve the problem posed by competing conventions, i.e., that a certain ANN topology can be expressed by numerous genotypes. The distance is based on the active topology and creates IDs which are designed to be unique for an equal placement of a node in the ANN. Inactive nodes do not influence the GIDD. Thus, each active node in the ANN is given an ID based on the connections to prior nodes or inputs and the number of non-duplicate connections of this node. Then, the distance of nodes can be calculated by a pairwise comparison of all node IDs. If a certain node ID matches for both ANNs, the subgraph prior to this node is analyzed recursively for validation of the ANN up to the position of the matching nodes. If all IDs in the subgraph are identical, we assume that the corresponding nodes have an equal position in the ANN topology. For all nodes that are matched in this way, the Euclidean distance of the weights () and Hamming distance of the transfer functions () is computed. A node pair can only be used once for this comparison, as node IDs may be present several times in each individual. If all node IDs of both individuals and are equal, the GIDD is simply the distance between all weights and transfer functions. If nodes do not match, for each node not present in the compared ANN, a fixed distance is assigned (in our example 2). Again, the GIDD is normalized by the maximum distance of two individuals. Fig. 3 illustrates the calculation in an example. Contrary to the GD, the GIDD reacts strongly to the introduced changes, as they have a large influence on the node relations, which results in different node IDs and puts them to maximum distance.

4.3 Phenotypic Distance (PhD)

Figure 4: PhD example with continuous inputs and trigonometric transfer functions. Two input samples, from a sine and linear function of length 30 are fed to two ANNs, which differ in their transfer functions and a single connection. The phenotypic distance is the normalized absolute difference of their output signals.

The phenotypic distance does not utilize any genotypic information of the ANNs to compute the distance. Instead, it utilizes solely their behavior. In our definition, the phenotype of a neural network is how it reacts to a certain input set, i.e., which outputs are produced. This output is then compared to resolve in a distance measure, which is indifferent to changes to the underlying genotype, including transfer functions, weights or connections, which result in the same behavior. More importantly, it is insensitive to the size of the genotype. The PhD distance utilizes the L1 norm distance to account for large dimensions of the input and output vector and is further normalized by the input set length. While it is indifferent to the network topology in terms of encoding, size, number of connections etc., it is very sensitive to the choice of the input set. Thus, these input sets have to be carefully selected to be representative for the underlying task and/or environment. In Section

5.2 we examine the influence of the input set for the task of classification. An example calculation for the distance of small ANNs utilizing PhD is given in Fig. 4. In the example, two continuous input samples, from a sine and linear function of length 30 are fed to two ANNs, where they differ in their transfer functions and a single connection. As this example is intended for understanding the change in the output signals, simple trigonometric functions have been used, which are commonly utilized in Genetic Programming. The small example shows that minor changes in the network topology can result in a large difference in the phenotype.

4.4 Mixed Distance (MD)

Similar to linear combination distance in [7], we utilize a mixed distance of GD, GIDD, and PhD, where each distance receives a weight determined by MLE. As the performance of each distance is unknown a-priori, the idea behind the MD is that allows an automatic selection of the most adequate distance measure in each optimization step.

5 Experiments

To assess the ability of SMB-NE to improve the efficiency of optimizing the topologies of ANNs, we decided to perform experiments with classification tasks. Classification is well understood, easily replicable, and does not introduce complex problems with the selection of environments or tasks as in general learning (e.g. reinforcement learning). We limited the experiments to a small budget of function evaluations, which provides a realistic scenario for problems with expensive fitness evaluations, such as real-time learning. The experiments are twofold:

  • First, we estimate the ability of SMB-NE to learn an elementary data set comparing the introduced distance measures GD, GIDD, PhD and MD.

  • Second, we further research how SMB-NE using the PhD reacts to different inputs sets and surrogate model sizes.

5.1 Comparison of Distance Measures for SMB-NE

5.1.1 Experimental Setup

For the first set of experiments, the well-known IRIS data set is used as an elementary benchmark problem. IRIS has samples, 4 variables, and 3 evenly distributed (

for each) classes of different flower types (Iris setosa, Iris virginica and Iris versicolor). The focus of this benchmark is the capability of SMB-NE to learn the best network topology to classify the data set with only 250 fitness evaluations. The fitness function is the adjusted classification accuracy:

, where if the predicted class is true, otherwise,

is the predicted probability for the true class. ANN optimized with Random Search and the inbuilt (1+4)-ES of CGPANN with different mutation rates are being used as baselines. For SMB-NE, all above described distance measures are compared, while for PhD four different input sample sets are tested. As a baseline, we used the complete IRIS data set and additional factorial designs(FD) with small (15) and a large (60) sizes as well as a Latin Hypercube Sample (LHS) with 150 samples. Both the FD and LHS are based on the IRIS variable boundaries. Further, the output of the PhD is adapted by computing the

softmax, which yields the class probabilities. In this experiment, for the (1+4)-ES of CGPANN and SMB-NE pure probabilistic random mutation is used with a strength of 5%. In SMB-NE, the (1+4)-ES is used in each consecutive iteration alternating between exploitation (L) and exploration (G). Parameters are listed in Table 1. The ANNs in this benchmark were kept rather small with a maximum of 40 nodes and 200 connections. This was intended to keep the search space small, but sufficient for the IRIS data set. Algorithm parameters were not tuned and all experiments were replicated 30 times.

arity nodes weight range function set
5 40 [-1,1] tanh, softsign, step, sigmoid, gauss
method mutation rate evaluations surrogate evaluations
Random 250
CGPANN 5%/15%
SMBO L:5% G:15% 50+200 L:10+400 G:1000+400
Table 1: Parameter setup, where evaluations denote the initial candidates plus the budget for consecutive evaluations. 40 nodes were used to keep the search space small, but sufficient for the IRIS problem given this small budget.

5.1.2 Results

Figure 5:

Results after 250 fitness evaluations with 30 replications, comparing random search, original CGPANN ((1+4)-ES) and different SMB-NE variants. The results were ranked (top down) by median values. Red circles depict outliers.

Fig. 5 visualizes the results. Firstly, the results show that the standard (1+4)-ES of CGPANN performs better than random search, even with the small number of evaluations. A low mutation rate, which depicts a rather local search, seems to be beneficial in this case. SMB-NE utilizing the GD and GIDD distance measures performs only slightly better than the basic CGPANN. With the use of PhD, a significant performance increase is observed, while the PhD with the complete input set performs the best and the LHS the worst. The mixed distance, which also utilizes PhD with complete input sets, cannot benefit from the linear combination, but is able to deliver a close performance to the sole use of PhD. Most runs seem to end up at a local optimum around an accuracy of 66%, which can be explained by the fact, that at this point two out of three classes of the IRIS data set are predicted correctly.

5.1.3 Discussion

An important insight of this experiment is that in comparison to the PhD, the genotypic distances (GD, GIDD) show a poor performance, even for the small genotype size. This fact might be explained by the small correlation between changes to the genotype and the resulting phenotype: small changes to the genes can have a massive impact on the behavior. Moreover, the GD has the problem that the calculated distances do not consider that aligned row-paired nodes in the genotype can have different, not similar placements in the ANN topology. This results in a misleading distance calculation of weights, inputs and transfer function for these nodes. However, even the more complex GIDD, which may be able to avoid this issue, does not seem to provide significantly better results. Further given the fact that the GIDD is computationally very expensive to calculate and has also shown numerical problems for large genotypes (as it utilizes several recursions over the complete ANN), it is further considered as not suitable for the task of SMB-NE. In contrast, the PhD distances show very promising results by directly exploiting the ANN behavior. Importantly, the small factorial input sample, which is very fast to compute, shows a good performance. Given the poor performance of GD and GIDD, the MD is able to automatically select the PhD distance measure and deliver equal performance. For the second set of experiments, we thus focus on exploiting features of SMB-NE utilizing the PhD distance measure.

5.2 Influence of the Input Set and Surrogate Model Size using SMB-NE with PhD

5.2.1 Experimental Setup

arity nodes weight range function set
25 100 [-1,1] tanh, softsign, step, sigmoid, gauss
method mutation evaluations surrogate evaluations
CGPANN single active (x10, x100)
SMB-NE single active 50+500 L:10+400 G:1000+400
Table 2: Algorithm parameter setup for the second set of experiments. The size of the genotypes and the number of evaluations was significantly increased.

To assess the performance of PhD, we discarded the GD, GIDD, and MD distance and introduced two more complex data sets, the glass and the cancer data set333

Available in the UCI machine learning repository: Both data sets were preprocessed by normalization and subsampling. Glass has 9 variables (material composition), 6 unevenly distributed classes of different glass types and 214 samples, while cancer has 9 variables, 2 classes (cancer yes/no) and 699 samples. Two benchmarks were conducted. The first benchmark investigated the influence of the input sample set used for generating the PhD distance measure. For the benchmark, different input samples were created by design of experiments (DOE) methods. All DOE sets are based on the known variable ranges and are not subsamples of the original data sets. The lengths of the samples are identical for both data sets, as they have the same number of variables. We compared the following:

  1. Small and large factorial designs, including main, interaction and quadratic effects, with 55/157 samples each.

  2. Small and large Latin Hypercube samples, with 55/110 samples each.

  3. The complete datasets as baseline input set, with 214/699 samples each.

  4. As algorithm baseline, CGPANN with an increasing number of evaluations (, , and ).

The motivation of this benchmark is as follows: in real-world optimization tasks, often a priori information of the the task and/or environment is sparse, as the underlying problem is a black-box problem. Thus, initially no data set is available to serve as an input set and the user has to rely on design of experiment methods to create input data for the PhD. Further, several changes were made to the algorithm setup to account for the more complex classification problems:

  • The genotype size was significantly enlarged to 100 nodes with 25 arity, resulting in a maximum of 2500 weights/connections.

  • For all compared algorithms, the mutation operator is changed to Goldmans single mutation, which mutates exactly one active node in the genotype (and an arbitrary number of inactive nodes).

  • The number of total function evaluations was raised to 550, while fixing the surrogate model set size to a value of 100. Each sample hereby consists of 80% random and 20% of the most fit individuals from the solution archive .

Table 2 shows the adjusted algorithm parameter setup. In the second, connected benchmark, the influence of the surrogate model size, which is used in each iteration of SMB-NE, to the overall optimization performance is analyzed. The glass data set with the complete input set and cancer with the factorial input set are compared for different model set sizes

5.2.2 Results

Figure 6: Violin plot of the benchmarks with SMB-NE using PhD for different input sets created by experimental design techniques. The results are compared to the complete dataset and CGPANN using a different number of total function evaluations
Figure 7: Results with different surrogate model sizes during sequential optimization run for SMB.NE using PhD. Compared datasets are glass with complete input set for PhD, cancer with factorial input set for PhD.
Figure 6: Violin plot of the benchmarks with SMB-NE using PhD for different input sets created by experimental design techniques. The results are compared to the complete dataset and CGPANN using a different number of total function evaluations

Fig. 7 shows the results of the benchmarks with SMB-NE using PhD for different input sets created by experimental design techniques. For glass, the SMB-NE with the complete input set performed best and on one level with CGPANN with 100 times more real function evaluations. This indicates that SMB-NE is clearly able to improve the evaluation efficiency, if the (best possible) input data set is selected. All DOE input sets performed on an equal level similar to CGPANN with 10 times the evaluations. They are thus also able to significantly increase the evaluation efficiency, while utilizing an input set, which requires (nearly) no a priori information. The results for cancer firstly indicate that the underlying problem seems to include strong local optima, as we can identify certain clusters of different levels of the adjusted accuracy (around 0.8, 0.86, and 0.95). If we only consider only the best cluster, again we can identify that again the complete set performs best, together with CGPANN x100, while the DOE sets are again on one level with CGPANN x10.

Fig. 7 shows the results of the benchmark with different surrogate model sizes during sequential optimization runs. In contrast to our expectations, the benchmarks show that the model sample size does not seem to have a significant impact on the performance. Even the small sample sizes perform on the same level.

5.2.3 Discussion

The second set of experiments significantly shows two features of SMB-NE using the PhD distance measure. As expected, the choice of input set has a strong influence on the algorithm performance. The different designs and input sample sizes show a similar performance, which is an unexpected finding, as we would have anticipated that more samples and a larger design would lead to a more precise representational distance for the complete dataset and thus an improved performance. As the complete dataset shows the best performance, a representatively measured dataset of the original task seems to be the best choice, if initially available or producible by experiments. For the surrogate model size, we anticipated that the large model, which has more information, would be beneficial to the search process, but the smaller, more local models also worked well. Thus, it can be held on a feasible level which is fast to compute.

6 Conclusion and Outlook

In this work, we proposed a new surrogate-based approach for the task of NeuroEvolution. Further, we investigated the influence of different distance measures for constructing the surrogate during the optimization process. We have shown that SMBO is a valuable extension for CGPANN, which is able to improve the NeuroEvolution of ANNs in case of small evaluation budgets. Regarding research question 1, we can conclude that SMB-NE with phenotypic distance kernels shows significantly better results than basic CGPANN. Utilizing the PhD distance measure with a perfect input set, we were able to reach an evaluation efficiency which is on one level with CGPANN with 100 times more function evaluations. Further, the comparison indicates that SMB-NE utilizing genotypic distances does not provide significant performance increases. This fact can be explained by the low correlation of changes in the genotype to the resulting ANN behavior. Regarding research question 2, the experiments have shown that the choice of input sets for computing the PhD distance measure has a significant influence on the algorithm performance. Input sets created by DOE methods show a reduced performance in comparison to a perfect input set. Finally, we can conclude that the PhD distance measure, which is insensitive to the ANN size, is very promising. Currently, we are working on applying SMB-NE with the PhD distance measure to reinforcement learning tasks, which pose additional challenges with regards to the choice of the input set.


This work is supported by the German Federal Ministry of Education and Research in the funding program Forschung an Fachhochschulen under the grant number 13FH007IB6.


  • [1] Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evolutionary computation 10(2), 99–127 (2002)
  • [2] Miller, J.F., Thomson, P.: Cartesian genetic programming. In: European Conference on Genetic Programming. pp. 121–132. Springer (2000)
  • [3] Turner, A.J., Miller, J.F.: Cartesian genetic programming encoded artificial neural networks: a comparison using three benchmarks. In: Proc. GECCO’13. pp. 1005–1012. ACM (2013)
  • [4] Koziel, S., Leifsson, L.: Surrogate-based modeling and optimization. Applications in Engineering (2013)
  • [5] Bartz-Beielstein, T., Zaefferer, M.: Model-based methods for continuous and discrete global optimization. Applied Soft Computing 55, 154 – 167 (2017)
  • [6] Zaefferer, M., Stork, J., Friese, M., Fischbach, A., Naujoks, B., Bartz-Beielstein, T.: Efficient global optimization for combinatorial problems. In: Proc. GECCO’14. pp. 871–878. ACM (2014)
  • [7] Zaefferer, M., Stork, J., Flasch, O., Bartz-Beielstein, T.: Linear combination of distance measures for surrogate models in genetic programming. In: Auger, A., Fonseca, C.M., Lourenço, N., Machado, P., Paquete, L., Whitley, D. (eds.) Parallel Problem Solving from Nature – PPSN XV: 15th International Conference. Lecture Notes in Computer Science, vol. 11102, pp. 220–231. Springer, Coimbra, Portugal (Sep 2018)
  • [8] Gaier, A., Asteroth, A., Mouret, J.B.: Data-efficient neuroevolution with kernel-based surrogate models. In: Genetic and Evolutionary Computation Conference (GECCO) (2018)
  • [9] Hildebrandt, T., Branke, J.: On using surrogates with genetic programming. Evolutionary Computation 23(3), 343–367 (Jun 2015)
  • [10] Stork, J., Bartz-Beielstein, T., Fischbach, A., Zaefferer, M.: Surrogate assisted learning of neural networks. In: GMA CI-Workshop 2017 (2017)
  • [11] Turner, A.J., Miller, J.F.: Introducing a cross platform open source cartesian genetic programming library. Genetic Programming and Evolvable Machines 16(1), 83–91 (2015)
  • [12] Forrester, A., Sobester, A., Keane, A.: Engineering Design via Surrogate Modelling. Wiley (2008)
  • [13]

    Moraglio, A., Kattan, A.: Geometric generalisation of surrogate model based optimisation to combinatorial spaces. In: Proceedings of the 11th European Conference on Evolutionary Computation in Combinatorial Optimization. pp. 142–154. EvoCOP’11, Springer, Berlin, Heidelberg, Germany (2011)

  • [14] Mockus, J., Tiesis, V., Zilinskas, A.: Towards Global Optimization 2, chap. The application of Bayesian methods for seeking the extremum, pp. 117–129. North-Holland (1978)
  • [15] Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. Journal of Global Optimization 13(4), 455–492 (1998)
  • [16] Zaefferer, M.: Combinatorial efficient global optimization in R - CEGO v2.2.0. online: (2017),, accessed: 2018-01-10
  • [17] Zeng, Z., Tung, A.K.H., Wang, J., Feng, J., Zhou, L.: Comparing stars: On approximating graph edit distance. Proc. VLDB Endow. 2(1), 25–36 (2009)