In many optimisation applications there is a need to deal with noisy evaluation functions, and also to make best possible use of a limited evaluation budget. Traditional, population-based evolutionary algorithms most commonly cope with noise by re-evaluating each evaluated individual several times and/or increasing the size of the population. For a recent survey refer to Rakshit et al .
The motivation behind our work is to develop powerful optimisation algorithms that are well-suited to applications in Game AI. The applications include rolling-horizon planning algorithms used to control non-player characters or bots, and to automatic game design or automatic game tuning. Both of these applications have many forms of uncertainty which introduce significant noise into the fitness evaluation function. Furthermore, they each operate under a limited time budget, so there is a strong need for algorithms that make the best possible use of a limited number of fitness (objective function) evaluations.
The algorithms developed in this paper are already showing promise at the initial testing phase on exactly these types of problems, including General Video Game AI [2, 3], but in this paper we focus on describing the algorithms and providing results on some simple benchmark problems.
The rest of this paper is structured as follows. The next section gives a very brief overview of the most relevant background work. Section III describes two variations of the compact Genetic Algorithm (cGA): the Multi-Sample version and the Sliding Window version. Both improve significantly on the standard cGA, with the sliding window version (a novel algorithm) providing the best results in our experiments. Section IV describes the test problems used and Section V presents the results. Section VI concludes and also discusses ongoing and future work.
, no cross-over or mutation is involved. Instead, the cGA uses a virtual population represented as a probability distribution over the set of binary strings. At each optimisation iteration, exactly two individuals are sampled from the distribution and evaluated for fitness to pick a winner and loser. The probability distribution is then adjusted to increase the probability of generating the winning vector. The iterations continue until either the evaluation budget has been exhausted or the distribution has converged (such that the probability of producing a ‘1’ in each position is either zero or one).
The cGA performs well in noisy conditions, as clearly shown in Friedrich et al . The main contribution of this paper is to develop two new versions of the cGA that make even more efficient use of the available evaluation budget. Both algorithms are efficient and easy to implement.
The standard cGA is a type of Univariate Estimation of Distribution Algorithm, since each dimension is considered independently of all others. Harik emphasises that the algorithm can be sensitive to the probability model used to model the virtual population and that “the choice of a good distribution is equivalent to linkage learning” . Regarding this, the extended cGA (ECGA) using a set of probability models known as Marginal Product Models (MPMs) is proposed . An MPM can represent a probability distribution over one bit or a tuple of bits, taking into account the dependency among the bits, and hence modelling higher-order effects. A similar idea has been used in a so-called N-Tuple system by Kunanusont et al. .
The multiple sample variations of the cGA introduced in this paper should also work with higher-order probability models, but this has not yet been implemented or tested.
Ii-a Compact Genetic Algorithm (cGA)
In this section we describe the standard cGA in Alg. 1, as introduced in . It models a population using a -dimensional vector of probabilities, with an element for each bit of the solution. The algorithm has one parameter which refers to the virtual population size and determines the learning rate, .
Each element of the vector is initialised to and represents the probability that each corresponding value in the solution string should be a . At each iteration two random binary vectors are produced by sampling from the probability vector, and the fitness of each one is evaluated (). The fitness values are compared to determine a winner and loser () and the probability distribution is updated: no update occurs if the two vectors have the same value. The algorithm then iterates over each dimension, comparing each candidate bit-by-bit. Updates to the probability vector only occur when the corresponding bits in the winner and loser differ. If the winning bit is , then the probability of producing a in that position is increased by , otherwise (i.e. if the winning bit is a ) the probability is decreased by , as defined in .
The algorithm terminates when the probability vector has converged as defined in function ). We also stop the algorithm when the total evaluation budget has been consumed. The solution found by the algorithm is the argmax of the probability vector, i.e. a whenever the corresponding probability is less than and a otherwise (cf. )).
Note that this is a rank-based algorithm, in that the magnitude of the difference in fitness between winner and loser makes no difference.
The single parameter, virtual populations size , has an important effect on the algorithm’s behaviour. Setting too low (and hence the learning rate too high) causes premature convergence of the probability vector, and results in very poor final solutions. Setting too high causes slower than necessary convergence but does not harm solution quality so much. Friedrich et al  show how to setvalues to the one used in .
Iii Multiple-Sample cGA Variants
This section describes the novel contribution of the paper.
The motivation for the new algorithm is to make better use of the fitness evaluations in order to find optimal or close to optimal solutions more quickly. Observe that in the standard cGA, at each iteration we draw two samples from the distribution and make one comparison and one update of the probability vector. This gives us an update per sample ratio of . The question arises as to whether we can make more efficient use of the samples we evaluate (keeping in mind that for most practical applications, the main cost of an evolutionary algorithm is in performing fitness evaluations).
This observation raises the question as to whether we may make better use of the fitness evaluations if we make more comparisons and updates for each one. This leads us on to the following two algorithms. The first is a natural extension of the cGA to increase the number of individuals sampled at each iteration. Note that this algorithm was described in . The second version only samples and evaluates a single candidate solution at each iteration, but then makes comparisons and updates with a number of previously evaluated vectors stored in a sliding history window.
Iii-a Multiple-Sample per Iteration cGA
In the multi-sample version we now make samples per iteration and present it in Alg. 2. Apart from this detail, the algorithm is very similar to the standard cGA. Since we are now making samples and evaluations, we now have comparisons and updates to make. For instance, for the ratio of updates per sample is now , nine times higher than the standard case.
Note that an algorithm similar to this was described in  though the way the algorithm was listed did not separate the fitness evaluation from the comparison (which is necessary in order to make best use of the fitness evaluation budget), though this detail may have been considered to be a low-level implementation detail by the authors. More importantly, the results presented in  for the multi-sample case were not particular good, perhaps due to a poorly chosen value. When making more updates per fitness evaluation, needs to be set higher to avoid premature convergence.
This could be the reason why recent work on the cGA  has not mentioned the Multiple-Sample variant. We will show that when is chosen well, the Multiple-Sample cGA (MScGA) greatly outperforms the standard cGA.
Iii-B Sliding Window cGA
While the MScGA aims to provide more efficient use of the available fitness evaluations, it suffers from the fact that the probability vector is only updated after all the samples for an iteration have been drawn.
However, it may be beneficial to update the probability vector more frequently, ideally after every new sample has been drawn and evaluated. This is exactly what the Sliding Window cGA (SWcGA) achieves. In addition to the parameter , this algorithm adds the parameter for the size of the window.
Again, the algorithm is similar to the standard cGA, except that now every time a sample is drawn from the probability vector, the fitness is evaluated and then the scored vector is compared with every other one in the window, and for each comparison the probability vector is updated. Note that each sample only has its fitness evaluated once and stored together with the sample in the sliding window (which can be implemented as a circular buffer or a FIFO queue). See algorithm 3 for the listing. For each new candidate sampled, assuming steady state when the buffer is already full, we make comparisons and updates with the previously evaluated samples. Hence, the ratio of comparisons and updates to fitness evaluations is . After the comparisons and updates have been made, the new scored sample is added to the sliding window buffer, replacing the oldest one if the buffer is already full (i.e. already has scored samples in it).
Iv Test Problems
We considered two binary optimisation problems based on bit strings: the OneMax problem corrupted by additive Gaussian noise, namely noisy OneMax, and the noisy PMax problem.
Iv-a Noisy OneMax
The OneMax problem aims at maximising the number of bits in a binary string. Let denote Gaussian noise with mean and variance , and is an -bit binary string. The -bit OneMax problem with additive Gaussian noise is formalised as . Friedrich et al  have proven that with high probability, the standard cGA with converges to the optimal distribution after iterations when optimising a noisy OneMax with variance , where is used. Thus, in the case considered in this paper (, ), the should be to guarantee the convergence. This setting is compared to as baselines in our experiments.
Iv-B Noisy PMax
The Noisy PMax problem is proposed by Lucas et al.  to represent an artificial game outcome optimisation problem. In this artificial model, is treated as an -bit binary number, and the true winning rate of is defined as , where denotes the numeric value of located between and . Thus, the outcome of a game is either win () with probability or loss, otherwise.
This problem formulation is also relevant to learning playout control parameters for Monte Carlo Tree Search (MCTS) , where the parameters control the biases for a stochastic playout policy. The efficient cGA variants described in this paper should be able to improve on the simple evolutionary algorithms used in [12, 13] but this has not yet been tried. The relevance is due to three factors: the extreme noise when evaluating stochastic playout policies, and the requirement for rapid adaptation (a feature of the MScGA algorithms), and the expectation that different parameters have very different levels of importance in controlling the playouts.
V Experimental results and discussion
V-a Experimental setting
We consider two baseline algorithms, the standard cGA and the Random Mutation Hill Climber (RMHC) on each of the tested problems. For each experimental run each algorithm was given a fixed maximum budget of fitness evaluations. Thus, the cGA and its variants stopped when the stopping condition defined in Alg. 1 was met, or the (noisy) fitness function had been evaluated times. Note that we did not use first hitting time as a measure, since this has been shown to give misleading results for noisy optimisation problems . Since each algorithm under test is able to return its best guess (by p used by cGA and its variants) or best solution found so far (RMHC) at any iteration, we plotted the true (noise-free) fitness of the current solutions of each algorithm at each iteration. Hence, in addition to the final fitness found we may also observe how fitness evolves over time.
First, we optimise separately the problems using cGA with different virtual population size and using RMHC with different resampling number , then choose the and with the best performance, respectively. More study on the optimal resampling using RMHC on the OneMax with additive Gaussian noise can by found in .
V-B Noisy OneMax
. The standard error is also given as a faded area around the average.
The performance of the standard cGA with different virtual population size and RMHC with different resampling number on the -bit noisy OneMax problem is illustrated in Fig. 1. Fig. 0(a) shows that the cGA with virtual population size performs the best when the maximal fitness evaluation number is larger than . It is notable that using small virtual population size, the cGA converges quickly to a good solution at the early stage of optimisation then never finds the optimum. As the RMHC with variant resampling numbers (see Fig. 0(b)) does not outperform the standard cGA with best (blue curve in Fig. 0(a)), our algorithms are directly compared to the standard cGA with .
Fig. 2 compares the performance of MScGA instances using different virtual population size and sampling number and Fig. 3 compares the performance of SWcGA instances using different virtual population size and sliding window width on the identical noisy OneMax problem. When is close to , the more samples there are, the worse the solution recommended by MScGA at each iteration is; the wider the sliding window is, the worse the solution recommended by SWcGA at each iteration is. When is large, larger sample number leads to better performance of MScGA and MScGA significantly outperforms the best standard cGA, but the difference led by using different sample number is minor; wider window leads to better performance of SWcGA, and the overall performance is better than MScGA. However, very big will weaken the performance of both MScGA and SWcGA. MScGA and SWcGA with optimal parameter setting have similar performance, but MScGA is less sensitive to its parameter, sample number .
The best parameter settings of each of the algorithms are listed and compared in Fig. 4, as well as the averaged final probability vector over 100 trials. Though MScGA converges faster than SWcGA, it did not stop with better solutions than SWcGA. The averaged noise-free fitnesses of the final recommendations are 98.68 ( 0.12) by cGA, 100.00 ( 0.00) by MScGA and 100.00 ( 0.00) by SWcGA.
The performance of the standard cGA with different virtual population size and RMHC with different resampling number on the PMax problem is illustrated in Fig. 5. As the RMHC with variant resampling numbers does not outperform the standard cGA with best setting (pink curve in Fig. 4(a)), our algorithms are directly compared to the standard cGA with .
The performance of MScGA and SWcGA with different parameter settings are compared to cGA with (black curves) in Figs. 6 and 7, respectively. The best parameter settings of each of the algorithms are listed and compared in Fig. 8, as well as the averaged final probability vector over 100 trials. The SWcGA slightly outperforms the MScGA.
Vi Conclusion and Future Work
This paper introduced a simple but important principle to improve the performance of the compact Genetic Algorithm: to make best possible use of each fitness evaluation by reusing the result in multiple comparisons, and hence in multiple updates of the probability distribution.
This principle was used to develop two variations of the algorithm: the first made multiple samples, comparisons and updates at each iteration, while the second one made just one sample at each iteration, but then performed multiple comparisons and updates by accessing a sliding window of previously evaluated candidates (samples).
Both algorithms significantly outperformed the standard cGA, with the sliding window version performing best. The sliding window version is therefore the one we are focusing on in on-going work. In addition to offering the best performance at the end of the each run, it also consistently offered better recommendations at nearly every stage of each run, making it a better choice as an anytime algorithm for use in real-time game AI. The sliding window variant is better as an anytime algorithm as it adds only a single candidate solution per iteration, meaning that the update of the recommendation happens more frequently.
Another interesting observation is the ability of cGA, MScGA and SWcGA to correctly recommend the optimal solution without having actually sampled it. Appendix References summarises the number of times that the optimal solution has been generated among the 100 trials in each experiment. For instance, among the 100 optimisation trials on noisy OneMax by MScGA with and , the optimal solution has been visited only 5 times but the algorithm has never failed in recommending the true optimal solution after fitness evaluations.
We are currently extending the work in two ways. The first is to allow multi-valued strings, since binary is an unnatural way to represent many problems. The second is to explore alternative ways to model the probability distribution. Both of these are already yielding positive results and will be the focus of future research. Also relevant is our recent work on bandit-based optimisation [8, 15], which explicitly balances exploration versus exploitation, but has not yet been combined with the sliding window approach developed here. There is reason to believe that such a combination will be beneficial.
P. Rakshit, A. Konar, and S. Das, “Noisy Evolutionary Optimization
Algorithms-A Comprehensive Survey,”
Swarm and Evolutionary Computation, 2016.
-  D. Perez-Liebana, S. Samothrakis, J. Togelius, T. Schaul, S. M. Lucas, A. Couëtoux, J. Lee, C.-U. Lim, and T. Thompson, “The 2014 general video game playing competition,” IEEE Transactions on Computational Intelligence and AI in Games, vol. 8, no. 3, pp. 229–243, 2016.
-  J. Liu, J. Togelius, D. Pérez-Liébana, and S. M. Lucas, “Evolving game skill-depth using general video game ai agents,” arXiv preprint arXiv:1703.06275, 2017.
-  G. R. Harik, F. G. Lobo, and D. E. Goldberg, “The Compact Genetic Algorithm,” IEEE transactions on evolutionary computation, vol. 3, no. 4, pp. 287–297, 1999.
-  M. Mitchell, An Introduction to Genetic Algorithms. MIT press, 1998.
-  T. Friedrich, T. Kötzing, M. S. Krejca, and A. M. Sutton, “The Compact Genetic Algorithm is Efficient Under Extreme Gaussian Noise,” IEEE Transactions on Evolutionary Computation, vol. 21, no. 3, pp. 477–490, June 2017.
-  G. Harik, “Linkage Learning via Probabilistic Modeling in the ECGA,” Urbana, vol. 51, no. 61, p. 801, 1999.
-  K. Kunanusont, R. D. Gaina, J. Liu, D. Perez-Liebana, and S. M. Lucas, “The N-Tuple Bandit Evolutionary Algorithm for Automatic Game Improvement,” in 2017 IEEE Congress on Evolutionary Computation (CEC), 2017.
-  G. R. Harik, F. G. Lobo, and D. E. Goldberg, “The Compact Genetic Algorithm,” Urbana, vol. 51, p. 61801, 1997.
-  S. M. Lucas, J. Liu, and D. Pérez-Liébana, “Evaluating Noisy Optimisation Algorithms: First Hitting Time is Problematic,” https://arxiv.org/abs/1706.05086, 2017.
-  T. Cazenave, “Playout policy adaptation with move features,” Theoretical Computer Science, vol. 644, pp. 43–52, 2016.
-  S. M. Lucas, S. Samothrakis, and D. Perez-Liebana, “Fast evolutionary adaptation for Monte Carlo Tree Search,” in European Conference on the Applications of Evolutionary Computation, 2014, pp. 349 – 360.
-  D. Perez-Liebana, S. Samothrakis, and S. M. Lucas, “Knowledge-based fast evolutionary MCTS for general video game playing,” in IEEE Conference on Computational Intelligence and Games, 2014.
-  J. Liu, M. Fairbank, D. Pérez-Liébana, and S. M. Lucas, “Optimal Resampling for the Noisy OneMax Problem,” arXiv preprint arXiv:1607.06641, 2016.
-  J. Liu, D. Pérez-Liébana, and S. M. Lucas, “Bandit-based random mutation hill-climbing,” in Evolutionary Computation (CEC), 2017 IEEE Congress on. IEEE, 2017, pp. 2145–2151.