Genetic Programming (GP)  is widely known as being highly computationally intensive. This is due to candidate GP programs being typically evaluated using an interpreter which is an inefficient method of running a program due to the use of a conditional statement at each step in order to ascertain which instruction to execute. Moreover, GP is a population based technique with generally large population sizes of candidate GP programs. As such, there have been many improvements to the execution speed of GP from a compiled approach or direct machine code, exploiting parallel computational hardware or finding efficiencies within the technique such as caching common subtrees. Recent advances in computational hardware have led to multi-core architectures such that very fast parallel implementations of GP have been implemented but a drawback is that although they are fast, they are not efficient. Techniques such as subtree caching are difficult to use with these parallel implementations. Searching for common subtrees can incur a significant time cost slowing execution speeds.
This paper introduces an efficiency saving that can be made by exploiting the characteristics of tournament selection which can be easily implemented within a CPU based parallel GP approach. Consequently, a considerable performance gain can be made in the execution speed of GP. The paper is laid out as follows: Section 2 describes GP and prior methods of improving the speed and efficiency of the technique. Section 3 introduces a tournament selection strategy embedded with a highly parallel GP model and demonstrates efficiency savings with regards classification tasks. Finally, Section 4 demonstrates significant enhancements by consideration of evaluated solutions surviving intact between generations.
Genetic Programming (GP)  solves problems by constructing programs using the principles of evolution. A population of candidate GP programs are maintained and evaluated as to their respective effectiveness against a target objective. New populations are generated using the genetic operators of selection, crossover and mutation. Selection is usually conducted with tournament selection where a subset of GP programs compete to be selected as parents based on their fitness. The evaluation of a GP program is typically achieved by interpreting it against a set of fitness cases. GP is computationally intensive as a result of using an interpreter, maintaining typically large populations of programs and often, using a large volume of fitness cases such as for classification tasks.
Recently, with the advent of multi-core CPUs and many core GPUs, the focus has been on creating highly parallel implementations of GP achieving speedups of several hundred [2, 3, 1, 4, 5]. However, prior to the move to parallel architectures the primary method of improving the speed of GP was through efficiency savings. A simple methodology is to reduce the number of fitness cases by dynamic sampling of the more difficult instances . A further selection strategy known as Limited Error Fitness (LEF) was investigated by Gathercole and Ross for classification problems whereby an upper bound on the number of permissible misclassifications was used to terminate evaluations . Maxwell implemented a time rationing approach where each GP program was evaluated for a fixed time . Tournament selection was then performed and the fittest candidate GP program was declared the winner. Smaller programs could evaluate more fitness cases. Teller used a similar technique known as the anytime approach .
With regards tournament selection Teller and Andre introduced a technique known as the Rational Allocation of Trials (RAT)  whereby population members are evaluated on a small subset of individuals. A prediction model is then used to test if further evaluations should be made to establish the winners of tournaments with a 36x speedup but only small populations are considered and regression type problems. Park et al. implemented a methodology whereby the fitness of the best candidate GP program within the current population is tracked and when evaluating a GP program, if the accumulated fitness becomes worse than this best found then the evaluation is terminated and the fitness approximated . Speedups of four fold were reported with little degradation in overall fitness. Finally, Poli and Langdon ran GP backwards by generating tournaments for the whole GP process and working backwards detecting offspring that are not sampled by tournament selection and not evaluating them or their parents if all offspring are not sampled . However, most of these methods do not accurately reflect tournament winners thereby changing the evolutionary path.
3 Improved Efficiency Through Tournament Selection
The typical methodology of GP is to completely evaluate a population of GP programs to ascertain their fitness. A new population is then constructed using the GP operators of selection, crossover and mutation. The selection process typically chooses two parents from the current population to generate two offspring programs using crossover and mutation. The most widely used selection operator within GP is tournament selection whereby GP programs are randomly selected from the current population to form a tournament. The program within this tournament with the best fitness is then selected as a parent.
However, tournament selection can be exploited by considering an
alternative evaluation methodology. Rather than evaluating each
candidate GP program over every single fitness case before moving
onto the next candidate, consider the opposite approach whereby
all programs are evaluated upon a single fitness case before
considering the next fitness case. Using this approach, the
fitness levels of candidate GP programs slowly build up and
facilitates comparisons between programs during the evaluation
process. Consequently, if a set of tournaments is generated prior
to the evaluation stage it is possible to ascertain which
candidate GP programs within a given tournament reach a stage
whereby they cannot possibly win. Moreover, if a candidate
GP program is deemed unable to win any of the tournaments
it is involved in before all the fitness cases have been evaluated
upon, there is clearly no reason to continue to evaluate it.
Consequently an efficiency saving can be realised using this
approach. This technique can be described as smart sampling
whereby only the minimum fitness cases necessary to establish
losers of tournaments is required.
Consider a classification problem whereby the fitness metric is the sum of correct classifications with a tournament size of two to select a potential parent. Using ten fitness cases the first GP program of the tournament correctly classifies the first six cases and the second incorrectly classifies them. Effectively the second GP program in the tournament cannot possiblywin as the maximum correct classifications it can now achieve is four from the remaining fitness cases. If this second GP program is not involved in any other tournament then there is no value in continuing to evaluate it for the remaining fitness cases as it will definitely not be selected as a parent. Consequently, a 40% efficiency saving can be achieved on the evaluation of the second GP program within the tournament.
To realise this potential efficiency saving a new GP evaluation model is proposed whereby at each generation a set of tournaments composed of randomly selected candidate GP programs from the population are generated prior to the evaluation. Using a system whereby two selected programs generate two offspring, this set consists of population size tournaments each of which contain tournament size randomly selected programs. Each fitness case is then evaluated by every member of the population before the next fitness case is considered. Before a program is evaluated on the given fitness case it is checked to establish if it is has effectively lost all of the tournaments it is involved in and if so labelled as a loser and not to be further evaluated. The efficient tournament selection work suggested here ensures the same candidate GP programs win tournaments as with standard GP. Algorithm 1 provides a high level overview of the efficient tournament selection model.
To determine if a candidate GP program has lost all of the tournaments it is involved in, each tournament needs to be checked once the current fitness case has been evaluated upon by all candidate GP programs in the population. Thus, each candidate GP program maintains the subset of tournaments it is involved in. For each of these tournaments, the currently best performing candidate GP program in the tournament is identified and its fitness compared with the fitness of the candidate GP program under consideration. If using a given fitness metric it can be ascertained that the GP program under consideration cannot beat this best then it is designated as having lost this tournament. If a program is deemed to have lost all of the tournaments it is involved in then it is designated as requiring no further evaluation. A high level description of tournament checking is shown in Algorithm 2. Using this approach all the candidate GP programs that would win the tournaments using a standard implementation of GP will still win their respective tournaments. Consequently, there is no disturbance to the evolutionary path taken by GP but an efficiency saving can be achieved.
3.1 Efficient Tournament Selection and a Fast GP Approach
Given that the goal of efficient tournament selection is to reduce the computational cost of GP and hence improve the speed, it is only natural that the technique should be able to operate within a fast parallel GP model. Integrating the efficient tournament selection model with a GPU implementation would prove difficult without compromising the speed of the approach as communication across GPU cores evaluating differing GP programs is difficult. The alternative platform is a CPU based parallel GP  which introduced a two dimensional stack approach to parallel GP demonstrating significantly improved execution times. A multi-core CPU with limited parallelism was used with the two dimensional stack model to exploit the cache memory and reduce interpreter overhead. In fact, this model actually operates similarly to efficient tournament selection, GP programs are evaluated in parallel over blocks of fitness cases. Once all the candidate GP programs have been evaluated on a block of fitness cases, the next block of fitness cases is considered. Using blocks of fitness cases provided the best utilisation of cache memory and hence the best speed.
Subsequently, the efficient tournament selection model is implemented within this two dimensional stack GP model using a CPU and instead of evaluating candidate GP programs on a single fitness case at a time, they are evaluated on a larger block of fitness cases. Once the block of fitness cases has been evaluated upon then candidate GP programs that may have lost all of their respective tournaments can be identified. A block of 2400 fitness cases is used which was identified by Chitty  as extracting the best performance from the cache memory and efficiency from reduced reinterpretation of candidate GP programs.
3.2 Initial Results
In order to evaluate if efficient tournament selection can provide a computational saving it will be tested against three classification type problems. The first two are the Shuttle and KDDcup classification problems available from the Machine Learning Repository consisting of 58,000 and 494,021 fitness cases respectively. The GP function set for these problems consists of *, /, +, -, , , ==, AND, OR, IF and the terminal values the input features or a constant value between -20,000.0 and 20,000.0. The third problem is the Boolean 20-multiplexer problem  with the goal to establish a rule which takes address bits and data bits and correctly outputs the value of the data bit which the address bits specify. The function set consists of AND,OR,NAND,NOR and the terminal set consists of A0-A3, D0-D15. There are 1048576 fitness cases which can be reduced using bit level parallelism such that each bit of a 32 bit variable represents a differing fitness case reducing fitness cases to 32768.
The results for these experiments were generated using an i7 2600 Intel processor running at 3.4GHz with four processor cores each able to run two threads of execution independently. The algorithms used were compiled using Microsoft Visual C++. Table 1 provides the GP parameters that were used throughout the work presented in this paper. Each experiment was averaged over 25 runs for a range of differing tournament sizes to demonstrate how efficiency can change dependant on the selection pressure.
|Population Size :||4000||Maximum Generations :||50|
|Maximum Tree Depth :||50||Maximum Tree Size :||1000|
|Probability of Crossover :||0.50||Probability of Mutation :||0.50|
In order to use efficient tournament selection a fitness metric is required which establishes if a given candidate GP program cannot win a tournament. This metric is described as comparing the classification rates of the best performing GP program in a tournament with the rate of the candidate GP program under consideration. If the performance of the best is greater than that of the program under consideration whilst also assuming that the all the remaining fitness cases are correctly classified then the program under consideration cannot possibly win. This can also be described as the candidate GP program under consideration being mathematically unable to win the given tournament.
Total efficiency saving is measured as the number of fitness cases not evaluated by each GP program multiplied by their size divided by the sum of the size of all GP programs evaluated multiplied by the number of fitness cases. It should be noted that this work is concerned with efficiency in the training phase of GP and not classification accuracy. These rates are provided merely as a demonstration that the same results are achieved between techniques.
Table 2 demonstrates the performance of a standard GP approach using the 2D stack model  and the efficient tournament selection model for a range of tournament sizes. Note that for two of the problem instances, as the tournament size increases, the average GP tree size similarly increases which obviously increases the execution time of GP. Also note that there is no deviation from the classification accuracy from both approaches as would be expected as there has been no deviation from the evolutionary path. However, in all cases, an efficiency saving has been observed. The greatest efficiency saving is made with the lowest levels of tournament size as a result of the non-sampled issue whereby some members of the population are not involved in any tournaments . If a candidate GP program is not involved in any tournaments then there is no value in evaluating it. Additionally, in cases of low selection pressure, a GP program is likely to be involved in few tournaments and thus a poor solution can quickly lose all its tournaments. Also note that as the tournament size increases to ten or greater the efficiency savings begin to improve once more. This effect is due to an increased probability of a highly fit GP program being in any given tournament making it easier to identify weak solutions at an earlier stage.
|Problem||Tourn. Size||Class Accuracy (%)||Av. Tree Size||Standard GP Execution Time (s)||Efficient Tournament Selection GP|
|Efficiency Saving (%)||Execution Time (s)||Speedup|
The KDDcup classification problem demonstrates the greatest efficiency savings with up to a 13.8% saving. The multiplexer problem demonstrates the lowest efficiency saving as a result of the lower accuracy achieved being a more difficult problem. Consider that as the fitness improves within the population during the evolutionary process, identifying weak candidate GP programs becomes easier to achieve at an earlier stage thereby increasing efficiency savings. The speedups observed are less than the efficiency savings as a result of the computational cost associated with repeatedly establishing if candidate GP programs have not lost any tournament they are involved in. Clearly the use of the efficient tournament selection technique has provided a boost in the performance of GP with a minor speedup in all cases with a maximum of 15% achieved.
4 Consideration of Previously Evaluated Individuals
section demonstrated that speedups in GP can be achieved using the
efficient tournament selection technique whilst not affecting the
outcome of the GP process but the gains are rather limited as at
least 50% of the fitness cases need to be evaluated before
losers of tournaments can be identified. The fitness metric
is boolean in nature in that a fitness case is either correctly
classified or not so a GP program cannot have
mathematically lost a tournament whilst 50% of
fitness cases remain. However, the crossover and mutation
parameters used both had a probability of 0.5. Consequently,
approximately 25% of candidate GP programs will survive intact
into the next generation and not reevaluating these will in itself
provide an efficiency saving of approximately 25%. More
importantly, these GP programs are previous tournament
winners whereby their complete fitness is known. This will
make it possible to identify losers of tournaments at an
earlier stage in the evaluation process. Additionally, as
winners of tournaments, they are likely highly fit and
leaving little margin for error for other tournament contenders.
Example: Consider a tournament size of two where the first GP program has survived intact into the next generation and been previously evaluated correctly classifying eight of the ten fitness cases. If the second candidate GP program incorrectly classifies more than two of the fitness cases then it cannot possibly win the tournament. So if the first three fitness cases are incorrectly classified the solution has lost the tournament and an efficiency saving of 70% can be achieved.
To test this theory the standard approach to GP is rerun as a benchmark but this time not reevaluating candidate GP programs which survive intact into the next generation. Additionally, candidate GP programs which are not involved in any tournament, the non-sampled issue, are also not evaluated. The results are shown in Table 3 with an expected 25% reduction in execution speed through this efficiency for standard GP. The comparison results from the efficient tournament selection method are also shown in Table 3. Efficiency savings are now considerably higher than those achieved in Table 2 even taking into account the efficiency savings achieved by not reevaluating intact candidate GP programs. Indeed, the additional efficiency in now as much as 30%. Subsequently, it can be considered that having previously evaluated candidate GP programs within tournaments makes it easier to establish losers of tournaments hence increasing the efficiency savings. Furthermore, it can also be observed that in cases of higher selection pressure the efficiency savings increase further. Indeed, in the case of the KDDcup classification problem, efficiency savings of 60% are achieved. The reason for this is that GP programs that survive intact into the next generation have won greatly competitive tournaments and are thus highly fit making it easier to establish losers of tournaments. Note that there is still no change in the evolutionary path when using efficient tournament selection.
|Problem||Tourn. Size||Class Accuracy (%)||Av. Tree Size||Standard GP Execution Time (s)||Efficient Tournament Selection GP|
|Efficiency Saving (%)||Execution Time (s)||Speedup|
In terms of the effective speed of GP from using these tournament selection efficiency savings, the greatest increase in speed has been achieved for the KDDcup problem with a 1.68x performance gain when using larger tournament sizes. Indeed, for all problem instances, the best performance gains are observed from higher selection pressure. The higher the selection pressure, the greater probability that highly fit candidate GP programs survive intact into the next generation.
Given that having candidate GP programs that have survived intact into subsequent generations has been shown to make it easier to correctly identify losers of tournaments it could be considered that using the elitism operator would have a similar effect. Elitism involves the best subset of candidate GP programs in a given population being copied intact into the next generation meaning the best solutions are never lost from the population. As previously, these elitist candidate GP programs do not need to be reevaluated thus providing a basic efficiency saving. Indeed, Table 4 shows the results generated from standard GP using an elitism operator of 10% of the population and also not evaluating non-modified or non-sampled candidate GP programs. From these results it can be seen that an expected approximate 32% efficiency saving is now achieved in cases of high selection pressure whereby non-sampling is not an issue. It should be noted that the use of the elitism operator has had an effect on the classification accuracy achieved leading to slight differences when compared to the results in Table 2.
|Problem||Tourn. Size||Classification Accuracy (%)||Av. Tree Size||Execution Time (s)||GPop/s (bn)||Efficiency Saving (%)|
|Problem||Tourn. Size||Classification Accuracy (%)||Av. Tree Size||Execution Time (s)||GPop/s (bn)||Efficiency Saving (%)||Speedup|
It should be expected from the earlier results that the elitism operator will improve the savings of the efficient tournament selection model further and as such the experiments are rerun using 10% elitism with the results shown in Table 5. It should be firstly noted that the classification accuracy and average tree sizes differ slightly to that those of the standard GP approach as shown in Table 4, a divergence in the evolutionary process has occurred. The reason behind this is that a highly fit candidate GP program capable of being selected by the elitism operator was not selected because it lost all the tournaments it was involved in and hence its continued evaluation terminated resulting in a lower fitness level and thereby no longer qualifying for selection by elitism. Normally, it would be selected even though it would not win any tournaments. However, the effect on classifier accuracy is minimal and in some cases the accuracy is actually improved. Efficiency savings of 40-65% are achieved for all problem instances with a peak occurring for the KDDcup classification problem and a high level of selection pressure whereby highly fit solutions are more influential. Indeed, for all problem instances, the greatest efficiency savings are once again achieved for high selection pressure. It should be noted though that when using increased tournament sizes, the average execution time tends to be greater as a result of larger GP trees being considered. Thus, a tradeoff needs to be considered between efficiency savings and the increase in the size of the average GP tree.
From the results observed in Table 4, using the elitism operator should provide an additional efficiency of approximately 7%. Observing the efficiency savings in Table 5 and comparing with the previous results in Table 3, efficiency savings have improved from between 8% and 12%. Thereby, it can be considered that use of the elitism operator has further benefited identification of candidate GP programs who mathematically cannot win tournaments at an earlier point in the evaluation phase. In terms of performance compared to a standard implementation of GP which uses elitism and does not reevaluate non-modified candidate GP programs, classifier training on the KDDcup problem occurs up to 1.74x faster. A brief mention of Genetic Programming Operations per Second (GPop/s) should be made with a maximum effective GP speed of 96 billion GPop/s achieved for the KDDcup classification problem and 1116 billion GPop/s for the multiplexer problem benefiting from an extra 32x bitwise parallelism.
In this paper a methodology for evaluating candidate GP programs is presented which provides significant computational efficiency savings even when embedded within a high performance parallel model. This methodology exploits tournament selection in that it is possible to identify losers of tournaments before evaluation of all the fitness cases is complete. Essentially, by evaluating all GP programs on subsets of fitness cases before moving to the next subset, comparisons can be made between solutions and hence losers of tournaments can be identified and early termination of the evaluation of these solutions achieved. This approach was shown to provide minor efficiency savings and hence runtime speedups. However, this paper discovers that the true advantage of the technique arises when solutions survive intact into the next generation. These solutions have won tournaments so are highly fit with a known fitness enabling much earlier detection of losers of tournaments especially when using high selection pressure. Efficiency savings of up to 65% and subsequent speedups in execution speed of up to 1.74x were demonstrated with a peak rate of 96 billion GPop/s achieved by GP running on a multi-core CPU. Further work needs to consider alternative fitness metrics such as the Mean Squared Error (MSE) for regression problems, combining the technique with sampling approaches and alternative methods for correctly predicting tournament losers earlier.
This is a pre-print of a contribution published in Lotfi A., Bouchachia H., Gegov A., Langensiepen C., McGinnity M. (eds) Advances in Computational Intelligence Systems. UKCI 2018. Advances in Intelligent Systems and Computing, vol 840 published by Springer. The definitive authenticated version is available online via https://doi.org/10.1007/978-3-319-97982-3_4.
-  Augusto, D.A., Barbosa, H.J.: Accelerated parallel genetic programming tree evaluation with OpenCL. Journal of Parallel and Distributed Computing 73(1), 86–100 (2013)
-  Cano, A., Zafra, A., Ventura, S.: Speeding up the evaluation phase of GP classification algorithms on GPUs. Soft Computing 16(2), 187–202 (2012)
-  Chitty, D.M.: Fast parallel genetic programming: multi-core CPU versus many-core GPU. Soft Computing 16(10), 1795–1814 (2012)
-  Chitty, D.M.: Improving the performance of GPU-based genetic programming through exploitation of on-chip memory. Soft Computing 20(2), 661–680 (2016)
-  Chitty, D.M.: Faster GPU-based genetic programming using a two-dimensional stack. Soft Computing 21(14), 3859–3878 (2017)
-  Frank, A., Asuncion, A.: UCI machine learning repository (2010)
Gathercole, C., Ross, P.: Dynamic training subset selection for supervised learning in genetic programming. In: International Conference on Parallel Problem Solving from Nature. pp. 312–321. Springer (1994)
-  Gathercole, C., Ross, P.: Tackling the boolean even N parity problem with genetic programming and limited-error fitness. Genetic programming 97, 119–127 (1997)
-  Koza, J.R.: Genetic programming (1992)
Maxwell, S.R.: Experiments with a coroutine execution model for genetic programming. In: Evolutionary Computation, 1994. IEEE World Congress on Computational Intelligence., Proceedings of the First IEEE Conference on. pp. 413–417. IEEE (1994)
-  Park, N., Kim, K., McKay, R.I.: Cutting evaluation costs: An investigation into early termination in genetic programming. In: Evolutionary Computation (CEC), 2013 IEEE Congress on. pp. 3291–3298. IEEE (2013)
-  Poli, R., Langdon, W.: Running genetic programming backwards. In: Yu, T., Riolo, R., Worzel, B. (eds.) Genetic Programming Theory and Practice III, Genetic Programming, vol. 9, pp. 125–140. Springer US (2006)
-  Teller, A.: Genetic programming, indexed memory, the halting problem, and other curiosities. In: Proceedings of the 7th annual Florida Artificial Intelligence Research Symposium. pp. 270–274 (1994)
-  Teller, A., Andre, D.: Automatically choosing the number of fitness cases: The rational allocation of trials. Genetic Programming 97, 321–328 (1997)