Evolutionary techniques (Simon, 2013)
have been serving as important tools in the Artificial Intelligence (AI) and Computational Intelligence (CI) tool-set. In this article, we present a new method of selecting individuals for the crossover and grouping them into pairs that should undergo the crossover operation. This research has been inspired by an upcoming project aimed at evolving semantic-logical programs(Świechowski and Ślezak, 2020). During initial research towards it, the closest problem of choice we found was the evolutionary decision tree induction. First of all, a decision tree can be structurally similar to a specific class of logical programs. Second of all, constructing a decision tree algorithmically is a well-understood problem, therefore we can focus on the analysis on the crossover phase. Evolving decision trees has been popular and dates back to before 2000 (Siegel, 1994; Papagelis and Kalles, 2000). The authors of (Barros et al., 2013) show that their method is capable of outperforming C4.5 (Quinlan, 2014)
, which is a dedicated decision tree induction algorithm.
The authors of (Ursem, 2002) state that diversity is one of the key factors in the performance of evolutionary algorithms. The diversity-guided algorithms are also subjects of (Alam et al., 2012) for Evolutionary Programming, (Angeline and Kinnear, 1996)
for Genetic Programming and(Algethami and Landa-Silva, ) for a Workforce Scheduling and Routing Problem. The means of measuring population diversity in genetic programming are summarized in (Burke et al., 2002). In the broader field of evolutionary computation, there have been dedicated ways of measuring the diversity proposed, e.g. (Yuhui Shi and Eberhart, 2008)
for Particle Swarm Optimization (PSO) and(Nakamichi and Arita, 2004) for Ant Colony Optimization (ACO).
2. The Proposed Crossover Methods
As a common part of the methods described below, except the Standard one, we compute a measure, for each pair of individuals and in the population, which we will refer to as the complementary fitness. It is a prediction of how fit they might potentially be combined. Afterwards, the pairs are sorted in descending order with respect to complementary fitness and they perform the recombination until at least unique individuals has been recombined. , where is the population size and is the crossover rate.
Novel-2 Method: the underpinning idea is to split the decision tree represented as in genetic programming into two parts and calculate the accuracy for each part, respectively.
Novel-N Method: let denote the decision returned by the decision tree represented by the individual for the -th sample in the training set. Let be the decision for the -th sample in case of the individual. Their complementary fitness is calculated as the number of samples, in which either of the trees correctly predicted the decision:
Standard Method: we will use this name for the baseline. Here, the top
fittest individuals perform the crossover. Among this set, they are matched into pairs with uniform random probability. We have also experimented with a roulette-wheel sampling. It resulted in worse results for the considered problem.
Hybrid-2 and Hybrid-N Methods - in these methods, the population is sorted by the fitness value. Then, the first half of the parents for crossover are determined as in the Standard method and the other half by Novel-2 or Novel-N for Hybrid-2 and Hybrid-N, respectively.
We have tested the five variants introduced in the previous section and compared them with each other. Each tested EA algorithm was initialized with a random population. For evaluation, we used the accuracy (which is also the fitness function) of the best solution found so far by a respective method. Such a value was averaged over independent repeats of each experiment. In addition, we calculated the confidence intervals it. For the implementation of both EA and decision trees, we used the Grail AI library (Świechowski and Ślezak, 2018).
The hybrid methods, i.e., Hybrid-2 and Hybrid-N as well as the baseline Standard method outperformed the non-hybrid counterparts. Therefore, in Table 1, we show a summary of comparison among those three methods. A detailed explanation of the cause of this is one of our future plans. The possible explanation is that the pure novel methods do not facilitate enough population diversity.
In overall, the Hybrid-2 is the clear winner of the experiments. It achieved the highest in all 8 experiments, however in 6 out of 8 experiments the advantage was statistically significant. The two experiments, in which it was not, were (1) with a lower crossover rate and with (2) a smaller number of variables () in the decision tree. The first case suggests that the method gains advantage when having more individuals to work with. The second case suggests the problem needs to be complex enough for the method to show its advantage. For this problematic case, we present a plot of the best fitness value achieved by each method with respect to iteration in Figure 1.
|(V, N, TS, CR)||(compared to standard)|
|(6, 200, 200, 0.5)||better||worse, significant|
|(7, 200, 200, 0.5)||better, significant||similar, inconclusive|
|(8, 200, 200, 0.5)||better, significant||similar, inconclusive|
|(8, 200, 200, 0.25)||better||similar, inconclusive|
|(8, 200, 200, 0.75)||better, significant||worse, significant|
|(8, 100, 200, 0.5)||better, significant||similar, inconclusive|
|(8, 400, 200, 0.5)||better, significant||similar, inconclusive|
|(8, 200, 400, 0.5)||better, significant||worse, significant|
In this paper, we have proposed a new method of choosing individuals for crossover. We evaluated the method using an evolutionary tree induction problem. The proposed method revolves around matching individuals into pairs, which have a high chance of producing fitter offspring. The method is suitable for scenarios in which the fitness is calculated as a sum (or aggregation, in general) of many parts and for each part a partial fitness value can be derived. We have shown that the best way to apply the proposed crossover procedure is by mixing it with the rank-based crossover selection. Such a merger is stronger than either of the methods alone what has been confirmed in 8 empirical experiments.
- Diversity Guided Evolutionary Programming: A novel approach for continuous optimization. Applied soft computing 12 (6), pp. 1693–1707. Cited by: §1.
diversity-based adaptive genetic algorithm for a workforce scheduling and routing problem. In 2017 IEEE Congress on Evolutionary Computation, Cited by: §1.
- Efficiently representing populations in genetic programming. In Advances in Genetic Programming, Vol. , pp. 259–278. External Links: Cited by: §1.
- Evolutionary design of decision-tree algorithms tailored to microarray gene expression data sets. IEEE Transactions on Evolutionary Computation 18 (6), pp. 873–892. Cited by: §1.
- Advanced Population Diversity Measures in Genetic Programming. In Parallel Problem Solving from Nature — PPSN VII, J. J. M. Guervós, P. Adamidis, H. Beyer, H. Schwefel, and J. Fernández-Villacañas (Eds.), Berlin, Heidelberg, pp. 341–350. External Links: Cited by: §1.
- Diversity control in ant colony optimization. Artificial Life and Robotics 7 (4), pp. 198–204. Cited by: §1.
- GA Tree: Genetically Evolved Decision Trees. In Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000, pp. 203–206. Cited by: §1.
C4.5: Programs for Machine Learning. Elsevier. Cited by: §1.
Competitively Evolving Decision Trees Against Fixed Training Cases for Natural Language Processing. Advances in genetic programming 19, pp. 409–423. Cited by: §1.
- Evolutionary Optimization Algorithms. John Wiley & Sons. Cited by: §1.
- Grail: A Framework for Adaptive and Believable AI in Video Games. In 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI), pp. 762–765. Cited by: §3.
- Introducing LogDL - Log Description Language for Insights from Complex Data. In 2020 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 145–154. Cited by: §1.
- Diversity-Guided Evolutionary Algorithms. In Parallel Problem Solving from Nature — PPSN VII, J. J. M. Guervós, P. Adamidis, H. Beyer, H. Schwefel, and J. Fernández-Villacañas (Eds.), Berlin, Heidelberg, pp. 462–471. Cited by: §1.
- Population diversity of particle swarms. In 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), Vol. , pp. 1063–1067. Cited by: §1.