 # Convergence of a Recombination-Based Elitist Evolutionary Algorithm on the Royal Roads Test Function

We present an analysis of the performance of an elitist Evolutionary algorithm using a recombination operator known as 1-Bit-Swap on the Royal Roads test function based on a population. We derive complete, approximate and asymptotic convergence rates for the algorithm. The complete model shows the benefit of the size of the population and re- combination pool.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Evolutionary Algorithms (EA) are a set of heuristic optimization tools, that are well-suited to problems with poorly-understood landscapes (sometimes known as black-box optimization). Despite a rich history in application, theoretical analysis has been lagging behind; while there have been some advances in recent years, the analysis has mainly been restricted to single-parent algorithms, which is an extremely limiting assumption.

The algorithm that we analyse is described as a evolutionary algorithm, where and are the size of the population from which solutions are picked, and the size of the mating pool that is created from them, respectively. The

signifies that the algorithm is elitist, i.e., that the best solution at each iteration is always reproduced in the next, so that the best fitness in the population can never decrease. In this paper we consider the effect that population has on the expected time when the algorithm finds the optimal solution to the problem. We model the distribution of elite species in the population using a static model of a uniform distribution. Most work to date has considered only

EAs. The operator that we use to recombine solutions is the 1-bit-swap (1BS) operator that was described in [TSMH10].

We derive an exact expression for the expected runtime of EA. Since this expression does not appear to have a closed form, we then develop an approximation to it and compute the asymptotic limit. Rather surprisingly, in the asymptotic expression the size of the population and recombination pool cancel out, and so they do not appear.

The Royal Roads (RR) is a test function introduced in [MFH92] and analyzed in [Mit96], where a population-based Evolutionary Algorithm was found to have underperformed a simpler heuristic Randomized Local Search (RLS), which contradicted the theoretical findings in the same article. The Royal Road was initially developed to demonstrate the schemata theory of EAs. A string of length is split into consecutive bins or blocks (which we index as ). All bins have the same size, , so that . The fitness of the string is the sum of the fitness for each bin, and the fitness of each bin is if all the bits in the bin have value 1, and 0 otherwise. Thus, the possible fitness of strings with the Royal Road function are , so the function has plateaus of fitness, where a large number of strings have the same fitness value. This means that single bit modifications to the solution string will generally not improve the fitness function. This means that within the plateau, the solutions created by the EA are expected to show a random walk behaviour.

### 1.2 Past Work

RR has not received much attention in recent EA literature, where the focus has been on the rather simpler OneMax (also known as Counting Ones) fitness function. However, in [SW03] an upper bound on the expected running time of was found for a version of RR. In [Mit96] the bounds on convergence for RR were found to be , where is the length of the string and the length of the bin, up to a linear term tighter than the bound for the RLS (), although numerically RLS outperformed EA. This result does not involve the size of the population or recombination pool in any way.

For the OneMax test function there has been rather more research on EAs, although set-ups are still more widespread. In [HY02] it was shown that the effect of population is problem-specific, i.e., increase in population size may not improve performance at all. Very recently, in [CTCY11], it was shown that populations of size boost performance, while those of size

impair the progress of the algorithm (with the analysis based on the TrapZeros multimodal function) and reduce the probability of global convergence.

## 2 Analyzed Algorithm:(μ+λ)Ea1bs

The genetic operator that we consider in this paper is not the usual mutation operator. The k-Bit-Swap genetic operator (KBS) was introduced in [TSMH10]. It contains some features of both mutation and uniform crossover and recombines information between two parents in a random manner. In this article we use 1-Bit-Swap (1BS), which picks exactly 1 bit from each parent uniformly at random.
See Table 1 for the pseudocode of the algorithm.
Tournament selection consists of picking two species at random and putting the fitter of the pair into the mating pool.

## 3 Model Setup and Assumptions

 τRRA=min{t≥0:f(α)=n}

where is the set of all possible populations that include a global solution. We want to find , the expectation of this time parameter, for the EA with 1BS as the only genetic operator.

### 3.1 Improvement process

We start with the pessimistic assumption that each bin starts with an equal number of 0s and 1s, which implies that the starting fitness of all elements of the population is 0. As the Royal Road fitness function makes incremental improvements impossible to see, in order to measure the progress of the algorithm we introduce, in addition to the fitness function, an auxiliary function, in this case OneMax (where the fitness of a string is simply a count of the number of 1s in it; for further reference see e.g. [CHS09]). We denote the value of the auxiliary function for a bin as , which can theoretically have values between 0 and , although in practice they all start at because of our assumption above. Since only 1 bit is changed in a string at each iteration, only one bin can evolve at a time. In a slight abuse of notation, we refer to that bin as the ‘active bin’ and index it as . Within a bin, the number of improvements that have already been made (i.e., the number of bits that were 0 and have already been changed into 1s) is denoted by the variable , which starts at and increases to .

We restrict our attention to elite pairs in the recombination pool, i.e. pairs in which both parents are currently elite species. This is a limitation of our analysis that means that we underestimate the chance of success. The probability of selecting an elite pair in the recombination pool is:

 Psel(α)=α2(α+2(μ−α))2μ4=(α(2μ−α))2μ4

where is the number of elite species in the population.

Having selected the pair, the probability that as a results of swapping bits between them, a better species evolves is:

 Pswap=2(M2−l)(n2+κM2+l)n2=(M−2l)(n+κM+2l)2n2

This probability comes from the fact that we want to select any 0 in bin in one of the parents and a 1 anywhere in the other parent. Obviously, as the number of 1s in both parents grows, this probability grows too. In Section 4.1 we also use the probability of failure:

 PF=1−Pswap

### 3.2 Population and elitism assumptions

We assume that each generation currently elite species in the population are distributed uniformly:

 α∼Uniform(1μ)

This is a static model, i.e. this distribution does not change throughout the run of the algorithm; we will consider a dynamic model for this distribution in future work. We also assume that the rate of elitism (the number of species saved for the next generation) is high enough, that is, high enough to keep all elite species.

## 4 Derivation of the expectation of convergence time

We present three main results: exact, approximate and asymptotic. The latter two are necessary, since the complete one does not have a closed form.

### 4.1 Exact expression

 P(G0)=λ2∑j=0P(G0|Hj)μ∑α=1P(Hj|α)P(α) (1)

where is th elite pair in the recombination pool , and is the number of elite species in the population .

The probability to fail to improve a bit in a bin given improvements so far is:

 P(G0l) =1μλ2∑j=0(2n2−(M−2l)(n+κM+2l)2n2)j(λ2j) ⋅μ∑α=1((α(α+2μ(μ−α)))2μ4)j(1−(α(α+2μ(μ−α)))2μ4)λ2−j =1μλ2∑j=0PjF(λ2j)μ∑α=1(Psel(α))j(1−Psel(α))λ2−j =1μμ∑α=1λ2∑j=0(λ2j)(PFPsel(α))j(1−Psel(α))λ2−j =1μμ∑α=1(1−Psel(α)Pswap)λ2 (2)

The last step is due to the Binomial expansion: . Therefore, the probability of an increase in the auxiliary function is:

 P(Gl)=1−P(G0l)=1−1μμ∑α=1(1−Psel(α)Pswap)λ2

The expected time until the next improvement of the auxiliary function of a bin is:

 ETκ=M2−1∑l=01P(Gl) (3)

and, finally, summing over all from 1 to we obtain (since depends on both and ):

 Eτ(μ+λ)EA1BS =K∑κ=1M2−1∑l=01P(Gl,κ) =K∑κ=1M2−1∑l=011−1μ∑μα=1(1−(α(2μ−α))2Pswap)λ2 =K∑κ=1M2−1∑l=011−1μ∑μα=1(1−(α((2μ−α)))2(M−2l)(n+κM+2l)2μ4n2)λ2 (4)

The benefit of larger population sizes is clear in the term in front of the sums in the denominator. Also, increases in the size of lead to reductions in the probability of failure.

We test this expression numerically for different values of (see Appendix C). Unfortunately, this expression does not seem to exist in closed form, so we instead go ahead with finding an approximation to it in the next subsection.

### 4.2 Approximate and asymptotic expressions

 P(G0l) =1μμ∑α=1(1−(α(α+2(μ−α)))2(M−2l)(n+κM+2l)2μ4n2)λ2 =1μμ∑α=1(1−(α(2μ−α))2(M−2l)(n+κM+2l)2μ4n2)λ2 ≈1μμ∑α=1e−λ(α(2μ−α))2(M−2l)(n+κM+2l)4μ4n2≈1μ∫μ1e−(α(2μ−α)√γ)2dα (5)

The last step in the summand was due to . Note that , and, assuming that , the upper bound on is . although for monotonically decreasing functions, such as this one, by the integral test the sum is larger than the corresponding integral; for the sum is closely approximated by the integral.

We denote . Expanding the function inside the integral as a Taylor series around up to the second term, we get (since ):

 f(α)≈e−(2μ−1√γ)2−4(2μ−1)(μ−1)(α−1)e(2μ−1√γ)2γ (6)

Therefore, the integral turns into:

 I1 =∫μ1f(α)dα≈∫μ1(e−(2μ−1√γ)2−4(2μ−1)(μ−1)(α−1)e(2μ−1√γ)2γ)dα =e−(2μ−1√γ)2(μ−1)[1−2(2μ−1)(μ−1)2γ] (7)

The probability of failure is approximately (with the assumptions specified above):

 P(G0l)≈e−(2μ−1√γ)2(μ−1)[1−2(2μ−1)(μ−1)2γ]μ

Accordingly, the probability of a successful swap is:

 P(Gl)≈1−e−(2μ−1√γ)2(μ−1)[1−2(2μ−1)(μ−1)2γ]μ

Using the sum of expectations of Geometric random variables with different parameters, the expected time until filling a bin, i.e. improvement of the fitness function, is:

 ETκ=M2−1∑l=0γγ−γe−(2μ−1√γ)2+2(2μ−1)(μ−1)2e−(2μ−1√γ)2 (8)

We make two approximations here. First, we use a Riemanian sums approximation to obtain bounds on the integral, and then expand the integrand in Taylor series with 2 terms around the midpoint to obtain a good approximation of the integral. The Riemanian sums approximation is defined by:

 limn→∞n∑j=1f(xj)=n∫10f(nx)dx+o(n)

and is transformed accordingly:

 γ=4μ4n2(M−2(M2−1)l)(n+κM+(M2−1)l)

Then:

 I2=∫10γdlγ−γe−(2μ−1√γ)2+2(2μ−1)(μ−1)2e−(2μ−1√γ)2 (9)

Therefore, the expected first hitting time until the evolution of the bin is (the rather long Taylor series expansion of the integrand is given in the Appendix A):

 ETκ ≈(M2−1)∫10γdlγ−γe−(2μ−1√γ)2+2(2μ−1)(μ−1)2e−(2μ−1√γ)2 =4μ4n2(M2−1)λ(M2+1)(M2+n+κM−1)[2(2μ−1)(μ−1)2σ1+4μ4n2λ(M2+1)σ2−4μ4n2λ(M2+1)σ2σ1] (10)

where:

 σ1=eλ(2μ−1)2(M2+1)σ24μ4n2
 σ2=M2+n+κM−1

Note that has the interesting property (given ) that:

 limn→∞eλ(2μ−1)2(M2+1)(M2+n+κM−1)4μ4n2=limn→∞eM(M+n+KM)μn2=limn→∞eMμn+O(M2μn2)=1

which means, that for sufficiently large values of and the second and the third terms in the square brackets cancel each other out, and the first term is just .

Finally, summing over all , the number of bins in the string, we get the approximation of the convergence time of the algorithm on RR test function:

 EτRR(μ+λ)EA1BS ≈2μ4n2(M−2)λ(M+2)(2μ−1)(μ−1)2K−1∑κ=01M2+n−1+κM ≈2μ4n2(M−2)λM(M+2)(2μ−1)(μ−1)2log(1+2KMM+2n) (11)

where is a Digamma function (see e.g. [AS65, GKP95]). In the derivation of the asymptotic expression for this bound, all population-related terms cancel out (since and both numerator and denominator have the highest term ), and the order of convergence is

 EτRR(μ+λ)EA1BS=O(n2log(1+KMM+n)M) (12)

which seems to be a comparable result compared to those available in literature covering fitness functions with plateaus of fitness (e.g. [HY02, CHS09, SW03]).

## 5 Conclusions and Future Work

We have derived three expressions for convergence of an elitist EA on Royal Roads test function: exact, approximate and asymptotic. Although the exact expression for the expected convergence time clearly shows the benefit of increase in the population (at least when the population is relatively small), the approximate result has an equal order of population and asymptotic has none at all due to cancellation.

An important assumption for the approximation of was that , but we never specified the relation, unlike in [CTCY11]. This is something to look at in the future. Since the effect of the population is known to be problem-specific, we will be able to get good insights into it for unimodal functions with plateaus, such as Royal Roads.

We have performed our analysis assuming Uniform distribution of elite species in the population, something noone seems to have done in EA literature before. This is a static approach to convergence (i.e. the distribution assumption does not change throughout the run of the algorithm). We would like to look at the dynamics of the elite species and their effect on the probability of success, and expected convergence time.

## References

• [AS65] Milton Abramowitz and Irine Stegun. Handbook of Mathematical Functions. United States Government Printing Office, 1965.
• [CHS09] Tianshi Chen, Jun He, Guangzhong Sun, Guoliang Chen, and Xin Yao. A New Approach for Analyzing Average Time Complexity of Population-Based Evolutionary Algorithms on Unimodal Problems. IEEE Transactions on Systems, Man, and Cybernetics B, 39(5):1092–1106, 2009.
• [CTCY11] Tianshi Chen, Ke Tang, Guoliang Chen, and Xin Yao. A large population size can be unhelpful in evolutionary algorithms. in press. 2011.
• [GKP95] Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. Concrete Mathematics: A Foundation for Computer Science. Addison-Wesley Publishing Company, 1995.
• [HY02] Jun He and Xin Yao. From an Individual to a Population: An Analysis of the First Hitting Time of Population-Based Evolutionary Algorithms.

IEEE Transactions on Evolutionary Computation

, 6-5, October 2002:495–511, 2002.
• [MFH92] Melanie Mitchell, Stephanie Forrest, and John H. Holland.

The Royal Road for Genetic Algorithms: Fitness Landscapes and GA Performance.

European Conference on Artificial Life, 1:245–254, 1992.
• [Mit96] M. Mitchell. Introduction to Genetic Algorithms. Kluwer Academic Publishers, 1996.
• [SW03] Tobias Storch and Ingo Wegener. Real Royal Road Function for Constant Population Size. In Genetic and Evolutionary Computing Conference (GECCO) 2003, pages 1406–1417, 2003.
• [TSMH10] A. Ter-Sarkisov, S. Marsland, and B. Holland. The k-Bit-Swap: A New Genetic Algorithm Operator. In Genetic and Evolutionary Computing Conference (GECCO) 2010, pages 815–816, 2010.

## Appendix A Taylor series approximation of the integrand

We give the expression for the Equation 10 here due to its length. It’s Taylor series expansion of the integrand around midpoint of the interval (0.5)

 ϕ(l) ≈4μ2n2λ(M2+1)σ1σ3+s3(s3{s1σ2σ4−s2μ4n2σ2(M2+1)σ3−16(M−2)σ2σ23(M2+1)}−s3(16M−32)σ5σ4σ23+φ1(M2+1)σ21σ3+4(M−2)σ5σ1σ23σ4) ⋅(l−12)

where

 σ1=2(μ−1)(μ−1)2σ2+4μ2n2λ(M2+1)σ3−4μ4n2λσ2σ3(M2+1)
 σ2=eλ(2μ−1)(M2+1)σ34μ4n2, σ3=M2+n+kM−1, σ4=(M2+1)2, σ5=n+kM−2
 s1=16(M−2), s2=4λ(2μ−1)2((M−2)(M2+1)−σ3(M−2)), s3=μ4n2λ
 φ1=2(2μ−1)3(μ−1)2(2M+2n+2kM−nM−kM2−4)s3σ2

## Appendix B Additional Derivation Details

To derive expressions in Section 4.1, we extensively used properties of independent Geometric RVs that are not identically distributed, which is also known as Coupon collector’s problem (see e.g. [GKP95]): if it expectation is . Therefore, if .

For Equation 1

we use the Law of total probability twice: first, conditioning on

, then on :

 P(A)=m∑i=1P(A|Bi)P(Bi)=m∑i=1P(A|Bi)n∑j=1P(Bi|Cj)P(Cj)

## Appendix C Numerical results to verify Equation 4

Column was obtained by running the algorithm with different parameters 20 times, each run was 2000 generation each. The earliest achievement of the global minimum for each run was saved and then averaged over.