Influence of the Binomial Crossover on Performance of Evolutionary Algorithms

In differential Evolution (DE) algorithms, a crossover operation filtering variables to be mutated is employed to search the feasible region flexibly, which leads to its successful applications in a variety of complicated optimization problems. To investigate whether the crossover operator of DE is helpful to performance improvement of evolutionary algorithms (EAs), this paper implements a theoretical analysis for the (1+1)EA_C and the (1+1)EA_CM, two variants of the (1+1)EA that incorporate the binomial crossover operator. Generally, the binomial crossover results in the enhancement of exploration and the dominance of transition matrices under some conditions. As a result, both the (1+1)EA_C and the (1+1)EA_CM outperform the (1+1)EA on the unimodal OneMax problem, but do not always dominate it on the Deceptive problem. Finally, we perform an exploration analysis by investigating probabilities to transfer from non-optimal statuses to the optimal status of the Deceptive problem, and propose adaptive parameter settings to strengthen the promising function of binomial crossover. It suggests that incorporation of the binomial crossover could be a feasible strategy to improve the performances of EAs.

Authors

• 55 publications
• 50 publications
• 123 publications
• 4 publications
12/18/2013

Comparative analysis of evolutionary algorithms for image enhancement

Evolutionary algorithms are metaheuristic techniques that derive inspira...
01/29/2020

Exploitation and Exploration Analysis of Elitist Evolutionary Algorithms: A Case Study

Known as two cornerstones of problem solving by search, exploitation and...
09/05/2014

An Experimental Study of Adaptive Control for Evolutionary Algorithms

The balance of exploration versus exploitation (EvE) is a key issue on e...
12/23/2016

Solving Combinatorial Optimization problems with Quantum inspired Evolutionary Algorithm Tuned using a Novel Heuristic Method

Quantum inspired Evolutionary Algorithms were proposed more than a decad...
12/09/2018

Working Principles of Binary Differential Evolution

We conduct a first fundamental analysis of the working principles of bin...
09/03/2019

Estimating Approximation Errors of Elitist Evolutionary Algorithms

When EAs are unlikely to locate precise global optimal solutions with sa...
09/11/2021

Parameterless Gene-pool Optimal Mixing Evolutionary Algorithms

When it comes to solving optimization problems with evolutionary algorit...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Evolutionary algorithms (EAs) demonstrate competitive performance on a large variety of complicated optimization problems, however, the efficiency deteriorates significantly with the increase of both the dimension and the size of feasible region. To further improve the performance of EAs on large-scale optimization problems, the co-evolution strategy is incorporated to develop efficient cooperative coevolutionary algorithms (COEAs) [1, 2].

Careful design of cooperative co-evolution strategies does improve performances of EAs on complicated optimization problems, even though cooperative co-evolution cannot generally guarantee global convergence of COEAs [3, 4]. Both exploitation and exploration could be enhanced by stepwise evolution of varied portions of the decision variables, which could be regarded as a subspace-restricted searching strategy imposed on the feasible region of an optimization problem.

Another instance of the subspace-restricted searching strategy is the crossover-assisted mutation employed in differential evolution (DE) algorithms. Crossover operations applied to donor vectors leads to changes of a portion of the decision variables, which contributes its efficient and low-complexity search in the feasible regions of optimization problems

[5, 6, 7]. Despite the excellent performance of DEs on various optimization problems, convergence analyses also demonstrated that its global convergence cannot be guaranteed in general [8, 9, 10, 11, 12].

An interesting question then arises: could the crossover operation employed in DEs be helpful to performance improvement of EAs? Both numerical results and theoretical researches indicated that the subspace-restricted searching strategies play an important role during the iteration processes of metaheuristics, but the underlying working mechanism is still an open issue to be addressed. Motivated by the theoretical research on cooperative coevolutionary  [3], we introduce the binomial crossover operation to the individual-based so as to reveal how it works during the process of iteration. The purpose of this research is twofold: on the one hand, analysis of the binomial crossover can be performed excluding the influence of population; on the other hand, we will confirm whether introduction of the binomial crossover could improve performances of EAs. Rest of this paper is organized as follows. Section II reviews theoretical studies of DEs, and some preliminary contents for theoretical analysis are presented in Section III. Then, the influence of the binomial crossover on transition probabilities is investigated in Section IV, and Section V conducts the analysis on asymptotic performance of EAs. To reveal how the binomial crossover works on performance of EAs for consecutive iterations, the OneMax problem and the Deceptive problem are investigated in Sections VI and VII, respectively. Finally, Section VIII presents the conclusions and discussions.

Ii Related Work

Although numerical investigations of DEs have been widely conducted, only a few theoretical studies paid attention to components of DEs [13]

. By estimating the probability density function of generated individuals, Zhou

et al. [14] demonstrated that the selection mechanism of DE, which chooses mutually different parents for generation of donor vectors, sometimes does not work positively on performance of DE. Focusing on the mutation and crossover operators, Zaharie [15, 16, 17] investigated influence of the crossover rate on both the distribution of the number of mutated components and the probability for a component to be taken from the mutant vector, as well as the influence of mutation and crossover on the diversity of intermediate population. Wang and Huang [18]

attributed the DE to a one-dimensional stochastic model, and investigated how the probability distribution of population is connected to the mutation, selection and crossover operations of DE.

Theoretical analysis was also conducted for the binary differential evolution (BDE) proposed by Gong and Tuson [19]. By investigating the expected runtime of BDE, Doerr and Zheng [20] showed that BDE optimizes the important decision variables, but is hard to find the optima for decision variables with small influence on the objective function. Since BDE generates trial vectors by implementing a binary variant of binomial crossover accompanied by the mutation operation, it has characteristics significantly different from classic EAs or estimation-of-distribution algorithms.

The runtime of metaheuristics quantifies computational budget needed to achieve a given approximation precision, whereas the average convergence rate (ACR) and the expected approximation error (EAE) evaluate the performances of EAs for consecutive iterations [21, 22, 23]. He and Lin [21] revealed the relation between the ACR and the spectral radius of the transition matrix, by which the asymptotic performance of an EA can be connected to the spectral radius of its transition matrix. Wang et al. [23] proposed a general framework to estimate the EAE of elitist EAs, by which the EAE can be obtained for given iteration budget.

Iii Preliminaries

Consider a maximization problem

 maxf(x),x=(x1,…,xn)∈{0,1}n, (1)

and denote its optimal solution and the corresponding objective value by and , respectively. Then, quality of a solution can be evaluated by its approximation error . Due to the finiteness of the feasible region of (1), values of are located in a finite set:

 e(x)∈{e0,e1,…,eL},0=e0≤e1≤⋯≤eL,

where is a positive integer confirmed by the landscape of (1). is called at the status if , , and we denote the collection of solutions at status by .

Iii-a Algorithms

The presented by Algorithm 1 is taken in this study as the baseline algorithm, where candidate solutions are generated by the bitwise mutation with probability . To investigate the influence of the binomial crossover, we introduce it to the , getting the and the illustrated in Algorithms 2 and 3, respectively. In the , candidate solutions are generated by the binomial crossover with crossover rate . The first performs the binomial crossover with rate , and then, employs the bitwise mutation with probability to generate candidate solutions. Although the performs mutation after the binomial crossover, the strategy of candidate generation is indeed consistent to that of the BDE [19]

under the premise that all random numbers follows the uniform distribution.

The EAs investigated in this research can be modeled as Markov chains characterized by the error vector

 ~e=(e0,e1,…,eL)′, (2)

the initial distribution

 ~q[0]=(q[0]0,q[0]1,…,q[0]n)′ (3)

and the transition matrix

 ~R=(ri,j)(L+1)×(L+1), (4)

where

 ri,j=Pr{xt+1∈Xi|xt∈Xj},i,j=0,…,L.

Recalling that the solutions are updated by the elitist selection, we know is upper triangular, and one can partition it as

 ~R=(1r00R),

where is the transition submatrix depicting the transitions between non-optimal statuses.

Iii-B Problems

Performance comparisons are conducted via the uni-modal OneMax problem and the multi-modal Deceptive problem.

Problem 1.

(OneMax)

 maxf(x)=∑ni=1xi,

where .

Problem 2.

(Deceptive)

 maxf(x)={∑ni=1xi,if ∑ni=1xi>n−1,n−1−∑ni=1xi,% otherwise.

where .

Both the OneMax problem and the Deceptive problem can be represented as

 maxf(x)=g(|x|), (5)

where , . For the OneMax problem, both exploration and exploitation are helpful to convergence of EAs to the optimal solution, because exploration accelerates the convergence process and exploitation refines precision of approximation solutions. However, local exploitation leads to convergence to the local optimal solution of the Deceptive problem, which in turn increases the difficulty to jump to the isolated global optimal solution. That is, exploitation hinders convergence to the global optimal solution of the Deceptive problem, and performances of EAs are dominantly influenced by their exploration abilities.

Iii-C The Transition Models of EAs

By elitist selection, a candidate replaces a solution if and only if , which is achieved if “ preferred bits” of are changed. If there are multiple solutions that is better than , there could be multiple choices for both the number of mutated bits and the location of “ preferred bits”.

Example 1.

For the OneMax problem, equals to the amount of ‘0’-bits in . Denoting and , we know replaces if and only if . Then, to generate a candidate replacing , “ preferred bits” can be confirmed as follows.

• If , “ preferred bits” consist of ‘1’-bits and ‘0’-bits, where is an even number that is not greater than .

• While , “ preferred bits” could be combinations of ‘0’-bits and ‘1’-bits (), where . Here, is not greater than , because could not be greater than , the number of ‘0’-bits in . Meanwhile, does not exceed , the number of ‘1’-bits in .

If an EA flips each bit with an identical probability, the probability to flip bits are related to and independent of their distribution. Denoting the probability to flip bits by , we can confirm the connection between the transition probability and .

Iii-C1 Transition Probability for the OneMax Problem

As presented in Example 1, transition from status to status () results from flips of ‘0’-bits and ‘1’-bits. Then,

 ri,j=∑Mk=0Ckn−jCk+(j−i)jP(2k+j−i), (6)

where , .

Iii-C2 Transition Probability for the Deceptive Problem

According to definition of the Deceptive problem, we get the following map from to .

 |x|:01⋯n−1ne(x):12⋯n0 (7)

Transition from status to status () is attributed to one the following cases.

• If , the amount of ‘1’-bits decreases from to . This transition results from change of ‘1’-bits and ‘0’-bits, where ;

• if , all of ‘0’-bits are flipped, and all of its ‘1’-bits keep unchanged.

Accordingly, we know

 (8)

where .

Iii-D Performance Metrics

The abilities of exploration and exploitation are directly reflected by the transition matrix, and we propose the definition of transition dominance for the case that both exploration and exploitation are enhanced.

Definition 1.

Let and be two EAs with an identical initialization mechanism. and are the transition matrices of and , respectively. It is said that dominates , denoted by , if it holds that

1. ;

2. .

However, the transition probability does not provide a quantitative evaluation of performance for consecutive iterations. Thus, we also compare the expected approximation error (EAE) and the tail probability (TP) of EAs for consecutive iterations [24, 23].

Definition 2.

Let be the individual sequence of algorithm . The expected approximation error (EAE) after consecutive iterations is

 e[t]A=E[e(xt)]=∑Li=0eiPr{e(xt)=ei}. (9)

The tail probability (TP) that is greater than or equal to is defined as

 p[t]A(ei)=Pr{e(xt)≥ei}. (10)

For problem (1), if both EAE and TP of Algorithm are smaller than those of Algorithm for any iteration budget, we say Algorithm outperforms Algorithm on problem (1).

Definition 3.

Let and be two EAs applied to problem (1). Algorithm outperforms on problem (1), denoted by , if it holds that

• , ;

• , , .

Iv Influence of the Binomial Crossover on Exploration and Exploitation

In this section, we investigate the influence of the binomial crossover on exploration and exploitation by comparing the transition probabilities of the , the and . According to the connections between and , comparison of transition probabilities can be conducted by considering the probabilities to flip “ preferred bits”.

Iv-a Probabilities to Flip “l Preferred Bits”

Denote the transition probabilities of the , and to flip “ preferred bits” by , and , respectively. We know

 P1(l,pm)=(pm)l(1−pm)n−l, (11) P2(l,CR)=ln(CR)l−1(1−CR)n−l, (12)

and

 P3(l,FR,qm)=∑n−lj=0Cjn−lP2(j+l,FR)(qm)l(1−qm)j = 1n[l+(n−l)FR−nqmFR](FR)l−1(qm)l(1−qmFR)n−l−1. (13)

Note that the degrades to the when , and the and the become the random search while . Thus, we assume that , , and are located in , and the fair comparison of transition probabilities is made with the identical parameter setting

 pm=CR=FRqm=p,0

Iv-A1 Comparison between P1(l,pm) and P2(l,CR)

Subtracting (12) from (11) by setting , we know

 P1(l,p)−P2(l,p)=(p−ln)pl−1(1−p)n−l. (15)

Given parameter , if and only if . Note that is the hamming distance between and . By replacing the bitwise mutation with the binomial crossover, the ability of exploration of the is enhanced at the expense of degradation of the exploitation ability. For the case that , we get the following theorem on increase of probabilities to flip “” bits.

Theorem 1.

While , it holds for all that

Proof:

By (15), we know when ,

 P1(l,p)−P2(l,p)≤0,∀1≤l≤n,

and thus, . ∎

Iv-A2 Comparison between P1(l,pm) and P3(l,FR,qm)

By setting , we know . Then, equation (13) implies

 P3(l,FR,p/FR)=1n[(n−l)+l−npFR]pl(1−p)n−l−1. (16)

Subtracting (11) from (16), we have

 P3(l,FR,p/FR)−P1(l,p)={1n[(n−l)+l−npFR]−(1−p)}pl(1−p)n−l−1 = (1FR−1)(ln−p)pl(1−p)n−l−1. (17)

From the fact that , we conclude that is greater than if and only if . That is, the introduction of the binomial crossover in the leads to the enhancement of exploration ability of the . Similarly, we get the following theorem for the case that .

Theorem 2.

While , it holds for all that

Proof:

The result can be obtained directly from equation (17) by setting . ∎

Iv-A3 Comparison between P2(l,CR) and P3(l,FR,qm)

By setting , both and are greater than on conditioned that . Then, the following lemma holds.

Given , it holds

1. when ;

2. when ;

3. when ;.

Proof:

With , equation (13) implies

 P3(l,FR,p/FR)=1n[lpFR+(n−l)p−np2FR]pl−1(1−p)n−l−1. (18)

Subtracting (12) from (18) by letting , we know

 P3(l,FR,p/FR)−P2(l,p) = 1n[lpFR+(n−l)p−np2FR]pl−1(1−p)n−l−1−lnpl−1(1−p)n−l = 1n(l−np)(pFR−1)pl−1(1−p)n−l−1. (19)

Then,

1. while , ;

2. while , we get from the fact that is greater than ;

3. while , we have .

Lemma 1 indicates that if and only if , which demonstrates that the can exploit the local region better by incorporating the binomial crossover and the bitwise mutation together. For the case that , we get the following theorem by applying Lemma 1.

Theorem 3.

If , it holds for any that

Proof:

The result can be obtained by considering the second result of Lemma 1 for any . ∎

Iv-B Comparison of Transition Probablities

Denote the transition probabilities of the , the and the by , and , respectively. For the OneMax problem and the Deceptive problem, we get the relation of transition dominance on the premise that .

Theorem 4.

For the , the and the , denote their transition matrices by , and , respectively. On condition that , it holds for problem (5) that

 ~Q⪰~S⪰~P. (20)
Proof:

Denote the collection of all solutions at status by , . We prove the result to consider the transition probability

 ri,j=Pr{y∈S(i)|x∈S(j)},(i

Since the function values of solutions are only related to the number of ‘1’-bits, the probability to generate a solution by performing mutation on is only dependent on the hamming distance

 l=H(x,y).

Given , can be partitioned as

 S(i)=⋃Ll=1Sl(i),

where , and is a positive integer that is smaller than or equal to .

Accordingly, the probability to transfer from status to is confirmed as

 ri,j=∑Ll=1Pr{y∈Sk(i)|x}=∑Ll=1|Sl(i)|P(l),

where is the size of , the probability to flip “ preferred bits”. Then,

 pi,j=∑Lk=1Pr{y∈Sk(j)|x}=∑Lk=1|Sk(j)|P1(k,p), (21) qi,j=∑Lk=1Pr{y∈Sk(j)|x}=∑Lk=1|Sk(j)|P2(k,p), (22) si,j=∑Lk=1Pr{y∈Sk(j)|x}=∑Lk=1|Sk(j)|P3(k,FR,p/FR). (23)

While , Theorems 1-3 imply that

 P1(l,p)≤P3(l,FR,p/FR)≤P2(l,p),∀1≤l≤n.

Combining it with (21), (22) and (23) we know

 pi,j≤si,j≤qi,j,∀0≤i

Then, we get the result by Definition 3. ∎

Example 2.

[Comparison of transition probabilities for the OneMax problem] Let . By (6), we have

 pi,j=∑Mk=0Ckn−jCk+(j−i)jP1(2k+j−i,p), (25) qi,j=∑Mk=0Ckn−jCk+(j−i)jP2(2k+j−i,p), (26) si,j=∑Mk=0Ckn−jCk+(j−i)jP3(2k+j−i,FR,p/FR). (27)

where . While , Theorems 1-3 imply that

 P1(2k+j−i,p)≤P3(2k+j−i,FR,p/FR)≤P2(2k+j−i,p),

and by (25), (26) and (27) we have

 pi,j≤si,j≤qi,j,∀0≤i
Example 3.

[Comparison of transition probabilities for the Deceptive problem] Let . Equation (8) implies that

 (28)
 (29)
 si,j=⎧⎨⎩∑Mk=0Ckn−j+1Ck+(j−i)j−1P3(2k+j−i,FR,pFR),i>0,P3(n−j+1,FR,p/FR),i=0, (30)

where . Similar to analysis of the Example 2, we know when ,

 pi,j≤si,j≤qi,j.

When , we cannot get the general results of Theorems 1-3. Since the differences among , and depends on the characteristics of problem (1), the general result of Theorem 4 does not hold any more.

V Analysis of the Asymptotic Performance

It is well-known that excellent performance of EAs is due to good balance between exploration and exploitation. Does the increase of transition probabilities necessarily leads to improvement on performances of EAs? To answer this question, we first investigate the asymptotic performances of EAs for sufficiently large iteration budget .

Definition 4.

The average convergence rate (ACR) of an RSH for generation is defined as

 RRSH(t)=1−(e[t]/e[0])1/t. (31)

The following lemma presents the asymptotic characteristics of the ACR, by which we get the result on the asymptotic performance of EAs.

Lemma 2.

[21, Theorem 1] Let be the transition submatrix associated with a convergent EA. Under random initialization (all statuses are generated with positive probabilities), it holds

 limt→+∞RRSH(t)=1−ρ(R), (32)

where is the spectral radius of .

Proposition 1.

If , there exists such that

1. , ;

2. , .

Proof:

By Lemma 2, we know , there exists such that

 e[0](ρ(R)−ϵ)t

From the fact that the transition submatrix of an RSH is upper triangular, we conclude

 ρ(R)=max{r1,1,…,rL,L}. (34)

Denote

 ~A=(ai,j)=(1a00A),~B=(bi,j)=(1b00B).

While , it holds

 aj,j=1−∑j−1i=0ai,j<1−∑j−1i=0bi,j=bj,j,1≤j≤L.

Then, equation (34) implies that

 ρ(A)<ρ(B).

Applying it to (33) for , we have

 e[t]A

Noting that the tail probability can be taken as the expected approximation error of an optimization problem with error vector

 e=(0,…,0i,1,…,1)′,

by (35) we have

 p[t]A(ei)≤p[t]B(ei),∀t>T,1≤i≤L.

By Proposition 1 we get the following theorem for comparison on the asymptotic performances of the , and .

Theorem 5.

While , there exists such that

1. , ;

2. , .

Proof:

The proof can be completed by applying (20) to Proposition 1. ∎

On condition that , Theorem 5 indicates that after sufficiently many number of iterations, the can performs best for problem (1), and can get better performances than . Then, what if the iteration budget is not sufficiently large?

Vi Influence of the Binomial Crossover on Performances of EAs Applied to the OneMax Problem

In this section, we show that the outperformance introduced by the binomial crossover can be obtained for the unimodel OneMax problem, which is based on the following lemma [23].

Lemma 3.

[23, Theorem 3] Let

 ~e=(e0,e1,…,eL)′,~v=(v0,v1,…,vL)′,

where , . If transition matrices and satisfy

 sj,j≥rj,j, ∀j, (36) ∑i−1l=0(rl,j−sl,j)≥0, ∀i

it holds

For the EAs investigated in this study, satisfaction of conditions (36)-(38) is due to the monotonicity of transition probabilities.

Lemma 4.

When (), , and are monotonously decreasing in .

Proof:

When , equations (11), (12) and (13) imply that

 P1(l+1,p)P1(l,p)=p1−p≤1n−1, (39) P2(l+1,p)P2(l,p)=l+1lp1−p≤l+1l1n−1, (40)