 # Estimating Approximation Errors of Elitist Evolutionary Algorithms

When EAs are unlikely to locate precise global optimal solutions with satisfactory performances, it is important to substitute the hitting time/running time analysis with another available theoretical routine. In order to bring theories and applications closer, this paper is dedicated to perform an analysis on approximation error of EAs. First, we proposed a general result on upper bound and lower bound of approximation errors. Then, several case studies are performed to present the routine of error analysis, and consequently, validate its applicability on cases generating transition matrices of various shapes. Meanwhile, the theoretical results also show the close connections between approximation errors and eigenvalues of transition matrices. The analysis validates applicability of error analysis, demonstrates significance of estimation results, and then, exhibits its potential to be applied for theoretical analysis.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

For theoretical analysis, convergence performance of evolutionary algorithms (EAs) is widely evaluated by the expected first hitting time (FHT) and the expected running time (RT) , which quantify the respective numbers of iteration and function evaluations (FEs) to hit the global optimal solutions. General methods for estimation of FHT/RT have been proposed by theories of Markov chains [8, 4], drift analysis[12, 5], switch analysis  and application of them via partition of fitness levels , etc.

Although popularly employed in theoretical analysis, simple application of FHT/RT is not practical when the optimal solutions are difficult to hit. One of these “difficult” cases is optimization of continuous problems. Optimal sets of continuous optimization problems are usually zero-measure set, which could not be hit by generally designed EAs in finite time, and so, FHT/RT could be infinity for most cases. A remedy to this difficulty is to take a positive-measure set as the destination of population iteration. So, it is natural to take an approximation set for a given precision as the hitting set of FHT/RT estimation [3, 14, 25, 2]

. Another “difficult” case is the optimization of NP-complete (NPC) problems that cannot be solved by EAs in polynomial running time. For this case, the exponential FHT/RT is not interesting to algorithm developers, and it is much more important to investigate the expected FHT/RT to obtain approximate solutions. By investigating various simplified EAs and NPC combinatorial optimization problems, researchers estimated the approximation ratio these EAs can achieved in polynomial expected FHT/RT

[24, 17, 26, 27, 22, 20].

However,the aforementioned methods could become impractical once we have little information about global optima of the investigated problems, and then, it is difficult to “guess” what threshold can result in polynomial FHT/RT. Since the approximation error after a given iteration number is usually employed to numerically compared performance of EAs, some researchers tried to analyze EAs by theoretically estimating the expected approximation error. Rudolph  proved that under the condition , the sequence converges in mean geometrically to , that is, . He and Lin  studied the geometric average convergence rate of the error sequence , defined by Starting from , it is straightforward to claim that .

A close work to analysis of approximation error is fixed budget analysis proposed by Jansen and Zarges [15, 16], who aimed to bound the fitness value within a fixed time budget . However, Jansen and Zarges did not present general results for any time budget . In fixed budget analysis, a bound of approximation error holds for some small but might be invalid for a large one. He  made a first attempt to obtain an analytic expression of the approximation error for a class of elitist EAs. He proved if the transition matrix associated with an EA is an upper triangular matrix with unique diagonal entries, then for any , the relative error is expressed by where are eigenvalues of the transition matrix (except the largest eigenvalue ) and are coefficients. He et al.  also demonstrated the possibility of approximation estimation by estimating one-step convergence rate , however, it was not sufficient to validate its applicability to other problems because only two studied cases with trivial convergence rates are investigated.

This paper is dedicated to present estimation of approximation error depending on any iteration number . We make the first attempt to perform error analysis by a general method, and demonstrate its feasibility by case studies. Rest of this paper is presented as follows. Section 2 presents some preliminaries. In Section 3, a general result on the upper and lower bounds of approximation error is proposed, and some case studies are performed in Section 4. Finally, Section 5 concludes this paper.

## 2 Preliminaries

In this paper, we consider a combinatorial optimization problem

 minf(x), (1)

where has only finite available values. Denote its optimal solution as , and the corresponding objective value . Quality of a feasible solution is quantified by its approximation error . Since there are only finite solutions for (1), there exist finite feasible values of , denoted as . Obviously, the minimum value is the approximation error of the optimal solution , and so, takes the value 0. If for a feasible solution , we call that is located at the status . Then, there are totally statuses for all feasible solutions. Status consists of all optimal solutions, called the optimal status; other statuses are the non-optimal statuses.

An elitist EA described in Algorithm 1 is employed to solve problem (1). When the one-bit mutation is employed, it is called a random local search (RLS); if the bitwise mutation is used, it is named as a (1+1) evolutionary algorithm ((1+1)EA). Then, the error sequence is a Markov Chain

. Assisted by the initial probability distribution of individual status

, the evolution process of (1+1) elitist EA can be depicted by the transition probability matrix

 P=⎛⎜ ⎜⎝p0,0p0,1⋯p0,n⋮⋮⋮⋮pn,0pn,1⋯pn,n⎞⎟ ⎟⎠, (2)

where is the probability to transfer from status to status .

Partition the transition probability matrix as

 P=(p0,0p00R), (3)

where , ,

 R=⎛⎜⎝p1,1…p1,n⋯⋯⋯pn,1…pn,n⎞⎟⎠. (4)

Thus, the expected approximation error at the iteration  is

 e[t]=eRtq, (5)

where , , is the sub-matrix of transition probability between non-optimal statuses. Then, in the following we only consider the transition submatrix for estimation of approximation error.

Since an elitist strategy is employed, the transition matrix is upper triangular. Then, is upper triangular, either. According to the shape of , we can further divide searching process of the elitist EA into two different categories.

1. Step-by-step Search: If the transition probability satisfies

 {pi,j=0, if i≠j−1,j,pj−1,j+pj,j=1,j=1,…,n. (6)

it is called a step-by-step search. For this case, the elitist EA cannot transfer between non-optimal statues that are not adjacent to each other, and the transition submatrix is

 R=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝p1,1p1,2⋱⋱pn−1,n−1pn−1,npn,n⎞⎟ ⎟ ⎟ ⎟ ⎟⎠. (7)
2. Multi-step Search: If there exists some such that , we called it a multi-step search. A multi-step search can transfer between inconsecutive statuses, which endows it with better global exploration ability, and sometimes, better convergence rate.

Note that this classification is problem-dependent because the statuses are defined via the problem to be optimized. So, the RLS could be either a step-by-step search or a multi-step search. However, the (1+1)EA is necessarily a multi-step search, because the bitwise mutation can jump between any two statuses. When in (3) is non-zero, column sums of submatrix is less than 1 which means it could probably jump from at least one non-optimal status directly to the optimal status. So, a step-by-step search represented by (7) must satisfies .

## 3 Estimation of General Approximation Bounds

### 3.1 General Bounds of the Step-by-step Search

Let be the submatrix of a step-by-step search. Its eigenvalues are , , which represents the probability of remaining at the present status after one-step iteration. Then, it is rational to declare that greater the eigenvalues are, slower the step-by-step search converges. Inspired by this idea, we can estimate general bounds of a step-by-step search by enlarging and reducing the eigenvalues. Achievement of the general bounds is based on the following lemma.

###### Lemma 1

Denote

 ft(e,λ1,…,λn)=(ft,1(e,λ1,…,λn),…,ft,n(e,λ1,…,λn))=eRt. (8)

Then, is monotonously increasing with , .

###### Proof

This lemma could be proved by mathematical induction.

1. When , we have

 f1,i(e,λ1,…,λn)={e1λ1,i=1,ei−1(1−λi)+eiλi,i=2,…,n.

Note that is not greater than 1 because it is an element of the probability matrix. Then, from the truth that , we conclude that is monotonously increasing with , . Meanwhile,

 0≤f1,1(e,λ1,…,λn)≤e1≤⋯≤f1,n(e,λ1,…,λn)≤en. (9)
2. Suppose that the result hold for , that is,

 0≤fk,i(e,λ1,…,λn)≤fk,i+1(e,λ1,…,λn),∀i∈{1,…,n−1}, (10)

and is monotonously increasing with for all . First, the monotonicity implies thatp

 ∂∂λjfk,i(e,λ1,…,λn)≥0,∀i,j∈{1,…,n}. (11)

Meanwhile, definition (8) implies , that is,

 fk+1,i(e,λ1,…,λn) = {fk,1(e,λ1,…,λn)λ1,i=1,fk,i−1(e,λ1,…,λn)(1−λi)+fk,i(e,λ1,…,λn)λi,i=2,…,n.

So, ,

 ∂∂λjfk+1,i(e,λ1,…,λn) = ⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩∂∂λjfk,1(e,λ1,…,λn)λ1+fk,1(e,λ1,…,λn)∂λ1∂λj,i=1,∂∂λjfk,i−1(e,λ1,…,λn)(1−λi)+∂∂λjfk,i(e,λ1,…,λn)λi+(fk,i(e,λ1,…,λn)−fk,i−1(e,λ1,…,λn))∂λi∂λj,i=2,…,n. (12)

Combining (10), (11) and 2, we know that

 ∂∂λjfk+1,i(e,λ1,…,λn)≥0,∀i,j∈{1,…,n},

which means is monotonously increasing with for all .

In conclusion, is monotonously increasing with , .

If we enlarge or shrink all eigenvalues to the maximum value and the minimum value, respectively, we can get two transition submatrices and , where

 R(λ)=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝λ1−λ⋱⋱λ1−λλ⎞⎟ ⎟ ⎟ ⎟ ⎟⎠, (13)

, . Then, depicts a searching process converging slower than the one represents, and is the transition matrix of a process converging faster than what represents.

###### Theorem 3.1

The expected approximation error is bounded by

 eRt(λmin)q≤e[t]≤eRt(λmax)q. (14)
###### Proof

Since , Lemma 1 implies that is also monotonously increasing with , . So, we get the conclusion that

 eRt(λmin)q≤e[t]≤eRt(λmax)q.

Theorem 3.1 provides a general result about the upper and the lower bounds of approximation error. From the above arguments we can figure out that the lower bounds and the upper bounds can be achieved once the transition submatrix degenerates to and , respectively. That is to say, they are indeed the “best” results about the general bounds. Recall that . Starting from the status, is the probability that the (1+1) elitist EA stays at the status after one iteration. Then, greater is, harder the step-by-step search transfers to the sub-level status . So, performance of a step-by-step search depicted by , for the worst case, would not be worse than that of ; meanwhile, it would not be better than that of , which is a bottleneck for improving performance of the step-by-step search.

### 3.2 General Upper Bounds of the Multi-step Search

Denoting the transition submatrix of a multi-step search as

 RM=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝p1,1p1,2…p1,n−1p1,n⋱⋱⋮pn−1,n−1pn−1,npn,n⎞⎟ ⎟ ⎟ ⎟ ⎟⎠, (15)

we can bound its approximation error by defining two transition matrices

 RSu=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝p1,1∑1k=0pk,1⋱⋱pn−1,n−1∑n−1k=0pk,npn,n⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠ (16)

and

 RSl=diag(p1,1,…,pn,n). (17)
###### Lemma 2

Let , and be the transition matrix defined by (15), (16) and (17

), respectively. Given any sorted vector

satisfying and the corresponding initial distribution , it holds that

 eRSltq≤eRMtq≤eRSutq,∀t>0.
###### Proof

It is trivial to prove that . Because has part of non-zero elements of , is a partial sum of . Since all items of is nonnegative, it holds that . The second inequality can be proved by mathematical induction.

1. When , denote

 a =(a1,…,an)=eRM, (18) b =(b1,…,bn)=eRSu, (19)

where . Combining with the fact that , we conclude that . Then, it holds that

 eRMq=n∑i=1aipi≤n∑i=1bipi=eRSuq.
2. Suppose that the result holds when , that is, . Because , it holds that

 eRMk+1q=eRMRMkq=aRMkq≤bRMkq. (20)

Meanwhile, because , we know . Then, the assumption implies that

 bRMkq≤bRSukq.

Combining it with (20), we can conclude that

 eRMk+1q≤bRMkq≤bRSkq=eRSuk+1q.

So, the result also holds for the case .

In conclusion, it holds that .

###### Theorem 3.2

The approximation error of the multi-step search defined by (15) is bounded by

 eRSltq≤e[t]≤eRt(λmax)q, (21)

where .

###### Proof

From Lemma 2 we know that

 eRSltq≤eRMtq≤eRStq,∀t>0. (22)

Moreover, By Theorem 3.1 we know that

 e[t]=eRStq≤eR(λmax)q. (23)

Combing (22) and (23) we get the theorem proved.

### 3.3 Analytic Expressions for Computation of General Bounds

Theorems 3.1 and 3.2 show that computation of general bounds for approximation errors is based on the computability of and , where and are defined by (13) and (17), respectively.

1. Analytic Expression of : The submatrix can be split as , where

 λ=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝λ⋱λλ⎞⎟ ⎟ ⎟ ⎟ ⎟⎠,B=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝01−λ⋱⋱01−λ0⎞⎟ ⎟ ⎟ ⎟ ⎟⎠.

Because multiplication of and is commutative, the binomial theorem  holds and we have

 Rt(λ)=(Λ+B)t=t∑i=0CitΛt−iBi, (24)

where

 Λt−i=diag{λt−i,…,λt−i}, (25)

Note that is a nilpotent matrix of index  111In linear algebra, a nilpotent matrix is a square matrix such that for some positive integer . The smallest such is called the index of  ., and

 Bi=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝0…(1−λ)i⋱⋱⋱⋱⋱(1−λ)i⋱⋮0⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠,i

Then, (25), (26) and (24) imply that

1. if ,

 eRt(λ)q= t∑j=1j∑i=1eiCj−itλt−(j−i)(1−λ)j−iqj + n∑j=t+1j∑i=j−teiCj−itλt−(j−i)(1−λ)j−iqj. (27)
2. if ,

 eRt(λ)q=n∑j=1j∑i=1eiCj−itλt−(j−i)(1−λ)j−iqj. (28)
2. Analytic Expression of : For the diagonal matrix , it holds that

 eRSltq=n∑i=1eipti,iqi=n∑i=1eiλtiqi. (29)

## 4 Case-by-case Estimation of Approximation Error

In section 3 general bounds of approximation error are obtained by ignoring most of elements in the sub-matrix . Thus, these bounds could be very general but not tight. In this section, we would like to perform several case-by-case studies to demonstrate a possible unverse method for error analysis, where the RLS and (1+1)EA are employed solving the popular OneMax problem and the Needle-in-Haystack prlbem.

###### Problem 1

(OneMax)

 maxf(x)=d∑i=1xi,x=(x1,…,xn)∈{0,1}n.
###### Problem 2

(Needle-in-Haystack)

 maxf(x)=⎧⎪ ⎪⎨⎪ ⎪⎩1,if d∑i=1xi=0,0,otherwise.x=(x1,…,xn)∈{0,1}n.

### 4.1 Error Estimation for the OneMax Problem

Application of RLS on the unimodal OneMax problem generates a step-by-step search, the transition submatrix of which is

 RS=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝1−1/n2/n⋱⋱1/n10⎞⎟ ⎟ ⎟ ⎟ ⎟⎠. (30)

Eigenvalues and corresponding eigenvectors of

are

 λ1=1−1/n,η1=(C11,0,…,0)T,λ2=1−2/n,η2=(−C12,C22,0,…,0)T,…,λn=0,ηn=((−1)n+1C1n,(−1)n+2C2n,…,(−1)2n)Cnn)T. (31)

Note that has distinct eigenvalues, and so, can be diagonalized . Then, we would like to estimate the approximation error by diagonalizing the transition matrix .

###### Theorem 4.1

The expected approximation error of RLS for the OneMax problem is

 e[t]=n2(1−1n)t. (32)
###### Proof

Denote . Then we know that

 Q−1=(q′i,j)n×n=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝C11C12C13⋯C1nC22C32⋯Cn2⋱⋮Cnn⎞⎟ ⎟ ⎟ ⎟ ⎟⎠. (33)

has distinct eigenvalues, and so, can be diagonalized as  . Then, we have

 e[t]=eRtq=eQΛtQ−1q=aΛtb, (34)

where , ,

 ak=k∑i=1eiqi,k=k∑i=1i(−1)i+kCik={1,k=1,0,k=2,…,n, (35) bk=n∑j=kqi,kpj=n∑j=kCkjCjn12n=Ckn2k,k=1,…,n.

Substituting (35) into (34) we get the result

 e[t]=a1λt1b1=n2(1−1n)t.
###### Theorem 4.2

The expected approximation error of (1+1)EA for the OneMax problem is bounded from above by

 e[t]≤n2[1−1ne]t. (36)
###### Proof

According to the definition of population status, we know that the status index is the number of 0-bits in . Once one of 0-bits is flip to 1-bit and all 1-bits keep unchanged, the generated solution will be accepted, and the status transfers from to . Recalling that probability this case happen is , we know that

 pi−1,i≥in(1−1n)n−i≥ine,i=1,…,n.

Denote

 RS=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝1−1ne2ne⋱⋱1−n−1ne10⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠,

and we know that

 e[t]≤eRStq. (37)

With distinct eigenvalues, can be diagonalized:

 P−1RSP=Λ, (38)

where , . and are the eigenvalues and the corresponding eigenvectors:

 λ1=1−1/(ne),η1=(C11,0,…,0)T,λ2=1−2/(ne),η2=(−C12,C22,0,…,0)T,…,λn=0,ηn=((−1)n+1C1n,(−1)n+2C2n,…,(−1)2n)Cnn)T. (39)

It is obvious that is invertible, and its inverse is

 P−1=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝C11C12C13⋯C1nC22C32⋯Cn2⋱⋮Cnn⎞⎟ ⎟ ⎟ ⎟ ⎟⎠. (40)

Similar to the result illustrated in (35), we know that

 eP=(1,0,…,0)T,P−1q=(C2n2,C2n22,…,Cn−1n2n−1,12n)T. (41)

Combing (37), (38), (39), (40) and (41) we know that

 e[t]≤ePΛtP−1q=n2[1−1ne]t.

### 4.2 Error Estimation for the Needle-in-Haystack Problem

Landscape of the Needle-in-Haystack problem has a platform where all solutions have the same function value , and only the global optimum has a non-zero function value . For this problem, the status is defined as total number of 1-bits in a solutions .

###### Theorem 4.3

The expected approximation error of RLS for the Needle-in-Haystack problem is bounded by

 (1−1n)t+1−n+12n≤e[t]≤(1−1en)t+1−n+12n. (42)
###### Proof

When the RLS is employed to solve the Needle-in-Haystack problem, the transition submatrix is

 RS=diag(1−1n(1−1n)n−1,1,…,1). (43)

Then,

 e[t]=eRSq=n∑i=1eipti,ipi=[1−1n(1−1n)n−1]t+n∑i=2Cin2n. (44)

Since

 (1−1n)t≤[1−1n(1−1n)n−1]t≤(1−1en)t,∑ni=2Cin2n=1−C0n2n−C1n2n=1−n+12n,

we can conclude that

 (1−1n)t+1−n+12n≤e[t]≤(1−1en)t+1−n+12n.

Both the upper bound and the lower bound converge to the positive . Because the RLS only searches adjacent statuses, it cannot converge to the optimal status once the initial solution is not located at statuses .

###### Theorem 4.4

The expected approximation error of (1+1)EA for the Needle-in-Haystack problem is bounded by

 n2(1−1n)t≤e[t]≤n2(1−1nn)t. (45)
###### Proof

When the (1+1)EA is employed to solve the Needle-in-Haystack problem, the transition probability matrix is

 RS=diag(1−1n(1−1n)n−1,…,1−(1n)n). (46)

Then,

 e[t]=eRSq=n∑i=1eipti,ipi=n∑i=1i[1−(1n)i(1−1n)n−i]tCin2n. (47)

Since

 ∑ni=1i[1−(1n)i(1−1n)n−i]tCin2n≥(1−1n)t∑ni=1iCi