 # A Theoretical Framework of Approximation Error Analysis of Evolutionary Algorithms

In the empirical study of evolutionary algorithms, the solution quality is evaluated by either the fitness value or approximation error. The latter measures the fitness difference between an approximation solution and the optimal solution. Since the approximation error analysis is more convenient than the direct estimation of the fitness value, this paper focuses on approximation error analysis. However, it is straightforward to extend all related results from the approximation error to the fitness value. Although the evaluation of solution quality plays an essential role in practice, few rigorous analyses have been conducted on this topic. This paper aims at establishing a novel theoretical framework of approximation error analysis of evolutionary algorithms for discrete optimization. This framework is divided into two parts. The first part is about exact expressions of the approximation error. Two methods, Jordan form and Schur's triangularization, are presented to obtain an exact expression. The second part is about upper bounds on approximation error. Two methods, convergence rate and auxiliary matrix iteration, are proposed to estimate the upper bound. The applicability of this framework is demonstrated through several examples.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

In the empirical study of evolutionary algorithms (EAs), the quality of a solution is evaluated by either the fitness value or approximation error. The latter measures the fitness difference between an approximation solution and the optimal solution. The absolute error of a solution is defined by where is the fitness of the optimal solution and the fitness of  [1, 2]. The approximation error has been widely used in the empirical study of EAs in either a standard form or its logarithmic scale  [3, 4, 5, 6, 7, 8, 9, 10]. Starting from the absolute error , it is trivial to derive the fitness value where for a minimization problem and for a maximization problem. Therefore, this paper focuses on analyzing the approximation error of EAs. It is straightforward to extend related results from the approximation error to the fitness value of EAs.

Although the fitness value or approximation error has been widely adopted to evaluate the performance of EAs in computational experiments, they are seldom studied in a rigorous way. This is in shark contrast to the computational time of EAs. The latter is today’s mainstream in the theory of EAs  but in the practice, computational time is seldom applied to evaluating the performance of EAs. In order to bridge this gap between practice and theory, it is necessary to make a rigorous error analysis of EAs.

Because EAs are random iterative algorithms, the expected value, of the th generation solution , is a function of . The main research questions are two questions: (1) what is an exact expression of ? (2) if an exact expression is unavailable, what is a bound on ? He  made one of the first attempts to answer these questions. He gave an analytic expression of the approximation error for a class of (1+1) strictly elitist EAs.

This paper aims at establishing a theoretical framework of studying approximation error of EAs for discrete optimization. In the framework, EAs are modelled by homogeneous Markov chains. The analysis is divided into two parts. The first part is about exact expressions of the approximation error. Two methods, Jordan form and Schur’s triangularization, are given to study the exact expression of

. The second part is about upper bounds on the approximation error. Two methods, convergence rate and auxiliary matrix iteration, are introduced to the estimatation of the upper bound on .

The paper is arranged as follows: Section II reviews links to related work. Section III

presents preliminary definitions, notation and Markov modelling of EAs. Section

IV demonstrates the exact expression of approximation error. Section V estimates the upper bound on approximation error. Section VI summarizes the paper.

## Ii Related Work

In practice, approximation error has been widely used to evaluate the quality of solutions found by EAs [3, 4, 5, 6, 7, 8, 9, 10]. When evaluating the performance of EAs, we list solution error in a table or display error trend in a figure. Then we claim that the algorithm with the smallest value is the best one at the th generation. Approximation error is called in different names, such as, objective function error , difference from a computed solution to a known global optimum , distance from the optimum [9, 10], fitness error  or solution error [6, 7].

So far, the theoretical study of approximation error is rare in evolutionary computation. Rudolph

 proved that under the condition , the sequence converges in mean geometrically fast to , that is, .

Recently He  made one of the first attempts to obtain an analytic expression of the approximation error for a class of elitist EAs. He proved if the transition matrix associated with an EA is an upper triangular matrix with unique diagonal entries, then for any , the relative error is expressed by where

are eigenvalues of the transition matrix (except the largest eigenvalue

) and are coefficients.

He and Lin  studied the geometric average convergence rate of the error sequence , defined by

 R[t]=1−(e[t]e)1/t. (1)

Starting from , it is straightforward to draw an exact expression of the approximation error: . They estimated the lower bound on and proved if the initial population is sampled at random, converges to an eigenvalue of the transition matrix associated with an EA.

A close work is fixed budget analysis proposed by Jansen and Zarges [14, 15]. They aim to bound the fitness value within a fixed time budget. The obtained bounds usually hold within some fixed . For example, the lower and upper bounds given in [15, Theorem 9] are expressed in the form for some fixed . However, when , these lower and upper bounds go towards ; thus they become invalid bounds on for large . This observation reveals an essential difference between fixed budget analysis and approximation error analysis. In fixed budget analysis, a bound is an approximation of for some small but might be invalid for large . The expression of bounds could be a linear or exponential function of . But approximation error analysis proves that always can be upper-bounded by exponential functions of . The bound is valid for all . In this sense, approximation error analysis may be called any budget analysis.

## Iii Preliminary

### Iii-a Definitions and Notation

We consider a maximization problem:

 maxf(x), subject to x∈S, (2)

where is a fitness function such that and its definition domain is a finite state set. Let denote the maximal value of and the optimal solution set.

In evolutionary computation, an individual is a solution . A population is a collection of individuals. Let denote the population set. The fitness of a population is

A general EA for solving the above optimization problem is described in Algorithm 1. The EA is stopped once an optimal solution is found. This stopping criterion is taken for the sake of theoretical analysis. An EA is called elitist (or strictly elitist) if for any . Any non-elitist EA can be modified into an equivalent elitist EA through adding an archive individual which preserves the best found solution but does not get involved in evolution.

###### Definition 1

Given an initial state , the fitness of is denoted by (or in short thereafter) and its expected value is The absolute error of is and its expected value is An EA is called to converge in mean if for any .

### Iii-B Transition Matrix

The approximation error analysis of EAs is built upon the Markov chain modelling of EAs which can be found in existing references such as [16, 17, 18]. A similar Markov chain framework has been used to analyze the computational time of EAs in . This paper focuses on a different topic, the approximation error of EAs.

For the sake of notation, population states are indexed by . The index represents the set of optimal populations. Other indexes represent non-optimal populations. Populations are sorted according to their fitness value from high to low:

 fmax=f0>f1≥⋯≥fL=fmin,

where stands for in short. The decomposition of states is not required to satisfy . Examples 1 and 2 in next section will show this point.

The sequence is a Markov chain because is determined by

in a probability way. Furthermore we assume that the transition probability from any

to doesn’t change over . So the chain is homogeneous. The transition probability from to is denoted by

 pi,j=Pr(X[t]=i∣X[t−1]=j), i,j=0,⋯,L. (3)

Let

stand for a column vector and

for the row column with the transpose operation . The transition matrix is a matrix:

 P=(1rT0R) (4)

where is due to the stopping criterion. The vector denotes transition probabilities from non-optimal states to optimal ones. The zero-valued vector means that transition probabilities from optimal states to non-optimal ones are 0. The matrix represents transition probabilities within non-optimal states, given by

 R= ⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝p1,1p1,2p1,3⋯p1,L−1p1,Lp2,1p2,2p2,3⋯p2,L−1p2,Lp3,1p3,2p3,3⋯p3,L−1p3,L⋮⋮⋮⋮⋮pL,1pL,2pL,3⋯pL,L−1pL,L⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠. (5)

The error sequence can be written by a matrix iteration. Then we get the first exact expression of .

###### Theorem 1

Let (or in short) denote the approximation error of : and

denotes the probability distribution of

over non-optimal states . Then

 e[t]=eTRtp. (6)
###### Proof:

Let (or in short) denote the probability . Because , we have

 e[t]=∑Li=0p[t]iei=∑Li=1p[t]iei. (7)

According to the Markov chain property, for any ,

 p[t]i=∑Lj=0pi,jp[t−1]j=∑Lj=1pi,jp[t−1]j. (8)

Let . (8) is rewritten as

 p[t]=Rp[t−1]=Rtp. (9)

Then we get .

The above theorem shows that is determined by , matrix power and . Only changes over , thus it plays the most important role in expressing . (6) also reveals it is sufficient to use partial transition matrix , rather than the whole transition matrix for expressing .

### Iii-C Matrix Analysis

Matrix analysis is the main mathematical tool used in the error analysis of EAs. Several essential definitions and lemmas are listed here. Their details can be found in the textbook .

###### Definition 2

For an matrix , scalars and vectors satisfying

are called eigenvalues and eigenvectors of

respectively. A complete set of eigenvectors for is any set of linearly independent eigenvectors for . Let , which is called the spectral radius of matrix .

###### Definition 3

A matrix is called diagonalizable if there exists a matrix such that where is diagonal matrix with diagonal entries and is an eigenvalue of .

###### Lemma 1

A square matrix is diagonalizable if and only if possesses a complete set of eigenvectors.

###### Definition 4

A unitary matrix is defined to be a

complex matrix whose columns (or rows) constitute an orthonormal basis for .

###### Lemma 2

is real symmetric if and only if is orthogonally similar to a real-diagonal matrix , that is, for some orthogonal .

###### Lemma 3 (Schur’s Triangularization)

Every square matrix is unitarily similar to an upper-triangular matrix. That is, for each , there exists a unitary matrix (not unique) and an upper-triangular matrix (not unique) such that , and the diagonal entries of are the eigenvalues of .

###### Lemma 4 (Jordan Form)

For every matrix with distinct eigenvalues , there is a non-singular matrix such that

 A=Q−1JQ= ⎛⎜ ⎜ ⎜ ⎜ ⎜⎝J1J2⋱Jk⎞⎟ ⎟ ⎟ ⎟ ⎟⎠. (10)

Each Jordan block is a square matrix of the form

 Ji= ⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝λi1λi1⋱⋱λi1λi⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠ (11)

where is an eigenvalue of . Each Jordan block is a square matrix and .

## Iv Exact Expressions of Approximation Errors

In the error analysis of EAs, the perfect goal is to seek an exact expression of . This section discusses this topic.

### Iv-a Jordan Form Method

Let’s start from a simple case that transition matrix is diagonalizable. We can obtain an exact expression of as follows.

###### Theorem 2

If matrix is diagonalizable such that where matrix is diagonal matrix, then

 e[t]=∑Li=1ciλti, (12)

where denote its th diagonal entry of , , vectors and .

###### Proof:

From Theorem 1, we know Since , we get Since is a diagonal matrix whose diagonal entries are , we come to the conclusion.

This theorem claims that is a linear combination of exponential functions provided that matrix is diagonalizable. Thus, the error analysis of EAs is how to calculate or estimated eigenvalues and coefficients .

###### Example 1 (EA-BWSE on Needle-in-Haystack)

Consider the problem of maximizing the Needle-in-Haystack function,

 maxf(x)={1,if |x|=0,0,otherwise,

where and .

EA-BWSE, a (1+1) EA with bitwise mutation and strictly elitist selection (Algorithm 2), is used for solving the above maximization problem.

Let index denote the state of such that where . Then transition probabilities satisfy

 p0,0=1,p0,i=(1n)i(1−1n)n−i,pi,i=1−(1n)i(1−1n)n−i. (13)

Transition matrix is diagonal. Let denote the initial distribution of . According to Theorem 2, the approximation error

 e[t]=∑ni=1[1−(1n)i(1−1n)n−i]tpi. (14)
###### Example 2 (EA-BWNE on Needle-in-Haystack)

Consider the problem of maximizing the Needle-in-Haystack function using EA-BWNE, the (1+1) EA with bitwise mutation and non-strictly elitist selection (Algorithm 3).

Let index denote the state of such that the conversion of from binary to decimal is where . Then transition probabilities satisfy

 p0,0=1, pi,j=pj,i, ∀i,j≠0 (15)

Since transition matrix is symmetric, it is diagonalizable. According to Theorem 2, Theorem 2 reveals that is a linear combination of exponential functions . However, it is still difficult to calculate eigenvalues and coefficients due to the difficulty in obtaining and .

No matter whether matrix is diagonalizable or not, it can be represented by a Jordan form. Previously the method of Jordan form was used to bound the probability distribution of solutions co verging towards a stationary distribution [16, 17], that is, where and is the limit of . Suzuki  derived a lower bound on

for simple genetic algorithms through analysing eigenvalues of the transition matrix. Schmitt and Rothlauf

 found that the convergence rate of is determined by the spectral radius of matrix .

In the current paper, we aim to derive an exact expression of using the Jordan form method.

###### Lemma 5

Let be the Jordan form of . Then

 e[t]=eTQ−1JtQp. (16)
###### Proof:

From Jordan form: , we get . Inserting this expression into (6), we get the desired conclusion.

From (16), we see that in order to obtain an exact expression of , we need to represent . This is given in the following theorem.

###### Theorem 3

For any matrix , the approximation error

 e[t]=∑ki=1∑Lim=1cim(tLi−m+1)λt−m+1i, (17)

where the coefficient

 cim= ∑Li−m+1j=1aijbij−m+1, m=1,⋯,Li, (18)

and and are given by (21). Let the binomial coefficient if .

###### Proof:

We assume that matrix consists of Jordan blocks

 ⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝Jt1Jt2⋱Jtk⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠. (19)

Let and write it into . Let and write it into . Then (16) can be rewritten as

 e[t]=∑ki=1aTiJtibi (20)

Denote vectors

 aTi=(ai1,⋯,aiLi), bi=(bi1,⋯,biLi)T. (21)

Consider the component in (20). Each Jordan block power equals to [19, pp. 618]

 Jti= ⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝λti(t1)λt−1i(t2)λt−2i⋯(tLi)λt−Liiλti(t1)λt−1i⋯(tLi−1)λt−Li+1i⋱⋱⋮⋱(t1)λt−1iλti⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠.

Inserting it into , we get that equals to

 Li∑m=1Li−m+1∑j=1aijbij−m+1(tLi−m+1)λt−m+1i. (22)

Then we have

 aTiJtibi=Li∑m=1cim(tLi−m+1)λt−m+1i, (23)

where coefficients is given by (18).

The approximation error is the summation of all from to , which equals to

 e[t]=k∑i=1Li∑m=1cim(tLi−m+1)λt−m+1i. (24)

The above is the desired result.

Theorem 3 reveals the exact expression of consisting of three parts:

1. Exponential terms . Each term is an exponential function of where each is an eigenvalue of .

2. Constant Coefficients . They are independent of . (18) shows that they are determined by vectors , and the size of Jordan block .

3. Binomial coefficients . Since , each coefficient is a polynomial function of and its order is up to . Binomial coefficients are only related to the size of Jordan block .

Because of the difficulty of obtaining Jordan form of transition matrices, it is hard to generate an exact expression of in practice.

As a direct consequence of Theorem 3, we get the sufficient and necessary condition of convergence of EAs.

if and only if .

### Iv-B Shur’s Decomposition Method

Alternately matrix power can be represented using Schur’s triangularisation. Then we obtain another exact expression of . Let’s start from a simple case that matrix is upper triangular with distinct eigenvalues . The analysis is based on power factors of a matrix .

###### Definition 5

For an upper triangular matrix , its power factors, (where ), are defined as follows:

 (25)

Using power factors of , we can obtain an explicit expression of the approximation error as shown in the theorem below.

###### Theorem 4

If matrix is upper triangular with distinct eigenvalues , then

 e[t]=∑Lk=1∑Li=1∑Lj=ieipi,j,kpjckλt−1k. (26)

The proof of this theorem is almost the same as that of  [1, Theorem 1] just with minor notation change.

Theorem 4 is a special case of Theorem 2 because distinct eigenvalues means matrix is diagonalizable.

###### Example 3 (EA-OBSE on OneMax)

Consider the problem of maximizing the OneMax function,

 maxf(x)=|x|, x∈{0,1}n.

EA-OBSE, a (1+1) EA with onebit mutation and strictly elitist selection (Algorithm 4), is used for solving the above maximization problem.

Let index denote the state of such that where . The error . Then transition probabilities satisfy

 p0,0=1,pi,i+1=i+1n,pi,i=1−in. (27)

Transition matrix is upper-triangular. Its power factors, (where ), are calculated as follows:

 pi,j,k=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩1−jn,if i=j=k,0,if kj,jj−kpi,j−1,k,if i≤k

Given an initial distribution , according to Theorem 4, the approximation error

 e[t]=L∑k=1L∑i=1L∑j=i(n−i)pi,j,kpj(1−kn)t−1. (29)

(29) is a closed-form expression of , which contains constants, variables, elementary arithmetic operation () and finite sums. It can be simplified to . The expression is also given by a much simpler method in Example 5.

###### Example 4 (EA-OBSE on Mono)

Consider EA-OBSE for maximizing a monotonically increasing function,

 maxf(x), x∈{0,1}n, (30)

where satisfies if ; if .

Let index denote the state of such that where . The error . The transition matrix is the same as the above example. Similarly, the approximation error

 e[t]=n∑k=1n∑i=1n∑j=i[f(n)−f(n−i)]pi,j,kpj(1−kn)t−1.

Table I shows the exact expression of and on , and when and . Note that coefficients vary on these functions. Some coefficients are positive and some are negative.

If matrix is not upper triangular, Schur’s triangularisation states that is unitarily similar to an upper triangular matrix.

###### Lemma 6

Let be Schur’s triangularisation of matrix , where is a unitary matrix and an upper triangular matrix. Then

 e[t]=eTU∗TtUp. (31)
###### Proof:

From and where is a unit matrix, we get . Inserting it into (6), we get the desired conclusion.

We need to express the matrix power in (31). For any upper triangular matrix , its power can be expressed by the entries of  [21, 20]. The following theorem is based on .

###### Theorem 5

Let be Schur’s triangularisation of matrix , then

 e[t]=∑Li=1ait[t]ijbj. (32)

where vectors is and is . is the th entry of matrix given by

 t[t]i,j=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩λti,if % i=j,j−i∑m=1∑αi∈A[l]⎛⎜ ⎜⎝m∏l=1tαlαl+1∑βk∈B[t−m][m]m+1∏k=1λβkαk⎞⎟ ⎟⎠,if ij. (33)

where is an eigenvalue of matrix . The index set where indexes are positive integers and satisfy . The index set where indexes are non-negative integers and their sum satisfies .

###### Proof:

Since matrix is upper triangular, applying [22, Theorem 2.4] to , we get the expression (33) of . Inserting (33) to (31), we have the desired result.

The above theorem gives another exact but complicated expression of . Because of Schur’s triangularisation, it is hard to generate an exact expression of in practice.

Summarizing this section, we have demonstrated the exact expression of through two methods, albeit the difficulty in obtaining Jordan form and Schur’s triangularisation.

## V Upper Bounds on Approximation Error

For many EAs, it is complex to obtain an exact expression of . Therefore, a more reasonable goal is to seek an upper bound on . A lower bound on is less interesting because a trivial lower bound always exists: .

### V-a Convergence Rate Method

Unlike an exact expression of , it is rather simple to obtain an upper bound on . A trivial upper bound is

 e[t]≤max{e(i);i=0,⋯,L}. (34)

Of course, this upper bound is loose and unsatisfied. A better upper bound can be derived from the convergence rate of EAs.

###### Definition 6

Given an error sequence , its normalized convergence rate is if

The above rate takes value from and it can be regarded as the convergence speed. The larger value, the faster convergence. Based on this rate, we get an upper bound on . The theorem below is similar to [12, Theorem 2] but its calculation is more accurate.

###### Theorem 6

Given an error sequence , define drift where . If for any , then

 e[t]≤e[1−mini=1,⋯,LΔe(i)e(i)]t. (35)
###### Proof:

We assume that where . From

 e[t]=e[t−1]−Δe(i)≤e[t−1]−Δe(i),

we get

 e[t]e[t−1]≤1−Δe(i)e(i)≤1−mini=1,⋯,LΔe(i)e(i). (36)

We have

 e[t]e[t−1]≤1−mini=1,⋯,LΔe(i)e(i),

then get the required result.

Let

 λ′:=1−mini=1,⋯,LΔe(i)e(i).

We show that where is the spectral radius of matrix . For the sake of analysis, we assume 111The analysis of is similar. The proof needs an extended Collatz-Wielandt formula [19, p. 670]. We omit it in this paper.. From

 1−Δe(i)e(i)=[eTR]i[e]i,

according to the Collatz-Wielandt formula [19, p. 669], we get

 λmax=mine:e>0maxi:1≤i≤L[eTR]iei,

Then . Thus can be written as for some non-negative . The above theorem implies

 e[t]≤c(λmax+ϵ)t. (37)

It is worth mentioning that multiplicative drift analysis  also applies the convergence rate to estimating the hitting time, . However, multiplicative drift analysis and approximation error analysis discuss two different topics. The former aims at an upper-bound on hitting time while the latter at an upper bound on approximation error.

The convergence rate provides a simple method of estimating and . Its applicability is shown through several examples.

###### Example 5 (EA-OBSE on OneMax)

Consider EA-OBSE on the OneMax function. Let denote the state of such that . Then . We assume that where , that is, includes zero-valued bits. The probability of reducing zero-valued bit is .

 Δe(i)=in. (38) Δe(i)e(i)=1n. (39)

Then we get

 e[t]=(1−1n)te. (40) f[t]=n−e(1−1n)t. (41)

(