In the empirical study of evolutionary algorithms (EAs), the quality of a solution is evaluated by either the fitness value or approximation error. The latter measures the fitness difference between an approximation solution and the optimal solution. The absolute error of a solution is defined by where is the fitness of the optimal solution and the fitness of [1, 2]. The approximation error has been widely used in the empirical study of EAs in either a standard form or its logarithmic scale [3, 4, 5, 6, 7, 8, 9, 10]. Starting from the absolute error , it is trivial to derive the fitness value where for a minimization problem and for a maximization problem. Therefore, this paper focuses on analyzing the approximation error of EAs. It is straightforward to extend related results from the approximation error to the fitness value of EAs.
Although the fitness value or approximation error has been widely adopted to evaluate the performance of EAs in computational experiments, they are seldom studied in a rigorous way. This is in shark contrast to the computational time of EAs. The latter is today’s mainstream in the theory of EAs  but in the practice, computational time is seldom applied to evaluating the performance of EAs. In order to bridge this gap between practice and theory, it is necessary to make a rigorous error analysis of EAs.
Because EAs are random iterative algorithms, the expected value, of the th generation solution , is a function of . The main research questions are two questions: (1) what is an exact expression of ? (2) if an exact expression is unavailable, what is a bound on ? He  made one of the first attempts to answer these questions. He gave an analytic expression of the approximation error for a class of (1+1) strictly elitist EAs.
This paper aims at establishing a theoretical framework of studying approximation error of EAs for discrete optimization. In the framework, EAs are modelled by homogeneous Markov chains. The analysis is divided into two parts. The first part is about exact expressions of the approximation error. Two methods, Jordan form and Schur’s triangularization, are given to study the exact expression of. The second part is about upper bounds on the approximation error. Two methods, convergence rate and auxiliary matrix iteration, are introduced to the estimatation of the upper bound on .
presents preliminary definitions, notation and Markov modelling of EAs. SectionIV demonstrates the exact expression of approximation error. Section V estimates the upper bound on approximation error. Section VI summarizes the paper.
Ii Related Work
In practice, approximation error has been widely used to evaluate the quality of solutions found by EAs [3, 4, 5, 6, 7, 8, 9, 10]. When evaluating the performance of EAs, we list solution error in a table or display error trend in a figure. Then we claim that the algorithm with the smallest value is the best one at the th generation. Approximation error is called in different names, such as, objective function error , difference from a computed solution to a known global optimum , distance from the optimum [9, 10], fitness error  or solution error [6, 7].
So far, the theoretical study of approximation error is rare in evolutionary computation. Rudolph proved that under the condition , the sequence converges in mean geometrically fast to , that is, .
Recently He  made one of the first attempts to obtain an analytic expression of the approximation error for a class of elitist EAs. He proved if the transition matrix associated with an EA is an upper triangular matrix with unique diagonal entries, then for any , the relative error is expressed by where
are eigenvalues of the transition matrix (except the largest eigenvalue) and are coefficients.
He and Lin  studied the geometric average convergence rate of the error sequence , defined by
Starting from , it is straightforward to draw an exact expression of the approximation error: . They estimated the lower bound on and proved if the initial population is sampled at random, converges to an eigenvalue of the transition matrix associated with an EA.
A close work is fixed budget analysis proposed by Jansen and Zarges [14, 15]. They aim to bound the fitness value within a fixed time budget. The obtained bounds usually hold within some fixed . For example, the lower and upper bounds given in [15, Theorem 9] are expressed in the form for some fixed . However, when , these lower and upper bounds go towards ; thus they become invalid bounds on for large . This observation reveals an essential difference between fixed budget analysis and approximation error analysis. In fixed budget analysis, a bound is an approximation of for some small but might be invalid for large . The expression of bounds could be a linear or exponential function of . But approximation error analysis proves that always can be upper-bounded by exponential functions of . The bound is valid for all . In this sense, approximation error analysis may be called any budget analysis.
Iii-a Definitions and Notation
We consider a maximization problem:
where is a fitness function such that and its definition domain is a finite state set. Let denote the maximal value of and the optimal solution set.
In evolutionary computation, an individual is a solution . A population is a collection of individuals. Let denote the population set. The fitness of a population is
A general EA for solving the above optimization problem is described in Algorithm 1. The EA is stopped once an optimal solution is found. This stopping criterion is taken for the sake of theoretical analysis. An EA is called elitist (or strictly elitist) if for any . Any non-elitist EA can be modified into an equivalent elitist EA through adding an archive individual which preserves the best found solution but does not get involved in evolution.
Given an initial state , the fitness of is denoted by (or in short thereafter) and its expected value is The absolute error of is and its expected value is An EA is called to converge in mean if for any .
Iii-B Transition Matrix
The approximation error analysis of EAs is built upon the Markov chain modelling of EAs which can be found in existing references such as [16, 17, 18]. A similar Markov chain framework has been used to analyze the computational time of EAs in . This paper focuses on a different topic, the approximation error of EAs.
For the sake of notation, population states are indexed by . The index represents the set of optimal populations. Other indexes represent non-optimal populations. Populations are sorted according to their fitness value from high to low:
The sequence is a Markov chain because is determined by
in a probability way. Furthermore we assume that the transition probability from anyto doesn’t change over . So the chain is homogeneous. The transition probability from to is denoted by
stand for a column vector andfor the row column with the transpose operation . The transition matrix is a matrix:
where is due to the stopping criterion. The vector denotes transition probabilities from non-optimal states to optimal ones. The zero-valued vector means that transition probabilities from optimal states to non-optimal ones are 0. The matrix represents transition probabilities within non-optimal states, given by
The error sequence can be written by a matrix iteration. Then we get the first exact expression of .
Let (or in short) denote the approximation error of : and denotes the probability distribution of
denotes the probability distribution ofover non-optimal states . Then
Let (or in short) denote the probability . Because , we have
According to the Markov chain property, for any ,
Let . (8) is rewritten as
Then we get .
The above theorem shows that is determined by , matrix power and . Only changes over , thus it plays the most important role in expressing . (6) also reveals it is sufficient to use partial transition matrix , rather than the whole transition matrix for expressing .
Iii-C Matrix Analysis
Matrix analysis is the main mathematical tool used in the error analysis of EAs. Several essential definitions and lemmas are listed here. Their details can be found in the textbook .
For an matrix , scalars and vectors satisfying are called eigenvalues and eigenvectors of
are called eigenvalues and eigenvectors ofrespectively. A complete set of eigenvectors for is any set of linearly independent eigenvectors for . Let , which is called the spectral radius of matrix .
A matrix is called diagonalizable if there exists a matrix such that where is diagonal matrix with diagonal entries and is an eigenvalue of .
A square matrix is diagonalizable if and only if possesses a complete set of eigenvectors.
A unitary matrix is defined to be a
A unitary matrix is defined to be acomplex matrix whose columns (or rows) constitute an orthonormal basis for .
is real symmetric if and only if is orthogonally similar to a real-diagonal matrix , that is, for some orthogonal .
Lemma 3 (Schur’s Triangularization)
Every square matrix is unitarily similar to an upper-triangular matrix. That is, for each , there exists a unitary matrix (not unique) and an upper-triangular matrix (not unique) such that , and the diagonal entries of are the eigenvalues of .
Lemma 4 (Jordan Form)
For every matrix with distinct eigenvalues , there is a non-singular matrix such that
Each Jordan block is a square matrix of the form
where is an eigenvalue of . Each Jordan block is a square matrix and .
Iv Exact Expressions of Approximation Errors
In the error analysis of EAs, the perfect goal is to seek an exact expression of . This section discusses this topic.
Iv-a Jordan Form Method
Let’s start from a simple case that transition matrix is diagonalizable. We can obtain an exact expression of as follows.
If matrix is diagonalizable such that where matrix is diagonal matrix, then
where denote its th diagonal entry of , , vectors and .
From Theorem 1, we know Since , we get Since is a diagonal matrix whose diagonal entries are , we come to the conclusion.
This theorem claims that is a linear combination of exponential functions provided that matrix is diagonalizable. Thus, the error analysis of EAs is how to calculate or estimated eigenvalues and coefficients .
Example 1 (EA-BWSE on Needle-in-Haystack)
Consider the problem of maximizing the Needle-in-Haystack function,
where and .
EA-BWSE, a (1+1) EA with bitwise mutation and strictly elitist selection (Algorithm 2), is used for solving the above maximization problem.
Let index denote the state of such that where . Then transition probabilities satisfy
Transition matrix is diagonal. Let denote the initial distribution of . According to Theorem 2, the approximation error
Example 2 (EA-BWNE on Needle-in-Haystack)
Consider the problem of maximizing the Needle-in-Haystack function using EA-BWNE, the (1+1) EA with bitwise mutation and non-strictly elitist selection (Algorithm 3).
Let index denote the state of such that the conversion of from binary to decimal is where . Then transition probabilities satisfy
Since transition matrix is symmetric, it is diagonalizable. According to Theorem 2, Theorem 2 reveals that is a linear combination of exponential functions . However, it is still difficult to calculate eigenvalues and coefficients due to the difficulty in obtaining and .
No matter whether matrix is diagonalizable or not, it can be represented by a Jordan form. Previously the method of Jordan form was used to bound the probability distribution of solutions co verging towards a stationary distribution [16, 17], that is, where and is the limit of . Suzuki  derived a lower bound on
for simple genetic algorithms through analysing eigenvalues of the transition matrix. Schmitt and Rothlauf found that the convergence rate of is determined by the spectral radius of matrix .
In the current paper, we aim to derive an exact expression of using the Jordan form method.
Let be the Jordan form of . Then
From Jordan form: , we get . Inserting this expression into (6), we get the desired conclusion.
From (16), we see that in order to obtain an exact expression of , we need to represent . This is given in the following theorem.
For any matrix , the approximation error
where the coefficient
and and are given by (21). Let the binomial coefficient if .
We assume that matrix consists of Jordan blocks
Let and write it into . Let and write it into . Then (16) can be rewritten as
Inserting it into , we get that equals to
Then we have
where coefficients is given by (18).
The approximation error is the summation of all from to , which equals to
The above is the desired result.
Theorem 3 reveals the exact expression of consisting of three parts:
Exponential terms . Each term is an exponential function of where each is an eigenvalue of .
Constant Coefficients . They are independent of . (18) shows that they are determined by vectors , and the size of Jordan block .
Binomial coefficients . Since , each coefficient is a polynomial function of and its order is up to . Binomial coefficients are only related to the size of Jordan block .
Because of the difficulty of obtaining Jordan form of transition matrices, it is hard to generate an exact expression of in practice.
As a direct consequence of Theorem 3, we get the sufficient and necessary condition of convergence of EAs.
if and only if .
Iv-B Shur’s Decomposition Method
Alternately matrix power can be represented using Schur’s triangularisation. Then we obtain another exact expression of . Let’s start from a simple case that matrix is upper triangular with distinct eigenvalues . The analysis is based on power factors of a matrix .
For an upper triangular matrix , its power factors, (where ), are defined as follows:
Using power factors of , we can obtain an explicit expression of the approximation error as shown in the theorem below.
If matrix is upper triangular with distinct eigenvalues , then
The proof of this theorem is almost the same as that of [1, Theorem 1] just with minor notation change.
Example 3 (EA-OBSE on OneMax)
Consider the problem of maximizing the OneMax function,
EA-OBSE, a (1+1) EA with onebit mutation and strictly elitist selection (Algorithm 4), is used for solving the above maximization problem.
Let index denote the state of such that where . The error . Then transition probabilities satisfy
Transition matrix is upper-triangular. Its power factors, (where ), are calculated as follows:
Given an initial distribution , according to Theorem 4, the approximation error
(29) is a closed-form expression of , which contains constants, variables, elementary arithmetic operation () and finite sums. It can be simplified to . The expression is also given by a much simpler method in Example 5.
Example 4 (EA-OBSE on Mono)
Consider EA-OBSE for maximizing a monotonically increasing function,
where satisfies if ; if .
Let index denote the state of such that where . The error . The transition matrix is the same as the above example. Similarly, the approximation error
Table I shows the exact expression of and on , and when and . Note that coefficients vary on these functions. Some coefficients are positive and some are negative.
If matrix is not upper triangular, Schur’s triangularisation states that is unitarily similar to an upper triangular matrix.
Let be Schur’s triangularisation of matrix , where is a unitary matrix and an upper triangular matrix. Then
From and where is a unit matrix, we get . Inserting it into (6), we get the desired conclusion.
Let be Schur’s triangularisation of matrix , then
where vectors is and is . is the th entry of matrix given by
where is an eigenvalue of matrix . The index set where indexes are positive integers and satisfy . The index set where indexes are non-negative integers and their sum satisfies .
The above theorem gives another exact but complicated expression of . Because of Schur’s triangularisation, it is hard to generate an exact expression of in practice.
Summarizing this section, we have demonstrated the exact expression of through two methods, albeit the difficulty in obtaining Jordan form and Schur’s triangularisation.
V Upper Bounds on Approximation Error
For many EAs, it is complex to obtain an exact expression of . Therefore, a more reasonable goal is to seek an upper bound on . A lower bound on is less interesting because a trivial lower bound always exists: .
V-a Convergence Rate Method
Unlike an exact expression of , it is rather simple to obtain an upper bound on . A trivial upper bound is
Of course, this upper bound is loose and unsatisfied. A better upper bound can be derived from the convergence rate of EAs.
Given an error sequence , its normalized convergence rate is if
The above rate takes value from and it can be regarded as the convergence speed. The larger value, the faster convergence. Based on this rate, we get an upper bound on . The theorem below is similar to [12, Theorem 2] but its calculation is more accurate.
Given an error sequence , define drift where . If for any , then
We assume that where . From
then get the required result.
We show that where is the spectral radius of matrix . For the sake of analysis, we assume 111The analysis of is similar. The proof needs an extended Collatz-Wielandt formula [19, p. 670]. We omit it in this paper.. From
according to the Collatz-Wielandt formula [19, p. 669], we get
Then . Thus can be written as for some non-negative . The above theorem implies
It is worth mentioning that multiplicative drift analysis  also applies the convergence rate to estimating the hitting time, . However, multiplicative drift analysis and approximation error analysis discuss two different topics. The former aims at an upper-bound on hitting time while the latter at an upper bound on approximation error.
The convergence rate provides a simple method of estimating and . Its applicability is shown through several examples.