I Introduction
In the empirical study of evolutionary algorithms (EAs), the quality of a solution is evaluated by either the fitness value or approximation error. The latter measures the fitness difference between an approximation solution and the optimal solution. The absolute error of a solution is defined by where is the fitness of the optimal solution and the fitness of [1, 2]. The approximation error has been widely used in the empirical study of EAs in either a standard form or its logarithmic scale [3, 4, 5, 6, 7, 8, 9, 10]. Starting from the absolute error , it is trivial to derive the fitness value where for a minimization problem and for a maximization problem. Therefore, this paper focuses on analyzing the approximation error of EAs. It is straightforward to extend related results from the approximation error to the fitness value of EAs.
Although the fitness value or approximation error has been widely adopted to evaluate the performance of EAs in computational experiments, they are seldom studied in a rigorous way. This is in shark contrast to the computational time of EAs. The latter is today’s mainstream in the theory of EAs [11] but in the practice, computational time is seldom applied to evaluating the performance of EAs. In order to bridge this gap between practice and theory, it is necessary to make a rigorous error analysis of EAs.
Because EAs are random iterative algorithms, the expected value, of the th generation solution , is a function of . The main research questions are two questions: (1) what is an exact expression of ? (2) if an exact expression is unavailable, what is a bound on ? He [1] made one of the first attempts to answer these questions. He gave an analytic expression of the approximation error for a class of (1+1) strictly elitist EAs.
This paper aims at establishing a theoretical framework of studying approximation error of EAs for discrete optimization. In the framework, EAs are modelled by homogeneous Markov chains. The analysis is divided into two parts. The first part is about exact expressions of the approximation error. Two methods, Jordan form and Schur’s triangularization, are given to study the exact expression of
. The second part is about upper bounds on the approximation error. Two methods, convergence rate and auxiliary matrix iteration, are introduced to the estimatation of the upper bound on .The paper is arranged as follows: Section II reviews links to related work. Section III
presents preliminary definitions, notation and Markov modelling of EAs. Section
IV demonstrates the exact expression of approximation error. Section V estimates the upper bound on approximation error. Section VI summarizes the paper.Ii Related Work
In practice, approximation error has been widely used to evaluate the quality of solutions found by EAs [3, 4, 5, 6, 7, 8, 9, 10]. When evaluating the performance of EAs, we list solution error in a table or display error trend in a figure. Then we claim that the algorithm with the smallest value is the best one at the th generation. Approximation error is called in different names, such as, objective function error [3], difference from a computed solution to a known global optimum [4], distance from the optimum [9, 10], fitness error [8] or solution error [6, 7].
So far, the theoretical study of approximation error is rare in evolutionary computation. Rudolph
[12] proved that under the condition , the sequence converges in mean geometrically fast to , that is, .Recently He [1] made one of the first attempts to obtain an analytic expression of the approximation error for a class of elitist EAs. He proved if the transition matrix associated with an EA is an upper triangular matrix with unique diagonal entries, then for any , the relative error is expressed by where
are eigenvalues of the transition matrix (except the largest eigenvalue
) and are coefficients.He and Lin [13] studied the geometric average convergence rate of the error sequence , defined by
(1) 
Starting from , it is straightforward to draw an exact expression of the approximation error: . They estimated the lower bound on and proved if the initial population is sampled at random, converges to an eigenvalue of the transition matrix associated with an EA.
A close work is fixed budget analysis proposed by Jansen and Zarges [14, 15]. They aim to bound the fitness value within a fixed time budget. The obtained bounds usually hold within some fixed . For example, the lower and upper bounds given in [15, Theorem 9] are expressed in the form for some fixed . However, when , these lower and upper bounds go towards ; thus they become invalid bounds on for large . This observation reveals an essential difference between fixed budget analysis and approximation error analysis. In fixed budget analysis, a bound is an approximation of for some small but might be invalid for large . The expression of bounds could be a linear or exponential function of . But approximation error analysis proves that always can be upperbounded by exponential functions of . The bound is valid for all . In this sense, approximation error analysis may be called any budget analysis.
Iii Preliminary
Iiia Definitions and Notation
We consider a maximization problem:
(2) 
where is a fitness function such that and its definition domain is a finite state set. Let denote the maximal value of and the optimal solution set.
In evolutionary computation, an individual is a solution . A population is a collection of individuals. Let denote the population set. The fitness of a population is
A general EA for solving the above optimization problem is described in Algorithm 1. The EA is stopped once an optimal solution is found. This stopping criterion is taken for the sake of theoretical analysis. An EA is called elitist (or strictly elitist) if for any . Any nonelitist EA can be modified into an equivalent elitist EA through adding an archive individual which preserves the best found solution but does not get involved in evolution.
Definition 1
Given an initial state , the fitness of is denoted by (or in short thereafter) and its expected value is The absolute error of is and its expected value is An EA is called to converge in mean if for any .
IiiB Transition Matrix
The approximation error analysis of EAs is built upon the Markov chain modelling of EAs which can be found in existing references such as [16, 17, 18]. A similar Markov chain framework has been used to analyze the computational time of EAs in [18]. This paper focuses on a different topic, the approximation error of EAs.
For the sake of notation, population states are indexed by . The index represents the set of optimal populations. Other indexes represent nonoptimal populations. Populations are sorted according to their fitness value from high to low:
where stands for in short. The decomposition of states is not required to satisfy . Examples 1 and 2 in next section will show this point.
The sequence is a Markov chain because is determined by
in a probability way. Furthermore we assume that the transition probability from any
to doesn’t change over . So the chain is homogeneous. The transition probability from to is denoted by(3) 
Let
stand for a column vector and
for the row column with the transpose operation . The transition matrix is a matrix:(4) 
where is due to the stopping criterion. The vector denotes transition probabilities from nonoptimal states to optimal ones. The zerovalued vector means that transition probabilities from optimal states to nonoptimal ones are 0. The matrix represents transition probabilities within nonoptimal states, given by
(5) 
The error sequence can be written by a matrix iteration. Then we get the first exact expression of .
Theorem 1
Let (or in short) denote the approximation error of : and
denotes the probability distribution of
over nonoptimal states . Then(6) 
Proof:
Let (or in short) denote the probability . Because , we have
(7) 
The above theorem shows that is determined by , matrix power and . Only changes over , thus it plays the most important role in expressing . (6) also reveals it is sufficient to use partial transition matrix , rather than the whole transition matrix for expressing .
IiiC Matrix Analysis
Matrix analysis is the main mathematical tool used in the error analysis of EAs. Several essential definitions and lemmas are listed here. Their details can be found in the textbook [19].
Definition 2
For an matrix , scalars and vectors satisfying
are called eigenvalues and eigenvectors of
respectively. A complete set of eigenvectors for is any set of linearly independent eigenvectors for . Let , which is called the spectral radius of matrix .Definition 3
A matrix is called diagonalizable if there exists a matrix such that where is diagonal matrix with diagonal entries and is an eigenvalue of .
Lemma 1
A square matrix is diagonalizable if and only if possesses a complete set of eigenvectors.
Definition 4
A unitary matrix is defined to be a
complex matrix whose columns (or rows) constitute an orthonormal basis for .Lemma 2
is real symmetric if and only if is orthogonally similar to a realdiagonal matrix , that is, for some orthogonal .
Lemma 3 (Schur’s Triangularization)
Every square matrix is unitarily similar to an uppertriangular matrix. That is, for each , there exists a unitary matrix (not unique) and an uppertriangular matrix (not unique) such that , and the diagonal entries of are the eigenvalues of .
Lemma 4 (Jordan Form)
For every matrix with distinct eigenvalues , there is a nonsingular matrix such that
(10) 
Each Jordan block is a square matrix of the form
(11) 
where is an eigenvalue of . Each Jordan block is a square matrix and .
Iv Exact Expressions of Approximation Errors
In the error analysis of EAs, the perfect goal is to seek an exact expression of . This section discusses this topic.
Iva Jordan Form Method
Let’s start from a simple case that transition matrix is diagonalizable. We can obtain an exact expression of as follows.
Theorem 2
If matrix is diagonalizable such that where matrix is diagonal matrix, then
(12) 
where denote its th diagonal entry of , , vectors and .
Proof:
From Theorem 1, we know Since , we get Since is a diagonal matrix whose diagonal entries are , we come to the conclusion.
This theorem claims that is a linear combination of exponential functions provided that matrix is diagonalizable. Thus, the error analysis of EAs is how to calculate or estimated eigenvalues and coefficients .
Example 1 (EABWSE on NeedleinHaystack)
Consider the problem of maximizing the NeedleinHaystack function,
where and .
EABWSE, a (1+1) EA with bitwise mutation and strictly elitist selection (Algorithm 2), is used for solving the above maximization problem.
Let index denote the state of such that where . Then transition probabilities satisfy
(13) 
Transition matrix is diagonal. Let denote the initial distribution of . According to Theorem 2, the approximation error
(14) 
Example 2 (EABWNE on NeedleinHaystack)
Consider the problem of maximizing the NeedleinHaystack function using EABWNE, the (1+1) EA with bitwise mutation and nonstrictly elitist selection (Algorithm 3).
Let index denote the state of such that the conversion of from binary to decimal is where . Then transition probabilities satisfy
(15) 
Since transition matrix is symmetric, it is diagonalizable. According to Theorem 2, Theorem 2 reveals that is a linear combination of exponential functions . However, it is still difficult to calculate eigenvalues and coefficients due to the difficulty in obtaining and .
No matter whether matrix is diagonalizable or not, it can be represented by a Jordan form. Previously the method of Jordan form was used to bound the probability distribution of solutions co verging towards a stationary distribution [16, 17], that is, where and is the limit of . Suzuki [16] derived a lower bound on
for simple genetic algorithms through analysing eigenvalues of the transition matrix. Schmitt and Rothlauf
[17] found that the convergence rate of is determined by the spectral radius of matrix .In the current paper, we aim to derive an exact expression of using the Jordan form method.
Lemma 5
Let be the Jordan form of . Then
(16) 
Proof:
From Jordan form: , we get . Inserting this expression into (6), we get the desired conclusion.
From (16), we see that in order to obtain an exact expression of , we need to represent . This is given in the following theorem.
Theorem 3
For any matrix , the approximation error
(17) 
where the coefficient
(18) 
and and are given by (21). Let the binomial coefficient if .
Proof:
We assume that matrix consists of Jordan blocks
(19) 
Let and write it into . Let and write it into . Then (16) can be rewritten as
(20) 
Denote vectors
(21) 
Consider the component in (20). Each Jordan block power equals to [19, pp. 618]
Inserting it into , we get that equals to
(22) 
Then we have
(23) 
where coefficients is given by (18).
The approximation error is the summation of all from to , which equals to
(24) 
The above is the desired result.
Theorem 3 reveals the exact expression of consisting of three parts:

Exponential terms . Each term is an exponential function of where each is an eigenvalue of .

Constant Coefficients . They are independent of . (18) shows that they are determined by vectors , and the size of Jordan block .

Binomial coefficients . Since , each coefficient is a polynomial function of and its order is up to . Binomial coefficients are only related to the size of Jordan block .
Because of the difficulty of obtaining Jordan form of transition matrices, it is hard to generate an exact expression of in practice.
As a direct consequence of Theorem 3, we get the sufficient and necessary condition of convergence of EAs.
Corollary 1
if and only if .
IvB Shur’s Decomposition Method
Alternately matrix power can be represented using Schur’s triangularisation. Then we obtain another exact expression of . Let’s start from a simple case that matrix is upper triangular with distinct eigenvalues . The analysis is based on power factors of a matrix [20].
Definition 5
For an upper triangular matrix , its power factors, (where ), are defined as follows:
(25) 
Using power factors of , we can obtain an explicit expression of the approximation error as shown in the theorem below.
Theorem 4
If matrix is upper triangular with distinct eigenvalues , then
(26) 
The proof of this theorem is almost the same as that of [1, Theorem 1] just with minor notation change.
Theorem 4 is a special case of Theorem 2 because distinct eigenvalues means matrix is diagonalizable.
Example 3 (EAOBSE on OneMax)
Consider the problem of maximizing the OneMax function,
EAOBSE, a (1+1) EA with onebit mutation and strictly elitist selection (Algorithm 4), is used for solving the above maximization problem.
Let index denote the state of such that where . The error . Then transition probabilities satisfy
(27) 
Transition matrix is uppertriangular. Its power factors, (where ), are calculated as follows:
(28) 
Given an initial distribution , according to Theorem 4, the approximation error
(29) 
(29) is a closedform expression of , which contains constants, variables, elementary arithmetic operation () and finite sums. It can be simplified to . The expression is also given by a much simpler method in Example 5.
Example 4 (EAOBSE on Mono)
Consider EAOBSE for maximizing a monotonically increasing function,
(30) 
where satisfies if ; if .
Let index denote the state of such that where . The error . The transition matrix is the same as the above example. Similarly, the approximation error
Table I shows the exact expression of and on , and when and . Note that coefficients vary on these functions. Some coefficients are positive and some are negative.
If matrix is not upper triangular, Schur’s triangularisation states that is unitarily similar to an upper triangular matrix.
Lemma 6
Let be Schur’s triangularisation of matrix , where is a unitary matrix and an upper triangular matrix. Then
(31) 
Proof:
From and where is a unit matrix, we get . Inserting it into (6), we get the desired conclusion.
We need to express the matrix power in (31). For any upper triangular matrix , its power can be expressed by the entries of [21, 20]. The following theorem is based on [22].
Theorem 5
Let be Schur’s triangularisation of matrix , then
(32) 
where vectors is and is . is the th entry of matrix given by
(33) 
where is an eigenvalue of matrix . The index set where indexes are positive integers and satisfy . The index set where indexes are nonnegative integers and their sum satisfies .
Proof:
The above theorem gives another exact but complicated expression of . Because of Schur’s triangularisation, it is hard to generate an exact expression of in practice.
Summarizing this section, we have demonstrated the exact expression of through two methods, albeit the difficulty in obtaining Jordan form and Schur’s triangularisation.
V Upper Bounds on Approximation Error
For many EAs, it is complex to obtain an exact expression of . Therefore, a more reasonable goal is to seek an upper bound on . A lower bound on is less interesting because a trivial lower bound always exists: .
Va Convergence Rate Method
Unlike an exact expression of , it is rather simple to obtain an upper bound on . A trivial upper bound is
(34) 
Of course, this upper bound is loose and unsatisfied. A better upper bound can be derived from the convergence rate of EAs.
Definition 6
Given an error sequence , its normalized convergence rate is if
The above rate takes value from and it can be regarded as the convergence speed. The larger value, the faster convergence. Based on this rate, we get an upper bound on . The theorem below is similar to [12, Theorem 2] but its calculation is more accurate.
Theorem 6
Given an error sequence , define drift where . If for any , then
(35) 
Proof:
We assume that where . From
we get
(36) 
We have
then get the required result.
Let
We show that where is the spectral radius of matrix . For the sake of analysis, we assume ^{1}^{1}1The analysis of is similar. The proof needs an extended CollatzWielandt formula [19, p. 670]. We omit it in this paper.. From
according to the CollatzWielandt formula [19, p. 669], we get
Then . Thus can be written as for some nonnegative . The above theorem implies
(37) 
It is worth mentioning that multiplicative drift analysis [23] also applies the convergence rate to estimating the hitting time, . However, multiplicative drift analysis and approximation error analysis discuss two different topics. The former aims at an upperbound on hitting time while the latter at an upper bound on approximation error.
The convergence rate provides a simple method of estimating and . Its applicability is shown through several examples.
Comments
There are no comments yet.