Introduction
The genetic algorithm (GA) proposed by J. Holland [16]
is a randomized heuristic search method, based on analogy with the genetic mechanisms observed in nature and employing a population of tentative solutions. Different modifications of GA are widely used in the areas of operations research, pattern recognition, artificial intelligence etc. (see e.g.
[23, 27]). Despite of numerous experimental investigations of these algorithms, their theoretical analysis is still at an early stage [7].Efficiency of a GA in application to a combinatorial optimization problem may be estimated in terms of expected computation time until an optimal solution or an acceptable approximation solution is visited for the first time. It is very unlikely, however, that there exists a randomized algorithm finding a globally optimal solution for an NPhard optimization problem on average in polynomially bounded time. This would contradict the wellknown hypothesis
which is in use for several decades [18].The main results of this paper are obtained through comparison of genetic algorithms to local search, which is motivated by the fact that the GAs are often considered to be good at finding local optima (see e.g. [1, 19, 22]).
Here and below we assume that the randomness is generated only by the randomized operators of selection, crossover, mutation and random initialization of population within the GA (the input data is deterministic). A function of input data is called polynomially bounded, if there exists a polynomial in the length of the problem input, which bounds the function from above. The terms efficient algorithm or polynomialtime algorithm are used for an algorithm with polynomially bonded running time.
1 Combinatorial Optimization Problems and Genetic Algorithms
NP Optimization Problems
In this paper, the combinatorial optimization problems are viewed under the technical assumptions of the class of NP optimization problems (see e.g. [2]). Let denote the set of all strings with symbols from and arbitrary string length. For a string , the symbol will denote its length. In what follows, denotes the set of positive integers and given a string , the symbol denotes the length of the string . To denote the set of polynomially bounded functions we define Poly as the class of functions from to bounded above by a polynomial in where .
Definition 1
An NP optimization problem is a triple , where is the set of instances of and:
1. The relation is computable in polynomial time.
2. Given an instance , is the set of feasible solutions of , where stands for the dimension of the search space . Given and , the decision whether may be done in polynomial time, and .
3. Given an instance , is the objective function (computable in polynomial time) to be maximized if is an NP maximization problem or to be minimized if is an NP minimization problem.
Without loss of generality we will consider only the maximization problems. The results will hold for the minimization problems as well. The symbol of problem instance may often be skipped in the notation, when it is clear what instance is meant.
Definition 2
A combinatorial optimization problem is polynomially bounded, if there exists a polynomial in , which bounds the objective values , from above.
Neighborhoods and local optima
Let a neighborhood be defined for every . The mapping is called the neighborhood mapping. Following [3], we assume this mapping to be efficiently computable, i.e. the set may be enumerated in polynomial time.
Definition 3
If the inequality holds for all neighbors of a solution , then is called a local optimum w.r.t. the neighborhood mapping .
Suppose is a metric on . The neighborhood mapping
is called a neighborhood mapping of radius defined by metric .
A local search method starts from some feasible solution . Each iteration of the algorithm consists in moving from the current solution to a new solution in its neighborhood, such that the value of objective function is increased. The way to choose an improving neighbor, if there are several of them, will not matter in this paper. The algorithm continues until a local optimum is reached.
Genetic Algorithms
The Simple GA proposed in [16] has been intensively studied and exploited over four decades. A plenty of variants of GA have been developed since publication of the Simple GA, sharing the basic ideas, but using different population management strategies, selection, crossover and mutation operators [22].
The GA operates with populations , which consist of genotypes. In terms of the present paper the genotypes are elements of the search space .
In a selection operator , each parent is independently drawn from the previous population where each individual in
is assigned a selection probability depending on its
fitness . Usually a higher fitness value of an individual implies higher (or equal) selection probability. Below we assume the following natural form of the fitness function:
if then

if then its fitness is defined by some penalty function, such that
In this paper, we consider the tournament selection, selection and exponential ranking selection (see the details in Section 3 below).
One or two offspring genotypes is created from two parents using the randomized operators of crossover (twooffspring version) or (singleoffspring version) and mutation . In general, we assume that and are efficiently computable randomized routines.
When a population of offspring is constructed, the GA proceeds to the next iteration . An initial population is generated randomly. One of the ways of initialization consists, e.g. in independent choice of all bits in genotypes.
To simplify the notation below, will always denote the nonelitist genetic algorithm with singleoffspring crossover based on the following outline.
Algorithm
Generate the initial population , assign
While termination condition is not met do:
Iteration .
For from 1 to do:
Selection: , .
Crossover:
Mutation:
End for.
End while.
In theoretical analysis of the we will assume that the termination condition is never met. The termination condition, however, may be required to stop a genetic algorithm when a solution of sufficient quality is obtained or the computing time is limited, or because the population is ”trapped” in some unpromising area and it is preferable to restart the search (see e.g. [4, 26]).
In what follows, the operators of selection, mutation and singleoffspring crossover are associated with the corresponding transition matrices:

represents a selection operator, where is the probability of selecting the th individual from population .

, where is the probability of mutating into .

, where is the probability of obtaining as a result of crossover between .
The singleoffspring crossover may be obtained from twooffspring crossover by first computing , and then defining as .
Crossover and Mutation Operators
Let us consider the wellknown operators of bitwise mutation and the singlepoint crossover from Simple GA [15] as examples.
The singlepoint crossover operator computes , given so that with a given probability ,
where the random number is chosen uniformly from 1 to . With probability both parent individuals are copied without any changes, i.e. .
The bitwise mutation operator computes a genotype , where independently of other bits, each bit , is assigned a value with probability and with probability it keeps the value . Here and below we use the notation for any positive integer . The tunable parameter is also called mutation rate.
The following condition holds for many wellknown crossover operators: there exists a positive constant which does not depend on , such that given a pair of bitstrings , the crossover result satisfies
(1) 
This condition is fulfilled for the singlepoint crossover with , if is a constant. Sometimes stronger statements can be deduced, e.g. for the wellknown OneMax and LeadingOnes fitness functions the offspring has a fitness with probability at least (see [8]).
Another condition analogous to (1) requires that the fitness of the resulting genotype is not less than the fitness of the parents with probability at least , for some constant , i.e.
(2) 
for any . This condition is also fulfilled for the singlepoint crossover with , if is a constant. Besides that, Condition (2) is satisfied with for the optimized crossover operators, where the offspring is computed as a solution to the optimal recombination problem. Polynomialtime optimized crossover routines are known for Maximum Clique [4], Set Packing, Set Partition and some other NPO problems [9, 10].
Bitwise Mutation and Bounded Neighborhood Mappings
Let denote the Hamming distance between and .
Definition 4
[3] Suppose is an NP optimization problem. A neighborhood mapping is called bounded, if for any and holds , where is a constant.
The bitwise mutation operator outputs a string , given a string , with probability . Note that probability , as a function of , , attains its minimum at . The following proposition gives a lower bound for the probability , which is valid for any , assuming that .
Proposition 5
Suppose the neighborhood mapping is bounded, and . Then for any and any holds
The proof may be found in the appendix.
2 Expected First Hitting Time of Target Subset
This section is based on the drift analysis of GAs from [8]. Suppose that for some there is an ordered partition of into subsets called levels. Level will be the target level in the subsequent analysis. The target level may be chosen as the set of solutions with maximal fitness or the set of local optima or the set of approximation solutions for some approximation factor . A wellknown example of partition is the canonical partition, where each level regroups solutions having the same fitness value (see e.g. [20]). For we denote by , the union of all levels starting from level .
Given a levels partition, there always exists a total order ”” on , which is aligned with in the sense that for any , ,
. W.l.o.g. in what follows the elements of a population vector
will be assumed to form a nonincreasing sequence in terms of ”” order: . For any constant , the individual will be referred to as the ranked individual of the population.The selective pressure of a selection mechanism is defined as follows. For any and population of size , let be the probability of selecting an individual from that belongs to the same or higher level as the individual with rank , i.e.
where is such that .
Theorem 6
Given a partition of , let be the runtime of . If there exist parameters, , and a constant such that for all , and

[noitemsep]
 (C1)

,
 (C2)

,
 (C3)

 (C4)

for any
 (C5)

with , and
then .
Informally, condition (C1) requires that for each element of subset , there is a lower limit on probability to mutate it into level or higher. Condition (C2) requires that there exists a lower limit on the probability that the mutation will not ”downgrade” an individual to a lower level. Condition (C3) follows from lower bound (2) assuming or from lower bound (1) with in the case of the canonical partition. Condition (C4) requires that the selective pressure induced by the selection mechanism is sufficiently strong. Condition (C5) requires that the population size is sufficiently large.
Unfortunately, Conditions (C3) and (C4) are unlikely to be satisfied when the target subset contains some less fit solutions than the solutions from level , e.g. when is the set of all local optima. In order to adapt Theorem 6 to analysis of such situations we first prove the following corollary with relaxed version of conditions (C3),(C4) and a slightly strengthened version of (C2).
Corollary 7
Given a partition of , let be the runtime of . If there exist parameters, , and a constant such that for all

[noitemsep]
 (C1)

, ,
 (C2’)

, ,
 (C3’)

,
 (C4’)

for any ,
 (C5)

with , and
then .
Proof. Given a genetic algorithm with certain initialization procedure for , selection operator , crossover and mutation and and population size , consider a genetic algorithm defined as the following modification of .

Let the initialization procedure for population in coincide with that of .

Operator of selection performs identically to operator , except for the cases when the input population contains an element from . In the latter cases returns the index of the first representative of in .

Operator of crossover performs identically to except for the cases when the input contains an element from . In the latter cases an element of is just copied to the output of the operator.

Operator of mutation is the same as .

The population size in is .
Note that meets Conditions (C1)(C5) of Theorem 6. Indeed, Condition (C2) follows from (C2’). Condition (C3) is satisfied for by (C3’), and for it holds with by definition of operator . Condition (C4) is satisfied for any by (C4’), and in the cases when population contains at least one element from , holds by definition of operator . Thus, by Theorem 6,
where is defined for the sequence of populations of .
Executions of and before iteration are identical. On iteration both algorithms place elements of
into the population for the first time. Thus, realizations of random variables
and coincide and .
3 Lower Bounds on Cumulative Selection
Probability
Let us see how to parameterise three standard selection mechanisms in order to ensure that the selective pressure is sufficiently high. We consider three selection operators with the following mechanisms.
By definition, in tournament selection, individuals are sampled uniformly at random with replacement from the population, and a fittest of these individuals is returned. In selection, parents are sampled uniformly at random among the fittest individuals in the population. The ties in terms of fitness function are resolved arbitrarily.
A function is a ranking function [14] if for all , and . In ranking selection with ranking function , the probability of selecting individuals ranked or better is . We define exponential ranking parameterised by as .
The following lemma is analogous to Lemma 1 from [8].
Lemma 8
Then for any constants and , there exist two constants and such that

tournament selection with ,

selection with and

exponential ranking selection with
satisfy (C4’), i.e. for any and any.
Note that the assumption of montonicity of mutation w.r.t. all fitness levels (see [8]) is substituted here by Inequality (3).
Proof. Denote .
1. Consider tournament selection. In order to select an individual from the same level as the ranked individual or higher, by Inequality (3) it is sufficient that the randomly sampled tournament contains at least one individual with rank or higher. Hence, one obtains for
Note that
So for , we have
If , then for all it holds that and
2. In selection, for all we have if , and otherwise (see by Inequality (3)). It suffices to pick so that with , for all . Then
3. In exponential ranking selection, we have
The rest of the proof is similar to tournament selection with in place of , e.g. based on the input condition on , it suffices to pick .
4 Expected First Hitting Time of the Set of Local Optima
Suppose an NP maximization problem is given and a neighborhood mapping is defined. Given , let be a lower bound on the probability that a mutation operator transforms a given feasible solution into a specific neighbor , i.e.
(4) 
The greater the value , the more consistent is the mutation with the neighborhood mapping .
In Subsections 4.1 and 4.2, the symbol is suppressed in the notation for brevity. The size of population and the selection parameters and , the number of levels and the fitness function are supposed to depend on the input data implicitly.
The set of all local optima is denoted by (note that global optima also belong to ).
4.1 No Infeasible Solutions
In many wellknown NP optimization problems, such as the Maximum Satisfiability Problem [13], the Maximum Cut Problem [13] and the Ising Spin Glass Model [5], the set of feasible solutions is the whole search space, i.e. . Let us consider the GAs applied to problems with such a property.
We choose to be the number of fitness values attained by the solutions from . Then starting from any point, the local search method finds a local optimum within at most steps. Let us use a modification of canonical based partition where all local optima are merged together:
(5) 
(6) 
Application of Corollary 7 and Lemma 8 w.r.t. this partition yields the following theorem.
Theorem 9
Suppose that

for any

Conditions (C2’) and (C3’) are satisfied for some constants and ,

,

is using either tournament selection with , or selection with or exponential ranking selection with for some constant .
Then there exist two constants and , such that for population size , a local optimum is reached for the first time after at most fitness evaluations on average.
A similar result for the with tournament selection and twooffspring crossover was obtained in [12, 11] without a drift analysis. In particular, Lemma 1 and Proposition 1 in [11] imply that with appropriate settings of parameters, a nonelitist genetic algorithm reaches a local optimum for the first time within fitness evaluations on average. The upper bound from Theorem 9 in the present paper has advantage in to the bound from [11] if is at least linear in . (Note that the size of many wellknown neighborhoods grows as some polynomial of .)
4.2 Illustrative Examples
Royal Road Functions
Let us consider a family of Royal Road Functions defined on the basis of the principles proposed by M. Mitchell, S. Forrest, and J. Holland in [21]. The function is defined on the search space , where is a multiple of , and the set of indices is partitioned into consecutive blocks of elements each. By definition is the number of blocks where all bits are equal to 1.
We consider a crossover operator, denoted by , which returns one of the parents unchanged with probability . In particular, may be built up from any standard crossover operator so that with probability the standard operator is applied and the offspring is returned, otherwise with probability one of the two parents is returned with equal probabilities.
The following corollary for royal road functions results from Theorem 9 with the neighborhood defined by the Hamming distance with radius .
Corollary 10
Suppose that the uses a crossover operator with being any constant in , the bitwise mutation with mutation rate for a constant , tournament selection with , or selection with , or exponential ranking selection with , where is a constant. Then there exists a constant such that the GA with population size , has expected runtime on .
Proof.
Note that fitness function of any solution with some nonoptimal bits (i.e. bits equal to zero) can be increased by an improving move within Hamming neighborhood of radius . So there is just one local optimum and it is the global optimum . We now apply Theorem 9 with the canonical partition for where
The probability of not flipping any bit position by mutation is
In the rest of the proof we assume that is sufficiently large to ensure that for the constant . Let
The lower bound to upgrade probability may be found if we consider the worstcase scenario where only one block contains some incorrect bits and the number of such bits is . Then .
We can put because the crossover operator returns one of the parents unchanged with probability , and with probability at least , this parent is not less fit than the other one. Then conditions of Theorem 9 regarding , and are satisfied for the constant .
It therefore follows by Theorem 9 that there exists a constant such that if the population size is , the expected runtime of on is upper bounded by for some constant .
The corollary implies that the with proper population size has a polynomial runtime on the royal road functions if is a constant.
Vertex Cover Problems with Regular Structure
In general, given a graph , the Vertex Cover Problem (VCP) asks for a subset (called a vertex cover), such that every edge has at least one endpoint in . The size of should be minimized. Let us consider a representation of the problem solutions, where , and each coordinate of corresponds to an edge , assigning one of its endpoints to be included into the cover (one endpoint of is assigned if and the other one is if ). Thus, contains all vertices, assigned by at least one of the coordinates of , and the feasibility of is guaranteed. This representation is a special case of the socalled nonbinary representation for a more general set covering problem (see e.g. [6]). The fitness function is by definition .
Family of vertex covering instances consists of VCP problems with where is a union of disjoined cliques of size 3 (triangles). An optimal solution contains a couple of vertices from each clique and there are optimal solutions.
Consider the neighborhood system defined by Hamming distance with radius 1. All local optima for are globally optimal in this case. For the bitwise mutation operator with mutation rate by Proposition 5 we have for any . Analogously to Corollary 10 we obtain
Corollary 11
Suppose that the uses a crossover operator with being any constant in , the bitwise mutation with mutation rate , tournament selection with , or selection with , or exponential ranking selection with , where is a constant. Then there exists a constant , such that the GA with population size , has expected runtime on the VCP family .
It is interesting that the VCP instances
in integer linear programming formulation are hard for the Land and Doig branch and bound algorithm (for a description of Land and Doig algorithm see e.g.
[25], Chapt. 24). This exact algorithm makes branchings on the problems from (see [24]).
4.3 The General Case of NP Optimization
Problems
Consider the general case where may be a proper subset of . Let us add another modification to the levels partition. Besides merging all local optima we assume that all infeasible solutions constitute level . The rest of solutions are stratified by their objective function values. Let be the number of fitness values attained by the feasible solutions from .
(7) 
(8) 
(9) 
Application of Corollary 7 with this partition yields the following lemma.
Lemma 12
Suppose that Condition (C2’) holds and

[noitemsep]
 (L1)

.
 (L2)

, .
 (L3)

Inequality (2) holds for some positive constant
and is using either tournament selection with or selection with or exponential ranking selection with for some constant .
Then there exist two constants , and such that for population size , a local optimum is reached for the first time after at most fitness evaluations on average.
Proof. Assumption (L1) is equivalent to Inequality (4) with . Condition (L2) imposes a lower bound on probability of producing feasible solutions by mutation of an infeasible bitstring. Thus together (L1) and (L2) give the lower bound for (C1). Condition (L3) implies (C3’).
Operators Mut and Cross are supposed to be efficiently computable and the selection procedure requires only a polynomial time. Therefore the time complexity of computing a pair of offspring in the is polynomially bounded and the following theorem holds.
Theorem 13
If problem is polynomially bounded, Conditions (C2’), (L1), (L2) and (L3) are satisfied for a lower bound and positive constants and , then using tournament selection or selection or exponential ranking selection with a suitable choice of parameters first visits a local optimum on average in polynomially bounded time.
Note that Condition (L2) in formulation of Theorem 13 can not be dismissed. Indeed, suppose that problem is polynomially bounded, and consider a where the mutation operator has the following properties. On one hand never outputs a feasible offspring, given an infeasible input. On the other hand, given a feasible genotype , is infeasible with a positive probability , . Finally assume that the initialization procedure produces no local optima in population . Now all conditions of Theorem 13 can be satisfied, but with a positive probability of at least the whole population consists of infeasible solutions, and subject to this event all populations are infeasible. Therefore, expected hitting time of a local optimum is unbounded.
In order to estimate applicability of Theorem 13 it is sufficient to recall that the set for may be enumerated efficiently by definition, so there exists a mutation operator
that generates a uniform distribution over
if and every point in is selected with probability at least . To deal with the cases where or but , we can recall that there are large classes of NPoptimization problems, where at least one feasible solution is computable in polynomial time (see e.g. the classes PLS in [17] and GLO in [3]). For such problems in case of or , a mutation operator may output the feasible solution with probability 1.Alternatively we can consider a repair heuristic (see e.g. [6]) which follows some standard mutation operator and, if the output of mutation is infeasible then the heuristic substitutes this output by a feasible solution, e.g. .
5 Analysis of Guaranteed Local Optima Problems
In this section, Theorem 13 is used to estimate the GA capacity of finding the solutions with approximation guarantee.
An algorithm for an NP maximization problem has a guaranteed approximation ratio , , if for any instance it delivers a feasible solution , such that
In the case of an NP minimization problem, the guaranteed approximation ratio is defined similarly, except that the latter inequality changes into
Comments
There are no comments yet.