Level-Based Analysis of Genetic Algorithms for Combinatorial Optimization

The paper is devoted to upper bounds on run-time of Non-Elitist Genetic Algorithms until some target subset of solutions is visited for the first time. In particular, we consider the sets of optimal solutions and the sets of local optima as the target subsets. Previously known upper bounds are improved by means of drift analysis. Finally, we propose conditions ensuring that a Non-Elitist Genetic Algorithm efficiently finds approximate solutions with constant approximation ratio on the class of combinatorial optimization problems with guaranteed local optima (GLO).

There are no comments yet.

Authors

• 7 publications
• 7 publications
• 19 publications
06/18/2016

Hitting times of local and global optima in genetic algorithms with very high selection pressure

The paper is devoted to upper bounds on the expected first hitting times...
07/12/2013

Non-Elitist Genetic Algorithm as a Local Search Method

Sufficient conditions are found under which the iterated non-elitist gen...
06/02/2016

On the performance of different mutation operators of a subpopulation-based genetic algorithm for multi-robot task allocation problems

The performance of different mutation operators is usually evaluated in ...
05/10/2019

Online Multistage Subset Maximization Problems

Numerous combinatorial optimization problems (knapsack, maximum-weight m...
04/13/2013

An Improved ACS Algorithm for the Solutions of Larger TSP Problems

Solving large traveling salesman problem (TSP) in an efficient way is a ...
06/03/2013

Evolutionary Approach for the Containers Bin-Packing Problem

This paper deals with the resolution of combinatorial optimization probl...
06/30/2015

Artificial Catalytic Reactions in 2D for Combinatorial Optimization

Presented in this paper is a derivation of a 2D catalytic reaction-based...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

The genetic algorithm (GA) proposed by J. Holland [16]

is a randomized heuristic search method, based on analogy with the genetic mechanisms observed in nature and employing a population of tentative solutions. Different modifications of GA are widely used in the areas of operations research, pattern recognition, artificial intelligence etc. (see e.g.

[23, 27]). Despite of numerous experimental investigations of these algorithms, their theoretical analysis is still at an early stage [7].

Efficiency of a GA in application to a combinatorial optimization problem may be estimated in terms of expected computation time until an optimal solution or an acceptable approximation solution is visited for the first time. It is very unlikely, however, that there exists a randomized algorithm finding a globally optimal solution for an NP-hard optimization problem on average in polynomially bounded time. This would contradict the well-known hypothesis

which is in use for several decades [18].

The main results of this paper are obtained through comparison of genetic algorithms to local search, which is motivated by the fact that the GAs are often considered to be good at finding local optima (see e.g. [1, 19, 22]).

Here and below we assume that the randomness is generated only by the randomized operators of selection, crossover, mutation and random initialization of population within the GA (the input data is deterministic). A function of input data is called polynomially bounded, if there exists a polynomial in the length of the problem input, which bounds the function from above. The terms efficient algorithm or polynomial-time algorithm are used for an algorithm with polynomially bonded running time.

1 Combinatorial Optimization Problems and Genetic Algorithms

NP Optimization Problems

In this paper, the combinatorial optimization problems are viewed under the technical assumptions of the class of NP optimization problems (see e.g. [2]). Let denote the set of all strings with symbols from  and arbitrary string length. For a string , the symbol  will denote its length. In what follows, denotes the set of positive integers and given a string , the symbol  denotes the length of the string . To denote the set of polynomially bounded functions we define Poly as the class of functions from to bounded above by a polynomial in  where .

Definition 1

An NP optimization problem is a triple , where is the set of instances of  and:

1. The relation is computable in polynomial time.

2. Given an instance , is the set of feasible solutions of , where  stands for the dimension of the search space . Given and , the decision whether may be done in polynomial time, and .

3. Given an instance , is the objective function (computable in polynomial time) to be maximized if is an NP maximization problem or to be minimized if is an NP minimization problem.

Without loss of generality we will consider only the maximization problems. The results will hold for the minimization problems as well. The symbol of problem instance  may often be skipped in the notation, when it is clear what instance  is meant.

Definition 2

A combinatorial optimization problem is polynomially bounded, if there exists a polynomial in , which bounds the objective values , from above.

Neighborhoods and local optima

Let a neighborhood be defined for every . The mapping is called the neighborhood mapping. Following [3], we assume this mapping to be efficiently computable, i.e. the set may be enumerated in polynomial time.

Definition 3

If the inequality holds for all neighbors  of a solution , then  is called a local optimum w.r.t. the neighborhood mapping .

Suppose is a metric on . The neighborhood mapping

 NI(x)={y∣R(x,y)≤r},  x∈\rm Sol(I),

is called a neighborhood mapping of radius  defined by metric .

A local search method starts from some feasible solution . Each iteration of the algorithm consists in moving from the current solution to a new solution in its neighborhood, such that the value of objective function is increased. The way to choose an improving neighbor, if there are several of them, will not matter in this paper. The algorithm continues until a local optimum is reached.

Genetic Algorithms

The Simple GA proposed in [16] has been intensively studied and exploited over four decades. A plenty of variants of GA have been developed since publication of the Simple GA, sharing the basic ideas, but using different population management strategies, selection, crossover and mutation operators [22].

The GA operates with populations ,   which consist of  genotypes. In terms of the present paper the genotypes are elements of the search space .

In a selection operator , each parent is independently drawn from the previous population  where each individual in

is assigned a selection probability depending on its

fitness . Usually a higher fitness value of an individual implies higher (or equal) selection probability. Below we assume the following natural form of the fitness function:

• if then

 f(x)=F(x);
• if then its fitness is defined by some penalty function, such that

 f(x)

In this paper, we consider the tournament selection, -selection and exponential ranking selection (see the details in Section 3 below).

One or two offspring genotypes is created from two parents using the randomized operators of crossover  (two-offspring version) or  (single-offspring version) and mutation . In general, we assume that and are efficiently computable randomized routines.

When a population  of  offspring is constructed, the GA proceeds to the next iteration . An initial population  is generated randomly. One of the ways of initialization consists, e.g. in independent choice of all bits in genotypes.

To simplify the notation below, will always denote the non-elitist genetic algorithm with single-offspring crossover based on the following outline.

Algorithm

Generate the initial population , assign
While termination condition is not met do:

Iteration .

For from 1 to do:
Selection: , .
Crossover:
Mutation:
End for.

End while.

In theoretical analysis of the we will assume that the termination condition is never met. The termination condition, however, may be required to stop a genetic algorithm when a solution of sufficient quality is obtained or the computing time is limited, or because the population is ”trapped” in some unpromising area and it is preferable to restart the search (see e.g. [4, 26]).

In what follows, the operators of selection, mutation and single-offspring crossover are associated with the corresponding transition matrices:

• represents a selection operator, where is the probability of selecting the -th individual from population .

• , where  is the probability of mutating into .

• , where  is the probability of obtaining  as a result of crossover between .

The single-offspring crossover may be obtained from two-offspring crossover by first computing , and then defining as .

Crossover and Mutation Operators

Let us consider the well-known operators of bitwise mutation  and the single-point crossover  from Simple GA [15] as examples.

The single-point crossover operator computes , given so that with a given probability ,

 x′=(x1,...,xZ,yZ+1,...,yn),  y′=(y1,...,yZ,xZ+1,...,xn),

where the random number  is chosen uniformly from 1 to . With probability both parent individuals are copied without any changes, i.e. .

The bitwise mutation operator  computes a genotype , where independently of other bits, each bit , is assigned a value  with probability  and with probability  it keeps the value . Here and below we use the notation  for any positive integer . The tunable parameter  is also called mutation rate.

The following condition holds for many well-known crossover operators: there exists a positive constant  which does not depend on , such that given a pair of bitstrings , the crossover result satisfies

 (1)

This condition is fulfilled for the single-point crossover with , if is a constant. Sometimes stronger statements can be deduced, e.g. for the well-known OneMax and LeadingOnes fitness functions the offspring has a fitness with probability at least  (see [8]).

Another condition analogous to (1) requires that the fitness of the resulting genotype  is not less than the fitness of the parents with probability at least , for some constant , i.e.

 ε0≤Pr(max{f(x′),f(y′)}≥max{f(x),f(y)}) (2)

for any . This condition is also fulfilled for the single-point crossover with , if is a constant. Besides that, Condition (2) is satisfied with for the optimized crossover operators, where the offspring is computed as a solution to the optimal recombination problem. Polynomial-time optimized crossover routines are known for Maximum Clique [4], Set Packing, Set Partition and some other NPO problems [9, 10].

Bitwise Mutation and K-Bounded Neighborhood Mappings

Let denote the Hamming distance between and .

Definition 4

[3] Suppose is an NP optimization problem. A neighborhood mapping  is called -bounded, if for any and holds , where is a constant.

The bitwise mutation operator  outputs a string , given a string , with probability . Note that probability , as a function of ,  , attains its minimum at . The following proposition gives a lower bound for the probability , which is valid for any , assuming that .

Proposition 5

Suppose the neighborhood mapping  is -bounded, and . Then for any and any holds

 Pr(\rm Mut∗(x)=y)≥KK/(en)K.

The proof may be found in the appendix.

2 Expected First Hitting Time of Target Subset

This section is based on the drift analysis of GAs from [8]. Suppose that for some  there is an ordered partition of  into subsets called levels. Level  will be the target level in the subsequent analysis. The target level may be chosen as the set of solutions with maximal fitness or the set of local optima or the set of -approximation solutions for some approximation factor . A well-known example of partition is the canonical partition, where each level regroups solutions having the same fitness value (see e.g. [20]). For we denote by , the union of all levels starting from level .

Given a levels partition, there always exists a total order ”” on , which is aligned with in the sense that for any , ,

. W.l.o.g. in what follows the elements of a population vector

will be assumed to form a non-increasing sequence in terms of ”” order: . For any constant , the individual  will be referred to as the -ranked individual of the population.

The selective pressure of a selection mechanism is defined as follows. For any and population of size , let be the probability of selecting an individual from that belongs to the same or higher level as the individual with rank , i.e.

 β(γ,P):=∑i:xi∈Hj(γ)psel(i∣P),

where is such that .

Theorem 6

Given a partition of , let be the runtime of . If there exist parameters, , and a constant such that for all , and

[noitemsep]

(C1)

,

(C2)

,

(C3)

(C4)

for any

(C5)

with , and

then .

Informally, condition (C1) requires that for each element of subset , there is a lower limit  on probability to mutate it into level  or higher. Condition (C2) requires that there exists a lower limit  on the probability that the mutation will not ”downgrade” an individual to a lower level. Condition (C3) follows from lower bound (2) assuming or from lower bound (1) with in the case of the canonical partition. Condition (C4) requires that the selective pressure induced by the selection mechanism is sufficiently strong. Condition (C5) requires that the population size  is sufficiently large.

Unfortunately, Conditions (C3) and (C4) are unlikely to be satisfied when the target subset  contains some less fit solutions than the solutions from level , e.g. when  is the set of all local optima. In order to adapt Theorem 6 to analysis of such situations we first prove the following corollary with relaxed version of conditions (C3),(C4) and a slightly strengthened version of (C2).

Corollary 7

Given a partition of , let be the runtime of . If there exist parameters, , and a constant such that for all

[noitemsep]

(C1)

, ,

(C2’)

, ,

(C3’)

,

(C4’)

for any ,

(C5)

with , and

then .

Proof. Given a genetic algorithm with certain initialization procedure for , selection operator , crossover and mutation and and population size , consider a genetic algorithm defined as the following modification of .

• Let the initialization procedure for population  in coincide with that of .

• Operator of selection performs identically to operator , except for the cases when the input population  contains an element from . In the latter cases returns the index of the first representative of  in .

• Operator of crossover performs identically to except for the cases when the input contains an element from . In the latter cases an element of  is just copied to the output of the operator.

• Operator of mutation is the same as .

• The population size in is .

Note that meets Conditions (C1)-(C5) of Theorem 6. Indeed, Condition (C2) follows from (C2’). Condition (C3) is satisfied for by (C3’), and for it holds with by definition of operator . Condition (C4) is satisfied for any by (C4’), and in the cases when population  contains at least one element from , holds by definition of operator . Thus, by Theorem 6,

 E[T′]≤2cψ(mλ(1+ln(1+cλ))+p0(1+δ)γ0m∑j=11sj),

where is defined for the sequence of populations of .

Executions of and before iteration are identical. On iteration  both algorithms place elements of

into the population for the first time. Thus, realizations of random variables

and coincide and .

3 Lower Bounds on Cumulative Selection Probability

Let us see how to parameterise three standard selection mechanisms in order to ensure that the selective pressure is sufficiently high. We consider three selection operators with the following mechanisms.

By definition, in -tournament selection, individuals are sampled uniformly at random with replacement from the population, and a fittest of these individuals is returned. In -selection, parents are sampled uniformly at random among the fittest individuals in the population. The ties in terms of fitness function are resolved arbitrarily.

A function is a ranking function [14] if for all , and . In ranking selection with ranking function , the probability of selecting individuals ranked or better is . We define exponential ranking parameterised by as .

The following lemma is analogous to Lemma 1 from [8].

Lemma 8

[8] Let the levels satisfy

 f(x)

for all from .

Then for any constants and , there exist two constants and such that

1. -tournament selection with ,

2. -selection with and

3. exponential ranking selection with

satisfy (C4’), i.e. for any and any.

Note that the assumption of montonicity of mutation w.r.t. all fitness levels (see [8]) is substituted here by Inequality (3).

Proof. Denote .

1. Consider -tournament selection. In order to select an individual from the same level as the -ranked individual or higher, by Inequality (3) it is sufficient that the randomly sampled tournament contains at least one individual with rank  or higher. Hence, one obtains for

 β(γ)>1−(1−γ)k.

Note that

 (1−γ)k

So for , we have

 β(γ) ≥1−11+γk≥1−11+4γ(1+δ′)/ε′=4γ(1+δ′)/ε′1+4γ(1+δ′)/ε′

If , then for all it holds that and

 β(γ′) ≥γ′4(1+δ′)/ε′γ′(1/γ′)+1=2(1+δ′)γ′ε′
 =√(1+δ′)ε′(ε′/4(1+δ′))γ′=√(1+δ′)ε′γ0γ′

2. In -selection, for all we have if , and otherwise (see by Inequality (3)). It suffices to pick so that with , for all . Then

 β(γ′) ≥λγ′μ=√λ2μ2γ′=√λμγ0γ′≥√1+δ′ε′γ0γ′.

3. In exponential ranking selection, we have

 β(γ)≥∫γ0ηeη(1−x)dxeη−1=(eηeη−1)(1−1eηγ)≥1−11+ηγ

The rest of the proof is similar to tournament selection with in place of , e.g. based on the input condition on , it suffices to pick .

4 Expected First Hitting Time of the Set of Local Optima

Suppose an NP maximization problem  is given and a neighborhood mapping  is defined. Given , let  be a lower bound on the probability that a mutation operator transforms a given feasible solution  into a specific neighbor , i.e.

 s(I)≤minx∈Sol(I), y∈NI(x)Pr(% Mut(x)=y). (4)

The greater the value , the more consistent is the mutation with the neighborhood mapping .

In Subsections 4.1 and 4.2, the symbol  is suppressed in the notation for brevity. The size of population  and the selection parameters  and , the number of levels  and the fitness function  are supposed to depend on the input data  implicitly.

The set of all local optima is denoted by  (note that global optima also belong to ).

4.1 No Infeasible Solutions

In many well-known NP optimization problems, such as the Maximum Satisfiability Problem [13], the Maximum Cut Problem [13] and the Ising Spin Glass Model [5], the set of feasible solutions is the whole search space, i.e. . Let us consider the GAs applied to problems with such a property.

We choose to be the number of fitness values attained by the solutions from . Then starting from any point, the local search method finds a local optimum within at most  steps. Let us use a modification of canonical -based partition where all local optima are merged together:

 Aj:={x∈X|f(x)=fj}∖LO, j∈[m], (5)
 Am+1:=LO. (6)

Application of Corollary 7 and Lemma 8 w.r.t. this partition yields the following theorem.

Theorem 9

Suppose that

• for any

• Conditions (C2’) and (C3’) are satisfied for some constants  and ,

• ,

• is using either -tournament selection with , or -selection with or exponential ranking selection with for some constant .

Then there exist two constants  and , such that for population size , a local optimum is reached for the first time after at most fitness evaluations on average.

A similar result for the with tournament selection and two-offspring crossover was obtained in [12, 11] without a drift analysis. In particular, Lemma 1 and Proposition 1 in [11] imply that with appropriate settings of parameters, a non-elitist genetic algorithm reaches a local optimum for the first time within fitness evaluations on average. The upper bound from Theorem 9 in the present paper has advantage in to the bound from [11] if is at least linear in . (Note that the size of many well-known neighborhoods grows as some polynomial of .)

4.2 Illustrative Examples

Let us consider a family of Royal Road Functions  defined on the basis of the principles proposed by M. Mitchell, S. Forrest, and J. Holland in [21]. The function  is defined on the search space , where is a multiple of , and the set of indices  is partitioned into consecutive blocks of  elements each. By definition is the number of blocks where all bits are equal to 1.

We consider a crossover operator, denoted by , which returns one of the parents unchanged with probability . In particular, may be built up from any standard crossover operator so that with probability the standard operator is applied and the offspring is returned, otherwise with probability one of the two parents is returned with equal probabilities.

The following corollary for royal road functions  results from Theorem 9 with the neighborhood defined by the Hamming distance with radius .

Corollary 10

Suppose that the uses a crossover operator with being any constant in , the bitwise mutation with mutation rate for a constant , -tournament selection with , or -selection with , or exponential ranking selection with , where is a constant. Then there exists a constant such that the GA with population size , has expected runtime on .

Proof.

Note that fitness function  of any solution  with some non-optimal bits (i.e. bits equal to zero) can be increased by an improving move within Hamming neighborhood of radius . So there is just one local optimum and it is the global optimum . We now apply Theorem 9 with the canonical partition for where

The probability of not flipping any bit position by mutation is

 (1−χ/n)n=(1−χ/n)(n/χ−1)χ(1−χ/n)χ≥e−χ(1−χ/n)χ.

In the rest of the proof we assume that  is sufficiently large to ensure that for the constant . Let

The lower bound to upgrade probability may be found if we consider the worst-case scenario where only one block contains some incorrect bits and the number of such bits is . Then .

We can put because the crossover operator returns one of the parents unchanged with probability , and with probability at least , this parent is not less fit than the other one. Then conditions of Theorem 9 regarding , and are satisfied for the constant .

It therefore follows by Theorem 9 that there exists a constant such that if the population size is , the expected runtime of on  is upper bounded by for some constant .

The corollary implies that the with proper population size has a polynomial runtime on the royal road functions if  is a constant.

Vertex Cover Problems with Regular Structure

In general, given a graph , the Vertex Cover Problem (VCP) asks for a subset (called a vertex cover), such that every edge has at least one endpoint in . The size of  should be minimized. Let us consider a representation of the problem solutions, where , and each coordinate of corresponds to an edge , assigning one of its endpoints to be included into the cover (one endpoint of is assigned if and the other one is if ). Thus, contains all vertices, assigned by at least one of the coordinates of , and the feasibility of is guaranteed. This representation is a special case of the so-called non-binary representation for a more general set covering problem (see e.g. [6]). The fitness function is by definition .

Family of vertex covering instances consists of VCP problems with where  is a union of  disjoined cliques of size 3 (triangles). An optimal solution contains a couple of vertices from each clique and there are optimal solutions.

Consider the neighborhood system defined by Hamming distance with radius 1. All local optima for are globally optimal in this case. For the bit-wise mutation operator with mutation rate by Proposition 5 we have for any . Analogously to Corollary 10 we obtain

Corollary 11

Suppose that the uses a crossover operator with being any constant in , the bitwise mutation with mutation rate , -tournament selection with , or -selection with , or exponential ranking selection with , where is a constant. Then there exists a constant , such that the GA with population size , has expected runtime  on the VCP family .

It is interesting that the VCP instances

in integer linear programming formulation are hard for the Land and Doig branch and bound algorithm (for a description of Land and Doig algorithm see e.g.

[25], Chapt. 24). This exact algorithm makes branchings on the problems from (see [24]).

4.3 The General Case of NP Optimization Problems

Consider the general case where may be a proper subset of . Let us add another modification to the levels partition. Besides merging all local optima we assume that all infeasible solutions constitute level . The rest of solutions are stratified by their objective function values. Let  be the number of fitness values attained by the feasible solutions from .

 A1:=X∖Sol(I), (7)
 Aj:={x∈Sol(I)|f(x)=fj}∖LOI, j=2,…,m, (8)
 Am+1:=LOI. (9)

Application of Corollary 7 with this partition yields the following lemma.

Lemma 12

Suppose that Condition (C2’) holds and

[noitemsep]

(L1)

.

(L2)

,  .

(L3)

Inequality (2) holds for some positive constant

and is using either -tournament selection with or -selection with or exponential ranking selection with for some constant .

Then there exist two constants , and such that for population size , a local optimum is reached for the first time after at most fitness evaluations on average.

Proof. Assumption (L1) is equivalent to Inequality (4) with . Condition (L2) imposes a lower bound on probability of producing feasible solutions by mutation of an infeasible bitstring. Thus together (L1) and (L2) give the lower bound for (C1). Condition (L3) implies (C3’).

Operators Mut and Cross are supposed to be efficiently computable and the selection procedure requires only a polynomial time. Therefore the time complexity of computing a pair of offspring in the is polynomially bounded and the following theorem holds.

Theorem 13

If problem  is polynomially bounded, Conditions (C2’), (L1), (L2) and (L3) are satisfied for a lower bound  and positive constants  and , then  using tournament selection or -selection or exponential ranking selection with a suitable choice of parameters first visits a local optimum on average in polynomially bounded time.

Note that Condition (L2) in formulation of Theorem 13 can not be dismissed. Indeed, suppose that problem  is polynomially bounded, and consider a  where the mutation operator has the following properties. On one hand  never outputs a feasible offspring, given an infeasible input. On the other hand, given a feasible genotype , is infeasible with a positive probability , . Finally assume that the initialization procedure produces no local optima in population . Now all conditions of Theorem 13 can be satisfied, but with a positive probability of at least  the whole population  consists of infeasible solutions, and subject to this event all populations  are infeasible. Therefore, expected hitting time of a local optimum is unbounded.

In order to estimate applicability of Theorem 13 it is sufficient to recall that the set for may be enumerated efficiently by definition, so there exists a mutation operator

that generates a uniform distribution over

if  and every point in  is selected with probability at least . To deal with the cases where or but , we can recall that there are large classes of NP-optimization problems, where at least one feasible solution  is computable in polynomial time (see e.g. the classes PLS in [17] and GLO in [3]). For such problems in case of or , a mutation operator may output the feasible solution  with probability 1.

Alternatively we can consider a repair heuristic (see e.g. [6]) which follows some standard mutation operator and, if the output of mutation is infeasible then the heuristic substitutes this output by a feasible solution, e.g. .

5 Analysis of Guaranteed Local Optima Problems

In this section, Theorem 13 is used to estimate the GA capacity of finding the solutions with approximation guarantee.

An algorithm for an NP maximization problem has a guaranteed approximation ratio , , if for any instance  it delivers a feasible solution , such that

 FI(x′)≥max{FI(x)|x∈Sol(I)}/ρ.

In the case of an NP minimization problem, the guaranteed approximation ratio is defined similarly, except that the latter inequality changes into