On Proportions of Fit Individuals in Population of Evolutionary Algorithm with Tournament Selection

07/29/2015 ∙ by Anton Eremeev, et al. ∙ 0

In this paper, we consider a fitness-level model of a non-elitist mutation-only evolutionary algorithm (EA) with tournament selection. The model provides upper and lower bounds for the expected proportion of the individuals with fitness above given thresholds. In the case of so-called monotone mutation, the obtained bounds imply that increasing the tournament size improves the EA performance. As corollaries, we obtain an exponentially vanishing tail bound for the Randomized Local Search on unimodal functions and polynomial upper bounds on the runtime of EAs on 2-SAT problem and on a family of Set Cover problems proposed by E. Balas.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Evolutionary algorithms are randomized heuristic algorithms employing a population of tentative solutions (individuals) and simulating an evolutionary type of search for optimal or near-optimal solutions by means of selection, crossover and mutation operators. The evolutionary algorithms with crossover operator are usually called genetic algorithms (GAs). Evolutionary algorithms in general have a more flexible outline and include genetic programming, evolution strategies, estimation of distribution algorithms and other evolution-inspired paradigms. Evolutionary algorithms are now frequently used in areas of operations research, engineering and artificial intelligence.

Two major outlines of an evolutionary algorithm are the elitist evolutionary algorithm, that keeps a certain number of most promising individuals from the previous iteration, and the non-elitist evolutionary algorithm, that computes all individuals of a new population independently using the same randomized procedure. In this paper, we focus on the non-elitist case.

One of the first theoretical results in the analysis of non-elitist GAs is Schemata Theorem (Goldberg, 1989) which gives a lower bound on the expected number of individuals from some subsets of the search space (schemata) in the next generation, given the current population. A significant progress in understanding the dynamics of GAs with non-elitist outline was made in (Vose, 1995) by means of dynamical systems. However most of the findings in (Vose, 1995)

apply to the infinite population case, and it is not clear how these results can be used to estimate the applicability of GAs to practical optimization problems. A theoretical possibility of constructing GAs that provably optimize an objective function with high probability in polynomial time was shown in 

(Vitányi, 2000)

using rapidly mixing Markov chains. However 

(Vitányi, 2000) provides only a very simple artificial example where this approach is applicable and further developments in this direction are not known to us.

One of the standard approaches to studying evolutionary algorithms in general, is based on the fitness levels (Wegener, 2002). In this approach, the solution space is partitioned into disjoint subsets, called fitness-levels, according to values of the fitness function. In (Lehre, 2011), the fitness-level approach was first applied to upper-bound the runtime of non-elitist mutation-only evolutionary algorithms. Here and below, by the runtime we mean the expected number of fitness evaluations made until an optimum is found for the first time. Upper bounds of the runtime of non-elitist GAs, involving the crossover operators, were obtained later in (Corus et al., 2014; Eremeev, 2016). The runtime bounds presented in (Corus et al., 2014; Lehre, 2011) are based on the drift analysis. In (Moraglio and Sudholt, 2015), a runtime result is proposed for a class of convex search algorithms, including some non-elitist crossover-based GAs without mutation, on the so-called concave fitness landscapes.

In this paper, we consider the non-elitist evolutionary algorithm which uses a tournament selection and a mutation operator but no crossover. The -tournament selection randomly chooses individuals from the existing population and selects the best one of them (see e.g. (Thierens and Goldberg, 1994)

). The mutation operator is viewed as a randomized procedure, which computes one offspring with a probability distribution depending on the given parent individual. In this paper, evolutionary algorithms with such outline are denoted as EA. We study the probability distribution of the EA population w.r.t. a set of fitness levels. The estimates of the EA behavior are based on a priori known parameters of a mutation operator. Using the proposed model we obtain upper and lower bounds on expected proportion of the individuals with fitness above certain thresholds. The lower bounds are formulated in terms of linear algebra and resemble the bound in Schemata Theorem 

(Goldberg, 1989). Instead of schemata here we consider the sets of genotypes with the fitness bounded from below. Besides that, the bounds obtained in this paper may be applied recursively up to any given iteration.

A particular attention in this paper is payed to a special case when mutation is monotone. Informally speaking, a mutation operator is monotone if throughout the search space the following condition holds: the greater the fitness of a parent the “better” offspring distribution the mutation generates. One of the most well-known examples of monotone mutation is the bitwise mutation in the case of OneMax fitness function. As shown in (Borisovsky and Eremeev, 2008), in the case of monotone mutation, one of the most simple evolutionary algorithms, known as the (+) EA has the best-possible performance in terms of runtime and probability of finding the optimum.

In the case of monotone mutation, the lower bounds on expected proportions of the individuals turn into equalities for the trivial evolutionary algorithm (,

) EA. This implies that the tournament selection at least has no negative effect on the EA performance in such a case. This observation is complemented by the asymptotic analysis of the EA with monotone mutation indicating that, given a sufficiently large population size and some technical conditions, increasing the tournament size 

always improves the EA performance.

As corollaries of the general lower bounds on expected proportions of sufficiently fit individuals, we obtain polynomial upper bounds on the Randomized Local Search runtime on unimodal functions and upper bounds on runtime of EAs on 2-SAT problem and on a family of Set Cover problems proposed by Balas (1984). Unlike the upper bounds on runtime of evolutionary algorithms with tournament selection from (Corus et al., 2014; Eremeev, 2016; Lehre, 2011), which require sufficiently large tournament size, the upper bounds on runtime obtained here hold for any tournament size.

The rest of the paper is organized as follows. In Section 2, we give a formal description of the considered EA, introduce an approximating model of the EA population and define some required parameters of the probability distribution of a mutation operator in terms of fitness levels. In Section 3, using the model from Section 2, we obtain lower and upper bounds on expected proportions of genotypes with fitness above some given thresholds. Section 4 is devoted to analysis of an important special case of monotone mutation operator, where the bounds obtained in the previous section become tight or asymptotically tight. In Section 5, we consider some illustrative examples of monotone mutation operators and demonstrate some applications of the general results from Section 3. In particular, in this section we obtain new lower bounds for probability to generate optimal genotypes at any given iteration  for a class of unimodal functions, for 2-SAT problem and for a family of set cover problems proposed by E. Balas (in the latter two cases we also obtain upper bounds on the runtime of the EA). Besides that in Section 5 we give an upper bound on expected proportion of optimal genotypes for OneMax fitness function. Section 6 contains concluding remarks.

This work extends the conference paper (Eremeev, 2000). The extension consists in comparison of the EA behavior to that of the (,) EA, the (,) EA and the (+) EA in Section 3 and in the new runtime bounds and tail bounds demonstrated in Section 5. The main results from the conference paper are refined and provided with more detailed proofs.

2 Description of Algorithms and Approximating Model

2.1 Notation and Algorithms

Let the optimization problem consist in maximization of an objective function  on the set of feasible solutions , where  is the search space of all binary strings of length .

The Evolutionary Algorithm EA.

The EA searches for the optimal or sub-optimal solutions using a population of individuals, where each individual (genotype)  is a bitstring , and its components  are called genes.

In each iteration the EA constructs a new population on the basis of the previous one. The search process is guided by the values of a fitness function

where is a penalty function.

The individuals of the population may be ordered according to the sequence in which they are generated, thus the population may be considered as a vector of genotypes

, where is the size of population, which is constant during the run of the EA, and is the number of the current iteration. In this paper, we consider a non-elitist algorithmic outline, where all individuals of a new population are generated independently from each other with identical probability distribution depending on the existing population only.

Each individual is generated through selection of a parent genotype by means of a selection operator, and modification of this genotype in mutation operator. During the mutation, a subset of genes in the genotype string

is randomly altered. In general the mutation operator may be viewed as a random variable

with the probability distribution depending on .

The genotypes of the initial population are generated with some a priori chosen probability distribution. The stopping criterion may be e.g. an upper bound on the number of iterations . The result is the best solution generated during the run. The EA has the following scheme.

1. Generate the initial population .
2. For to do
       2.1. For to do
                Choose a parent genotype  from by -tournament selection.
                Add to the population .

In theoretical studies, the evolutionary algorithms are usually treated without a stopping criterion (see e.g. (Neumann and Witt, 2010)). Unless otherwise stated, in the EA we will also assume that 

Note that in the special case of the EA with  we can assume that , since the tournament selection has no effect in this case.

(,) EA and (+) Ea.

In the following sections we will also need a description of two simple evolutionary algorithms, known as the (,) EA and the (+) EA.

The genotype of the current individual on iteration  of the (,) EA will be denoted by , and in the (+) EA it will be denoted by . The initial genotypes and are generated with some a priori chosen probability distribution. The only difference between the (,) EA and the (+) EA consists in the method of construction of an individual for iteration  using the current individual of iteration  as a parent. In both algorithms the new individual is built with the help of a mutation operator, which we will denote by . In case of the (,) EA, the mutation operator is independently applied  times to the parent genotype  and out of  offspring a single genotype with the highest fitness value is chosen as . (If there are several offspring with the highest fitness, the new individual  is chosen arbitrarily among them.) In the (+) EA, the mutation operator is applied to  once. If is such that then ; otherwise .

2.2 The Proposed Model

The EA may be considered as a Markov chain in a number of ways. For example, the states of the chain may correspond to different vectors of genotypes that constitute the population (see (Rudolph, 1994)). In this case the number of states in the Markov chain is . Another model representing the GA as a Markov chain is proposed in (Nix and Vose, 1992), where all populations which differ only in the ordering of individuals are considered to be equivalent. Each state of this Markov chain may be represented by a vector of components, where the proportion of each genotype in the population is indicated by the corresponding coordinate and the total number of states is . In the framework of this model, M.Vose and collaborators have obtained a number of general results concerning the emergent behavior of GAs by linking these algorithms to the infinite-population GAs (Vose, 1995).

The major difficulties in application of the above mentioned models to the analysis of GAs for combinatorial optimization problems are connected with the necessity to use the high-grained information about fitness value of each genotype. In the present paper, we consider one of the ways to avoid these difficulties by means of grouping the genotypes into larger classes on the basis of their fitness.

Assume that and there are level lines of the fitness function fixed such that . The number of levels and the fitness values corresponding to them may be chosen arbitrarily, but they should be relevant to the given problem and the mutation operator to yield a meaningful model. Let us introduce the sequence of Lebesgue subsets of 

Obviously, . For the sake of convenience, we define . Also, we denote the level sets which give a partition of . Partitioning subsets  are more frequently used in literature on level-based analysis, compared to the Lebesgue subsets . In this paper we will frequently state that a genotype has a sufficiently high fitness, therefore the use of subsets will be more convenient in such cases. One of the partitions used in the literature, called the canonical partition, defines as the set of all fitness values on the search space .

Now suppose that for all and the a priori lower bounds and upper bounds on mutation transition probabilities from subset to are known, i.e.

Fig. 1 illustrates the transitions considered in this expression.

Figure 1: Transitions from to under mutation.

Let A denote the matrix with the elements where , and . The similar matrix of upper bounds is denoted by B. Let the population on iteration  be represented by the population vector

where is the proportion of genotypes from in population . The population vector is a random vector, where for since .

Let be the probability that an individual, which is added after selection and mutation into , has a genotype from for , and According to the scheme of the EA this probability is identical for all genotypes of , i.e. .

Proposition 1

for all .

Proof. Consider the sequence of identically distributed random variables , where if the -th individual in the population belongs to , otherwise . By the definition, , consequently

Level-Based Mutation.

If for some mutation operator there exist two equal matrices of lower and upper bounds A and B, i.e. for all then the mutation operator will be called level-based. By this definition, in the case of level-based mutation, does not depend on a choice of genotype and the probabilities are well-defined. In what follows, we call  a cumulative transition probability. The symbol  will denote the matrix of cumulative transition probabilities of a level-based mutation operator.

If the EA uses a level-based mutation operator, then the probability distribution of population  is completely determined by the vector . In this case the EA may be viewed as a Markov chain with states corresponding to the elements of

which is the set of all possible vectors of population of size . Here and below, the symbol z is used to denote a vector from the set of all possible population vectors .

The cardinality of set  may be evaluated analogously to the number of states in the model of Nix and Vose (1992). Now levels replace individual elements of the search space, which gives a total of  possible population vectors.

3 Bounds on Expected Proportions of Fit Individuals

In this section, our aim is to obtain lower and upper bounds on for arbitrary  and if the distribution of the initial population is known.

Let denote the probability that the genotype, chosen by the tournament selection from a population with vector z, belongs to a subset . Note that if the current population is represented by the vector , then a genotype obtained by selection and mutation would belong to with a conditional probability

(1)

3.1 Lower Bounds

Expression (1) and the definitions of bounds yield for all :

(2)

which turns into an equality in the case of level-based mutation and .

Given a tournament size we obtain the following selection probabilities: , and, consequently, . This leads to the inequality:

By the total probability formula,

(3)
(4)

where the last expression is obtained by regrouping the summation terms. Proposition 1 implies that . Consequently, since and , expression (4) gives a lower bound

(5)

Note that (5) turns into an equality in the case of level-based mutation and . We would like to use (5) recursively times in order to estimate for any , given the initial vector . It will be shown in the sequel that such a recursion is possible under monotonicity assumptions defined below.

Monotone Matrices and Mutation Operators.

In what follows, any -matrix  with elements , will be called monotone iff for all from 1 to . Monotonicity of a matrix of bounds on transition probabilities means that the greater fitness level  a parent solution has, the greater is its bound on transition probability to any subset . Note that for any mutation operator the monotone upper and lower bounds exist. Formally, for any mutation operator a valid monotone matrix of lower bounds would be where

is a zero matrix. A monotone matrix of upper bounds, valid for any mutation operator is

, where is the matrix with all elements equal 1. These are extreme and impractical examples. In reality a problem may be connected with the absence of bounds which are sharp enough to evaluate the mutation operator properly.

If given some set of levels there exist two matrices of lower and upper bounds  such that and these matrices are monotone then operator  is called monotone w.r.t. the set of levels . In this paper, we will also call such operators monotone for short. Note that by the definition, any monotone mutation operator is level-based, since for all . The following proposition shows how the monotonicity property may be equivalently defined in terms of cumulative transition probabilities.

Proposition 2

A mutation operator  is monotone w.r.t. the set of levels iff for any such that for any genotypes holds

Proof. Indeed, suppose that and these matrices are monotone. Then for any genotypes and , holds

Conversely, if for any level  and any genotypes and , holds , then taking we note that is equal for all and one can assign The resulting matrices A and B are obviously monotone.

Proposition 2 implies that in the case of the canonical partition, i.e. when is the set of all values of , operator  is monotone w.r.t. iff for any genotypes and such that , for any holds

The monotonicity of mutation operator w.r.t. a canonical partition is equivalent to the definition of monotone reproduction operator from (Borisovsky and Eremeev, 2001) in the case of single-parent, single-offspring reproduction. According to the terminology of Daley (1968), such random operators are also called stochastically monotone.

As a simple example of a monotone mutation operator we can consider a point mutation operator: with probability  keep the given genotype unchanged; otherwise (with probability ) choose randomly from and change gene . As a fitness function we take the function , where . Let us assume and define the thresholds . All genotypes with the same fitness function value have equal probability to produce an offspring with any required fitness value, therefore this is a case of level-based mutation. In such a case identical matrices of lower and upper bounds A and B exist and they both equal to the matrix of cumulative transition probabilities . The latter consists of the following elements: for all , since point mutation can not reduce the fitness by more than one level; for because with probability any genotype is upgraded;

because a genotype in  can be obtained as an offspring of a genotype from  in two ways: either the parent genotype has been upgraded (which happens with probability ) or it stays at level , which happens with probability ; finally because point mutation can not increase the level number by more than 1. The elements of matrix  obviously satisfy the monotonicity condition  when . For the case of  we have  which is nonnegative if Therefore with any the matrix  is monotone in this example and the mutation operator is monotone as well.

Proposition 3

If A is monotone, then for any tournament size and holds

(6)

besides that (6) is an equality if , operator  is monotone and A is its matrix of cumulative transition probabilities.

Proof. Monotonicity of matrix A implies that for all so the simple estimate may be applied to all terms of the sum in (5) and we get

Regrouping the terms in the last bound we obtain the required inequality (6).

Finally, note that lower bound (5) holds as an equality if the mutation operator is monotone and , therefore the last lower bound is an equality in the case of monotone  and .

Lower Bounds from Linear Algebra.

Let be a -matrix with elements let

be the identity matrix of the same size, and denote

. With these notations, inequality (6) takes a short form Here and below, the inequality sign ”” for some vectors and means the component-wise comparison, i.e. iff for all . The following theorem gives a component-wise lower bound on vector  for any .

Theorem 1

Suppose that is some matrix norm. If matrix A is monotone and , then for all holds

(7)

and inequality (7) turns into an equation if the tournament size , the mutation operator used in the EA is monotone and A is its matrix of cumulative transition probabilities.

The proof of this theorem is similar to the well-known inductive proof of the formula  for a sum of terms in a geometric series  Note that the recursion is similar to the recursive formula assuming . However in our case matrices and vectors replace numbers, we have to deal with inequalities rather than equalities and the initial element  may be non-zero unlike .

Proof of Theorem 1. Let us consider a sequence of -dimensional vectors , where , . We will show that for any , using induction on . Indeed, for the inequality holds by the definition of . Now note that the right-hand side of (6) will not increase if the components of are substituted with their lower bounds. Therefore, assuming we already have for some  and substituting  for we make an inductive step .

By properties of the linear operators (see e.g. (Kolmogorov and Fomin, 1999), Chapter III, § 29), due to the assumption that  we conclude that matrix exists.

Now, using the induction on for any we will obtain the identity

which leads to inequality (7). Indeed, for the base case of by the definition of  we have the required equality. For the inductive step, we use the following relationship

In conditions of Theorem 1, the right-hand side of (7) approaches when tends to infinity, thus the limit of this bound does not depend on distribution of the initial population.

In many evolutionary algorithms, an arbitrary given genotype  may be produced with a non-zero probability as a result of mutation of any given genotype . Suppose that the probability of such a mutation is lower bounded by some for all . Then one can obviously choose some monotone matrix A of lower bounds that satisfies for all . Thus, for all . In this case one can consider the matrix norm . Due to monotonicity of A we have , so , and the conditions of Theorem 1 are satisfied. A trivial example of a matrix that satisfies the above description would be a matrix A where all elements are equal to .

Application of Theorem 1 may be complicated due to difficulties in finding the vector  and in estimation the effect of multiplication by matrix  Some known results from linear algebra can help to solve these tasks, as the example in Subsection 5.2 shows. However sometimes it is possible to obtain a lower bound for  via analysis of the (,) EA algorithm, choosing an appropriate mutation operator for it. This approach is discussed below.

Lower Bounds from Associated Markov Chain.

Suppose that a partition defined by contains no empty subsets and let T denote a -matrix, with components

Note that T

is a stochastic matrix so it may be viewed as a transition matrix of a Markov chain, associated to the set of lower bounds 

. This chain is a model of the (,) EA, which is a special case of the (,) EA with  (see Subsection 2.1). Suppose that the (,) EA uses an artificial monotone mutation operator  where the cumulative transition probabilities are defined by the bounds , corresponding to the EA mutation operator . Namely, given a parent genotype , for any we have , where is such that . Operator  may be simulated e.g. by the following two-stage procedure. At the first stage, a random index  of the offspring level is chosen with the probability distribution where is the level of parent . At the second stage, the offspring genotype is drawn uniformly at random from . (Simulation of the second stage may be computationally expensive for some fitness functions but the complexity issues are not considered now.) The initial search point  of the (,) EA is generated at random with probability distribution defined by the probabilities . Denoting , by properties of Markov chains we get . The following theorem is based on a comparison of  to the distribution of the Markov chain .

Theorem 2

Suppose all level subsets are non-empty and matrix A is monotone. Then for any holds

(8)

where is a triangular -matrix with components if and otherwise. Besides that inequality (8) turns into an equation if , the EA mutation operator is monotone and A is its matrix of cumulative transition probabilities.

Proof. The (,) EA described above is identical to an EA’ with , and mutation operator . Let us denote the population vector of EA’ by . Obviously,

(9)

Proposition 3 implies that in the original EA with population size  and tournament size , the expectation  is lower bounded by the expectation since (6) holds as an equality for the whole sequence of  and the right-hand side of (6) is non-decreasing on . Equality together with (9) imply the required bound (8).

Note that inequalities (7) and (8) in Theorems 1 and 2 turn into equalities if these theorems are applied to the EA with and monotone mutation operator  defined above. Therefore both theorems guarantee equal lower bounds on , given equal matrices A.

Subsections 5.3 and 5.4 provide two examples illustrating how Theorem 2 may be used to import known results on Markov chains behavior. The example from Subsection 5.4 employs Theorem 2 for finding a vector  so that Theorem 1 may be applied to bound  from below.

3.2 Upper Bounds

In this subsection, we obtain upper bounds on using a reasoning similar to the proof of Proposition 3. Expression (1) for all yields:

(10)

which turns into equality in the case of level-based mutation. By the total probability formula we have:

(11)

so

(12)

Under the expectation in the right-hand side we have a convex function on . Therefore, in the case of monotone matrix B, using Jensen’s inequality (see e.g. (Rudin, 1987), Chapter 3) we obtain the following proposition.

Proposition 4

If B is monotone then

(13)

By means of iterative application of inequality (13) the components of the expected population vectors  may be bounded up to arbitrary , starting from the initial vector . The nonlinearity in the right-hand side of (13), however, creates an obstacle for obtaining an analytical result similar to the bounds of Theorems 1 and 2.

Note that all of the estimates obtained up to this point are independent of the population size and valid for arbitrary . In the Section 4 we will see that the right-hand side of (13) reflects the asymptotic behavior of population under monotone mutation operator as .

3.3 Comparison of EA to (,) EA and (+) Ea

This subsection shows how the probability of generating the optimal genotypes at a given iteration of the EA relates to analogous probabilities of (,) EA and (+) EA. The analysis here will be based on upper bound (13) and on some previously known results provided in the attachment.

Suppose, matrix B gives the upper bounds for cumulative transition probabilities of the mutation operator  used in the EA. Consider the (,) EA and the (+) EA, based on a monotone mutation operator  for which B is the matrix of cumulative transition probabilities and suppose that the initial solutions  and have the same distribution over the fitness levels as the best incumbent solution in the EA population . Formally: for any and In what follows, for any by we denote the probability that current individual  on iteration  of the (,) EA belongs to . Analogously denotes the probability for the (+) EA.

The following proposition is based on upper bound (13) and the results from (Borisovsky, 2001; Borisovsky and Eremeev, 2001) that allow to compare the performance of the EA, the (,) EA and the (+) EA.

Proposition 5

Suppose that matrix B is monotone. Then for any  holds

Proof. Let us compare the EA to the (,) EA and to the (+) EA using the mutation and initialization procedures as described above. Theorem 6 (see the appendix) together with Proposition 1 imply that for all . Furthermore, Theorem 5 from (Borisovsky and Eremeev, 2001) (see the appendix) implies that for all . Using Proposition 4 and monotonicity of B, we conclude that both claimed inequalities hold.

4 EA with Monotone Mutation Operator

First of all note that in the case of monotone mutation operator, two equal monotone matrices of lower and upper bounds exist, so the bounds (5) and (12) give equal results, and assuming we get

(14)

This equality will be used several times in what follows.

In general, the population vectors are random values whose distributions depend on . To express this in the notation let us denote the proportion of genotypes from in population  by .

The following Lemma 1 and Theorem 3 based on this lemma indicate that in the case of monotone mutation, recursive application of the formula from right-hand side of upper bound (13) allows to compute the expected population vector of the infinite-population EA at any iteration .

Lemma 1

Let the EA use a monotone mutation operator with cumulative transition probabilities matrix , and let the genotypes of the initial population be identically distributed. Then

(i) for all and holds

(15)

(ii) if the sequence of -dimensional vectors is defined as