In the realm of evolutionary computation the notion of building blocks of evolution has been developed in Holland’s original works[holland:75, holland:00] to describe the effect of crossover. In that respect, building blocks are composed of genes with more or less linkage between them. This is one to one with the notion of schemata and eventually lead to the schema theories which describe the evolution of these building blocks.
In the biology literature though, the notion of building blocks has quite a different connotation. As a paradigm I choose the empirical findings of halder-callaerts-gehring:95: The experimenters forced the mutation of a single gene, called “eyeless gene”, in early ontogenesis of a Drosophila Melanogaster fly. This rather subtle genotypic variation results in a severe phenotypic variation: an additional whole, functionally complete eye module grows at some place it was not supposed to. Here, the notion of a building block refers to the eye as a functional module which can be grown phenotypically by triggering a single gene. In other words, a single (and thus non-correlated) mutation of a gene leads to a highly complex, in terms of physiological cell variables highly correlated phenotypic variation. Such properties of the genotype-phenotype mapping are considered as the basis of complex adaptation [wagner-altenberg:96]. A theory on the evolution of complex phenotypic variability exists [toussaint:03], and in this paper we show that the induced notion of building blocks is completely different to the one induced by crossover.
Besides the discussion of crossover in GAs and that of functional modularity in natural evolution, there is a third field of research that relates to the discussion of building blocks: Estimation-of-Distribution Algorithms (EDAs, pelikan-goldberg-lobo:99). These algorithms are a direct implementation of the idea of correlated exploration in the framework of heuristic search algorithms. They explicitly encode the search distribution (i.e., offspring probability distribution) by means of a product of marginals (PBIL, baluja:94), factorized distributions (FDA, muehlenbein-mahnig-rodriguez:99), dependency trees (baluja-davies:97), or, most generally, a Bayesian network (BOA, pelikan-goldberg-cantupaz:00). To my point of view, the key of these algorithms is that they are capable to induce the same notion of building blocks as we introduced it in the context of natural evolution. For instance, consider a dependency tree where the leaves encode the phenotypic variables. Offsprings are generated bysampling this probabilistic model, i.e., by first sampling the root variable of the tree, then, according to the dependencies encoded on the links, sampling the root’s successor nodes, etc. Now, if we assume that the dependencies are very strong, say, deterministic, it follows that a single variation at the root leads to a completely correlated variation of all leaves. Hence, we may define a set of leaves which, due to their dependencies, always vary in high correlation as a functional phenotypic module in the same sense as for the eyeless paradigm.
Several discussions in the EC community though contradict this point of view: Some argue that the essence of EDAs is that they can model the evolution of crossover building blocks (schemata) by explicitly encoding the linkage correlations that are implicit in the offspring distribution of crossover GAs [shapiro:03, Introduction]. In that sense, EDAs are “only” faster versions of crossover GAs; faster because EDAs actively analyze correlations in the selection distribution whereas crossover masks would have to self-adapt (see section LABEL:crossconclu). In this paper we want to point out that, certainly, crossover induces a correlation in the search distribution that can be modeled by graphical models, but the concept of graphical models is far more general than that of linkage correlations. Hence, EDAs and non-trivial gene interaction models (non-trivial genotype-phenotype mappings, toussaint:03) can introduce correlational structures in the search distribution that go qualitatively beyond simple crossover GAs.
Most important of all: EDAs and gene interaction models can account for correlated innovation. Here, innovation means that some phenotypic variable changes its value and some other phenotypic variables change their values in high dependence of this change, such that the constellation of this set of variables is really new, has not been present in the parent population. In contrast, crossover can only preserve certain (by the crossover mask determined) linkage correlations that have been present in the parent population and never explores new correlated constellations in the sense of correlated innovation.
The main goal of this paper is to prove and formalize the claims that have been made above. After we define crossover in the next section, section 3 and LABEL:crossinfo will present some theorems on the ‘structure’ of the search distribution after mutation and crossover. With structure we mean the correlational structure that we measure by means of mutual information. Many arguments are based on the increase and decrease of mutual information in relation to increase or decrease of entropy in the search distribution. Section LABEL:correxplo finally defines the notion of correlated exploration and thereby pinpoints the difference between linkage correlations and correlations in EDAs or gene interaction models. Figure LABEL:corr already explains the key idea.
The Simple GA.
We represent a population as a distribution over genotype space . In this paper we assume that a genotype is composed of a fixed number of genes, , where the space of alleles of the th gene is arbitrary. We represent also finite populations as a distribution , namely, if the population is given as a multiset we (isomorphically) represent it as the finite distribution given by where is the delta distribution at , i.e., . Crossover and mutation are represented as operators that map a parental (finite or infinite) population to an offspring distribution. Given some operator we will use the notation to denote the difference of a quantity under transition, e.g., the quantity may be the entropy of a distribution.
In that framework we may write the evolution equation of a crossover GA as
with crossover , mutation , offspring sampling , fitness , and parent sampling . A sampling operator draws independent samples from a distribution and maps this multiset of samples to the respective finite distribution; note that . Fitness rescales a distribution proportional to some functional . We define mutation and crossover more precisely as follows:
Definition 2.1 (Mutation).
We define mutation as an operator defined by the conditional probability of mutating from to :
A typical mutation operator fulfills the constraints of symmetry and component-wise independence:
In the following we will refer to the the simple mutation operator for which all component-wise mutation operators are such that the probability of mutating from to is constant for :
where and denotes the mutation rate parameter.
Definition 2.2 (Crossover).
We define crossover as an operator parameterized by a mask distribution , where is the number of loci (or genes) of a genome in :
where the th allele of the -crossover-product is the th allele of the parent , i.e., . We only consider symmetric crossover, where .
In the case of bit strings, , it holds , where denotes the xor and the and . It follows that [vose:99, Theorem 4.4]
Concerning EDAs, we write their dynamics as
where, instead of a parent population, some other parameters (e.g. a Bayesian graph or dependency tree) determine the offspring distribution , which is sampled, evaluated, and, instead of a simple parent sampling, mapped back on new parameters by some update operator . The operator is called heuristic rule and, in the case of Estimation-of-Distribution Algorithms, is such that the new search distribution approximates the experienced fitness distribution . The generic implementation of this idea is
where is the space of feasible parameters and denotes the Kullback-Leibler distance (see toussaint:03b for a discussion of generic heuristic search and evolution). In fact, the BOA algorithm [pelikan-goldberg-cantupaz:00], which uses Bayesian networks to parameterize the search distribution, realizes exactly this scheme. Other algorithms [baluja-davies:97, muehlenbein-mahnig-rodriguez:99, baluja:94]
differ in some details, e.g., they use distance measures other than the Kullback-Leibler divergence or realize a gradual adaptation of continuous parametersof the style “”. See [toussaint:03] for a survey on the relation between EDAs and the evolution of genetic representations (-evolution) in the context of non-trivial genotype-phenotype mappings.
3 The structure of the mutation distribution
This section derives a theorem that simply states that mutation increases entropy and decreases mutual information. It is surprising how non-trivial it is to prove this intuitively trivial statement.
Lemma 3.1 (Component-wise mutation).