We introduce a probability model of evolution that has three purposes: as an abstract model of biological evolution; as a class of efficiently implementable genetic algorithms that are easy to analyse theoretically; and as a link between genetic algorithms and Bayesian non-parametric MCMC methods. The stationary distribution of the model can be expressed in closed form for arbitrary fitness functions: this enables us to investigate the behaviour of the model for different population sizes, mutation rates, and fitness scalings. We find two phase transitions that occur for all fitness functions. Our approach is most applicable to evolution by sexual rather than asexual reproduction.
In any model of evolution under constant conditions, there is a temporal sequence of possibly overlapping populations. In the transition from each population to the next, one or more new individuals are ‘born’, and the same number of individuals are removed from the population, or ‘die’. Each new generation is a population that depends only upon the previous parent population, so that the sequence of populations is a Markov chain. In a model with mutation, the Markov chain is irreducible and has a unique stationary distribution that is also known as the mutation-selection equilibrium. Typically we wish to know what the mutation-selection equilibrium is. Unfortunately the mutation-selection equilibrium is notoriously hard to characterize, even for apparently simple and elegant models of breeding, mutation, and selection. The reason is that in previous models of evolution with sexual reproduction, the Markov chain of populations is irreversible, so that there is no obvious method of finding the stationary distribution other than attempting to compute eigenvectors of the transition matrix directly, as done in, but these calculations are neither easy nor revealing for arbitrary fitness functions.
For a reversible Markov chain, the stationary distribution may be found by verifying detailed balance conditions. In our model, introduced in , we start by writing the mutation-selection equilibrium in a convenient closed form, and then exhibit MCMC kernels that implement reversible Markov chains with this stationary distribution, and for which the proposal and acceptance algorithms are plausible abstract models of breeding, mutation, and selection.
Each generation starts with a population of individuals. One new individual is ‘bred’ – that is, sampled conditionally on the existing population – to produce an expanded population of individuals; from this expanded population, one individual is selected to be discarded, leaving a new population of size to start the next generation. This might appear similar to the Moran Process  but in fact it is very different, as explained in section 3.6
In previous evolutionary models such as , and in genetic algorithms such as , the reproductive fitness of a genome is modelled as the genome’s rate of breeding, and not as its probability of death. In the breeding phase of each generation, fit genomes are chosen to breed more frequently than unfit genomes: in these models, discarding individuals – or ‘death’ – is modelled as random deletion from the population.
Here we model breeding as conditional sampling in which all existing members of the population are treated equally, regardless of their fitness. We discard individuals according to their fitness: less-fit individuals are more likely to be discarded, so that they remain in the population for a shorter time, and they are therefore sampled less as a result of their shorter lifetime, and so contribute less to the new individuals that are ‘bred’. Thus differences in reproductive fitness are modelled as differences in longevity. This is a significant design choice in our models because breeding and selection are modelled separately, and this turns out to greatly simplify the analysis.
As explained in detail in sections 2 and 3, careful choices in modelling breeding as conditional sampling, and in modelling fitness by stochastic rejection of the less fit, enable us to construct a reversible Markov chain of populations that is a form of Metropolis-Hastings process.
Throughout the paper, we suppose there is a finite set of possible genomes, and denotes an exchangeable -valued stochastic process. For any population of genomes , we define
By definition of exchangeability, we have that for any permutation ,
Given a population of genomes , we breed an ’th genome by sampling conditionally:
The ‘fitness’ of a genome is denoted by , where is an arbitrary strictly positive function over . In the evolutionary models we propose, in each generation one individual is ‘bred’ by conditional sampling from the existing population, and then an individual is discarded in a fitness-biased way, so that less-fit individuals are more likely to be discarded. The stationary distribution of populations factorises into the form:
The process is similar to, but not the same as non-parametric Bayesian MCMC, and we make this connection explicit in section 3.5.
A genetic algorithm has three basic parameters: population size, mutation rate, and fitness scaling. The most important results of this paper characterise the interacting effect of these parameters as population size .
We particularly consider exchangeable processes that are (collections of) Polya urn processes, also known as Dirichlet-categorical processes because these admit a notion of ‘mutation’. Conditional sampling from these processes is directly interpretable as a simplified model of sexual breeding with mutation, as explained in section 2. We denote the parameters defining the Dirichlet prior(s) as . Since fitness , we often write in terms of a function such that . We study the interaction of , , and on the stationary distribution by letting population size while the fitness of each genome scales with as , and scales with as ; in sections 5 and 6 we use the de Finetti representation of to derive limiting forms of the marginal distributions over of
We establish that with this rescaling, for any fitness function over , and for any Dirichlet-categorical process , there are three different non-trivial population distributions in the limit as :
- Constant mutation-rate limit
: , so that scales as and is unscaled;
- Low mutation, weak selection limit
: , so that scales as and as ;
- Critical low mutation limit
: , so that is unscaled, and scales as .
The phase transitions at and are sharp. We characterise the limiting distributions in the case that there is a unique maximal element of the posterior distribution. As far as we are aware, these are the first results that give exact explicit expressions for stationary distributions of genetic algorithms with arbitrary fitness functions.
The paper is organised as follows. In section 2 we give examples of exchangeable conditional sampling procedures that can be regarded as abstract models of breeding, and section 3 gives examples of selection procedures that, together with any of the breeding procedures, will produce a reversible Markov chain of populations with the stationary distribution of equation 1.2. Section 3.4 establishes that the stationary distributions are invariant to multiplicative fitness noise; section 3.53.6 explains that the Moran process is quite different. Next, section 4 establishes basic conditions on the convergence of the stationary distribution to a limit distribution as population size tends to infinity. Sections 5 and 6 characterise the limiting forms of the stationary distribution for large populations. These are the main results of this paper. We present computational experiments demonstrating our results in section 7. Finally we discuss implications of our results for genetic algorithms and evolutionary modelling in 8.
2 Breeding and mutation
We model breeding as Gibbs sampling  from an exchangeable distribution; exchangeable Gibbs sampling is a standard technique in statistical nonparametric MCMC methods, for example [14, 10], but in that context it is not of course regarded as a model of breeding. The property of exchangeability of will be used in two ways: first, in section 3 we will use it to establish detailed balance for several selection procedures, which establishes that the stationary distribution of populations is indeed as given in equation 1.2. Second, in section 4 we use the de Finetti integral representation of to establish limit properties of the stationary distribution as .
We now give examples of conditional sampling that can be regarded as plausible models of breeding.
In the simplest case, each ‘genome’ consists of only one ‘gene’ which can be one of possible alleles, that we denote by . The exchangeable process is the well known Polya urn model for a Dirichlet process with discrete base distribution. We recall the definition of this process. Let , be the prior parameters of the base distribution; we write . Let be a random process over . Let . Given , we denote the number of -s in the sequence by :
It follows that
and is infinitely exchangeable by inspection. By de Finetti’s theorem
is the set of all probability vectors (simplex) and the prior measureis Dirichlet distribution, i.e. . This process is in a sense the central object of our study.
Concentration parameter and mutation rate.
The concentration parameter may be viewed as determining a mutation rate that depends on . With balls in the urn, when a new ball is sampled, there is a probability that the ball will be sampled from the collection of ‘prior’ balls, rather than the actual balls in the urn. This probability is independent of the colors of actual balls present: since a new colour may be introduced in this way, we regard this as analogous to a mutation. The mutation rate is
In our evolutionary processes, one new genome is sampled, and one genome is then discarded at each generation, so that remains constant. To define processes with the same mutation rate and different values of , must be adjusted to depend on .
Note that in this model, mutations only occur at birth, and – importantly – the distribution of mutations does not depend on the frequencies of different colours that are currently in the population; mutations are distributed according to a fixed prior distribution determined by the frequencies of ‘magic’ balls in the urn. This is strictly less general than an alternative model in which breeding occurs as follows: a ball is sampled from the urn, and then a mutated ball is conditionally sampled according to some mutation distribution , and then and the new mutated ball are both returned to the urn. But this alternative model would not necessarily be reversible, and we do not consider it further.
In our model, we count as a mutation any ball which results from a draw of a ‘magic’ ball: this definition is consistent with an extension of our model to Dirichlet processes with continuous base distributions, which we intend to consider in future work.
2.1 Complex genomes: direct product of Dirichlet processes
More complex evolutionary models such as genetic algorithms require more complex genomes. Suppose that each genome is a vector of genes, . Let be a direct product of independent exchangeable processes . Then
which clearly is exchangeable as well. Exchangeable sampling from , where each is a vector of discrete values, can be viewed as a model of sexual reproduction with the assumption of linkage equilibrium. A new vector is sampled by, for , sampling the -th component independently from the rest of the components. In words, each new element of the vector is either a copy of the corresponding element of a randomly chosen member of the existing population, or else a mutation. Instead of a new ‘child’ genome being constructed by random recombination of two parent genomes, it is instead a random recombination of all existing genomes in the population, with mutations.
This method of constructing new genomes by ‘-way recombination’ is a widely used approach in genetic algorithms, as used by [1, 2, 3] and others, and sexual reproduction with full linkage equilibrium is a standard simplified model of sexual reproduction in population genetics theory [4, 6].
The extension to Cartesian products of Dirichlet processes might appear rather simple because each component of a new genome is sampled independently of the others; however, this extension can lead to models of great complexity because the fitness function , or equivalently , can be an arbitrary function on , so that the stationary distribution need not be a product distribution. In genetic language, the fitness function can have arbitrary epistasis.
Note that there are other discrete exchangeable distributions based on Dirichlet distributions that could also be used as . A notable example is the discrete fragmentation-coagulation sequence process introduced in 
; this was intended as statistical model for imputing phasing in genetic analysis, but it could also be used as a breeding distribution for our purposes.
Several MCMC sampling methods give the factorized stationary distribution given by equation (1.2
) and at the same time are models of sexual reproduction that are as plausible as those used in evolutionary computation or simplified models in population genetics.
We suppose that each element has a strictly positive weight . In context, we will denote the weights as .
3.1 Single tournament selection
We suppose that when a new genome is ‘born’ and added to the population, it competes to survive by having a tournament with another randomly selected member of the population, say. The probability that wins the tournament and ejects from the population is . This probability is always well defined since is strictly positive. An equally valid tournament winning probability is that wins with probability . These two tournament winning probabilities are simply different formulations of the Metropolis-Hastings acceptance rule. The proof below establishes detailed balance for the first winning rule. The algorithm for performing one generation of breeding, mutation, and selection is:
Sample randomly from
With probability replace with and discard , otherwise discard .
Let , be populations defined as
Recall the measure defined in (1.2). We now show that this measure satisfies detailed balance. By exchangeability of , we have:
Note that has tournament against with probability and wins with probability , so:
3.2 Inverse fitness selection: limit of many tournaments
Suppose that many tournaments are fought, and each time the loser of the previous tournament fights another randomly chosen genome from the population. After many tournaments, and at a stopping time, the current loser is ejected. The current loser evolves according to a irreducible aperiodic Markov chain, and the limiting distribution of ejection is the stationary distribution of that chain:
The algorithm for performing one generation of breeding and selection is then:
Sample from a discrete p.d. over with probabilities proportional to
This process too satisfies detailed balance. With and defined as above, note that:
Note in passing that
That is, the expected weight of the rejected genome is the harmonic mean of the weights of the genomes in the current population, including the newly added genome.
The case reduces to Metropolis-Hastings:
with a population of size 1, we have proposal distribution and acceptance probability of given of , which gives a stationary distribution
3.3 Exchangeable breeding of many offspring
Another MCMC process which is also interpretable as an evolutionary algorithm breeds some arbitrary number of offspring to give a population of size , and then from these selects genomes to form the next generation. The algorithm for a single generation is as follows:
Pick a random number of offspring to breed, and a random number of tournaments to conduct. Both and should be independent of the current population
Breed by sequential exchangeable sampling; that is, let for
Assign ‘survival tickets’ to each of ; the newly bred genomes have as yet no survival tickets.
Uniformly sample from the population a genome which currently has a survival ticket, and a genome which currently does not have a survival ticket.
Hold a tournament between and ; wins with probability .
The winner of the tournament gets the survival ticket; after the tournament, the winner has the ticket and the loser does not.
After tournaments have taken place, the genomes currently holding survival tickets are selected to be the new population ; the genomes that are not holding tickets are discarded.
3.4 Invariance to fitness noise
In real life, survival depends not just on fitness but also on luck. Suppose that when each new ‘genome’ is ‘born’, it combines its own intrinsic fitness with an independent random amount of luck. It keeps this amount of luck, unchanged, throughout its life. Does individual luck at birth alter the stationary distribution?
To formalise the question, let
be discrete random variables that represent ‘luck’. We consider theas discrete random variables because the formal derivations become far simpler. The random variable can be interpreted as the individual luck that multiplies the fitness of -th individual. The log fitness function of -th individual is .
The joint probability
We see that the sum factorizes, and so , where
Thus the joint measure factorizes
is a probability measure. Thus, after summing out, we end with :
It follows that the stationary distribution is unchanged by multiplicative noise that is independent of the genomes, for all population sizes. Also, considering as the prior, we see that posterior measure is that is independent of .
3.5 Fitness as likelihood: a connection with non-parametric Bayesian MCMC
In Dirichlet Process mixture models, as described for example in [14, 16], items of data are given, together with a likelihood function , where is a discrete latent variable. The prior distribution over latent variables is given by the exchangeable process , each item of data is associated with its corresponding latent variable, and the aim of MCMC fitting is to sample latent variables from the distribution
where is an appropriate normalizing constant. If we write
then this distribution (3.4) is produced by the MCMC algorithm of section 3.1 with the slight modification that the tournament between the new ‘genome’ and the randomly selected is a victory for with probability .
One might construct an evolutionary “Just-So” story as follows. On a rock in the ocean, there are niches, in each of which one member of the species can live; the fitness of living in niche is . Evolution occurs when a new individual, bred by ‘n-way recombination’ of all parents, then challenges the occupant of a randomly chosen niche by fighting a tournament, after which the victor survives and takes over the niche. This gives a (fanciful) evolutionary interpretation to Bayesian latent variable models with exchangeable priors.
3.6 Differences from previous models
The well known Moran Process was introduced by , but since then the term has broadened to include many other related genetic models. Our model has significant differences from the Moran process and its subsequent variants, which we now describe.
First, our model implicitly includes mutation; the Moran model did not and the mutation had to be incorporated independently. To show the difference more clearly, consider the Polya urn scheme with an “prior” of initial balls of color . Unlike in the Moran model, in our model the urn is not the same as the population. A ball is randomly chosen and returned to the urn with another ball of the same color. This procedure is repeated times, and after that there are balls in the urn, but elements in the population. Now, according to our breeding rule, a random ball is chosen again and returned with another ball of the same color. Observe that the prior balls are involved in the breeding process as much as the rest of the balls. Whenever a prior ball is chosen during the breeding process, we call it a mutation. For example, if all population balls are currently white, and a black prior ball is chosen, then a black ball enters the population. Or it might be that a color (type) that has never observed before (inside the balls taken out) suddenly appears. Or a mutation may produce a colour that is already common in the population. The main difference from the Moran model is that the prior balls do not take part in the selection step. Thus, before tournament(s) start(s), balls ( balls from -th color) are removed from the urn, and so they will never be discarded. There are now balls in the urn and in the population, one of them will be ejected according to our selection rule. Now the breeding starts, but before that the previously removed prior balls are put back to the urn. In this way, the Markov chain does not have absorbing states, and no fixation occurs.
Second, in the Moran model and in variants we know of, differences in fitness are differences in the probability of being selected to breed, whereas in our model fitness is related to the probability of being selected for discard. This decision avoids complicating the model of breeding by including arbitrary fitness, which simplifies the analysis.
Third, our model applies to any exchangeable distributions over finite sets of possible genomes. The most important instantiation we give here is the product of Dirichlet processes with finite support, but other complex finitely supported exchangeable distributions are possible. This means that our model can be applied to non-trivial genetic algorithms. The Moran model, in contrast, concerns the fixation probabilities of individual alleles.
Finally, the Moran model required one individual to be born and one to die in each generation, whereas our model also to applies to exchangeable breeding of arbitrary numbers of offspring, as described in section 3.3 above.
Our model is a population generalisation of Metropolis Hastings, and is more closely related to nonparametric Bayes inference with distributions based on Dirichlet processes, than it is to the Moran model.
, and others, but in these models the genetic operators are not biologically relevant and although they are described as ‘evolutionary’, they have no relevance to modelling biological evolution; rather, they are proposal heuristics for Metropolis Hastings. These methods are normally applied to continuous domains, but if they were applied to a discrete vector space, the stationary distribution in our notation would be:
which differs from our stationary distribution in equation 1.2 in that the breeding term is absent or equivalently that
is the uniform distribution over. In our examples, and in genetic models, is very far from the uniform distribution.
3.7 Designing a reversible evolutionary model
Our larger aim is to develop a model of evolution that is sufficiently realistic to capture some of the computational power of natural evolution, but which is also simple and tractable for analysis. To ensure that our model satisfies detailed balance and has a stationary distribution that factorises into a breeding and a selection term, we have made the following simplifying assumptions in addition to the usual simplifications of population genetics or genetic algorithms:
- Overlapping populations
: We believe that overlapping populations are necessary for reversibility. If full replacement of the population is enforced at each generation, there can be do guarantee that the population at time could be easily bred from the population at time . In MCMC, state changes typically occur through proposing changes that may or may not be accepted; in an overlapping generations model, if a proposed change is not accepted, we continue with the same population as before.
- -way recombination
: in the product of Dirichlet processes breeding system, each new genome is bred from parents rather than from two parents selected from the population. We conjecture that this is necessary for exact reversibility because, with long genomes and many mutations, if a child is bred from two parents, then the child will be more similar to each of its parents than to other individuals in the population, so that ‘triples’ of two parents and one child will be identifiable even in the stationary distribution. This breaks reversibility since the direction of time can be determined observing evolution in the stationary distribution.
- Mutation as sampling
: We consider mutation as sampling from a base distribution of possible alleles. This model of mutation is not as general as those found in biology, where mutation probabilities are not symmetric or reversible.
- Fitness as lifetime
: All members of the population ‘breed’ at the same rate, and differences in fitness affect only the expected lifetime of an individual. This clearly differs from many types of natural selection, but it is also well known that many organisms continue to produce offspring throughout their lives, so that their total reproductive success depends on their lifetime, as well as on other factors.
It is beyond the scope of this article to argue further whether our model successfully abstracts some essential computational aspects of evolution with sexual reproduction: we present it merely as an abstraction of sexual evolution which is significantly more tractable to analyse than other apparently simple models.
4 The measure
The measure is our main object of interest. In this section we show that the marginal distributions over the set of genotypes converge as the population size ; in the next section we characterize these limits.
Since is exchangeable, by de Finetti’s theorem there exists a prior measure on the set of all probability measures on (simplex) such that for every
so that we can write as follows
where is the normalizing constant. In order to analyze the measure, it is convenient to rewrite it as follows. First, let us introduce some notation
Note that since for all , is correctly defined for every . Thus is the expected weight (under -measure) and is a probability measure on . Now
Since is a probability measure, it holds that and so the normalization constant for is
Finally note that (4.3) can be rewritten more neatly by defining the measure
With , we have
From(4.5), it is easy to find all marginal distributions, namely for any
In particular, when , then
and so on. It is important to observe that depends on .
The limit process.
We have defined for every the measure (4.5) that describes the genotype distribution of a -element population. Now the natural question is: do these measures converge (in some sense) if the population size grows? First we have to define the sense of convergence. Since every measure is defined on different domain (), we cannot speak about standard (weak) convergence of measures. Instead, we ask about the existence of a limiting stochastic process. To explain the sense of convergence, consider that we have defined a triangular array of random variables:
We also know that the joint distribution of the firstvariables in every row depends on . Therefore we ask: is there a stochastic process so that for every the following convergence holds
According to Kolmogorov’s existence theorem, the existence of a stochastic process is equivalent to the existence of (finite dimensional) measures on set , that satisfy the following consistency conditions: for every and for every , it holds that
If we also want (4.7) to be true, then for every and for every the following convergences must hold:
We now present a general lemma that guarantees the convergence (4.8). To achieve the full generality, we let also depend on . Thus, we have weights , and we define the measures as follows
We start with the following observation, proven in appendix.
If , and and are defined with respect of and , respectively, then the following uniform convergence holds.
Let for every . If there exists an probability measure such that , then for every there exists a probability measure on so that (4.8) holds. Moreover, for every
and the measures , satisfy consistency conditions.
Proof. For every from (4.9), it follows that
Since the functions
are bounded (by 1) continuous functions, from uniform convergence, it holds
Clearly are probability measures. The consistency condition trivially holds, because
4.1 Frequencies: the measure
We now consider how to express the limit measure in terms of a limiting measure on the simplex . Recall defined in (2.1) and let
Since is exchangeable, the probability depends on the counts only, so
We may now write:
In what follows, let us denote
Therefore, the measure can be defined on the set as follows:
Considering the frequencies instead of counts, we can define the corresponding measure on the simplex . Let us denote that measure as , so that with
Thus is a discrete measure