Overcoming Problems in the Measurement of Biological Complexity

11/03/2010 ∙ by Manuel Cebrian, et al. ∙ 0

In a genetic algorithm, fluctuations of the entropy of a genome over time are interpreted as fluctuations of the information that the genome's organism is storing about its environment, being this reflected in more complex organisms. The computation of this entropy presents technical problems due to the small population sizes used in practice. In this work we propose and test an alternative way of measuring the entropy variation in a population by means of algorithmic information theory, where the entropy variation between two generational steps is the Kolmogorov complexity of the first step conditioned to the second one. As an example application of this technique, we report experimental differences in entropy evolution between systems in which sexual reproduction is present or absent.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The evolution over time of the entropy of a genome within a population is currently an interesting problem which is conjectured to be connected to the evolution of the complexity of organisms in a genetic algorithm (Adami et al., 2000; Adami and Cerf, 2000). The complexity of the genome of an organism is considered to be the amount of information about its environment it stores. That is, evolution would cause the appearance of more complex sequences, which correspond to more complex phenotypes. This hypothesis states that natural selection acts as a Maxwell demon, accepting only those changes which adapt better to the environment and give rise to more complex individuals with genomes of lower entropy. This idea was tested by the simulation of a very simple system of asexual individuals in fixed environmental conditions.

However, it is well known that that the computation of the entropy as


has technical complications, due to the large size of the sample needed to estimate it with accuracy

(Adami and Cerf, 2000; Herzel et al., 1994; Basharin, 1959). In practice, it is usually estimated as


i.e., as the sum of the entropy contributions of each locus in the genome. This estimation misses the entropy contributions due to epistatic effects. Some sophisticated statistical methods can be used to remedy this (see Appendix in Adami et al. (2000)), although we will not deal with them in this work.

An still unexplored way to overcome this problem is to estimate the entropy of a genome as its average Kolmogorov complexity


(see Cover and Thomas (1991); Kolmogorov (1968); Li and Vitányi (1997)). However, this result only holds for infinitely long sequences, and therefore it cannot be applied to finite (sometimes short) genomes.

If we are only interested in the entropy evolution of the genome, and not in the particular value estimation, we can resort to the following trick: the genetic algorithm can be modelled as a thermodynamic system which evolves over time, where every population is just a measurement by an observer, i.e. the system is modeled as a statistical ensemble of genomes and each measurement is just a sample from that ensemble.

Now we can measure the entropy evolution of the system from two different viewpoints. The first is the system itself, where the entropy is calculated a la Shannon (equation 2

) by estimating the probabilities of the loci alleles, using the frequencies of the ensemble sample.

The second way of measuring the entropy is from the viewpoint of the observer, where a measurement is made of the population at each time step, and the information about the system is updated, i.e., the observer measures the system at time and stores this information . At time the observer makes another measurement and substitutes by . The entropy variation due to this substitution can be calculated for both equilibrium and non-equilibrium thermodynamic systems (Zurek, 1989a, b; Bennett, 1982). Since evolution cannot be modeled as a system in equilibrium, the second case applies: the mutation and generational replacement operators may increase or decrease the entropy of the system (Vose, 1999; Wright, 2005).

Thus the entropy variation from the observer viewpoint is bits. As is an incomputable measure, we estimate it by using the Lempel-Ziv algorithm (Ziv and Lempel, 1978) as , where is an infinite string and is the size of the same string compressed (Cover and Thomas, 1991). Now our measurement of the system at time , is much larger than with equation 3 (just the genome), so the estimation becomes possible.

Ii Experiments

We want to test whether the evolution of can help in the study of the evolution of . Both measurements have their limitations, but their agreement would provide evidence that the entropy evolution is being studied correctly.

We have evaluated this experimentally, using the genetic algorithm proposed by Hayashi et al. (2007), which is able to reproduce sexual behaviour in a vary detailed way, because it includes several features absent in the Adami et al. (2000) experiments, such as sexual reproduction, different inter-locus and intra-locus interactions across the genotypic or phenotipyc distance, and the evolutionary mechanisms of mutation and natural selection.

Figure 1: Histogram of the 22 experimental correlation coefficients.

We have implemented and run the same simulation proposed by Hayashi et al. The model’s sexual dynamics can be summarized as follows: there are two different sexes (male and female). The likelihood that a female with trait will mate with a male with trait is defined by , where is the genetic or phenotypic distance measuring compatibility between the sexes, and is a parameter that represents the compatibility between traits. The value of used in the simulations is . The overall number of offspring produced by a female in each coupling is given by , where is the proportion of males with which the female has been able to mate. The parameter (which can take the values ) defines the fertility of the female. The parameter (ranging from to ) stands for sexual conflict selection in females. is the maximum possible number of offspring (). The sex and the father are randomly chosen for each offspring, and the number of males each female encounters is .

Twenty two different experiments have been performed. and have been estimated for each generation step with the methods described above. The system evolved for 10,000 generations with a population size of 1,000 individual genomes.

In all our results, both measures were highly correlated (see fig. 1), giving evidence that the dual measurement of the evolution of entropy by means of Kolmogorov complexity confirms the use of the Shannon entropy.

Figure 2: Example of continuous evolutionary chase without genetic differentiation obtained with parameters , , , , , 2 loci, phenotypic model: additive. Experimental correlation coefficient: 0.9956.

Iii Discussion

Figure 3: Example of differentiation without speciation, obtained with parameters , , , , , 8 loci, phenotypic model: co-dominance. Experimental correlation coefficient: 0.9741.

When sexual dynamics are introduced in the system, the increase of complexity observed by Adami et al. (2000) is not present anymore.

The typical result observed in our experiments is a chaotic behavior of the entropy (fig. 2). Only large genome lengths or high female mating rates () escape from this (the other parameters seem to have little importance). The effect of this is to increase the autocorrelation of the entropy time series and provoke the rise of a few entropy bumps during a small number of generations (figs. 3 and 4).

Our hypothesis for this behavior of the entropy is the absence of natural selection in the Hayashi et al. (2007) model, which could explain the similarities between male and female evolution (fig. 5). Without natural selection, the environment for females is reduced to random boundary conditions (mutations). On the other hand, males are selected by females as mating partners. In this way, females can be considered to become the environment for males, since they determine the way in which the entropy of the males evolves. On the other hand, females have no environment to adapt to. Perhaps if the pressure of natural selection was applied both to male and female (not necessarily in the same way), more complex patterns in the behavior of their entropies would appear. The fact that natural selection is not taken into account may be the cause of the differences in entropy evolution between the models by Adami et al. (2000) and Hayashi et al. (2007), and the reason why global decreases in entropy are not observed in the latter.

Figure 4: Example of genetic differentiation without co-evolutionary chase or simpatryc speciation, obtained with parameters , , , , , 32 loci, phenotypic model: dominance. Experimental correlation coefficient: 0.8571.

Iv Conclussions and future work

Studying a genetic algorithm from the observer point of view allows us to have large-scale estimates of the entropy evolution via Kolmogorov Complexity. This overcomes many limitations that arise from epistatic effects between loci and provide an easy way to do study the dynamics without resorting to complex mathematical trickeries. We also use this methodology to study what is the effect of sexual reproduction in terms of the evolution of complexity, as a decrease of entropy. We show that when sexual reproduction is present the population enters in a chaotic regime of complexity driven by the complexity drifts of the female organisms. We suggest that this might be cause for female organism evolving chaotically without natural selection, and male organisms evolving with females as the boundary conditions, which gives an overall chaotic evolution of complexity. The next immediate step is to introduce natural selection in the experiments and study whether this will change the evolution of complexity. We plan to do it by implementing the typical natural selection operators from genetic algorithms such as tournament selection, steady state-selection and so forth. We conjecture that introducing natural selection will remove the chaotic complexity dynamics and might probably get closer to what Hayasi et. al. formerly reported: an increase in complexity by removing genetic mutations that do not improve the fitness function.

Figure 5: Same parameters as in fig. 2 with entropy calculation decomposed by sex.


We would like to thank Carlos Castañeda for his help in the implementation and simulations.


  • Adami and Cerf (2000) C. Adami and N. J. Cerf. Physical complexity of symbolic sequences. Phys. D, 137(1-2):62–69, 2000.
  • Adami et al. (2000) C. Adami, C. Ofria, and T.C. Collier. Special feature: Evolution of biological complexity. Proceedings of the National Academy of Sciences, 97(9):4463–4468, 2000.
  • Basharin (1959) G.P. Basharin.

    On a Statistical Estimate for the Entropy of a Sequence of Independent Random Variables.

    Theory of Probability and its Applications, 4:333, 1959.
  • Bennett (1982) C.H. Bennett. The thermodynamics of computation. International Journal of Theoretical Physics, 21(12):905–940, 1982.
  • Cover and Thomas (1991) T.M. Cover and J.A. Thomas. Elements of information theory. Wiley New York, 1991.
  • Hayashi et al. (2007) T.I. Hayashi, M.D. Vose, and S. Gavrilets. Genetic differentiation by sexual conflict. Evolution, 61:516–529(14), March 2007.
  • Herzel et al. (1994) H. Herzel, W. Ebeling, and A.O. Schmitt. Entropies of biosequences: The role of repeats. Physical Review E, 50(6):5061–5071, 1994.
  • Kolmogorov (1968) A.N. Kolmogorov. Three approaches to the quantitative definition of information. International Journal of Computer Mathematics, 2(1):157–168, 1968.
  • Li and Vitányi (1997) M. Li and P. Vitányi. An Introduction to Kolmogorov Complexity and Its Applications. Springer, 1997.
  • Vose (1999) M.D. Vose. The Simple Genetic Algorithm: Foundations and Theory. MIT Press, 1999.
  • Wright (2005) A.H. Wright. Foundations of genetic algorithms. Springer, 2005.
  • Ziv and Lempel (1978) J. Ziv and A. Lempel. Compression of individual sequences via variable-rate coding. Information Theory, IEEE Transactions on, 24(5):530–536, Sept. 1978.
  • Zurek (1989a) W. H. Zurek. Thermodynamic cost of computation, algorithmic complexity and the information metric. Nature, 341:119–124, Sept. 1989a.
  • Zurek (1989b) W.H. Zurek. Algorithmic randomness and physical entropy. Physical Review A, 40(8):4731–4751, 1989b.