1 Introduction
Central to modern synthesis and general evolutionary theory is the understanding that evolution is gradual and is explained by small genetic changes in populations over time [1]. Genetic variation in populations can arise by chance through mutation, with these small changes leading to major evolutionary changes over time. Of interest in connection to the possible links between the theory of biological evolution and the theory of information is the place and role of randomness in the process that provides the variety necessary to allow organisms to change and adapt over time.
On the one hand, while there are known sources of nonuniform random mutations, for example, as a function of environment, gender and age in plants and animals, when all conditions are the same, mutations are traditionally considered to be uniformly distributed across coding and noncoding regions. Noncoding DNA regions are subject to different mutation rates throughout the genome because they are subject to less selective pressure than coding regions. This is the same for the socalled microsatellites, repetitive DNA segments which are mostly noncoding, where the mutation rate increases as a function of number of repetitions. However, beyond physical properties in which the probability of a given nucleotide mutating also depends on their weaker or stronger chemo and thermodynamic bonds, other departures from nonuniformity are less well understood, and seem to be the result of a process rather than being related to or driven by direct physical or chemical interactions.
On the other hand, random mutation implies no evidence for a directing force and in artificial genetic algorithms, mutation has traditionally been uniform even if other strategies are subject to continuous investigation and have been introduced as a function of, for example, time or data size.
More recently, it has been suggested [2, 3, 4, 5] that the deeply informational and computational nature of biological organisms makes them amenable to being studied or considered as computer programs following (algorithmic) random walks in software space, that is, the space of all possible—and valid—computer programs. Here, we numerically test this hypothesis and explore the consequences visàvis our understanding of the biological aspects of life and natural evolution by natural selection, as well as for applications to optimization problems in areas such as evolutionary programming.
We found that the simple assumption of introducing computation in the model of random mutation had some interesting ramifications that echo some genetic and evolutionary phenomenology.
1.1 Chaitin’s Evolutionary Model
In the context of his Metabiology programme, Gregory Chaitin, a founder of the theory of algorithmic information, introduced a theoretical computational model that evolves ‘organisms’ relative to their environment considerably faster than classical random mutation [6, 2, 7]. While theoretically sound, the ideas had not been tested and further advancements were needed for their actual implementation. Here we follow an experimental approach heavily based on the theory that Chaitin himself helped found. We apply his ideas on evolution operating in software space on synthetic and biological examples and even if further investigation is needed this work represents the first step towards testing and advancing a sound algorithmic framework for biological evolution.
Starting with an empty binary string, Chaitin’s example approximates his number, defined as , where is the set of all halting programs [8], in an expected time of , which is significantly faster than the exponential time that the process would take if random mutations from a uniform distribution were applied. This speedup is obtained by drawing mutations according to the Universal Distribution [9, 10], a distribution that results from the operation of computer programs that we will explain in detail in the following section.
In a previous result [11], we have shown that Chaitin’s model exhibits openended evolution (OEE [12]) according to a formal definition of OEE as defined in [11] in accordance to the general intuition about OEE, and that no decidable system with computable dynamics can achieve OEE under such computational definition. Here we will introduce a system that, by following the Universal Distribution, optimally approaches OEE.
2 Methodology
2.1 Algorithmic Probability and the Universal Distribution
At the core of our approach is the concept of Algorithmic Probability introduced by Solomonoff [13], Levin [14] and Chaitin [15]. Denoted by , the algorithmic probability of a binary string is formally defined as:
(1) 
where
is a random computer program in binary (whose bits were chosen at random) running on a socalled prefixfree (in order to constrain the number of valid programs as it would happen in physical systems) universal Turing machine
that outputs and halts.Algorithmic probability connects the algorithmic likelihood of to the intrinsic algorithmic . The less algorithmically complex (like ), the more frequently it will be produced on by running a random computer program . If is the descriptive algorithmic complexity of (also known as KolmogorovChaitin complexity [16, 17]), we have it that .
The distribution induced by over all strings is called the Universal Distribution or Levin’s semimeasure [10, 9, 18], because the measure is semicomputable and can only be approximated from below and its sum does not add up to 1 to be a full measure.
The mainstream practice in the consideration and application of mutation is that mutations happen according to a uniform distribution based on e.g. the length of a genomic sequence and independent of the fitness function. What we will show here is that all other things equal and without making considerations to other genetic operations (e.g. sexual vs asexual beyond the scope of this paper), our results indicate that the operation of random mutation based on algorithmic probability and Universal Distribution makes ‘organisms’ to converge faster and has interesting phenomenological implications such as modularity. Evidently this claim is not completely independent of fitness function. If a fitness function assigns, for example, a higher fitness to organisms whose description maximizes algorithmic randomness, then the application of mutations based on algorithmic probability and the Universal Distribution will fail and will do so optimally as it would be pushing exactly in the opposite direction. But we will show that as long as the fitness function maximizes some nonalgorithmic random structure—as it would be expected from organisms living in a structured world [4], then mutations based on the Universal Distribution will converge faster.
2.2 Classical v. Algorithmic Probability
To illustrate the difference between one and the other, the classical probability of producing the first digits of a mathematical constant such as in binary by chance by e.g. randomly typing on a typewriter, is exponentially unlikely as a function of the number of digits to be produced. However, because is not random, in the sense that it has a short description that can generate an arbitrary number of digits of with the same (short) formula, the algorithmic likelihood of to be generated by a random program is much higher than its classical probability. This is because the (classical) probability of producing a short computer program encoding a short mathematical formula is more likely than typing the digits of themselves one by one. This probability based on generating computer programs rather than generating the objects that such computer programs may geenrate, is called algorithmic probability. A generating formula can thus be written as a computer program in no more than bits having a probability of occurring by chance divergent from the probability given by classical probability.
2.3 Motivation and Theoretical Considerations of Algorithmic Evolution
In Chaitin’s evolutionary model [6, 2, 7], a successful mutation is defined as a computable function , chosen according to the probabilities stated by the Universal Distribution [10, 9], that changes the current state of the system (as an input of the function) to a better approximation of the constant [17]. In order to be able to simulate this system we would need to compute the Universal Distribution and the fitness function. However, both the Universal Distribution and the fitness function of the system require the solution of the Halting Problem [8], which is uncomputable. Nevertheless, as with itself, this solution can be approximated [19, 20]. Here we are proposing a model that, to the best of our knowledge, is the first computable approximation to Chaitin’s proposal.
For this first approximation we have made four important initial concessions: one with respect to the real computing time of the system, and three with respect to Chaitin’s model:

We assume that building the probability distributions for each instance of the evolution takes no computational time, while in the
real computation this is the single most resourceintensive step. 
The goal of our system is to approximate objects of bounded information content: binary matrices of a set size.

We use BDM and Shannon’s entropy as approximations for the algorithmic information complexity .

We are not approximating the algorithmic probability of the mutation functions, but that of their outputs.
We justify the first concession in a similar fashion as Chaitin: if we assume that the interactions and mechanics of the natural world are computable, then the probability of a decidable event^{1}^{1}1An event is decidable if it can be decided by a Turing machine. occurring is given by the Universal Distribution. The third one is a necessity, as the algorithmic probability of an object is uncomputable (it requires a solution for HP too). In an upcoming section we will show that Shannon’s entropy is not as good as BDM for our purposes. Finally, note that given the Universal Distribution and a fixed input, the probability of a mutation is in inverse proportion to the descriptive complexity of its output, up to a constant error. In other words, it is highly probable that a mutation may reduce the information content of the input but improbable that it may increase the information content. Therefore, the last concession yields an adequate approximation, since a low information mutation can reduce the descriptive complexity of the input but not increase it in a meaningful way.
2.4 Our Expectations
It is important to note that, when compared to Chaitin’s metabiology model [2], we changed the goal of our system therefore we must also change the expectations we had for its behaviour.
Chaitin’s evolution model [2] is faster than regular random models despite targeting a highly random object, thanks to the fact that positive mutations have low algorithmic information complexity and hence a (relatively) high probability of being stochastically chosen under the Universal Distribution. The universally low algorithmic complexity of these positive mutations relies on the fact that, when assuming an oracle for HP, we are also implying a constant algorithmic complexity for its evaluation function and target, since we can write a program that verifies if a change on a given approximation of is a positive one without needing a codification of itself.
In contrast, we expected our model to be sensitive with respect to the algorithmic complexity of the target matrix, obtaining high speedup for structured target matrices that decreases as the algorithmic complexity of the target grows. However, this change of behaviour remains congruent with the main argument of metabiology [2] and our assertion that, contrary to regular random mutations, algorithmic probability driven evolution tends to produce structured novelty at a faster rate, which we hope to prove in the upcoming set of experiments.
In summary, we expect that when using an approximation to the Universal Distribution:

Convergence will be reached in an fewer total mutations than when using the uniform distribution for structured target matrices.

The stated difference will decrease in relation to the algorithmic complexity of the target matrix.
We also aimed to explore the effect of the number of allowed shifts (mutations) on the expected behaviour.
2.4.1 The Unsuitability of Shannon’s Entropy
As shown in [21, 20], when compared to BDM we can think of Shannon’s entropy alone as a less accurate approximation to the algorithmic complexity of an object (if its underlying probability distribution is not updated by a method equivalent to BDM, as it would not be by the typical uninformed observer). Therefore we expect the entropyinduced speedup to be consistently outperformed by BDM when the target matrix moves away from algorithmic randomness and has thus some structure. Furthermore, as random matrices are expected to have a balanced number of 0’s and 1’s, we anticipated the performance of single bit entropy to be nearly identical to the uniform distribution on unstructured (random) matrices. For block entropy [22, 23], that is, the entropy computed over submatrices rather than single bits, the probability of having repeated blocks is in inverse proportion to their size, while blocks of smaller sizes approximate single bit entropy, again yielding similar results to the uniform distribution. The results support our assumptions and claims.
2.5 Evolutionary Model
Broadly speaking, our evolutionary model is a tuple , where:

is the state space (see section 2.7),

, with , is the initial state of the system,

is a function, called the fitness or aptitude function, which goes from the state space to the positive real numbers,

is a positive integer called the extinction threshold,

is a real number called the convergence parameter, and

is a nondeterministic evolution dynamic such that if then and , where is the number of steps or mutations it took to produce , if it was unable find with a better fitness in the given time, and if it finds such that .
Specifically, the function receives an individual and returns an evolved individual , in the time specified by , that improves upon the value of the fitness function and the time it took to do so, if it was unable to do so and if it reached the convergence value.
A successful evolution is the sequence and is the total evolution time. We say that the evolution failed, or that we got an extinction, if instead we finish the process by , with being the extinction time. The evolution is undetermined otherwise. Finally, we will call each element an instance of the evolution.
2.6 Experimental Setup: A Max One Problem Instance
For this experiment, our phase state is the set of all binary matrices of sizes , our fitness function is defined as the Hamming distance , where is the target matrix, and our convergence parameter is . In other words, the evolution converges when we produce the target matrix, guided only by the Hamming distance to it, which is defined as the number of different bits between the input matrix and the target matrix.
The stated setup was chosen since it allows us to easily define and control the descriptive complexity of the fitness function by controlling the target matrix and, therefore also control the complexity of the evolutionary system itself. Is important to note that our setup can be seen as a generalization of the Max One problem [24], where the initial state is a binary “initial gene” and the target matrix is the “target gene”; when we obtain a Hamming distance of 0 we have obtained the gene equality.
2.7 Evolution Dynamics
The main goal of this project is to contrast the speed of the evolution when choosing between two approaches to determining the probability of mutations:

When the probability of a given set of mutations has a uniform distribution. That is, all possible mutations have the same probability of occurrence, even if under certain constraints.

When the probability of a given mutation occurring is given by an approximation to the Universal Distribution (UD) [14, 10]. As the UD is noncomputable, we will approximate it by approximating the algorithmic complexity ([16, 17]) by means of the Block Decomposition Method (BDM, with no overlapping) [20] based on the Coding Theorem Method (CTM) [25, 26, 27] (see methods).

We will also investigate the results by running the same experiments using Shannon Entropy instead of BDM to approximate .
Each evolution instance was computed by iterating over the same dynamic. We start by defining the set of possible mutations as those that are within a fixed number of bits from the input matrix. In other words, for a given input matrix , the set of possible mutations in a single instance is defined as the set
Then, for each matrix in , we compute the probability defined as:

in the case of the Uniform Distribution.

for the BDM Distribution and

or for Shannon entropy (for an uninformed observer with no access to the possible deterministic or stochastic nature of the source),
where , and are normalization factors such that the sum of the respective probabilities are 1.
For implementation purposes, we used a minor variation to the entropy probability distribution to be used and compared to BDM. The probability distributions for the set of possible mutations using entropy were built using two heuristics: Let
be a possible mutation of , then the probability of obtaining as a mutation is defined as either, or . The first definition assigns a linearly higher probability to mutations with lower entropy. The second definition is consistent with our use of BDM in the rest of the experiments. The constant is an arbitrary small value that was included to avoid undefined (infinite) probabilities. For the presented experiments was set at .Once the probability distribution is computed, we set the number of steps as 0 and then, using a (pseudo)random number generator (RNG), we proceed to stochastically draw a matrix from the sated probability distributions and evaluate its fitness with the function , adding to the number of steps. If the resultant matrix does not show an improvement in fitness, we draw another matrix and add another 1 to the number of steps, not stopping the process until we obtain a matrix with better fitness or reach the extinction threshold. We can either replace the drawn matrix or leave it out of the pool for the next iterations. A visualisation of the stated work flow for a matrix is shown in Figure 1.
To produce a complete evolution sequence, we iterate the stated process until either convergence or extinction is reached. As stated before, we can choose to not replace an evaluated matrix from the set of possible mutations in each instance, but we chose to not keep track of evaluated matrices after an instance was complete. This was done in order to keep open the possibility of dynamic fitness functions in future experiments.
In this case, the evolution time is defined as the sum of the number of steps (or draws) it took the initial matrix to reach equality with the target matrix. When computing the evolution dynamics by one of the different probability distribution schemes we will denote it by uniform strategy, BDM strategy or strategy
, respectively. That is, the uniform distribution, the distribution for the algorithmic probability estimation by BDM, and the distribution by Shannon entropy.
2.8 The SpeedUp Quotient
We will measure how fast (or slow) a strategy is compared to the uniform by the speedup quotient, which we will define as:
Definition 1.
The speedup quotient, or simply speedup, between the uniform strategy and a given strategy is defined as
where is the average number of steps it takes a sample (a set of initial state matrices) to reach convergence under the uniform strategy and is the average number of steps it takes under the strategy.
3 Results
3.1 Cases of Negative Speedup
In order to better explain the choices we have made to our experimental setup, first we will present a series of cases where we obtained no speedup or slowdown. Although these cases were expected, they shed important light on the behaviour of the system.
3.1.1 Entropy vs. Uniform on Random Matrices
For the following experiments, we generated 200 random matrices separated into two sets: initial matrices and target matrices. After pairing them based on their generation order we evolved them using 10 strategies: the uniform distribution, block Shannon’s entropy for blocks of size , denoted below by , entropy for single bits denoted by , and their variants where we divide by and respectively. The strategies were repeated for 1 and 2bit shifts (mutations).
Strategy  Shifts  Average  SE 

Uniform  1  214.74  3.55 
1  214.74  3.55  
1  215.53  3.43  
1  214.74  3.55  
1  213.28  3.33  
Uniform  2  1867.10  78.94 
2  1904.52  79.88  
2  2036.13  83.38  
2  1882.46  78.63  
2  1776.25  81.93 
The results obtained are summarized in the table 1
, which lays out the strategy used for each experiment, the number of shifts/mutations allowed, the average number of steps it took to reach convergence, as well as the standard error of the sample mean. As we can see, the differences in the number of steps required to reach convergence are not statistically significant, validating our assertion that, for random matrices, entropy evolution is not much different than the uniform evolution.
Because the algorithmic complexity of a network makes sense only in its unlabelled version in general, and in most of the cases. In [27, 28, 20] we showed, both theoretically and numerically, that approximations of algorithmic complexity of adjacency matrices of labelled graphs are a good approximation (up to a logarithmic term or the numerical precision of the algorithm) of the algorithmic complexity of the unlabelled graphs. This means that we can consider any adjacency matrix of a network a good representation of the network disregarding graph isomorphisms.
3.1.2 Entropy vs. Uniform on a Highly Structured Matrix
For this set of experiments, we took the same set of 100 initial matrices and evolved them into a highly structured matrix, which is the adjacency matrix of the star with 8 nodes. For this matrix, we expected entropy to be unable to capture its structure, and the results obtained accorded with our expectations. The results are shown in table 2.
Strategy  Shifts  Average  SE 

Uniform  1  216.24  3.48 
1  216.71  3.54  
1  212.74  3.41  
1  216.71  3.54  
1  211.74  3.69  
Uniform  2  1811.84  85.41 
2  1766.69  88.18  
2  1859.11  75.73  
2  1764.03  84.52  
2  1853.04  74.48 
As we can see from the results, entropy was unable to show a statistically significant speedup compared to the uniform distribution. Over the next sections we show that we have obtained a statistically significant speedup by using the BDM approximation to algorithmic probability distributions, which is expected because BDM manages to better capture the algorithmic structures of a matrix rather than just the distribution of the bits which entropy measures. Based on the previous experiments, we conclude that entropy is thus not a good approximation for , and we will omit its use in the rest of the article.
3.1.3 Randomly Generated Graphs
For this set of experiments, we generated 200 random matrices and 600 matrices, both sets separated into initial and target matrices. We then proceeded to evolve the initial matrix into the corresponding target by the following strategies: uniform and BDM within 2bit and 3bit shifts (mutations) for the matrices and only 2bit shifts for the matrices due to computing time. The results obtained are shown in the Figure 2. In all cases, we do not replace drawn matrices and the extinction threshold was set at 2500.
From the results we can see two important behaviours for the matrices. The matrices generated are of high BDM complexity and evolving the system using the uniform strategy tends to be faster than using BDM for these highly random matrices. Secondly, although increasing the number of possible shifts by 1 seems, at a first glance, a small change in our setup, it has a big impact on our results: the number of extinctions has gone from 0 for both methods to 92 for the uniform strategy and 100 for BDM. This means that most evolutions will rise above our threshold of 2500 drafts for a single successful evolutionary step, leading to an extinction. As for the matrices, we can see a formation of two easily separable clusters that coincide perfectly with the Uniform and BDM distributions respectively.
3.2 The Causes of Extinction
For the uniform distribution, the reason is simple: the number of 3bit shifts on matrices gives a space of possible mutations of matrices, which is much larger than the number of possible mutations present within 2shifts and 1shift (mutation), which are and respectively. Therefore, as we get close to convergence, the probability of getting the right evolution, if the needed number of shifts is two or one, is about 0.04%, and removing repeated matrices does not help in a significant way to avoid extinction, since 41 664 is much larger than 2500.
Given the values discussed, we have chosen to set the extinction threshold at 2500 and the number of shifts at 2 for matrices, as allowing just 64 possible mutations for each stage is a number too small for showing a significant difference in the evolutionary time between the uniform and BDM strategies, while requiring evolutionary steps of 41 664 for an evolutionary stage is too computationally costly. The threshold of 2500 is close to the number of possible mutations and has been shown to consume a high amount of computational resources. For matrices, we performed 1bit shifts only, and occasionally 2bit shifts when computationally possible.
3.2.1 The BDM Strategy, Extinctions and Persistent Structures
The interesting case is the BDM strategy. As we can see clearly in Figure 3 for the 3bit case, the overall number of steps needed to reach each extinction is often significantly higher than 2500 under the BDM strategy. This behaviour cannot be explained by the analysis done for the uniform distribution, which predicts the sharp drop observed in the blue curve.
After analyzing the set of matrices drawn during failed mutations (all the matrices drawn during a single failed evolutionary stage), we found that most of these matrices have in common highly regular structures. We will call these structures persistent structures. Formally, regular structures can be defined as follows:
Definition 2.
Let be the description used for an organism or population and a substructure of in a computable position such that , where is a small number and is the codification of without the contents of . We will call a persistent or regular structure of degree if the probability of choosing a mutation with the subsequence is .
Now, note that grows in inverse proportion to and the difference in algorithmic complexity of the mutation candidates and : Let contain in a computable position. Then the probability of choosing as an evolution of is
Furthermore, if the possible mutations of can only mutate a bounded number of bits and there exists such that, for every other subsequence of that can replace we have it that , then:
The previous inequality is a consequence of the fact that the possible mutations are finite and only a small number of them, if any, can have a smaller algorithmic complexity than the mutations that contain ; otherwise we contradict the existence of . In other words, as has relatively low complexity, the structures that contain tend to also have low algorithmic complexity, and hence a higher probability of being chosen.
Finally, as shown in the section 3.2, we can expect the number of mutations with persistent structures to increase in factorial order with the number of possible mutations and in polynomial order with respect to the size of the matrices that compose the state space.
Proposition 3.
As a direct consequence of the last statement, we have it that, for systems evolving as described in the section 2.7 under the Universal Distribution:

Once a structure with low descriptive complexity is developed, it is exponentially hard to get rid of it.

The probability of finding a mutation without the structure decreases in factorial order with respect to the set of possible mutations.

Evolving towards random matrices is hard (improbable).

Evolving from and to unrelated regular structures is also hard.
Given the fourth point, we will always choose random initial matrices from now on, as the probability of drawing a mutation other than an empty matrix (of zeroes), when one is present in the set of possible mutations, is extremely low (below for matrices with 2 shifts).
3.3 Positive SpeedUp Instances
In the previous section, we established that the BDM strategy yields a negative speedup when targeting randomly generated matrices, which are expected to be of high algorithmic information content or unstructured. However, as stated in section 2.4, that behaviour is within our expectations. In the next section we will show instances of positive speedup, including cases where previously entropy failed to show statistically significant speedup or was outperformed by BDM.
3.3.1 Synthetic Matrices
For the following set of experiments we manually built three matrices that encode the adjacency matrices of three undirected nonrandom graphs with 8 nodes that are intuitively structured: the complete graph, the star graph and a grid. The matrices used are shown in Figure 4.
After evolving the same set of 100 randomly generated matrices for the three stated matrices, we can report that we found varying degrees of positive speedup, that correspond to their respective descriptive complexities as approximated by their BDM values. The complete graph, along with the empty graph, is the graph that has the lowest approximated descriptive complexity with a BDM value of just 24.01. As expected, we get the best speedup quotient in this case. After the complete graph, the star intuitively seems to be one of the less complex graphs we can draw. However, its BDM value (105.434) is higher than the grid (83.503). Accordingly, the speedup obtained is lower. The results are shown in the Figure 5.
As we can see from the Figure 5, a positive speedup quotient was consistently found within 2bit shifts without replacements. We have one instance of negative speedup with one shift with replacements for the grid, and negative speedup for all but the complete graph with two shifts.
However, it is important to say that almost all the instances of negative speedup are not statistically significant, as we have a very high extinction rate of over 90%, and the difference between the averages is lower than two standard errors of the mean. The one exception is the grid at 1bit shift, which had 45 extinctions for the BDM strategy. The complete tables are presented in the appendix.
3.3.2 Mutation Memory
The cause of the extinctions found in the grid are what we will call maladaptive persistent structures (definition 2), as they occur at a significantly higher rate under the BDM distribution. Also, as the results suggest, a strategy to avoid this problem is adding memory to the evolution. In our case, we will not replace matrices already drawn from the set of possible mutations.
We do not believe this change to be contradictory to the stated goals, since another way to see this behaviour is that the Universal Distribution dooms (with very high probability) populations with certain mutations to extinction, and evolution must find strategies to eliminate these mutations fast from the population. This argument also implies that extinction is faster under the Universal Distribution than regular random evolution when a persistent maladaptive mutation is present, which can be seen as a form of mutation memory. This requirement has the potential to explain evolutionary phenomena such as the Cambrian explosion, as well as mass extinctions: once a positively structured mutation is developed, further algorithmic mutations will keep it (with a high probability), and the same applies to negatively structured mutations. This can also explain the recurring structures found in the natural world. Degradation of a structure is still possible, but will be relatively slow. In other words, evolution will remember positive and negative mutations (up to a point) when they are structured.
From now on, we will assume that our system has memory and that mutations are not replaced when drawn from the distribution.
3.3.3 The SpeedUp Distribution
Having explored various cases, and found several conditions where negative and positive speedup are present, the aim of the following experiment was to offer a broader view of the distribution of speedup instances as functions of their algorithmic complexity.
For the case, we generated 28 matrices by starting with the undirected complete graph with 8 nodes, represented by its adjacency matrix, and then we removed one edge at a time until the empty graph (the diagonal matrix) was left, obtaining our target matrix set. It is important to note that the resultant matrices are always symmetrical. The process was repeated for the matrices, obtaining a total of 120 target matrices.
For each target matrix in the first target matrix set, we generated 50 random initial matrices and evolved the population until convergence was reached using the two stated strategies: uniform and BDM, both without replacements. We saved the number of steps it took for each of the 2800 evolutions to reach convergence and computed the average speedup quotient for each target matrix. The stated process was repeated for the second target matrix set, but by generating 20 random matrices for each of the 120 target matrices to conserve computational resources. The experiment was repeated for shifts of 1 and 2 bits and the extinction thresholds used were 2500 for and 10 000 for matrices.
As we can see from the results in Figure 6, the average number of steps required to reach convergence is lower when using the BDM distribution for matrices with low algorithmic complexity, and this difference drops along with the complexity of the matrices but never crosses the extinction threshold. This suggests that symmetry over the diagonal is enough to guarantee a degree of structure that can be captured by BDM. It is important to report that we found no extinction case for the matrices, 13 in the matrices with 1bit shifts, all for the BDM distribution, and 1794 with 2bit shifts, mostly for the uniform distribution.
This last experiment was computationally very expensive. Computing the data required for the , 2bit shifts sequence took 12 days, 6 hours and 22 minutes on a single core of an i54570 PC with 8GB of RAM. Repeating this experiment for 3bit shifts is unfeasible with our current setup, as it would take us roughly two months shy of 3 years.
Now, by combining the data obtained for the previous sequence and the random matrices used in section 3.1.3, we can approximate the positive speedup distribution. Given the nature of the data, this approximation (Figure 7) is given as two curves, each representing the expected evolution time from a random initial matrix as a function of the algorithmic information complexity of the target matrix for both strategies, uniform and BDM respectively. The positive speedup instances are those where the the BDM curve is below the uniform curve.
The first result we get from Figure 7 is a confirmation of an expected one: unlike the uniform strategy, the BDM strategy is highly sensitive to the algorithmic information content of the target matrix. In other words, it makes no difference for a uniform probability mutation space whether the solution is structured or not, while an algorithmic probability driven mutation will naturally converge faster to structured solutions.
The results obtained expand upon the theoretical development presented in section 3.2.1. As the set of possible mutations grows, so do the instances of persistent structures and the slowdown itself. This behaviour is evident given that, when we increase the dimension of the matrices, we obtain a wider gap within the intersection point of the two curves and the expected BDM value, which corresponds to the expected algorithmic complexity of randomly generated matrices. However, we also increase the number of structured matrices, ultimately producing a richer and more interesting evolution space.
3.4 Chasing Biological and Synthetic Dynamic Networks
3.4.1 A Biological Case
We now set as target the adjacency matrix of a biological network corresponding to the topology of an ERBB signalling network [29]. The network is involved in responses ranging from cell division, death, motility, and adhesion and when dysregulated it has been found to be strongly related to cancer [30, 31].
As one of our main hypotheses is that algorithmic probability is a better model for explaining biological diversity, it is important to explore whether naturally occurring structures are more likely to be produced under the BDM strategy than the uniform strategy, which is equivalent to showing them evolving faster.
The binary target matrix is shown in Figure 8 and it has a BDM of 349.91 bits. For the first experiment, we generated 50 random matrices that were evolved using 1bit shift mutations for the Uniform and BDM distributions, without repetitions. The BDM of the matrix is at the right of the intersection point inferred by the cubic models shown in Figure 7. Therefore we predict a slowdown. The results obtained are shown in the table 3.
Strategy  Shifts  Average  SE  Extinctions 

Uniform  1  1222.62  23.22  0 
BDM  1  1721.86  56.88  0 
As the results show, we obtained a slowdown of 0.71, without extinctions. However, as mentioned above, the BDM of the target matrix is relatively high, so this result is consistent with our previous experiments. However, the strategy can be improved.
3.4.2 Evolutionary Networks
An evolutionary network
is a tensor of dimension 4 of nodes
which are networks themselves with edges drawn if evolves into and weight corresponding to the number of times that a network has evolved into . Fig. 9 shows a subnetwork of the full network for each evolutionary strategy from 50 (pseudo)randomly generated networks with the biological ERBB signalling network as target.Mutations and overexpression of ERB receptors (ERBB2 and ERBB3 in this network) have been strongly associated to more than 10 types of tissuespecific cancers and they can be seen at the highest level regulating most of the acyclic network.
We call forward mutations, mutations that led to the target network, and backward mutations, mutations that get away from the target network through the same evolutionary paths induced by forward mutations. The forward mutations in the neighbourhood of the target (the evolved ERBB network) for each strategy, are as follow. For the uniform distribution, each of the following network forward mutations (regulating links) had equal probability (1/5, assuming independence even if unlikely): ESRERBB2, ERBB2CDK4, CDK2 AKT, CCNDEGFR, CCNDCDKN1A, as shown in Fig. 9.
For the BDM strategy, the network forward mutations in the top 5 most likely immediate neighbourhood followed and sorted by their occurring probability are: CCND CDK4, 0.176471; ESR CCND, 0.137255; CDK6 RB, 0.137255; CDKN1B CDK2, 0.0784314; IGFR.
One of the mutations by BDM involves the breaking of the only network cycle of size 6:
EGFRERBB3, ERBB3IGFR, IGFRESR, ESRMYC, MYCEGFR by deletion of the interaction MYCEGFR, with probability 0.05 among the possible mutations in the BDM immediate neighbourhood of the target. In the cycle is involved ERBB3 which has been found to be related to many types of cancer when overexpressed [30]
For the local BDM strategy, the following were the top 5 forward mutations: EGFRERBB2, 0.32; EGFR ERBB3, 0.107; IGFR CCNE, 0.0714; ERBB3 ERBB2, 0.0714; EGFR ESR, 0.0714; with ERBB2 and ERBB3 heavily involved in 3 of the top 5 possible mutations with added probability 0.49 and thus more likely than any other pair and interaction of proteins in the network.
Under a hypothesis that mutations can be reversals to states in past evolutionary pathways, then mutations to such interactions may be the most likely backward mutations to occur.
3.4.3 The Case for Localized Mutations and Modularity
As previously mentioned in the proposition 3, the main causes of slowdown under the BDM distribution are maladaptive persistent structures. These structures will negatively impact the evolution speed in factorial order relative to the size of the state space. One direct way to reduce the size set of possible mutations is to reduce the size of the matrices we are evolving. However, doing so will reduce the number of interesting objects we can evolve towards too. Another way to accomplish the objective while using the same heuristic is to rely on localized (or modular) mutations. That is, we force the mutation to take place on a submatrix of the input matrix.
The way we implement the stated change is by adding a single step in our evolution dynamics: at each iteration, we will randomly draw, with uniform probability, one submatrix of size out of the set of adjacent submatrices that compose the input matrix, with no overlap, and force the mutation to be there by computing the probability distribution over all the matrices that contain the bitshift only at the chosen place. We will call this method the local BDM method.
It is important to note that, within 1bit shifts (point mutations), the space of total possible mutations remains the same when compared to the uniform and BDM strategies. Furthermore, the behaviour of the uniform strategy would remain unchanged if the extra step is applied using the Uniform distribution.
We repeated the experiment shown in the table 3 with the addition of the local BDM strategy and the same 50 random initial matrices. Its results are shown in the table 4. As we can see from the results obtained, local BDM obtains a statistically significant speedup of 1.25 when compared to the uniform.
Strategy  Shifts  Average  SE  Extinctions 

Uniform  1  1222.62  23.22  0 
BDM  1  1721.86  56.88  0 
Local BDM  1  979  25.94  0 
One potential explanation of why we failed to obtain speedup for the network with the BDM strategy is that, as an approximation to , the model depends on finding global algorithmic structures, while the sample is based on a substructure which might not have enough information about the underlying structures that we hypothesize govern the natural world and allow scientific models and predictions.
However, biology evolves modular systems [32], such as genes and cells, that in turn build building blocks such as proteins and tissues. Therefore, local algorithmic mutation is a better model. This is a good place to recall that local BDM was devised as a natural solution to the problem presented by maladaptive persistent structures in global algorithmic mutation. Which also means that this type of modularity can be evolved by itself given that it provides an evolutionary advantage, as our results demonstrate. This is compatible with the biological phenomenon of nonpoint mutations in contrast to point mutations, which affect only a single nucleotide. For example, in microsatellites mutations may lead to the gain or loss of the entire repeated unit, and sometimes several repeats simultaneously.
We will further explore the relationship between BDM and local BDM within the context of global structures in the next section. Our current setup is not optimal for further experimentation in biological and local structured matrices, as the computational resources required to build the probability distribution for each instance grows in quadratic order relative to matrix size, though these computational resources are not needed in the real world (c.f. Conclusions).
3.4.4 Chasing Synthetic Evolving Networks
The aim of the next set of experiments was to follow, or chase, the evolution of a moving target using our evolutionary strategies. In this case, we chased 4 different dynamical networks: the ZK graphs [21], ary trees, an evolving star graph and a startopath graph dynamic transition artificially created for this project (see Appendix for code). These dynamical networks are families of directed labelled graphs that evolve over time using a deterministic algorithm, some of which display interesting graphtheoretic and entropyfooling properties [21]. As the evolution dynamics of these graphs are fully deterministic, we expected BDM to be (statistically) significantly faster than the other two evolutionary strategies, uniform probability and local BDM.
We chased these dynamics in the following way: Let , , , , be the stages of the system we are chasing. Then the initial state
was represented by a random matrix and, for each evolution
, the input was defined as the adjacency matrix corresponding to , while the target was set as the adjacency matrix for . In order to normalize the matrix size, we defined the networks as always containing the same number of nodes (16 for matrices). We followed each dynamic until the corresponding stage could not be defined in 16 nodes.The results which were obtained, starting from 100 random graphs and 100 different evolution paths at each stage, are shown in Figure 10. It is important to note that, since the graphs were directed, the matrices used were nonsymmetrical.
From the results we can see that local BDM consistently outperformed the uniform probability evolution, but the BDM strategy was the faster by a significant margin. The results are as expected and confirm our hypothesis: uniform evolution cannot detect any underlying algorithmic cause of evolution, while BDM can, inducing a faster overall evolution. Local BDM can only detect local regularities, which is good enough to outrun uniform evolution in these cases. However, as the algorithmic regularities are global, local BDM is slower than (global) BDM.
4 Discussion and Conclusions
The results of our numeric experiments are statically significant and, as shown in figures 6 and 7, the speedup quotient increases in relationship to the ratio between the algorithmic complexity of the target matrix and the expected random matrix, confirming our theoretical expectations. The obtained speedup can be considered low when the stated quotient is sufficient close to 1, but on a rich evolution space we expect this difference to be significant: for a rough estimate, the human genome can potentially store 700 megabytes of data, while the biggest matrix used in our experiments represent a space limited to objects of bits, therefore we expect the effects of speedup (and slowdown) to be significantly higher in natural evolution than in these experiments.
On the one hand, classical mechanics establishes that random events are only apparent and not fundamental. This means that mutations are not truly random but the result of interacting deterministic systems that may distribute differently than random. A distribution representing causal determinism is that suggested by algorithmic probability and the Universal Distribution because of its theoretical stability under changes of formalism and description language [9, 10]. Its relevancy, even for nonTuring universal models of computation, has also been proven [18], able to explain up more than 50% of a bias towards simplicity.
On the other hand, the mathematical mechanisms of biological information, from Mendelian inheritance to Darwin’s evolution and the discovery of the digital nature of the genetic code together with the mechanistic nature of the mechanisms of translation, transcription and other inter cellular processes, suggests a strong algorithmic basis underlying fundamental biological processes. By taking it to the next consequence, these ideas indicate that evolution by natural selection may not be (very) different to, and can thus be regarded and studied as evolving programs in software space as suggested by Chaitin [6, 7, 3].
Our findings demonstrate that computation can thus be a powerful driver of evolution that can better explain key aspects of life. Effectively, algorithmic probability reduces the space of possible mutations. By abandoning the uniform distribution assumption, questions ranging from the apparition of sudden major stages of evolution, the emergence of ‘subroutines’ in the form of modular persistent structures and the need of an evolving memory carrying information organized in such modules that drive evolution by selection, may be explained.
The algorithmic distribution emerges naturally from the interaction of deterministic systems [18, 10]. In other words, we are simulating the conditions of an algorithmic/procedural world and there is no reason to believe that it requires greater realworld (thus highly parallel) computation than is required by the assumption of the uniform distribution given the highly parallel computing nature of physical laws. The Universal Distribution can thus be considered as natural, or in some way, even more natural, than the uniform distribution.
The interplay of the evolvability of organisms from the persistence of such structures also explains two opposed phenomena: recurrent explosions of diversity and mass extinctions, phenomena which have occurred during the history of life on earth that have not been satisfactorily explained under the uniform mutation assumption. The results suggest that extinction may be an intrinsic mechanism of biological evolution.
In summary, taking the informational and computational aspects of life based on modern synthesis to the ultimate and natural consequences, the present approach based on weak assumptions of deterministic dynamic systems offers a novel framework of algorithmic evolution within which to study both biological and artificial evolution.
Methods
Approximations to algorithmic complexity
The algorithmic complexity of a string (also known as KolmogorovChaitin complexity [16, 17]) is defined as the length of the smallest program that produces as an output and halts. This measure of complexity is invariant—up to a constant value— with respect to the choice of reference universal Turing machine. Finding the exact value of for any is a lower semicomputable problem. This means that there is no general effective method to find for any given string, but upper bounds can be estimated.
Among the computable methods used to set an upper bound are the Coding Theorem Method (CTM) [25, 26, 33] and the Block Decomposition Method (BDM) [27, 20]. The CTM relies upon approximating the algorithmic probability of an object by running every possible machine in a large set of small Turing machines, generating an empirical probability distribution for the produced strings by counting the number of small Turing machines that produce each string and halt. The algorithm can only be decided for a small number of Turing machines and for those that can be decide it runs in exponential time, therefore only approximations of for small strings are feasible. However, this computation only needs to be done once to populate a lookup table that allows its application in linear (constant in exchange of memory) time.
BDM is an extension of CTM defined as
where each corresponds to a substring of for which its CTM value is known and is the number of times the string appears in . A thorough discussion of BDM is found in [20].
Recursively generated graphs
To test the speed of algorithmic evolution on recursive dynamic networks we generated 3 other low algorithmic graphs different from the ZK graph as defined in [21] that is also of low algorithmic complexity. We needed graphs that evolved over time in a low algorithmic complexity fashion from and to low algorithmic complexity graphs. The 3 graphs were canonically labelled using the positive natural numbers up to by maximizing the number of nodes with consecutive numbers, then a rule was applied from lowest to highest number until the transformation was complete.
The Wolfram Language code used to generate these recursively (hence of low algorithmic complexity/randomness) evolving graphs are the following. For the ZK graph [21] recursively generated by:
AddEdges[graph_]:= EdgeAdd[graph, Rule@@@Distribute[{Max[VertexDegree[graph]] +1, Table[i,{i,(Max[VertexDegree[graph]]+ 2), (Max[VertexDegree[graph]]+ 1)+(Max[VertexDegree[graph]]+1)  VertexDegree[graph, Max[ VertexDegree[graph]] + 1]}]}, List]] EdgeList/@NestList[AddEdges, Graph[{1>2}],n]
where is the number of iterations. For the growing star graph with overlapping nodes:
Graph[Rule@@@ Flatten[Table[(List@@@EdgeList[ StarGraph[n]]) + 2 n, {n, 3, i, 1}], 1]]
The startopath graph was encoded by:
EdgeList/@ FoldList[EdgeDelete[EdgeAdd[#1, #2[[1]]], #2[[2]]] &, Graph[EdgeList@StarGraph[n], VertexLabels > "Name"], Thread[{Most@Flatten[{EdgeList@ CycleGraph[16], 1 <> 16}], Flatten[{EdgeList@StarGraph[n], 1 <> 16}]}]]
where is the size of the star graph in the 2 previous evolving graphs.
The ary tree evolving graph was generated with the Wolfram Language builtin function KaryTree[n], where is the size of the ary tree.
Appendix
Details of the theoretical and numerical application of BDM to matrices and graphs are provided in in [34, 27]. Tables 5, 6 and 7 contain full statistical information for the speedup obtained for the simple graphs ‘complete’, ‘star’ and ‘grid’.
Strategy  Shifts  Average  SE  Extinctions  Replacements 

Uniform  1  216.14  3.70  0  No 
BDM  1  75.82  1.67  0  No 
Uniform  2  1828.29  73.76  0  No 
BDM  2  68.40  2.38  0  No 
Uniform  3  1996.15  236.98  87  No 
BDM  3  47.39  2.02  0  No 
Uniform  1  292.94  7.47  0  Yes 
BDM  1  78.66  2.23  14  Yes 
Uniform  2  1808.77  99.79  22  Yes 
BDM  2  65.41  2.36  20  Yes 
Uniform  3  2070.83  354.82  94  Yes 
BDM  3  49.63  1.91  25  Yes 
Strategy  Shifts  Average  SE  Extinctions  Replacements 

Uniform  1  215.67  3.48  0  No 
BDM  1  162.66  2.86  0  No 
Uniform  2  1798.93  80.39  0  No 
BDM  2  819.89  29.79  0  No 
Uniform  3  1996.15  236.98  93  No 
BDM  3  2763.40  583.03  95  No 
Uniform  1  304.24  8.48  0  Yes 
BDM  1  639.33  61.17  45  Yes 
Uniform  2  2055.99  102.86  21  Yes 
BDM  2  1372.00  201.63  84  Yes 
Uniform  3  2469.38  207.54  92  Yes 
BDM  3  NaN  NaN  100  Yes 
Strategy  Shifts  Average  SE  Extinctions  Replacements 

Uniform  1  217.30  2.22  0  No 
BDM  1  172.63  2.23  0  No 
Uniform  2  1811.84  85.41  0  No 
BDM  2  1026.76  45.78  0  No 
Uniform  3  1942.89  262.68  91  No 
BDM  3  2577.27  392.37  89  No 
Uniform  1  294.27  7.43  0  Yes 
BDM  1  268.87  12.68  7  Yes 
Uniform  2  1952.54  111.16  32  Yes 
BDM  2  1099.40  74.36  27  Yes 
Uniform  3  1953.33  440.85  94  Yes 
BDM  3  2563.00  753.07  98  Yes 
Acknowledgment
SHO wants to thank Francisco HernándezQuiroz for his continuous support.
Data accessibility
The results can be reproduced using the Online Algorithmic Complexity Calculator at http://www.complexitycalculator.com/.
Authors’ contributions
HZ and SHO conceived the project. HZ and NAK provided guidance, data and proposed experiments. SHO, HZ and NAK analyzed the data. SHO and HZ wrote code. SHO and HZ wrote the paper. All authors gave final approval for publication. HZ is the corresponding author.
Competing interests
We have no competing interests.
Funding
SHO acknowledge the financial support of the Mexican Science and Technology Council (CONACYT), the Posgrado en Ciencia e Ingeniería de la Computatión, UNAM, and the research grant 221341SEPCONACYT. HZ acknowledges the support of the Swedish Research Council (Vetenskapsrådet) grant No. 201505299 “Reglering och Entropisk Styrning av Biologiska Nätverk för Applicering på Immunologi och Cancer”
References
 [1] Hartl DL, Clark AG, Clark AG. Principles of population genetics. vol. 116. Sinauer associates Sunderland; 1997.
 [2] Chaitin GJ. Proving Darwin: Making Biology Mathematical. Vintage; 2013.
 [3] Wolfram S. A New Kind of Science. Wolfram Media Inc.; 2002.
 [4] Zenil H, Gershenson C, Marshall JAR, Rosenblueth DA. Life as Thermodynamic Evidence of Algorithmic Structure in Natural Environments. Entropy. 2012;14(11).
 [5] Zenil H, Marshall JAR. Some Aspects of Computation Essential to Evolution and Life. Ubiquity. 2013;April 2013:1–16.
 [6] Chaitin GJ. Evolution of Mutating Software. Bulletin of the EATCS. 2009;97:157–164.
 [7] Chaitin GJ. Life as Evolving Software. In: Zenil H, editor. A Computable Universe: Understanding and Exploring Nature as Computation. World Scientific Publishing Company; 2012. p. 277–302.
 [8] Turing AM. On Computable Numbers, with an application to the Entscheidungsproblem. Proceedings of the London Mathematical Society. 1936;42:230–265.

[9]
Solomonof RJ.
The Universal Distribution and Machine Learning.
COMPJ: The Computer Journal. 2003;46.  [10] Kirchherr W, Li M, Vitányi P. The miraculous universal distribution. The Mathematical Intelligencer. 1997;19(4):7–15.
 [11] HernándezOrozco S, HernándezQuiroz F, Zenil H. The Limits of Decidable States on OpenEnded Evolution and Emergence. Artificial Life. 2018;24:1:56–70.
 [12] Bedau MA. Four Puzzles About Life. ARTLIFE: Artificial Life. 1998;4.
 [13] Solomonoff RJ. A formal theory of inductive inference. Part I. Information and control. 1964;7(1):1–22.

[14]
Levin LA.
Laws of information conservation (nongrowth) and aspects of the foundation of probability theory.
Problemy Peredachi Informatsii. 1974;10(3):30–35.  [15] Chaitin GJ. On the length of programs for computing finite binary sequences. Journal of the ACM (JACM). 1966;13(4):547–569.
 [16] Kolmogorov A. Three Approaches to the Quantitative Definition of Information. Problems Inform Transmission. 1965;1:1–7.
 [17] Chaitin GJ. InformationTheoretic Limitations of Formal Systems. Journal of the ACM. 1974 Jul;21(3):403–424.
 [18] Zenil H, Badillo L, HernándezOrozco S, HernándezQuiroz F. Codingtheorem like behaviour and emergence of the universal distribution from resourcebounded algorithmic probability. International Journal of Parallel, Emergent and Distributed Systems. 2018;0(0):1–20. Available from: https://doi.org/10.1080/17445760.2018.1448932.
 [19] Calude CS, Dinneen MJ, Shu CK, et al. Computing a Glimpse of Randomness. Experimental Mathematics. 2002;11(3):361–370.
 [20] Zenil H, SolerToscano F, Kiani NA, HernándezOrozco S, RuedaToicen A. A Decomposition Method for Global Evaluation of Shannon Entropy and Local Estimations of Algorithmic Complexity. arXiv preprint arXiv:160900110. 2016;.
 [21] Zenil H, Kiani NA, Tegnér J. Lowalgorithmiccomplexity entropydeceiving graphs. Physical Review E. 2017;96(1):012308.
 [22] Shannon CE. A Mathematical Theory of Communication. The Bell System Technical Journal. 1948 Jul / Oct;27:379–423, 623–656.
 [23] Schmitt AO, Herzel H. Estimating the Entropy of DNA Sequences. Journal of Theoretical Biology. 1997;188(3):369 – 377. Available from: http://www.sciencedirect.com/science/article/pii/S0022519397904938.
 [24] Schaffer JD, Eshelman LJ. On Crossover as an Evolutionary Viable Strategy. In: Belew RK, Booker LB, editors. Proceedings of the 4th International Conference on Genetic Algorithms. Morgan Kaufmann; 1991. p. 61–68.
 [25] SolerToscano F, Zenil H, Delahaye JP, Gauvrit N. Calculating Kolmogorov Complexity from the Output Frequency Distributions of Small Turing Machines. PLoS ONE. 2014;9(5): e96223.
 [26] Delahaye JP, Zenil H. Numerical Evaluation of the Complexity of Short Strings: A Glance Into the Innermost Structure of Algorithmic Randomness. Applied Mathematics and Computation. 2012;219:63–77.
 [27] Zenil H, SolerToscano F, Dingle K, Louis AA. Correlation of automorphism group size and topological properties with programsize complexity evaluations of graphs and complex networks. Physica A: Statistical Mechanics and its Applications. 2014 Jun;404:341–358.
 [28] Zenil H, Kiani NA, Tegnér J. Methods of Information Theory and Algorithmic Complexity for Network Biology. Seminars in Cell and Developmental Biology. 2016;51:32–43.
 [29] Kiani NA, Kaderali L. Dynamic probabilistic threshold networks to infer signaling pathways from timecourse perturbation data. BMC Bioinformatics. 2014;15(1):250.
 [30] Yarden Y, Sliwkowski M. Untangling the ErbB signalling network. Nat Rev Mol Cell Biol. 2001;2(2):127–37.
 [31] Olayioye MA, Neve RM, Lane HA, Hynes NE. The ErbB signaling network: receptor heterodimerization in development and cancer. EMBO J. 2000;19(3):3159–67.
 [32] Mitra K, Carvunis AR, Ramesh SK, Ideker T. Integrative approaches for finding modular structure in biological networks. Nature reviews Genetics. 2013;14(10):719.
 [33] SolerToscano F, Zenil H. A Computable Measure of Algorithmic Probability by Finite Approximations with an Application to Integer Sequences. Complexity (Accepted). 2017;.
 [34] Zenil H, SolerToscano F, Delahaye JP, Gauvrit N. TwoDimensional Kolmogorov Complexity and Validation of the Coding Theorem Method by Compressibility. PeerJ Computer Science. 2015;1:e23.