1. Introduction
Since the birth of artificial intelligence, the ultimate aim of the field has been to replicate intelligent behavior akin to those found in humans and animals. To this end, a number of tools and techniques have been developed. Among these, reinforcement learning
(Sutton and Barto, 1998)(LeCun et al., 2015)(Koza, 1994; Holland, 1992) are the most powerful and popular ones till date. Although these methods are extremely useful, they require some form of explicit task specification. As a result, they have a limited degree of autonomy, and fail to generalize well (Narodytska and Kasiviswanathan, 2016; Taylor and Stone, 2009). In addition, the mentioned approaches fail to capture the mechanisms through which intelligent beings came into existence in natural systems. Even approaches that are inspired from biological evolution require the specification of a fitness function, and the aim is solely to solve a specific optimization problem. Even when the aim is to simulate artificial life (Sims, 1994; Nolfi et al., 2016), these approaches do not capture the trends towards increasing diversity and complexity which are seen in biological evolution. We hypothesize that in order to design truly intelligent agents, the fundamental, natural processes that led to their creation need to be recreated to a certain extent. In this work, we propose an approach based on the zeroforce evolutionary law (McShea and Brandon, 2010), which states that an increase in diversity and complexity is a natural outcome of evolutionary systems that possess the properties of heredity and variation. These properties constitute an imperfect selfreplication process, which is the key idea behind our approach.2. Approach and Preliminary Results
Our approach starts with a population of fundamental elements, analogous to a primordial soup (Haldane, 1929)
which contains the fundamental components needed to build more complex entities/agents. However, in each generation, such an agent is allowed to survive and selfreplicate only when a certain replication rule is followed. This rule is similar to the fitness function used in traditional evolutionary algorithms, in the sense that it determines which agents are allowed to continue to the next generation. However, unlike traditional approaches, the selection is not fitness proportional. Instead, agents that follow the replication rule are allowed to replicate and those that do not are removed from the population. Each agent is assigned a limited number of generations, referred to as the generation lifetime of an agent, within which it may selfreplicate. Once the generation lifetime decays to zero, it is removed from the population. The imperfect nature of selfreplication allows mutations to occur with a fixed, predefined probability, otherwise producing identical offspring. The nature of the mutation can be additive or subtractive, and occur with equal probability. Additive mutations append the offspring with new, randomly picked elements, while subtractive mutations remove a randomly picked element from the offspring agents. This allows the number of elements comprising the agent (agent complexity) to grow or reduce across generations. The proposed algorithm is summarized in Algorithm
1.In order to demonstrate the described approach, we consider a problem of discovering the sequence of all prime numbers up to a given number . The set of fundamental elements is thus the set of integers from to
. Here, the hyperparameters mentioned in Algorithm
1 are set to be as follows: , , , and . The agents are initialized to be a random integer between and , and are allowed to selfreplicate as described in Algorithm 1. The rule for self replication in this case is simply that the agents must be a continuous sequence of prime numbers starting from , without repetition. With this replication rule, initially, agents of complexity (here, the length of the sequence is synonymous with complexity) are discovered, and they replicate, leading to an exponential growth in population. Subsequently, owing to mutations, more complex agents are discovered, and are allowed to replicate. This process continues until the sequence of all prime numbers is discovered. In the final population, agents with lower complexities emphatically outnumber those with higher complexities. This is a feature that is also true in biological ecosystems, perhaps due to the similar manner in which complex species evolve from simpler ones. In practice, since the growth of the agents is so rapid, and since the algorithm loops through all the agents in the population, the discovery of more complex agents eventually becomes prohibitively slow and computationally intensive. In order to overcome this limitation, one may periodically eliminate agents with lower levels of complexity, and focus the computational effort on more complex agents. With this periodic selective extinction approach, using an ordinary desktop computer, the complete sequence of primes numbers was obtained in the order of a minute’s time. We also applied this approach to the OneMax problem, in which the objective is to maximize the number of ’s in a fixed length string of numbers. To this end, the replication rule of the prime number problem was merely modified to the following: if the agent has all elements as , allow it to replicate; if not, remove it from the population.Figures 1(a) and 1(b) show that the described approach leads to increased complexity and diversity with the number of generations. The exponential increase in the population (Figure 1(c)) makes the computation intractable in the absence of extinction events. As a result, only a complexity of up to could be achieved as shown in Figures 1(a) and 1(b). However, periodic forced extinction events allow more complex solutions to be discovered at a steady rate, as shown in Figure 1(d). This shows that the proposed approach, apart from being able to evolve agents of increasing complexity, can also be used as a stochastic optimization tool.
In nature, replicative success is determined by specific conditions imposed by the environment itself. Hence, in general, designing appropriate replication rules may not be trivial, as in the case of fitness functions. However, the less restrictive nature of the replication rule may allow for greater flexibility when compared to traditional evolutionary approaches. Although our approach, as described, does not include a learning component, this can be incorporated into the innermost ‘for’ loop in Algorithm 1. This opens up the possibility of incorporating established learning approaches and leveraging the Baldwin effect (Baldwin, 1896) to guide the evolutionary process. Doing so could possibly constitute a more realistic approach for designing truly autonomous and intelligent agents.
3. Conclusions
In this work, we introduced an approach that captures the typical trends of increasing complexity and diversity observed in biological evolution. We tested the approach on two simple problems, and in this context, described its important characteristics. We also proposed a method to simply utilize this approach as a tool for solving stochastic optimization problems. We posit that such an approach, especially when combined with existing learning techniques, could enable the design of truly intelligent artificial agents.
References
 (1)
 Baldwin (1896) J Mark Baldwin. 1896. A new factor in evolution. The american naturalist 30, 354 (1896), 441–451.
 Haldane (1929) JBS Haldane. 1929. Rationalist Annual. The origin of Life (1929), 148.
 Holland (1992) John Henry Holland. 1992. Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press.
 Koza (1994) John R Koza. 1994. Genetic programming as a means for programming computers by natural selection. Statistics and computing 4, 2 (1994), 87–112.
 LeCun et al. (2015) Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature 521, 7553 (2015), 436.
 McShea and Brandon (2010) Daniel W McShea and Robert N Brandon. 2010. Biology’s first law: the tendency for diversity and complexity to increase in evolutionary systems. University of Chicago Press.
 Narodytska and Kasiviswanathan (2016) Nina Narodytska and Shiva Prasad Kasiviswanathan. 2016. Simple blackbox adversarial perturbations for deep networks. arXiv:1612.06299 (2016).
 Nolfi et al. (2016) Stefano Nolfi, Josh C Bongard, Phil Husbands, and Dario Floreano. 2016. Evolutionary Robotics. (2016).
 Sims (1994) Karl Sims. 1994. Evolving 3D morphology and behavior by competition. Artificial life 1, 4 (1994), 353–372.
 Sutton and Barto (1998) Richard S Sutton and Andrew G Barto. 1998. Reinforcement learning: An introduction. Vol. 1. MIT press Cambridge.

Taylor and Stone (2009)
Matthew E Taylor and
Peter Stone. 2009.
Transfer learning for reinforcement learning
domains: A survey.
Journal of Machine Learning Research
10, Jul (2009).