Towards a framework for the evolution of artificial general intelligence

03/25/2019 ∙ by Sidney Pontes-Filho, et al. ∙ OsloMet 0

In this work, a novel framework for the emergence of general intelligence is proposed, where agents evolve through environmental rewards and learn throughout their lifetime without supervision, i.e., self-supervised learning through embodiment. The chosen control mechanism for agents is a biologically plausible neuron model based on spiking neural networks. Network topologies become more complex through evolution, i.e., the topology is not fixed, while the synaptic weights of the networks cannot be inherited, i.e., newborn brains are not trained and have no innate knowledge of the environment. What is subject to the evolutionary process is the network topology, the type of neurons, and the type of learning. This process ensures that controllers that are passed through the generations have the intrinsic ability to learn and adapt during their lifetime in mutable environments. We envision that the described approach may lead to the emergence of the simplest form of artificial general intelligence.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The brain is a truly remarkable computing machine that continuously adapts through sensory inputs. Rewards and penalties are encoded and learned throughout the evolution of organisms living in an environment (our world) that continuously provides unlabeled and mutable data. The supervision in the brain is a product of such evolutionary process. A real-world environment does not provide labeled data (i.e., supervised learning) nor predefined fitness functions (i.e., reinforcement learning) to organisms and their brains. However, organisms know which sensory inputs or input sequences may affect positively or negatively their survival and reproduction. One of the key components which ensures that a species will reproduce is the lifetime of the organisms. Pleasure, joy, and desire (or other positive inputs) may increase the lifetime of an organism and act as rewards. Pain, fear, and disgust may decrease the lifetime and act as penalties. All those feelings and emotions are results of the evolutionary pressure for increasing the life expectancy and succeeding in generating offspring

(Zador, 2019). An example is the desire and disgust emotions that arise for some smells. The desire emotion may come from the smell of nutritive food which increases life expectancy, while the disgust emotion may come from spoiled food which may cause food poisoning, and therefore causing a lifetime reduction. Evolution by natural selection made it possible for living beings to be ”interpreters” of sensory inputs by being attracted to rewards and repulsed by penalties.

Artificial General Intelligence (AGI) or strong Artificial Intelligence (AI) has been pursued for many years by researchers in many fields, with the goal of reproducing human-level intelligence in machines, e.g. the ability of generalization and self-adaptation. So far, the AI scientific community achieved outstanding results for specific tasks, i.e., weak or narrow AI. In this work, we propose to tackle the quest to general intelligence in its simplest form through evolution. It is therefore essential the development of a mutable environment that mimics what the first living beings with the simplest nervous systems faced.

In this work, we propose the Neuroevolution of Artificial General Intelligence (NAGI) framework. NAGI is a bio-inspired framework which uses plausible models of biological neurons, i.e., spiking neurons (Izhikevich, 2003), in an evolved network structure that controls a sensory-motor system in a mutable environment. Evolution affects the connection structure of neurons, their neurotransmitters (excitatory and inhibitory), and their local bio-inspired learning algorithms. The inclusion of such learning algorithms under evolutionary control is an important factor to generate long-term associative memory neural networks which may have cells with different plasticity rules (Grewe et al., 2017). Moreover, the genotype of the agents does not contain the synaptic strength (weights) of the connections to avoid any innate knowledge about the mutable environment. However, controllers that are selected for reproduction are those rewarded by their ability of self-learning and adaptation to new environments, i.e. a newborn brain of an agent is an ”empty box” with the ability to learn different environments in its lifetime.

The remainder of this paper is organized as follows. Section 2 provides background knowledge for the proposed NAGI framework, and Section 3 describes related works. Section 4 contains a detailed explanation of the framework, and Section 5 concludes our work by discussing the relevance of our approach for current AGI research, and elaborates on possible future works which may include such novel learning method.

2 Background

The proposed NAGI framework brings together several key approaches in artificial intelligence and evolutionary robotics (Doncieux et al., 2015), briefly reviewed in this section. Spiking Neural Network (SNN) (Izhikevich, 2003) is a type of artificial neural network which consists of biologically plausible neuron models. Such neurons communicate with spikes or binary values in time series. SNNs incorporate the concept of time by intrinsically modeling the membrane potential within each neuron. Neurons spike when the membrane potential reaches a certain threshold. When the signals propagate to the neighboring neurons, their membrane potentials are therefore affected. While SNNs are able to learn through unsupervised methods, i.e. Hebbian learning (Hebb, 1949) and Spike-Timing-Dependent Plasticity (STDP) (Li et al., 2014), spike trains are not differentiable and cannot be trained efficiently through gradient descent. NeuroEvolution of Augmenting Topologies (NEAT) (Stanley and Miikkulainen, 2002)

is a method that uses Genetic Algorithm (GA)

(Holland, 1992) to grow the topology of a simple neural network and adjust the weights of the connections to optimize a target fitness function, while allowing to keep diversity (speciation) in the population and to maintain compatible gene crossover with historical marking. The neuroplasticity used to adapt the weights in the proposed NAGI framework will include the Hebbian learning rule as STDP. In particular, the weight adaptation of STDP happens when the neuron produces a spike or action potential through the axon (i.e., output connection). Such event allows the modification of the synaptic strength of the dendrites (i.e., input connections) that caused or did not cause that spike.

Funes and Pollack (1998) describe the body/brain interaction (sensors and actuators vs. controller) as ”chicken and egg” problem; the course of natural evolution shows a history of body, nervous system, and environment all evolving simultaneously in cooperation with, and in response to, each other (Mautner and Belew, 2000). Embodied evolution (Watson et al., 1999) is an evolutionary learning method for training agents through embodiment, i.e., embodied agents learning in an environment. Thus, in nature, general intelligence is a result of evolved self-supervised learning through embodiment.

3 Related work

The idea of neuroevolution with adaptive synapses is not new.

Stanley et al. (2003) present NEAT with adaptive synapses using Hebbian local learning rules, with the goal of training neural networks for controlling agents in an environment. The authors verify the difference in performance with and without adaptation on the dangerous food foraging domain. Their environment is different from the one proposed in this work in that it is static throughout the agent lifetime. Their results show that both networks with and without adaptive synapses reach the maximum fitness on that domain, and therefore both present ”adaptation”. An extended version of the previous method is Adaptive Hypercube-based NEAT (HyperNEAT) (Risi and Stanley, 2010). Adaptive HyperNEAT includes indirect encoding of the network topology as large geometric patterns.

A recent review of neuroevolution can be found in Ref. (Stanley et al., 2019)

and it shows how competitive NEAT and its extensions are in comparison to deep neural networks trained with gradient-based methods for reinforcement learning tasks. Neuroevolution provides several extensions, which include indirect encoding to allow scalability, novelty search to promote diversity, meta-learning for training a network to learn how to learn, and the combination with deep learning for searching deep neural network architectures. Furthermore, its authors envisage that neuroevolution will be a key factor to reach AGI through meta-learning and open-ended evolution. However, in NEAT the neural weights are inherited, so there is no explicit target for general intelligence and adaptation. A framework for the neuroevolution of SNNs and topology growth with genetic algorithms is proposed by

Schaffer (2015), with the goal of pattern generation and sequence detection. Eskandari et al. (2016) propose a similar framework for artificial creature control, where the evolutionary process modifies and inherits the network topology and the SNN weights to perform a given task.

A method which tries to produce general intelligence incrementally is PathNet (Fernando et al., 2017), where deep neural network paths are selected through evolution to perform the forward propagation and weight adjustment. Such evolving selection allows the network to learn new tasks faster by re-using frozen (previously learned) paths without catastrophic forgetting. Another framework which tries to produce low-level general intelligence is described by Voss (2007). It is a functional proof-of-concept prototype, owned by the company Adaptive A.I. Inc., which can interact with virtual and real world through sensors and actuators. Its controller with conceptual general intelligence capabilities consists of a memory to save all data and to store the proprietary cognitive algorithms.

Multi-agent environments have also been considered a valuable stepping stone towards AGI because the behavior of agents must adapt to cooperate and compete with others. One recent example of such multi-agent environment is presented by Lowe et al. (2017). However, in such environment, the adaptation occurs in the actor-critic methods of their reinforcement learning framework. Such method outperforms traditional reinforcement learning approaches on competitive and cooperative multi-agent environments. Another reinforcement learning method which exploits multi-agent environments is introduced by Jaderberg et al. (2018). In their work, they use the environment of Quake III Arena Capture the Flag, a 3D first-person multiplayer game. Their method in this game exceeds human-level performance, therefore the artificial agents are able to cooperate and compete among them and even with human players. One work that provides an open-source competitive multi-agent environment for research purposes is Neural MMO (Suarez et al., 2019). Here, the agents are players which need to survive and prosper in an environment similar to the ones used in Massively Multiplayer Online Role-Playing Games (MMORPGs).

One important aspect of natural evolution is the ability to endlessly produce diverse solutions of increasing complexity, i.e., open-ended evolution (OOE). In contrast, OOE is difficult to achieve in artificial systems. A conceptual framework for the implementation of OOE in evolutionary systems is presented by Taylor (2019). Embodiment plays a key role in OOE in the context of the agent and its morphology, as discussed by Bongard et al. (2018). For an articulated summary and discussion of OOE see Ref. (Taylor et al., 2016). In Ref. (Stanley et al., 2017) the authors argue that open-ended evolution is a grand challenge for artificial general intelligence, and artificial life methods are promising starting points for the inclusion of OOE in AI systems.

4 Framework concept

The main concept of the proposed NAGI framework is to mimic as close as possible the evolution of general intelligence in biological organisms, starting from the simplest form. To do that, we propose a minimalistic model with the following components. An agent is equipped with a randomly-initialized minimal spiking neural network. The agent is placed in a mutable environment in order to be able to generalize (learn to learn), instead of merely learning to solve the specific environment. Agents are more likely to survive if they perform correct actions. Agents have access to the environment through sensory inputs. The environment also provides intrinsic rewards and penalties. New agents inherit the topologies of the controllers from the previous generation (untrained), with the possibility of complexification (e.g., new neurons and synapses can appear through genetic operators). Training happens throughout a generation. The goal of the untrained inherited controllers is to possess a topology that supports the ability to learn new environments. Neural learning happens through self-supervision (environmental information: sensory input and environmental rewards/penalties) via neuroplasticity (e.g. Hebbian learning through spike-timing-dependent plasticity). The result is an unsupervised evolving system that learns without explicit training in a self-supervised manner through embodiment. In this section, the components of the framework are described in details.

4.1 Data representation

The data that flows to and from the spiking neural networks that control the agents are encoded as firing rate, i.e., the number of spikes per second. The firing rate has minimum and maximum values, and is represented for simplification as real number between 0 and 1 (i.e., range

). The stimulus to the neural networks can be Poisson-distributed spikes which have irregular interspike intervals, as observed in the human cortex

(Heeger, 2000). That representation can be used for encoding input from binary environments (e.g., binary numbers and or Boolean values and ), or multi-value environments (e.g. represented as grayscale from black to white), and allows for representation of minimum and maximum activation values of sensors and actuators.

4.2 Self-Supervised Learning through Embodiment

A new agent learns through the reactions of an environment via embodiment (i.e., by having a ”body” that affects an environment while sensing it). As such, the input of the neural network controller includes reward and penalty information for the learning process. This feedback information is the key factor for achieving self-supervised learning. The concept of self-adaptation is closely connected with embodied cognition, a core property of living beings (Smith and Gasser, 2005)

. In contrast, supervised learning and reinforcement learning use the error of the neural network output to globally adjust the network model through methods of iterative error reduction, such as gradient descent and evolutionary algorithms. In embodied learning, the input itself is used to adjust the agent’s controller. Such sensory input contains the reactions of the environment to the actions of the agent.

In the proposed framework, the spiking neural network controller’s local learning rules are responsible to correct the global behavior of the network according to agent experiences. This learning approach is, therefore, a result of self-supervision (Voss, 2007) through embodiment. The framework overview is depicted in Fig. 1

. Note that self-supervision through embodiment only works with agents in reactive environments (environments that affects the agents and are affected by them), such as any sensory-motor system deployed in the real-world. Non-reactive environments, on the other hand, do not react to any action of the agent, like any image classifier or object detector which only gives environmental information, thus there is no mutual interaction between an agent and a non-reactive environment. Therefore, we propose to create a virtual reactive environment for such cases. Virtual Embodied Learning (VEL) is the proposed method for such cases when no reward and penalty feedback is available through the sensory input. VEL adds reward and penalty inputs to a given sensory-motor system as illustrated in Fig. 

1. In addition, VEL can also substitute supervised and reinforcement learning by using the loss of the model as penalty input and the opposite of the loss as reward input.

Figure 1: Illustration of virtual embodied learning or self-supervised learning through embodiment in a non-reactive environment. In case of a reactive environment, rewards and penalties are embedded within the environmental data.

4.3 Mutable environment

To truly exploit and assess the self-learning capabilities and the generalization of the evolving spiking neural network, a mutable environment is proposed. The evolutionary goal of agents is to survive to the changes in their environment. In the real-world, living organisms inherit modifications to their body and/or behavior through the generations. For example, a species may evolve a camouflage, such as the stick insects (Lev-Yadun et al., 2004), and another one may evolve the appearance of a poisonous or venomous animal, such as the false coral snakes (Davidson and Eisner, 1996). The proposed mutable environment is a simple metaphor of such examples.

Fig. 1(a) shows mutable environments which every agent in the population faces during its lifetime. Each agent has one sensor which provides one bit of information (i.e., or ) and can perform two actions (i.e., or ). In each generation, the agents are presented with environmental data from several environments. Each sample is presented for a given period of time to allow the agents’ controllers to learn. In the first environment, the correct action is associated with the color while the action is associated with . Once the environmental data has been consumed by the agent, there is an abrupt change in the interpretation of the environment ( and are flipped) and the agents are presented with the environmental data again. Agents that perform well in many environments within each generation are more likely to go through the next generation.

Fig. 1(b) presents more complex mutable environments where agents have two sensors and non-binary environmental values can be received. As shown in the figure, different environments are procedurally generated and presented in each generation, where abrupt changes in the labeling of correct and wrong actions have happened. The set of actions may also be expanded to more than two, with different effects on agents’ lifetime and their fitness scores.

(a) 1D binary data
(b) 2D floating-point data
Figure 2: Samples of mutable environments that can be presented to the agents through their evolution. Agents can execute two actions (eat or avoid). Within each generation, after the agents of a generation have seen all samples of an environment, a new one is presented. fig:mutable_env1 1D environment where the agent has one sensor. fig:mutable_env2 The agent has two sensors (i.e, the axes).

4.4 Neuroplasticity

Each neuron in the evolved spiking neural network may have a different plasticity rule. The different types of learning rules are subject to evolutionary control. Examples of learning rules include asymmetric Hebbian, symmetric Hebbian, asymmetric anti-Hebbian, symmetric anti-Hebbian (Li et al., 2014). Together with all the Hebbian learning rules encoded in the genome, there will be the effectiveness of the potentiation and the depression of the synapse strengths, i.e., how strong the learning rules are going to be for reducing or increasing the weight of the synapses. Moreover, other types of learning rules discovered in neuroscience may be added together or in parallel to those, such as non-Hebbian learning, neuromodulation, and synapse fatigue (Kato et al., 2009; Johansen et al., 2014; Abrahamsson et al., 2005). The neuroplasticity will also be regulated by a maximum total value of synaptic strength that a neuron can have for its dendrites. In case this value is reached, the increase in the weight of a synapse will cause a decrease of the others in the same neuron. This type of weight normalization is reported in Refs. (Royer and Paré, 2003; El-Boustani et al., 2018) for biological neurons.

4.5 Neuroevolution

The population of genomes (spiking neural network controllers) for the agents is evolved through a modification of NEAT (Stanley and Miikkulainen, 2002). The genotypes of NEAT describe the topology and weights of the synapses, while our proposed method does not evolve the weights while includes in the genotype the type of neuroplasticity (Li et al., 2014). The weights of the spiking neural networks are randomly initialized in every generation because the agents should not have innate knowledge of the environment (Zador, 2019). Therefore, the proposed framework focuses on the self-learning capabilities of the agents. Their lifetime will be longer when agents perform correct actions and shorter when they perform wrong actions. The lifetime of agents is used as fitness score to define the best performing neural networks.

Algorithm 1 explains how an agent’s genome is evaluated while it is in a mutable environment during its lifetime. The fitness score for the agent is equal to the time the agent is alive until its death. Each agent has a maximum life expectancy. Such life expectancy is reduced faster when an agent receives a penalty and it is reduced slower when the agent receives a reward. Both penalties and rewards reduce the lifetime of agents, as one agent is not to live for an infinite amount of time if it performs always the correct action.

The neuroevolution process allows the growth of neural network topologies and therefore the population is initialized with minimal networks that complexify over time. Nevertheless, there may be a penalty on lifetime to avoid the generation of big networks which may have neuron groups that specialize for each different environment. Therefore it allows the network to learn how to forget the previous environment, and then be able to adapt to the new one (Benjamin, 2011). Another reason to apply this penalty for the size of the network is that more neurons require more energy to maintain them. This reduction of lifetime caused by the number of neurons can be regulated by a parameter, therefore choosing it is of high importance to the fitness and lifetime of the agent.

1:procedure evaluate()
2:      new Agent() Agent is initialized with untrained neural network
3:     
4:     while  is alive do
5:         if  is empty then
6:               Next temporary environment of the current generation          
7:         
8:          Initialization of
9:          Initialization of
10:         while  is learningSample is alive do Agent learns the presented sample with neuroplasticity for a period of time
11:              
12:              
13:              
14:              
15:               Penalty reduces the agent’s health faster than reward, then accelerating its death               
16:     return
Algorithm 1 Agent’s genome evaluation using mutable environment and virtual embodied learning

5 Discussion and Conclusion

While current AI methods such as deep learning and reinforcement learning (and their combinations) have proven to be successful in solving a multitude of challenging tasks, e.g., defeating humans in the real-time strategy game Starcraft II (Vinyals et al., 2019), there is a lot of debate around the limitation of current methods for breakthroughs in Artificial General Intelligence. One key difference between AI and AGI is the learning ability. Most of AI methods (supervised, unsupervised, and reinforcement learning) are explicitly trained, while AGI needs some intrinsic ability to self-learn.

One of the open questions for AGI research is: how can artificial agents be able to acquire the general skill of learning, in order to continuously adapt throughout their lifetime?

In biological systems, self-learning is a result of rewards and penalties which are embedded in the sensory data living beings receive from the environment (unlabeled and mutable data). Their ability to learn through this form of self-supervised learning through embodiment is a result of evolution.

One of the goals of the proposed NAGI framework is an AGI system that allows the adaptation and general learning skills through the three main levels of self-organization in living systems (Sipper et al., 1997):

  • Phylogeny, which includes evolution of genetic representations;

  • Ontogeny, which takes care of the morphogenetic process (growth) from a single cell to a multicellular machine, by following the genotype instructions;

  • Epigenesis, which allows the emergence of a learning system through an indirect encoding between genotype and phenotype, and the phenotype is subject to modifications (learning) through the lifetime while interacting with the environment.

We, therefore, envision the proposed spiking neural network model will include developmental and morphogenetic processes (Doursat et al., 2012) in future extensions of the framework.

Another envisioned stepping stone to AGI is the extension of the framework to artificial life multi-agent systems. Multi-agent environments will allow the emergence of more advanced strategies of adaptation and learning based on collaboration and competition. In addition, the framework may benefit from extending the environment itself into an evolving agent, which can also allow for increased complexity and open-ended evolution.

Finally, we expect that future implementations of the NAGI framework and its extensions will be deployed/embodied into real robot agents equipped with physical sensors.

In conclusion, this work proposes a novel general framework for the neuroevolution of artificial general intelligence (NAGI) in its simplest form, which can be extended to more complex tasks and environments. In NAGI, the general intelligence, i.e., learning to learn to adapt to different environments, is a result of self-supervised learning through embodiment. Therefore, the learning process is not a result of explicit training with supervision or reinforcement learning, as there is no loss function used to adjust the neural network weights. The proposed neural network model is a bio-inspired model based on spiking neural networks. Learning is based on spike-timing-dependent plasticity which uses only input data for learning. As such, penalties and rewards are embedded within the environmental data sensed by the agents.

We expect more researchers and prominent laboratories around the globe getting involved in Artificial General Intelligence research. We envision that the proposed NAGI framework will motivate more AGI research, and in particular, methods inspired by artificial life, complex systems, and neuroscience.

Acknowledgments

We thank Kristine Heiney, Gustavo Moreno e Mello, and Anis Yazidi for thoughtful comments and discussions. This work was supported by Norwegian Research Council SOCRATES project (grant number 270961).

References

  • Zador (2019) A. Zador, A critique of pure learning: What artificial neural networks can learn from animal brains, bioRxiv preprint bioRxiv:582643 (2019).
  • Izhikevich (2003) E. M. Izhikevich, Simple model of spiking neurons, IEEE Transactions on neural networks 14 (2003) 1569–1572.
  • Grewe et al. (2017) B. F. Grewe, J. Gründemann, L. J. Kitch, J. A. Lecoq, J. G. Parker, J. D. Marshall, M. C. Larkin, P. E. Jercog, F. Grenier, J. Z. Li, et al., Neural ensemble dynamics underlying a long-term associative memory, Nature 543 (2017) 670.
  • Doncieux et al. (2015) S. Doncieux, N. Bredeche, J.-B. Mouret, A. E. G. Eiben, Evolutionary robotics: what, why, and where to, Frontiers in Robotics and AI 2 (2015) 4.
  • Hebb (1949) D. O. Hebb, The organization of behavior: A neuropsychological theory, Wiley, New York, 1949.
  • Li et al. (2014) Y. Li, Y. Zhong, J. Zhang, L. Xu, Q. Wang, H. Sun, H. Tong, X. Cheng, X. Miao, Activity-dependent synaptic plasticity of a chalcogenide electronic synapse for neuromorphic systems, Scientific reports 4 (2014) 4906.
  • Stanley and Miikkulainen (2002) K. O. Stanley, R. Miikkulainen, Evolving neural networks through augmenting topologies, Evolutionary computation 10 (2002) 99–127.
  • Holland (1992) J. H. Holland, Genetic algorithms, Scientific american 267 (1992) 66–73.
  • Funes and Pollack (1998) P. Funes, J. Pollack, Evolutionary body building: Adaptive physical designs for robots, Artificial Life 4 (1998) 337–357.
  • Mautner and Belew (2000) C. Mautner, R. K. Belew, Evolving robot morphology and control, Artificial Life and Robotics 4 (2000) 130–136.
  • Watson et al. (1999) R. A. Watson, S. G. Ficici, J. B. Pollack, Embodied evolution: embodying an evolutionary algorithm in a population of robots, 1999.
  • Stanley et al. (2003) K. O. Stanley, B. D. Bryant, R. Miikkulainen, Evolving adaptive neural networks with and without adaptive synapses, in: The 2003 Congress on Evolutionary Computation, 2003. CEC’03., volume 4, IEEE, 2003, pp. 2557–2564.
  • Risi and Stanley (2010) S. Risi, K. O. Stanley, Indirectly encoding neural plasticity as a pattern of local rules, in: International Conference on Simulation of Adaptive Behavior, Springer, 2010, pp. 533–543.
  • Stanley et al. (2019) K. O. Stanley, J. Clune, J. Lehman, R. Miikkulainen, Designing neural networks through neuroevolution, Nature Machine Intelligence 1 (2019) 24–35. URL: https://doi.org/10.1038/s42256-018-0006-z. doi:10.1038/s42256-018-0006-z.
  • Schaffer (2015) J. D. Schaffer, Evolving spiking neural networks: A novel growth algorithm corrects the teacher, in: 2015 IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), 2015, pp. 1–8. doi:10.1109/CISDA.2015.7208630.
  • Eskandari et al. (2016) E. Eskandari, A. Ahmadi, S. Gomar, M. Ahmadi, M. Saif, Evolving spiking neural networks of artificial creatures using genetic algorithm, in: Neural Networks (IJCNN), 2016 International Joint Conference on, IEEE, 2016, pp. 411–418.
  • Fernando et al. (2017) C. Fernando, D. Banarse, C. Blundell, Y. Zwols, D. Ha, A. A. Rusu, A. Pritzel, D. Wierstra, Pathnet: Evolution channels gradient descent in super neural networks, arXiv preprint arXiv:1701.08734 (2017).
  • Voss (2007) P. Voss, Essentials of general intelligence: The direct path to artificial general intelligence, in: Artificial general intelligence, Springer, 2007, pp. 131–157.
  • Lowe et al. (2017) R. Lowe, Y. Wu, A. Tamar, J. Harb, O. P. Abbeel, I. Mordatch, Multi-agent actor-critic for mixed cooperative-competitive environments, in: Advances in Neural Information Processing Systems, 2017, pp. 6379–6390.
  • Jaderberg et al. (2018) M. Jaderberg, W. M. Czarnecki, I. Dunning, L. Marris, G. Lever, A. G. Castaneda, C. Beattie, N. C. Rabinowitz, A. S. Morcos, A. Ruderman, et al., Human-level performance in first-person multiplayer games with population-based deep reinforcement learning, arXiv preprint arXiv:1807.01281 (2018).
  • Suarez et al. (2019) J. Suarez, Y. Du, P. Isola, I. Mordatch, Neural mmo: A massively multiagent game environment for training and evaluating intelligent agents, arXiv preprint arXiv:1903.00784 (2019).
  • Taylor (2019) T. Taylor, Evolutionary innovations and where to find them: Routes to open-ended evolution in natural and artificial systems, Artificial Life 25 (2019).
  • Bongard et al. (2018) J. Bongard, N. Cheney, Z. Mahoor, J. Powers, The role of embodiment in open-ended evolution, OOE3: The Third Workshop on Open-Ended Evolution (2018).
  • Taylor et al. (2016) T. Taylor, M. Bedau, A. Channon, D. Ackley, W. Banzhaf, G. Beslon, E. Dolson, T. Froese, S. Hickinbotham, T. Ikegami, et al., Open-ended evolution: perspectives from the oee workshop in york, Artificial life 22 (2016) 408–423.
  • Stanley et al. (2017) K. O. Stanley, J. Lehman, L. Soros, Open-endedness: The last grand challenge you’ve never heard of, O’Reilly (2017).
  • Heeger (2000) D. Heeger, Poisson model of spike generation, Handout, University of Standford 5 (2000) 1–13.
  • Smith and Gasser (2005) L. Smith, M. Gasser, The development of embodied cognition: Six lessons from babies, Artificial life 11 (2005) 13–29.
  • Lev-Yadun et al. (2004) S. Lev-Yadun, A. Dafni, M. A. Flaishman, M. Inbar, I. Izhaki, G. Katzir, G. Ne’eman, Plant coloration undermines herbivorous insect camouflage, BioEssays 26 (2004) 1126–1130.
  • Davidson and Eisner (1996) T. M. Davidson, J. Eisner, United states coral snakes, Wilderness & Environmental Medicine 7 (1996) 38–45.
  • Kato et al. (2009) H. K. Kato, A. M. Watabe, T. Manabe, Non-hebbian synaptic plasticity induced by repetitive postsynaptic action potentials, Journal of Neuroscience 29 (2009) 11153–11160.
  • Johansen et al. (2014) J. P. Johansen, L. Diaz-Mataix, H. Hamanaka, T. Ozawa, E. Ycu, J. Koivumaa, A. Kumar, M. Hou, K. Deisseroth, E. S. Boyden, et al., Hebbian and neuromodulatory mechanisms interact to trigger associative memory formation, Proceedings of the National Academy of Sciences 111 (2014) E5584–E5592.
  • Abrahamsson et al. (2005) T. Abrahamsson, B. Gustafsson, E. Hanse, Synaptic fatigue at the naive perforant path–dentate granule cell synapse in the rat, The Journal of physiology 569 (2005) 737–750.
  • Royer and Paré (2003) S. Royer, D. Paré, Conservation of total synaptic weight through balanced synaptic depression and potentiation, Nature 422 (2003) 518.
  • El-Boustani et al. (2018) S. El-Boustani, J. P. K. Ip, V. Breton-Provencher, H. Okuno, H. Bito, M. Sur, Locally coordinated synaptic plasticity shapes cell-wide plasticity of visual cortex neurons in vivo, bioRxiv preprint bioRxiv:249706 (2018).
  • Benjamin (2011) A. S. Benjamin, Successful remembering and successful forgetting: A festschrift in honor of Robert A. Bjork, Psychology Press, 2011.
  • Vinyals et al. (2019) O. Vinyals, I. Babuschkin, J. Chung, M. Mathieu, M. Jaderberg, W. M. Czarnecki, A. Dudzik, A. Huang, P. Georgiev, R. Powell, T. Ewalds, D. Horgan, M. Kroiss, I. Danihelka, J. Agapiou, J. Oh, V. Dalibard, D. Choi, L. Sifre, Y. Sulsky, S. Vezhnevets, J. Molloy, T. Cai, D. Budden, T. Paine, C. Gulcehre, Z. Wang, T. Pfaff, T. Pohlen, Y. Wu, D. Yogatama, J. Cohen, K. McKinney, O. Smith, T. Schaul, T. Lillicrap, C. Apps, K. Kavukcuoglu, D. Hassabis, D. Silver, AlphaStar: Mastering the Real-Time Strategy Game StarCraft II, https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/, 2019.
  • Sipper et al. (1997) M. Sipper, E. Sanchez, D. Mange, M. Tomassini, A. Pérez-Uribe, A. Stauffer, A phylogenetic, ontogenetic, and epigenetic view of bio-inspired hardware systems, IEEE Transactions on Evolutionary Computation 1 (1997) 83–97.
  • Doursat et al. (2012) R. Doursat, H. Sayama, O. Michel, Morphogenetic engineering: toward programmable complex systems, Springer, 2012.