In the field of Evolutionary Robotics, the majority of studies consider the evolution of brains with a fixed body. This is not surprising, considering that the joint evolution of morphologies and controllers implies two search spaces and the search space for the brain changes with every new robot body produced. Evolving morphologies and controllers of robots simultaneously leads to a problem which has been noted long ago, being the body-brain mismatch problem [Eiben2013]: Even though parents have well-matching bodies and brains, recombination and mutation can shuffle the parental genotypes such that the resulting body and brain combination might not fit well. Consequently, causing sub-optimal behaviour in the offspring. The proposed solution is the addition of learning. As phrased in “If it evolves it needs to learn”.[Eiben2020]
The main goal of this research is to investigate the effects of learning in morphologically evolving robots. Our Hypothesis is that: 1) with learning, the time for achieving the same fitness level is less than without. 2) the learning approach does not only lead to different fitness levels, but also to different robot morphologies.
To this end, we set up a system where (simulated) modular robots can reproduce and create offspring that inherit the parents’ morphologies by crossover and mutation. Regarding the controllers, we implement two methods. The first one is with evolution only which means the brain of a robot child is inherited from its parents. The second approach is evolution plus learning which means the brain of a child is also inherited, but additionally, it is optimized by a learning algorithm. The comparison is based on three measures: efficiency, efficacy, and morphological intelligence.
Ii Related Work
Ii-a Evolvable morphology
The body & brain mismatch issue has been noted long ago, several approaches have been proposed to mitigate this effect on the population.
Cheney et al. [Cheney2014] implemented a form of novelty protection in which ‘younger’ robot designs were protected from individuals that survived for more generations. Protecting a novel individual will increase its chance to adapt the controller properly for its body. Novelty protection corresponds with implementing a single lifetime learning iteration every time a morphology is protected. Similarly, De Carlo et al. [de2020influences] implemented protection in the form of speciation within their NEAT algorithm. The preservation of diversity in the population allowed new morphologies to survive, thus reducing the effects of body-brain mismatch.
Nygaard, Samuelsen, and Glette [nygaard2017overcoming] demonstrated improvements in their ER system by introducing two phases during evolution. The first phase consists of both controller and morphology evolution, while during the second phase only the controller evolves in a fixed body. The results showed that, without the second phase, morphology and controller evolution led to sub-optimal controllers which required additional fine-tuning.
In this paper, we use the Triangle of Life framework (Figure 1) to integrates evolution and life time learning [Eiben2013]. The essence is to have newborn robots perform a learning process that optimizes their inherited brain quickly after birth. An important additional feature is that newborn robots are considered to be infertile (i.e., not eligible for reproduction) until they successfully finish the learning period. This prevents that inferior genetic information is propagated and thus it saves resources.
Ii-B Controller learning algorithms
Similar research has been done recently [Diggelen] in which three learning algorithms have been compared whilst being applied to the CPG-based controllers to improve the weights. Namely: Evolutionary Strategies, Bayesian Optimization and Reversible Differential Evolution(RevDE). The study shows that the shape of the fitness landscape in Evolutionary strategies hints to a possible bias for morphologies with many joints. This could be an unwanted property for the implementation of lifetime learning because we want an algorithm that can work consistently on different kinds of morphologies. Bayesian Optimization is good at sample efficiency, however it required much more time comparing to the other two methods due to the higher time-complexity. Therefore RevDE outperforms among these three algorithms.
is a population-based Evolutionary algorithm (EA) that samples new candidates by perturbing the current population[Storn1997]. The three main components in this method are as follows:
Differential mutation operator: A new candidate is generated by randomly picking a triplet from the population, , then is perturbed by adding a scaled difference between and , that is:
where is the scaling factor.
Uniform crossover operator: the authors of [Storn1997] proposed to sample a binary mask
The last component is a selection mechanism: the authors use the “survival of the fittest” approach, i.e., combine the old population with the new one and select N candidates with the highest fitness values, i.e., the deterministic ( + ) selection.
However, the mutation operator in DE perturbs candidates using other individuals in the population to generate a single new candidate. As a result, having too small a population could limit exploration of the search space and loose diversity. In order to overcome this issue, a modification of DE - Reversible DE (RevDE) was proposed that utilized all three individuals to generate three new points in the following manner [Tomczak2020]:
New candidates and could be further used to calculate perturbations using points outside the population. This approach does not follow a typical construction of an EA where only evaluated candidates are mutated. Further, we can express (3
) as a linear transformation using matrix notation by introducing matrices as follows:
In order to obtain the matrix R, we need to plug to the second and third equation in (3), and then to the last equation in (3). As a result, we obtain M = 3N new candidate solutions and the linear transformation R is reversible.
However, generating 3N new candidates and evaluating all of them further comes with an extra computational cost while running the simulator. In this paper, we will use an advanced version of RevDE to alleviate this issue. This algorithm is introduced in the Algorithm section.
Iii Experiment Set-up
The experiments have been carried out in Revolve (https://github.com/ci-group/revolve), a Gazebo based simulator which enables us to test the parts of the system as well as to set an entire environment for the complete evolutionary process. All experiments were performed using an infinite plane environment to avoid any extra complexity. We ran two experiments: experiment 1 works by running evolution alone. In this system, controllers are inheritable and the controller of the offspring is produced by applying crossover and mutation to the controllers of the parents. We refer to this experiment as Evolution Only throughout the paper. In experiment 2, controllers are not only evolvable, but also learnable. In this experiment, the controller of the offspring is produced by a learning algorithm that starts with the inherited brain. We refer to this experiment as Evolution + Learning throughout the paper.
Iii-a Robot genotype (Body & Brain)
In this paper, we use a Lindenmayer-System (L-system) as the genetic representation [Miras2020]. The grammar of a L-System is defined as a tuple G = (V, w, P ), where
– V, the alphabet, is a set of symbols containing replaceable and non-replaceable symbols.
– w, the axiom, is a symbol from which the system starts.
– P is a set of production-rules for the replaceable symbols.
The following didactic example illustrates the process of iterative-rewriting of an L-System. For a given number of iterations, each replaceable symbol is simultaneously replaced by the symbols of its production-rule. Given V = X,Y,Z, w = X and P = X : X,Y,Y : Z,Z : X,Z, the rewriting goes as follows:
Iteration 0: X
Iteration 1: XY
Iteration 2: XY Z
Iteration 3: XY ZXZ
The construction of a phenotype (robot) from a genotype (grammar) is done by the following steps: 1. the axiom of the grammar is rewritten into a more complex string of symbols according to the production-rules of the grammar. 2. this string is decoded into a phenotype, one for the morphology (pointing to the current module) and one for the controller (pointing to the current sensor and the current oscillator).
Iii-B Robot phenotype (Body)
The robots in Revolve are based on the RoboGen framework [Auerbach2014]. We use a subset of 3D-printable components: one core component, one or more brick components, and one or more active hinges (see Figure 2). Each robot’s genotype describes its layout and consists of a tree-structure with the root node representing a core module from which further components branch out. Component types contain specific features described by its genotypical encoding dependant on a component’s type. These models are used in the simulation, but also could be used for 3D printing and construction of the real robots.
Iii-C Robot phenotype (Brain)
We use Central Pattern Generators (CPGs)-based controllers to drive the modular robots. CPGs are biological neural circuits that produce rhythmic outputs in the absence of rhythmic input. They are pairs of neurons (,) that drive rhythmic and stereotyped locomotion behaviors like walking, swimming, flying etc. in vertebrate species and they have been proven to perform well in modular robots [Ijspeert2007].
In this study, the controllers are optimized for gait-learning. Each robot joint is associated with a CPG that is defined by three neurons, an -neuron, a -neuron, and an -neuron that are recursively connected as shown in Figure 3. The change of a neuron’s state is calculated by multiplying the activation value of the opposite neuron with a weight (). So the -neuron and -neurons feed their activation values multiplied by weights and respectively to the -neuron and-neurons to bound the output value in due to the limited rotating angle of the joints.
It has been demonstrated that RevDE performs well to evolve controllers in modular robots for a given task [Diggelen]. However, it increases the computational cost of running the simulator by tripling the population. Here we introduce a surrogate model to overcome this issue. It uses the K-Nearest-Neighbor (K-NN) regressor to approximate the fitness values of the new candidates, then select N most promising points [Weglarz-Tomczak2021]. We refer to this approach as RevDEknn.
The K-NN regression model is a non-parametric model that stores all previously seen individuals with their evaluations, and the prediction of a new candidate solution is an average over the K closest previously seen individuals (TableI). In this paper, we set K = 3.
The algorithm works as follows:
(1) initialize a population with X samples;
(2) evaluate the fitness of all X samples;
(3) perform a selection over the top samples to obtain vectorin search space;
(4) randomly shuffle to create two additional vectors (), and create the following new samples , and . Apply uniform crossover with probability p between each and ;
(5) apply K-NN to predict the fitness value of new samples based on the 3 closest previously seen samples and repeat from (2).
(6) terminate when the maximum generation is reached. Following general recommendations in literature [Pedersen2010] to obtain stable exploration/exploitation behaviour, the crossover probability p is fixed to a value of 0.9 and the scaling factor F is fixed to a value of 0.5.
|25||Initial population size|
|25||Top samples size|
|3||Number of Nearest-Neighbors|
|10||Number of Generations|
In this paper, we apply RevDEknn to change the weights of the CPGs of modular robots, for N CPGs, we have 3*N weights to improve their controllers for the task of gait-learning. For the whole big loop, we use Evolutionary Algorithm (EA). The whole process is illustrated in Figure 4.
The code for carrying out the experiments is available online: https://github.com/ci-group/revolve/tree/experiments/jlo_learning. The pseudocode of combining EA and RevDEknn is shown below:
Iii-E Experiment parameters
An initial population of 50 robots is randomly generated in the first generation. In each generation 25 offspring are produced by selecting 25 pairs of parents through binary tournaments (with replacement) and creating one child per pair by crossover and mutation. From the top 25 parents plus 25 offspring, 50 individuals are selected for the next generation. The evolutionary process is terminated after 30 generations. In this research, the fitness value and performance value are the same. Therefore, for running Experiment Evolution Only, we perform fitness evaluations.
In Experiment Evolution + Learning, for each evolutionary process, we tested RevDEknn on gait learning during 750 (25 initial population * 3 by RevDE * 10 generations) learning trials with 250 assessments (750 divided by k=3 predictions) to simulate the robot’s limited field of view in the real-world. This resulted in fitness evaluations.
We set the evaluation time to be 30 seconds to balance computing time and accurately evaluating a task as gait learning in which the fitness utilized was the speed (cm/s) of the robot’s displacement in any direction, notated as where is x coordinate of the robot’s center of mass in the beginning of the simulation, is x coordinate of the robot’s center of mass at the end of the simulation, and t is the duration of the simulation.
To sum up, for running these 2 experiments, we perform 200,800 evaluations which amounts to hours of (simulated) time. In practice, it takes about 0.7 day to run these experiments on five computers with an Intel i7 CPU. All the experiments are repeated 10 times independently to get a robust assessment of the performance per data set. The experimental parameters we used in the experiments are described in Table II
|Population size||50||Number of individuals per generation|
|Offspring size||25||Number of offspring produced per generation|
|Mutation||0.8||Probability of mutation for individuals|
|Crossover||0.8||Probability of crossover for individuals|
|Generations||30||Termination condition for each run|
|Learning trial||750||Number of the RevDEknn learning trials|
|Evaluation time||30||Duration of the test period per fitness evaluation in seconds|
|Tournament size||2||Number of individuals used in tournament selection|
|Repetitions||10||Number of repetitions per experiment (each robot + gait-learning)|
Iii-F Performance measures
To compare the two methods, we consider three performance indicators: efficiency, efficacy, and the morphologies intelligence.
Iii-F1 Efficiency & Efficacy
We measure efficacy by the quality achieved at the end of the evolutionary process. Since we consider gait learning here, the quality is defined by the speed of the robot. As this measure can be sensitive to ‘luck’, we get more useful statistics by taking the average over 10 different runs. Thus here, the efficacy of a method is defined by the mean best fitness averaged over the 10 independent repetitions. Efficiency indicates how quickly the robot finds its best solution.
Iii-F2 Morphological Descriptors
For quantitatively assessing morphological traits of the robots, we utilized the following set of descriptors:
Absolute Size: Total number of modules of a robot body. It’s a sum of all the structural bricks, hinges and one core-component with controller board.
absolute_size = brick_count + hinge_count +
Width: The width of a body, excluding sensors.
Proportion: The length-width ratio of the rectangular envelope around the morphology. It is defined as following:
where is the shortest side of the morphology, and is the longest side.
Number of Bricks: The number of structural bricks in the morphology.
Relative Number of Limbs: The number of extremities of a morphology relative to a practical limit. It is defined as following:
where m is the total number of modules in the morphology, l the number of modules which have only one face attached to another module (except for the core-component) and is the maximum amount of modules with one face attached that a morphology with m modules could have, if containing the same amount of modules arranged in a different way.
Number of Active Hinges: Number of active hinges in the morphology. Activate hinge means a joint that has a motor, and non-activate is a passive joint (just the ‘bendable’ plastic).
We use the above six morphological descriptors to capture relevant robot morphological traits, and quantify the correlations between controller & morphology search spaces.
Iv Experiment Results
Iv-a Efficiency & Efficacy
Adding a life-time learning capacity to the system increased the speed of the robots, as depicted by Figures 5 and 6. This was expected for two reasons: a) the number of evaluations performed by Evolution + Learning is around 100 times higher than by Evolution Only; b) in Evolution + Learning, robots have time to fine-tune their controllers to the morphologies they were born with. Instead, we are interested in observing which method presents a faster growth of the average speed. In Figure 5, the black line shows that around generation 20, Evolution + Learning had already obtained an average speed that took Evolution Only the whole evolutionary period to achieve, i.e., 30 generations. In another word, Evolution + Learning at generation 20 created only 550 robots and spent 55,000 evaluations while Evolution Only created 800 robots and 800 evaluations at generation 30. If we consider real physical robots, and assuming that the production cost of each robot (around 4 hours) is substantially higher than the evaluation cost (around 30 seconds), we can clearly see the advantage of introducing learning.
Iv-B Morphological Intelligence
In this paper, we consider a new concept - Morphological Intelligence. Morphology influences how the brain learns. Some bodies are more suitable for the brains to learn with than others. How well the brain learns can be empowered by a better body. Therefore we define the intelligence of a body as a measure of how well it facilitates the brain to learn and achieve tasks, in this case, gait learning. In this paper, we quantify morphological intelligence by the learning delta, being speed after the parameters were learned minus speed before the parameters were learned. See Figure 8, where we see that the average learning of the method Evolution + Learning, grows across the generations.
This growth is very steady. The observation indicates that the life-time learning led the evolutionary search to more quickly exploit the high performing morphological properties. In other words, it was faster for the population to turn into morphologies that are big, disproportional, with few, long limbs which fit the brains better.
Iv-C Morphological Descriptors
In [Miras2020a], a study utilizing this same robot framework observed a strong selection pressure for robots with few limbs, most often one single, long limb, i.e., a snake-like morphology. Furthermore, they demonstrated that by explicitly adding a penalty to having this morphological property, the population did indeed develop multiple limbs, nevertheless, these robots were much slower than the single-limb ones. We selected 6 morphological traits which display a clear trend over generations. In figure 9, we see the progression of the mean of different morphological descriptors averaged over 10 runs for the entire population. We observed a strong selection pressure for robots with few limbs, most often one single long limb, i.e., a snake-like morphology. Evolution + learning robots tend to be bigger along the evolution and wider in width compared to robots evolved in Evolution only. Robots from both methods tend to have fewer number of bricks, limbs and more active hinges over generations. We also observe that the generations evolve from having multiple limbs to having a single-limb, leading to higher speed. (Figure 10 and 11). A video showing examples of robots from both types of experiments can be found in https://www.youtube.com/watch?v=bH4A6sm4Umw.
V Conclusions and Future work
Firstly, if we measure time by the number of generations, learning boosts evolvability in terms of efficiency as well as efficacy, i.e., solution quality at termination, once its growth curve was steeper and ended higher than that of the Evolution Only method. Of course, this is not a surprise, since the learning version performs much more search steps. However, a learning trial (testing another controller) is much cheaper than an evolutionary trial (making another robot), so we can firmly conclude the advantage of adding lifetime-learning to an evolutionary robot system.
Secondly, we have witnessed a change in the evolved morphologies when life-time learning was applied. In this paper, we introduced a concept - Morphological Intelligence, and quantified it as the learning delta. The results show how the brain can shape the body and thus affects task performance which in turn changes the fitness values that define selection probabilities during evolution.
For future work, we will work on Lamarckian evolution which use the inherited brain as starting point and pass the learned traits to the next generation. It means the genotype of the brain of the population will be changed in the evolution process. In order to make this happen, we have to carefully think about a suitable genetic representation that allows us to change the genetic traits easily. Moreover, we will implement more complex tasks than just gait learning.