The Effects of Learning in Morphologically Evolving Robot Systems

by   Jie Luo, et al.
Vrije Universiteit Amsterdam

When controllers (brains) and morphologies (bodies) of robots simultaneously evolve, this can lead to a problem, namely the brain body mismatch problem. In this research, we propose a solution of lifetime learning. We set up a system where modular robots can create offspring that inherit the bodies of parents by recombination and mutation. With regards to the brains of the offspring, we use two methods to create them. The first one entails solely evolution which means the brain of a robot child is inherited from its parents. The second approach is evolution plus learning which means the brain of a child is inherited as well, but additionally is developed by a learning algorithm - RevDEknn. We compare these two methods by running experiments in a simulator called Revolve and use efficiency, efficacy, and the morphology intelligence of the robots for the comparison. The experiments show that the evolution plus learning method does not only lead to a higher fitness level, but also to more morphologically evolving robots. This constitutes a quantitative demonstration that changes in the brain can induce changes in the body, leading to the concept of morphological intelligence, which is quantified by the learning delta, meaning the ability of a morphology to facilitate learning.



There are no comments yet.


page 9


Evolutionary Co-Design of Morphology and Control of Soft Tensegrity Modular Robots with Programmable Stiffness

Tensegrity structures are lightweight, can undergo large deformations, a...

The Dynamic of Body and Brain Co-Evolution

We introduce a method that permits to co-evolve the body and the control...

Co-optimising Robot Morphology and Controller in a Simulated Open-Ended Environment

Designing robots by hand can be costly and time consuming, especially if...

Using Indirect Encoding of Multiple Brains to Produce Multimodal Behavior

An important challenge in neuroevolution is to evolve complex neural net...

Towards the Targeted Environment-Specific Evolution of Robot Components

This research considers the task of evolving the physical structure of a...

Morphological Development at the Evolutionary Timescale: Robotic Developmental Evolution

Evolution and development operate at different timescales; generations f...

A Minimal Developmental Model Can Increase Evolvability in Soft Robots

Different subsystems of organisms adapt over many time scales, such as r...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

In the field of Evolutionary Robotics, the majority of studies consider the evolution of brains with a fixed body. This is not surprising, considering that the joint evolution of morphologies and controllers implies two search spaces and the search space for the brain changes with every new robot body produced. Evolving morphologies and controllers of robots simultaneously leads to a problem which has been noted long ago, being the body-brain mismatch problem [Eiben2013]: Even though parents have well-matching bodies and brains, recombination and mutation can shuffle the parental genotypes such that the resulting body and brain combination might not fit well. Consequently, causing sub-optimal behaviour in the offspring. The proposed solution is the addition of learning. As phrased in “If it evolves it needs to learn”.[Eiben2020]

The main goal of this research is to investigate the effects of learning in morphologically evolving robots. Our Hypothesis is that: 1) with learning, the time for achieving the same fitness level is less than without. 2) the learning approach does not only lead to different fitness levels, but also to different robot morphologies.

To this end, we set up a system where (simulated) modular robots can reproduce and create offspring that inherit the parents’ morphologies by crossover and mutation. Regarding the controllers, we implement two methods. The first one is with evolution only which means the brain of a robot child is inherited from its parents. The second approach is evolution plus learning which means the brain of a child is also inherited, but additionally, it is optimized by a learning algorithm. The comparison is based on three measures: efficiency, efficacy, and morphological intelligence.

Ii Related Work

Ii-a Evolvable morphology

The body & brain mismatch issue has been noted long ago, several approaches have been proposed to mitigate this effect on the population.

Cheney et al. [Cheney2014] implemented a form of novelty protection in which ‘younger’ robot designs were protected from individuals that survived for more generations. Protecting a novel individual will increase its chance to adapt the controller properly for its body. Novelty protection corresponds with implementing a single lifetime learning iteration every time a morphology is protected. Similarly, De Carlo et al. [de2020influences] implemented protection in the form of speciation within their NEAT algorithm. The preservation of diversity in the population allowed new morphologies to survive, thus reducing the effects of body-brain mismatch.

Nygaard, Samuelsen, and Glette [nygaard2017overcoming] demonstrated improvements in their ER system by introducing two phases during evolution. The first phase consists of both controller and morphology evolution, while during the second phase only the controller evolves in a fixed body. The results showed that, without the second phase, morphology and controller evolution led to sub-optimal controllers which required additional fine-tuning.

In this paper, we use the Triangle of Life framework (Figure 1) to integrates evolution and life time learning [Eiben2013]. The essence is to have newborn robots perform a learning process that optimizes their inherited brain quickly after birth. An important additional feature is that newborn robots are considered to be infertile (i.e., not eligible for reproduction) until they successfully finish the learning period. This prevents that inferior genetic information is propagated and thus it saves resources.

Fig. 1: The life of triangle framework

Ii-B Controller learning algorithms

Similar research has been done recently [Diggelen] in which three learning algorithms have been compared whilst being applied to the CPG-based controllers to improve the weights. Namely: Evolutionary Strategies, Bayesian Optimization and Reversible Differential Evolution(RevDE). The study shows that the shape of the fitness landscape in Evolutionary strategies hints to a possible bias for morphologies with many joints. This could be an unwanted property for the implementation of lifetime learning because we want an algorithm that can work consistently on different kinds of morphologies. Bayesian Optimization is good at sample efficiency, however it required much more time comparing to the other two methods due to the higher time-complexity. Therefore RevDE outperforms among these three algorithms.

Differential Evolution

is a population-based Evolutionary algorithm (EA) that samples new candidates by perturbing the current population

[Storn1997]. The three main components in this method are as follows:

Differential mutation operator: A new candidate is generated by randomly picking a triplet from the population, , then is perturbed by adding a scaled difference between and , that is:


where is the scaling factor.

Uniform crossover operator: the authors of [Storn1997] proposed to sample a binary mask

according to the Bernoulli distribution with probability p = P(md = 1) shared across all D dimensions, and calculate the final candidate according to the following formula:


The last component is a selection mechanism: the authors use the “survival of the fittest” approach, i.e., combine the old population with the new one and select N candidates with the highest fitness values, i.e., the deterministic ( + ) selection.

However, the mutation operator in DE perturbs candidates using other individuals in the population to generate a single new candidate. As a result, having too small a population could limit exploration of the search space and loose diversity. In order to overcome this issue, a modification of DE - Reversible DE (RevDE) was proposed that utilized all three individuals to generate three new points in the following manner [Tomczak2020]:


New candidates and could be further used to calculate perturbations using points outside the population. This approach does not follow a typical construction of an EA where only evaluated candidates are mutated. Further, we can express (3

) as a linear transformation using matrix notation by introducing matrices as follows:

In order to obtain the matrix R, we need to plug to the second and third equation in (3), and then to the last equation in (3). As a result, we obtain M = 3N new candidate solutions and the linear transformation R is reversible.

However, generating 3N new candidates and evaluating all of them further comes with an extra computational cost while running the simulator. In this paper, we will use an advanced version of RevDE to alleviate this issue. This algorithm is introduced in the Algorithm section.

Iii Experiment Set-up

The experiments have been carried out in Revolve (, a Gazebo based simulator which enables us to test the parts of the system as well as to set an entire environment for the complete evolutionary process. All experiments were performed using an infinite plane environment to avoid any extra complexity. We ran two experiments: experiment 1 works by running evolution alone. In this system, controllers are inheritable and the controller of the offspring is produced by applying crossover and mutation to the controllers of the parents. We refer to this experiment as Evolution Only throughout the paper. In experiment 2, controllers are not only evolvable, but also learnable. In this experiment, the controller of the offspring is produced by a learning algorithm that starts with the inherited brain. We refer to this experiment as Evolution + Learning throughout the paper.

Iii-a Robot genotype (Body & Brain)

In this paper, we use a Lindenmayer-System (L-system) as the genetic representation [Miras2020]. The grammar of a L-System is defined as a tuple G = (V, w, P ), where
– V, the alphabet, is a set of symbols containing replaceable and non-replaceable symbols.
– w, the axiom, is a symbol from which the system starts.
– P is a set of production-rules for the replaceable symbols.
The following didactic example illustrates the process of iterative-rewriting of an L-System. For a given number of iterations, each replaceable symbol is simultaneously replaced by the symbols of its production-rule. Given V = X,Y,Z, w = X and P = X : X,Y,Y : Z,Z : X,Z, the rewriting goes as follows:

Iteration 0: X

Iteration 1: XY

Iteration 2: XY Z

Iteration 3: XY ZXZ


The construction of a phenotype (robot) from a genotype (grammar) is done by the following steps: 1. the axiom of the grammar is rewritten into a more complex string of symbols according to the production-rules of the grammar. 2. this string is decoded into a phenotype, one for the morphology (pointing to the current module) and one for the controller (pointing to the current sensor and the current oscillator).

Iii-B Robot phenotype (Body)

The robots in Revolve are based on the RoboGen framework [Auerbach2014]. We use a subset of 3D-printable components: one core component, one or more brick components, and one or more active hinges (see Figure 2). Each robot’s genotype describes its layout and consists of a tree-structure with the root node representing a core module from which further components branch out. Component types contain specific features described by its genotypical encoding dependant on a component’s type. These models are used in the simulation, but also could be used for 3D printing and construction of the real robots.

Fig. 2: Modules of robots: The active hinge component (A) is a joint moved by a servomotor with attachment slots on both ends; The brick component (B) is a smaller cube with attachment slots on its lateral sides; The core component (C) which holds a controller board with battery is a large brick with four attachment slots on its lateral faces;

Iii-C Robot phenotype (Brain)

We use Central Pattern Generators (CPGs)-based controllers to drive the modular robots. CPGs are biological neural circuits that produce rhythmic outputs in the absence of rhythmic input. They are pairs of neurons (

,) that drive rhythmic and stereotyped locomotion behaviors like walking, swimming, flying etc. in vertebrate species and they have been proven to perform well in modular robots [Ijspeert2007].

In this study, the controllers are optimized for gait-learning. Each robot joint is associated with a CPG that is defined by three neurons, an -neuron, a -neuron, and an -neuron that are recursively connected as shown in Figure 3. The change of a neuron’s state is calculated by multiplying the activation value of the opposite neuron with a weight (). So the -neuron and -neurons feed their activation values multiplied by weights and respectively to the -neuron and

-neuron respectively. In our case, we use a variant of the sigmoid function, the hyperbolic tangent function (tanh), as the activation function of

-neurons to bound the output value in due to the limited rotating angle of the joints.

Fig. 3: A single CPG. denotes the specific joint that is associated with this CPG. , , and denote the weights of the connections between the neurons, and out is the activation value of -neuron that controls the servo in a joint .

Iii-D Algorithm

It has been demonstrated that RevDE performs well to evolve controllers in modular robots for a given task [Diggelen]. However, it increases the computational cost of running the simulator by tripling the population. Here we introduce a surrogate model to overcome this issue. It uses the K-Nearest-Neighbor (K-NN) regressor to approximate the fitness values of the new candidates, then select N most promising points [Weglarz-Tomczak2021]. We refer to this approach as RevDEknn.

The K-NN regression model is a non-parametric model that stores all previously seen individuals with their evaluations, and the prediction of a new candidate solution is an average over the K closest previously seen individuals (Table

I). In this paper, we set K = 3.

The algorithm works as follows:
(1) initialize a population with X samples;
(2) evaluate the fitness of all X samples;

(3) perform a selection over the top samples to obtain vector

in search space;
(4) randomly shuffle to create two additional vectors (), and create the following new samples , and . Apply uniform crossover with probability p between each and ;
(5) apply K-NN to predict the fitness value of new samples based on the 3 closest previously seen samples and repeat from (2).
(6) terminate when the maximum generation is reached. Following general recommendations in literature [Pedersen2010] to obtain stable exploration/exploitation behaviour, the crossover probability p is fixed to a value of 0.9 and the scaling factor F is fixed to a value of 0.5.

RevDEknn Value Description
25 Initial population size
25 Top samples size
0.5 Scaling factor
0.9 Crossover probability
3 Number of Nearest-Neighbors
10 Number of Generations
TABLE I: Hyperparameters

In this paper, we apply RevDEknn to change the weights of the CPGs of modular robots, for N CPGs, we have 3*N weights to improve their controllers for the task of gait-learning. For the whole big loop, we use Evolutionary Algorithm (EA). The whole process is illustrated in Figure 4.

Fig. 4: Evolution + Learning Framework: This is a general framework to make embodied robots via two interacting adaptive processes. An outer loop of evolution optimizes robot morphology via variation (mutation & crossover) operations and an inner RevDEknn learning loop optimizes the parameters of a CPG controller (the yellow box). In the Evaluation box, we show examples of a robot morphology, controller, environment, behavior and fitness value.

The code for carrying out the experiments is available online: The pseudocode of combining EA and RevDEknn is shown below:

1:INITIALIZE robot population (genotypes + phenotypes with body and brain)
2:EVALUATE each robot (evaluation delivers a fitness value)
3:while not STOP-EVOLUTION do
4:     SELECT parents; (based on fitness)
5:     RECOMBINE+MUTATE parents’ bodies; (genotype)
6:     RECOMBINE+MUTATE parents’ brains; (genotype)
7:     CREATE offspring robot body; (phenotype)
8:     CREATE offspring robot brain; (phenotype)
9:     INITIALIZE brain(s) for the learning process;
10:     while not STOP-LEARNING do
11:         ASSESS offspring; (performance value)
12:         GENERATE new brain for offspring;
13:     end while
14:     EVALUATE offspring w/ learned brain; (fitness value)
15:     SELECT survivors / UPDATE population
16:end while
Algorithm 1 EA+DevDEknn

Iii-E Experiment parameters

An initial population of 50 robots is randomly generated in the first generation. In each generation 25 offspring are produced by selecting 25 pairs of parents through binary tournaments (with replacement) and creating one child per pair by crossover and mutation. From the top 25 parents plus 25 offspring, 50 individuals are selected for the next generation. The evolutionary process is terminated after 30 generations. In this research, the fitness value and performance value are the same. Therefore, for running Experiment Evolution Only, we perform fitness evaluations.

In Experiment Evolution + Learning, for each evolutionary process, we tested RevDEknn on gait learning during 750 (25 initial population * 3 by RevDE * 10 generations) learning trials with 250 assessments (750 divided by k=3 predictions) to simulate the robot’s limited field of view in the real-world. This resulted in fitness evaluations.

We set the evaluation time to be 30 seconds to balance computing time and accurately evaluating a task as gait learning in which the fitness utilized was the speed (cm/s) of the robot’s displacement in any direction, notated as where is x coordinate of the robot’s center of mass in the beginning of the simulation, is x coordinate of the robot’s center of mass at the end of the simulation, and t is the duration of the simulation.

To sum up, for running these 2 experiments, we perform 200,800 evaluations which amounts to hours of (simulated) time. In practice, it takes about 0.7 day to run these experiments on five computers with an Intel i7 CPU. All the experiments are repeated 10 times independently to get a robust assessment of the performance per data set. The experimental parameters we used in the experiments are described in Table II

Parameters Value Description
Population size  50 Number of individuals per generation
Offspring size  25 Number of offspring produced per generation
Mutation  0.8 Probability of mutation for individuals
Crossover  0.8 Probability of crossover for individuals
Generations  30 Termination condition for each run
Learning trial  750 Number of the RevDEknn learning trials
Evaluation time  30 Duration of the test period per fitness evaluation in seconds
Tournament size  2 Number of individuals used in tournament selection
Repetitions  10 Number of repetitions per experiment (each robot + gait-learning)
TABLE II: Main experiment parameters

Iii-F Performance measures

To compare the two methods, we consider three performance indicators: efficiency, efficacy, and the morphologies intelligence.

Iii-F1 Efficiency & Efficacy

We measure efficacy by the quality achieved at the end of the evolutionary process. Since we consider gait learning here, the quality is defined by the speed of the robot. As this measure can be sensitive to ‘luck’, we get more useful statistics by taking the average over 10 different runs. Thus here, the efficacy of a method is defined by the mean best fitness averaged over the 10 independent repetitions. Efficiency indicates how quickly the robot finds its best solution.

Iii-F2 Morphological Descriptors

For quantitatively assessing morphological traits of the robots, we utilized the following set of descriptors:

Absolute Size: Total number of modules of a robot body. It’s a sum of all the structural bricks, hinges and one core-component with controller board.

absolute_size = brick_count + hinge_count +

Width: The width of a body, excluding sensors.

Proportion: The length-width ratio of the rectangular envelope around the morphology. It is defined as following:

where is the shortest side of the morphology, and is the longest side.

Number of Bricks: The number of structural bricks in the morphology.

Relative Number of Limbs: The number of extremities of a morphology relative to a practical limit. It is defined as following:

where m is the total number of modules in the morphology, l the number of modules which have only one face attached to another module (except for the core-component) and is the maximum amount of modules with one face attached that a morphology with m modules could have, if containing the same amount of modules arranged in a different way.

Number of Active Hinges: Number of active hinges in the morphology. Activate hinge means a joint that has a motor, and non-activate is a passive joint (just the ‘bendable’ plastic).

We use the above six morphological descriptors to capture relevant robot morphological traits, and quantify the correlations between controller & morphology search spaces.

Iv Experiment Results

Iv-a Efficiency & Efficacy

Adding a life-time learning capacity to the system increased the speed of the robots, as depicted by Figures 5 and 6. This was expected for two reasons: a) the number of evaluations performed by Evolution + Learning is around 100 times higher than by Evolution Only; b) in Evolution + Learning, robots have time to fine-tune their controllers to the morphologies they were born with. Instead, we are interested in observing which method presents a faster growth of the average speed. In Figure 5, the black line shows that around generation 20, Evolution + Learning had already obtained an average speed that took Evolution Only the whole evolutionary period to achieve, i.e., 30 generations. In another word, Evolution + Learning at generation 20 created only 550 robots and spent 55,000 evaluations while Evolution Only created 800 robots and 800 evaluations at generation 30. If we consider real physical robots, and assuming that the production cost of each robot (around 4 hours) is substantially higher than the evaluation cost (around 30 seconds), we can clearly see the advantage of introducing learning.

Fig. 5: The blue and purple lines are the progression of the mean of fitness over 30 generations (avg. over 10 runs). The blue and purple dotted lines show the maximum fitness per generation. The black dotted lines mark generations, when the evo+learning method (after learning) achieved the levels of the fitness that the Evo only method managed to achieve only in the end of the evolutionary period.
Fig. 6: Comparison of fitness in the final generations. The blue triangles show the mean of the fitness.Significance levels for the Wilcoxon tests in the boxplot non-significant.
Fig. 7: Comparison of maximum fitness in the final generations. The blue triangles show the mean of the fitness. Significance levels for the Wilcoxon tests in the boxplot are non-significant.

Iv-B Morphological Intelligence

In this paper, we consider a new concept - Morphological Intelligence. Morphology influences how the brain learns. Some bodies are more suitable for the brains to learn with than others. How well the brain learns can be empowered by a better body. Therefore we define the intelligence of a body as a measure of how well it facilitates the brain to learn and achieve tasks, in this case, gait learning. In this paper, we quantify morphological intelligence by the learning delta, being speed after the parameters were learned minus speed before the parameters were learned. See Figure 8, where we see that the average learning of the method Evolution + Learning, grows across the generations.

This growth is very steady. The observation indicates that the life-time learning led the evolutionary search to more quickly exploit the high performing morphological properties. In other words, it was faster for the population to turn into morphologies that are big, disproportional, with few, long limbs which fit the brains better.

Fig. 8: Learning : average speed after the parameters were learned minus average speed before the parameters were learned. Progression of the mean over 10 runs of the population.

Iv-C Morphological Descriptors

In [Miras2020a], a study utilizing this same robot framework observed a strong selection pressure for robots with few limbs, most often one single, long limb, i.e., a snake-like morphology. Furthermore, they demonstrated that by explicitly adding a penalty to having this morphological property, the population did indeed develop multiple limbs, nevertheless, these robots were much slower than the single-limb ones. We selected 6 morphological traits which display a clear trend over generations. In figure 9, we see the progression of the mean of different morphological descriptors averaged over 10 runs for the entire population. We observed a strong selection pressure for robots with few limbs, most often one single long limb, i.e., a snake-like morphology. Evolution + learning robots tend to be bigger along the evolution and wider in width compared to robots evolved in Evolution only. Robots from both methods tend to have fewer number of bricks, limbs and more active hinges over generations. We also observe that the generations evolve from having multiple limbs to having a single-limb, leading to higher speed. (Figure 10 and 11). A video showing examples of robots from both types of experiments can be found in

[] [] [] [] [] []
Fig. 9:

We selected 6 morphological traits which give a clear trend over generations. Progression of the mean of different morphological descriptors averaged over 10 runs for the entire population. Shaded region denotes 95% bootstrapped confidence interval. Evolution + learning robots tend to be bigger (a) along the evolution and wider in width (b) compared to robots evolved in Evolution only. Proportion (c) is the ratio of width and length of the robot morphology. Robots from both methods tend to have less number of bricks (d) & limbs (e) and more active hinges (f) over generations.

[] []
Fig. 10: (a), an example of the initial randomly generated 50 morphologies in a run, using ()selection, 25 new off-springs were created. (b), the morphologies of best robots of each run for both control methods with the fitness value.
Fig. 11: Learning/evolving robots in a plain terrain. Morphologies changes over generations

V Conclusions and Future work

Firstly, if we measure time by the number of generations, learning boosts evolvability in terms of efficiency as well as efficacy, i.e., solution quality at termination, once its growth curve was steeper and ended higher than that of the Evolution Only method. Of course, this is not a surprise, since the learning version performs much more search steps. However, a learning trial (testing another controller) is much cheaper than an evolutionary trial (making another robot), so we can firmly conclude the advantage of adding lifetime-learning to an evolutionary robot system.

Secondly, we have witnessed a change in the evolved morphologies when life-time learning was applied. In this paper, we introduced a concept - Morphological Intelligence, and quantified it as the learning delta. The results show how the brain can shape the body and thus affects task performance which in turn changes the fitness values that define selection probabilities during evolution.

For future work, we will work on Lamarckian evolution which use the inherited brain as starting point and pass the learned traits to the next generation. It means the genotype of the brain of the population will be changed in the evolution process. In order to make this happen, we have to carefully think about a suitable genetic representation that allows us to change the genetic traits easily. Moreover, we will implement more complex tasks than just gait learning.