A Staged Approach to Evolving Real-world UAV Controllers

05/26/2019 ∙ by Gerard David Howard, et al. ∙ CSIRO 0

A testbed has recently been introduced that evolves controllers for arbitrary hover-capable UAVs, with evaluations occurring directly on the robot. To prepare the testbed for real-world deployment, we investigate the effects of state-space limitations brought about by physical tethering (which prevents damage to the UAV during stochastic tuning), on the generality of the evolved controllers. We identify generalisation issues in some controllers, and propose an improved method that comprises two stages: in the first stage, controllers are evolved as normal using standard tethers, but experiments are terminated when the population displays basic flight competency. Optimisation then continues on a much less restrictive tether, effectively free-flying, and is allowed to explore a larger state-space envelope. We compare the two methods on a hover task using a real UAV, and show that more general solutions are generated in fewer generations using the two-stage approach. A secondary experiment undertakes a sensitivity analysis of the evolved controllers.



There are no comments yet.


page 3

page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Rapidly-advancing Additive Manufacturing technologies are leading a shift from mass production of identical simulacrums to provision of one-off, bespoke systems. Robotics is particularly suited to benefit from this shift; permitting rapid prototype-and-test of bespoke robotic systems with specialised morphologies to achieve heightened environmental/task-specific performance. In such an approach, robots may appear in a wide variety of different morphological compositions, with varied payloads. Behaviours must be generated to fully harness this diversity of morphologies. Evolutionary Algorithms (EAs) are ideal for this task as they are problem- and platform-agnostic optimisers that can account for morphology, payload, environmental and task requirements 

(Eiben and Smith, 2003).

Although EAs are a promising suite of optimisers for this task, their dependence on stochastic search processes leads to difficulties when optimising on real hardware, mainly around time requirements, repeatability, and accurate fitness assessment. As a consequence, the push towards bespoke robotics requires a concurrent push towards automated learning facilities (herein referred to as testbeds (Heijnen et al., 2017; Faina et al., 2017)), where robots will have their behaviours and morphologies optimised. Key characteristics of such testbeds include: closed loop operation (no human intervention required), robot-ambivalent optimisation, and the ability to produce robots that work in the real world.

UAVs are an ubiquitous example of general-purpose robots being required to perform a rapidly-expanding variety of increasingly specific tasks (e.g., tracking, mapping, inspection, repair, and delivery). As these tasks become more challenging (or more niche), the ability of a general-purpose UAV to perform well significantly decreases. As such, UAVs are highly likely to benefit from the development of systems that permit their specialisation for increased task performance, an increased range of deployment scenarios, and more favourable mission outcomes.

To address this need, we recently introduced a testbed (Fig. 1) that optimises controllers for arbitrary UAVs (Howard and Merz, 2015; Howard, 2017a)111Herein we use ’UAV’ to refer to any hover-capable multirotor, with an airframe size . To date, we have already demonstrated (i) repeatable evolutionary experimentation to test hypotheses in the real world, and (ii) optimizing UAVs with unconventional payloads and physical setups that would not be otherwise flyable.

Figure 1:

(a) Showing a hexacopter in flight. (b) The testbed, showing (1) the fan, (2) camera, (3) UAV, (4) physical tether, (5) data/power tether, and (6) light. The camera height is 200cm and padded floor area is 271cm


This article covers the development of our testbed from a prototyping to a deployment stage, ensuring that our testbed reliably generates controllers that are more generalised to real-world conditions. The main scientific contributions of this work are:

  1. The development of a two-staged technique (a form of incremental learning (Rossi and Eiben, 2014)) that works in hardware on real UAVs and reliably generates controllers suitable for real UAV deployments;

  2. Statistical comparisons between the new and old methods, and;

  3. A comprehensive analysis of the best controllers to ascertain how close they are to optimal values.

We pose the following research questions:

  1. Given that evolutionary performance is critically dependent on the amount of state space available (Sutton and Barto, 1998), can we improve the optimisation process by increasing the amount of state space experienced during training enough that the evolved controllers can generalise to unseen waypoints, as found in real world missions? Can we balance this desire with the requirement for a safe testing platform.

  2. Can we show that controllers evolved through this improved optimisation are optimal for the mission/payload/UAV considered?

Results will guide the development of further research testbeds, and, as all current learning techniques are sensitive to the state-space the algorithm experiences, are broadly applicable to robot learning in general.

1.1 Background

To place our work in the broader context of relevant literature, we now briefly summarise research in evolving UAV control, and robot testbed development.

1.1.1 Evolving UAV controllers

EAs typically require a substantial number of evaluations to discover promising solutions due to the underlying mechanisms of iterative, population-based, stochastic search. In addition, early generations are largely comprised of low-performance controllers. Simulation, e.g., (De Nardi et al., 2006; Howard and Elfes, 2014; Koppejan and Whiteson, 2009) is therefore the preferred methodology when evolving UAV controllers, as (i) it cannot physically damage the UAV, (ii) the environment is fully controllable, and (iii) evaluations are parallellisable and generally run faster than realtime.

All simulations, no matter how complex, necessarily abstract reality to some degree. As such, a continuing research focus is to cross the ’reality gap’ (Jakobi et al., 1995), and replicate simulated performance on real robot. Despite continuing efforts, including coevolutionary model learning (Holland and Nardi, 2008), specifically selecting for transferability (Koos et al., 2010), and manual post-evolution rule-tweaking (Scheper et al., 2016), the only way to guarantee performance in reality is to evolve in reality.

Real-world attempts to evolve UAV controllers are few in number due to inherent difficulties of stochastic optimisation in hardware. Control of a blimp is successfully evolved (Floreano et al., 2003) in a large, open space, but the slow dynamics of the blimp simplifies the control problem, as well as trivialising recovery from dangerous states. Height and yaw control of a miniature helicopter (Gongora et al., 2009)

are evolved, although other degrees of freedom are neglected to prevent suboptimal controllers from potentially damaging the UAV. Although there have been some attempts to perform exhaustive analysis of evolved solutions on real systems 

(Gongora et al., 2009), it is not a well-studied area of research.

1.1.2 Robotic testbeds

Real-world evolution implies the need for a robotic testbed, as defined in in Section 1. Testbeds provide a controllable, measurable evaluation environment. They are chiefly used for field testing; ensuring that the system can handle real-world (i) sensor and actuator noise (How et al., 2008), (ii) software issues (Nishiwaki et al., 2000), (iii) environmental conditions (e.g., underground (Acar et al., 2001), space (Samuele et al., 2010)), and (iv) mission requirements (e.g., multi-robot coordination (Acar et al., 2001)

). They focus on state estimation to evaluate performance, and the robots are typically manually controlled for missions with a duration from 5 minutes to a couple of hours.

Performing iterative optimisation in addition to evaluation imposes additional requirements on the testbed. Such testbeds should consider (i) repeatable evaluations to ascertain fair, reliable optimisation scores, (ii) automated experimental management, to reduce the human requirement through multiple optimisation generations and remove the requirement to manually operate the robot under evaluation, and (iii) provision of power, again to reduce human intervention whilst making the optimisation tractable in time.

One of the earliest examples (1994) approximated a differential-drive ground robot with a gantry-mounted camera, thus removing issues with motor wear and power supply (Harvey et al., 1994). Technology advancements later allowed real robots to evolve on a testbed — examples include gait optimisation for legged robots (Yosinski et al., 2011; Degrave et al., 2015). More recent work includes multi-objective optimisation of legged robot behaviours as a fully closed-loop system (Heijnen et al., 2017), and a testbed that can alter its layout via a robot arm and vision/marker system (Faina et al., 2017), to evolve controllers for difficult environments by incrementally bootstrapping them in simpler ones.

Aside from (Heijnen et al., 2017), human intervention is required to, e.g., change batteries, reset the robot if it moves out of bounds, etc. One notable example successfully evolves hardware UAV controllers (Ghiglino et al., 2015), however state estimation requires an expensive infra-red tracking system, and frequent human intervention is required to change batteries.

Our testbed is fully described in (Howard, 2017b, a), and is unique in that it can run back-to-back optimisations on a real UAV in a fair, repeatable manner, indefinitely, and without human intervention. High-performance controllers are generated for real UAVs that are specific to the mission, payload, and hardware state of the UAV being optimised. The most recent iteration of the testbed (Howard, 2017b) optimises a controller for an arbitrary UAV (tested on two different quadrotors and a hexacopter) from scratch in 3-4 hours. The testbed has also optimised control for a UAV carrying a heavy off-centre payload; something commercially-available self-tuning autopilots cannot do (Howard, 2017a).

As evaluations using the testbed occur in real-time, subsequent improvements focused on reducing the number of evaluations required. Self-adaptive mutation rates, e.g.,  (Rechenberg, 1973), are used as a form of experimental process optimisation, and adapt to the experimental specifics (UAV, payload, and environment) which significantly reduces the number of evaluations needed compared to algorithmically-determined static rate settings (Howard, 2017a). Rate restart strategies (Howard, 2017b)

make experiment times more predictable by reducing variance in the total number of evaluations required.

This research focuses on ensuring the feasibility of the controllers in real-world settings, including (i) generalisation analysis of evolved controllers, and (ii) methods that improve the ability of the controllers to function in the real world. This work represents the final, critical, step before deployment for our UAV operations.

2 Material and Methods

2.1 Testbed

The testbed, which is exhaustively described in (Howard, 2017a), comprises a solid floor which is covered with foam matting. Before an experiment starts, the target UAV is set up in the testbed (Fig. 1). Flipping (tilt angles ), and excessive rotation () of the UAV are prohibited by two nylon wires, which are attached to the UAV and the centre of the testbed’s floor. A camera, mounted centrally on a mesh-covered metal frame, provides position estimates. Wind disturbances of 5m/s are provided by a fan, with a total traversal angle of and oscillation period of 10 seconds. A 24V power tether permits continuous operation, and a serial cable connects to the host PC, which manages and monitors experiments using the real-time Extended State Machine (ESM) framework (Merz et al., 2006). Our testbed provides a comprehensive set of functions to enable closed-loop, automated UAV controller generation:

  • Accurate state estimation for control and evaluation, using a conventional camera (position) and AHRS (attitude), with no expensive marker-based tracking required.

  • Performance evaluation of controllers in reality, in a confined space.

  • Robot-agnostic controller evolution for any robot capable of hover, with no models or simulator required, automatically taking into account payload and hardware variability.

  • Continuous experimentation, including 24/7 evolution, experimental monitoring and statistics recording or repeatable, fair tests. The testbed can carry out back-to-back experiments for over a week without human intervention.

  • Health monitoring of dangerous states, with UAV recovery. Error detection algorithms identify and and re-run erroneous trials222Typical causes include, e.g., the tracking LED being obscured, or data link errors.

  • Environmental interactions, e.g., wind, can be included in the evaluations.

2.2 Controllers & State Estimation

The goal is to optimise the PID controller of the UAV, which has a nested structure shown in Fig. 2. PID control aims to minimise the error of the UAV during flight, which we take as the deviation between the UAVs actual pose and desired pose. Pose consists of 6 elements, or Degrees of Freedom (DoF); roll , pitch , yaw , height , and lateral position in North and East . Three gains , , and control the error response per DoF. Each controller is therefore parameterised by a gain set of 18 real-valued gains (6 DoFs, 3 gains per DoF).

Figure 2: PID control structure, showing attitude and position loops. Parameters denote error limits for height yaw and attitude respectively. and are minimum and maximum motor commands, and , , , and are command inputs to a mixer which outputs speed controller commands to (assuming a hexacopter).

At the start of an experiment, we bootstrap our population by repeatedly randomly initialising gain sets within allowed ranges (1). Here is a generalised maximum possible command (Pulse Width Modulation, or PWM) for each parameter : =300 for , , , and , and 20cm for /. The gain set is accepted into the initial population if it allows the UAV to stay in the air for s when attempting to hover at a height of 10cm with all over DoFs neutral. When the population size reaches , the first generation begins.


We evaluate each gain set by loading it into the PID structure. To ensure a fair, repeatable test, the UAV is reset to a designated start position (in the centre of the floor area with 60) between evaluations. Once reset, the UAV attempts to fly a predefined waypoint set, where each waypoint defines a desired setting of position and yaw: (, , , ). To successfully minimise error in the position waypoints, roll and pitch error must also be minimised; hence all 6 DoFs are optimised by these waypoints. Note that the fan is not reset between trials; this prevents overspecialisation but provides different conditions to the controllers. To ensure a fair test we re-evaluate any successful controller 3 times in total to prove generality to wind conditions, and use an average fitness score (see Section2.3).

As our controller responds to error, we must continually estimate the UAV’s state to compare to our desired waypoint. An onboard Inertial Measurement Unit (IMU, a Microstrain GX4-25) estimates roll , pitch , yaw (250Hz), and height (20Hz) when combined with an optical range finder. An external camera estimates position in North and East at 60Hz. Angular velocities , , (250Hz) are derived from two consecutive Euler angles, and position velocities , ,

(60Hz) calculated through a linear regression of five consecutive position estimates. Height is processed through a complimentary filter; the Kalman filters integrated into the IMU were bypassed and manually re-implemented to give us full control and observability of the system.

The PID uses the state estimate to calculate errors in all DoFs. Errors are taken as the difference between the waypoint value and the UAV’s current value per DoF. Each error is limited (10cm for , 15 for attitude, 15cm for ) before being input to the PID.

The PID takes these error signals, and in response produces four outputs (, , , and ), which represent commanded changes in roll, pitch, yaw, and thrust, scaled to fall in the range of possible motor PWMs (=1090 and =1950). Error response follows (2), where is the PID output, is the instantaneous time, and is the integration time step from 0 to . These outputs are passed to a linear mixer, which provides one control command per motor , to at 250Hz.


2.3 Evolutionary Algorithm

During evaluation, a controller accumulates fitness (initially 0) at 250Hz by adding a per-Hz fitness measure (max. 10) to a running total (max. 150,000). is a compound fitness function (3), with components measuring the error in pitch/roll (), horizontal and vertical velocity (, ), height (), yaw (), horizontal position (), pitch/roll rates (), and staying within controller limits (). Full calculations are shown in Appendix A.


Once each gain set has a fitness, we use self-adaptive Differential Evolution (DE) (Storn and Price, 1997)

to generate new gain sets. DE is a state of the art optimiser of real-valued vectors, such as our PID gain sets 

(Chiha et al., 2012; Biswas et al., 2009), and has seen previous success in a robotic context (Moravec and Pošík, 2014); further justification for using DE can be found in (Howard, 2017a).

A donor vector v is created for each gain set in the population p following (4). is a differential weight, and r1, r2, and r3 are the gain sets of three unique randomly-selected individuals.


A child controller c is created for each parent by probabilistically replacing elements of p with those of v. For each vector index , = if or CR, otherwise = . is a uniform-random number in range [0,1], and CR is the crossover rate. is a vector index, selected randomly per c, ensuring c contains at least one element of v. Each child is evaluated and assigned a fitness , and replaces its parent if it is fitter. This concludes a generation. Each subsequent generation involves creation of one child per parent, evaluating and assigning fitness, and creating the next generation by selecting parents and children based on fitness, as above.

Self-adaptation based on an Evolution Strategy (Rechenberg, 1973) allows the testbed to tailor the learning process to the UAV, mission, and payload under consideration (Howard, 2017a). New population members random-uniformly initialise their CR and F, respecting bounds for CR=[0,1], and F=[0,2]. Each copies CR and F from its , and modifies them following (5), before using them to alter its controller.


To prevent the learning process being stuck due to suboptimal rate settings, the rates of a given parent are uniform-randomly reinitialised if it does not create a fitter child within 5 generations333Selected to balance search stability and convergence times following a parameter sweep.. This combination of self-adaptation with restarts has been shown to promote high-fitness controllers, while significantly reducing the number of generations required, when compared to optimally-set static rates (Howard, 2017b).

As a stochastic optimser, we must be mindful of testing gain sets that are harmful to the UAV. ESM monitors the UAVs behaviour throughout an evaluation, and safely terminates as required to preserve the UAV. Terminated controllers are assigned their currently-accumulated fitness. Termination occurs when;

  • angles , ,

  • any angular rate,

  • horizontal velocities / 50cm/s,

  • vertical velocity 25cm/s,

  • total current draw ,

  • if the UAV does not take off within 5s of evaluation start,

  • or if the UAV pulls on the tether (18cm). 444This latter criterion prevents the UAV from cheating by using the tether to ’balance’ itself.

Any controller that successfully completes the entire waypoint set once is re-evaluated twice more, and assigned the average fitness of all three runs. If the controller completes all three repeats, it is called a success. The experiment ends when we have an entire population of successful controllers.

2.4 Experimental

Our testbed is designed to optimise UAV controllers for real-world flight. Design decisions to date support this goal: optimising real UAVs, with real payloads, in strong, realistic wind disturbances. However, stochastic optimisation of hardware UAV controllers for real-world conditions is challenging, as the amount of state space exploration must be balanced with the requirement for non-destructive controller evaluation. Post-hoc analysis of some previously-generated controllers showed degraded performance when the UAV was removed from its tether and allowed to free-fly inside the test bed. There are two main reasons for this. Firstly, the state space available to the UAV may be too limiting, resulting in a lack of generalisation. Secondly, controllers optimised in the ’ground’ effect555when flying close the the floor, the ground deflects a propellers airflow, causing increased thrust nearer the ground for the same power input (Johnson, 2012), may perform sub-optimally when outside of it. A widely used ground-effect equation is shown in (6), where is the radius of the rotor, is the vertical distance to the ground, is the thrust in ground effect, and is the thrust produced at the same power outside of the ground effect.


It follows that ground effect is negligible () when 26664 in certain circumstances (Powers et al., 2013), and that the influence of the ground effect diminishes rapidly the UAV gets further from the ground (6). The simplest way to mitigate both of these factors is to increase the amount of flying area available to the UAV, so TSE appears to be a simple and viable route towards real-world flyability.

In our first experiment, we compare the original approach of using a single physical tether throughout the experiment (One-Stage Evolution, OSE) with a new incremental method that partially evolves controllers with the original tether, before switching to a more permissive tether to complete the evolution in a larger state space (Two-Stage Evolution, TSE), as shown in Fig. 3.

Figure 3: Visualising the difference in flight envelope attainable in OSE (22cm, orange) and TSE (62cm, yellow).

Controllers are optimised on a wind-affected hover scenario, with a total evaluation length of 60s. and waypoint transitions every 10s. Transitions are rate-limited: , N/E=0.1m/s, h=0.2m/s. The waypoints are selected to excite all of the UAV’s six DoFs by requiring controlled movements in each, and shown in Table 1.

t (s) N (m) E (m) h (m)
0 0, 0 0, 0 0.2, 0.2 40, 40
10 0.06, 0.15 -0.06, -0.15 0.2, 0.25 -5, -5
20 -0.06, -0.15 0.06, 0.15 0.2, 0.4 40, 40
30 0.06, 0.15 0.06, 0.15 0.2, 0.25 85, 85
40 0.06, 0.15 0.06, 0.15 0.2, 0.4 40, 40
50 0, 0 0, 0 0.2, 0.25 40, 40
60 stop stop stop stop
Table 1: Waypoints for OSE (left in cell) and TSE (right in cell). Position error is calculated from the centre of the UAV.

The OSE tether is set to 22cm, the maximum length that prevents the UAV from flipping. The TSE tether is set to 62cm, the maximum length that prevents the UAV from contacting the fan, to more closely represent free flight. The experiments proceed as follows:

  • OSE controllers fly the OSE waypoint set until all controllers are successful.

  • TSE controllers initially fly the OSE waypoint set until the population contains its first successful controller. A new population (=20) is created from copies of this controller, with each gain subject to uniform noise of 25% of the total range of that gain when copied. The UAV is transferred to a longer tether (Fig. 3), and experimentation continues on TSE waypoints until all controllers are successful.

Ten experimental repeats of OSE and TSE are run, with statistical significance assessed with a Mann-Whitney U-test (which does not require normally-distributed samples). OSE and TSE repeats both have a population size of 20 controllers, and are run until the entire population is filled with successful controllers.

3 Results and Discussion

As we operate in hardware and are limited to real-time evaluations, reducing the number of evaluations required to optimise our controllers is particularly interesting to us (see, e.g., (Howard, 2017b)). Here, we see that TSE lowers the mean convergence generation from 45.6 (OSE) to 32.5 (Fig. 4, which is statistically significant (p

0.05) and indicates that TSE is a viable technique for reducing the number of generations required to perform an experiment. Although generally more closely distributed around the mean, standard deviation for TSE convergence is higher than OSE (12.32 vs. 10.13) due to a single outlier at generation 64. Visualising individual experimental fitness progressions reinforces this. Although TSE typically converges faster than OSE (Fig. 

5(a) vs Fig. 5(b)), the outlier in Fig. 5(b) (a grey line) lags far behind the other repeasts. Despite this, TSE is clearly preferable in terms of convergence.

Figure 4: A box plot comparing convergence performance for OSE and TSE.
Figure 5: Showing individual fitness progressions for (a) OSE and (b) TSE in experiment 1. Aside from a single outlier (grey line) in (b), OSE experiments typically take much longer to converge than their TSE counterparts.

Both OSE and TSE produce high fitness controllers (Fig. 5(a) and (b)). Final fitness values are quantatively similar, but cannot be directly compared as consecutive TSE waypoints are typically further from each other than OSE waypoints. Mean fitness for OSE and TSE controllers was 99234 vs. 97184, 93090 vs. 90880, and 82786 vs. 82735, for best, average, and worst fitness respectively.

As a more representative test of the generalisation ability of the controllers, we run the highest fitness controller per repeat on an unseen waypoint set. Each controller in this set would be selected for any real-world deployment, based on their performance in the first experiment.

This waypoint set has the following transitions, with one transition per 10s as before. The four array elements respectively represent position in N(m) and E(m), height h(m), and yaw : ( [0.0, 0.0, 0.4, -10], [-0.25, 0.25, 0.2, 45], [0.25, -0.25, 0.4, 85], [0.25, 0.25, 0.4, 85], [0.0, 0.0, 0.3, -10], [-0.25, -0.25, 0.4, 45]). The UAVs are placed on the longer stage 2 TSE tether. Each controller is evaluated 20 times to generate reliable results. The UAV flies away from significant ground effect.

Figure 6: Showing the mean fitness for 20 repeats of the best (a) OSE and (b) TSE controllers per experimental run, when evaluated on the unseen waypoint set.

Results are shown in Fig. 6. Notably, half of the OSE controllers (those from repeats 2, 3, 5, 8, and 10, Fig. 6(a)) are seen to struggle when flying the TSE waypoints on a longer tether. This suggests that the OSE setup does not properly expose the required state space to the controller, and as such some OSE controllers struggle when flying the new waypoint set. All TSE controllers (Fig. 6(b)) can handle the TSE waypoint set, displaying a consistently higher fitness than OSE controllers.

It was observed that many OSE controllers from the aforementioned repeats oscillate, which causes premature fight termination and the resulting low fitness values. This is likely due to the restricted state space available during OSE experiments, and indicates that OSE is not the best way to evolve controllers for real-world scenarios. TSE is shown to be more successful in producing such controllers by being able to provide a more expansive state space during evolution. However, it should be noted that both OSE and TSE are capable of generating real-world flying controllers (OSE repeats 1, 4, 6, 7, and 9 in Fig. 6(a)).

The best fitness score overall for OSE was 102363.1, and for TSE 100188.1. Averaged across the entire experimentation on the new waypoint set, OSE fitness was 69810.6, and TSE fitness was 91839.7. Averaging each controller’s repeats individually, this is statistically significant.

Next, we assess the affect of the TSE experimental procedure on our self-adaptive mutation strategy. The mean crossover rate varies between 0.482 and 0.553 for the first stage of TSE, raising to a maximum of 0.589 for the second stage (Fig. 7(a)). Context-sensitive adaptation of the learning rates is seen between stage 1 and 2 in TSE, in particular the rise in is a response to the reseeding of controller parameters, which subsequently requires more parent-child variance. Self-adaptive rates also vary between OSE and TSE, showing again that rates can adapt to experimental setup as required.

has more effect on the evolutionary process than ; as such the setting of between stage 1 and 2 in TSE is more pronounced than the corresponding change observed between stages for , which agrees with the literature of rate setting in DE(Storn and Price, 1997), and supports previous results for our testbed (Howard, 2017a). self-adapts from 0.57 to 0.672 within 2 generations of stage 2 commencing, which induces more variance in the search process, either (i) in response to the increased search space, or (ii) to combat the increased parameter convergence in the stage 2 population. This feature is key to TSE achieving more expedient convergence than OSE. Stage 2 rates are statistically different from both stage 1 and OSE rates.

Figure 7: Self-adaptive rates and in experiment 1 for OSE and TSE.

3.1 Controller Optimality and Sensitivity

Next, we incrementally step through a range of feasible gain settings to show the optimality of our discovered controllers. We focus on height control as it is most critical to the UAVs ability to accumulate fitness. Mapping the effects of changing gains on controller fitness allows us to (i) confirm that our testbed produces the highest fitness controllers possible, and (ii) estimate how brittle those controllers are. Note that without the testbed, this mapping would be extremely taxing to achieve.

The highest fitness controller from TSE is used; we take the three height gains for that controller (P, I, and D), and for all perturbations, hold one gain static whilst systematically varying the other two gains within the following ranges:

  • P gains step from -0.1 to -1, in increments of 0.1.

  • I gains step from -0.15 to -1.5, in increments of 0.15.

  • D gains step from -0.05 to -0.5, in increments of 0.05.

This gives us ten possible settings per gain. Ranges were chosen to fall centrally around the corresponding value from the best evolved controller, whilst covering a wide range of possible rate settings. Each gain combination is run 20 times on the final waypoint set using the TSE tether, and fitness values are recorded and averaged. The fitness landscape produced by this exhaustive gain search (Fig. 8) shows performance degradation when gains diverge from their optimal values.

Figure 8: (a)-(c) fitness heat maps obtained from varying gain combinations I&D, P&I, and P&D respectively, for height control.

Notably, the maximum fitness values discovered through this search reach within 3 of the best evolved controller’s fitness; in other words, evolution consistently finds the best controller settings. Owing to the repeatability offered by the testbed, we can confirm that our best evolved height controller lies within the discretised global maximum discovered through exhaustive search (although in significantly less time, with fewer evaluations required).

Mapping the search space in this way also allows us to easily see infeasible solution subspaces, e.g. low I (-0.15) where the UAV is unable to take off before being ‘timed out’ after 5s of inactivity, or low D (-0.05) gains causing trial-ending overshoots; which is an important feedback for future experimental design in terms of being able to ignore certain solution subspaces a priori. The complexity of the problem (through e.g., sensory state estimation, noisy fitness evaluations, and other real-world artefacts) is evidenced through the appearance of multiple disconnected local maxima, shown in Fig. 8(b) and (c).

3.2 Conclusions

In this paper we investigated a method to increase the real-world flyability of our UAV controllers by increasing the amount of state space, and reducing the ground effect. Wße compared our standard method of restrictive tethering (OSE) to a new method of partially evolving the controllers, and subsequently transferring to a longer tether (TSE) for extra exploration.

Statistical comparisons between OSE and TSE shows that TSE produces more general controllers than OSE. As such, TSE will be the new de facto technique as we continue our experimentation on the testbed. We note that this technique can potentially be extended with various tether lengths, where the incremental learning we demonstrate here allows for the safe transition to progressively larger and larger state spaces.

Finally, we used the testbed to perform an exhaustive gain analysis, and showed that our testbed successfully finds the global optimum gain settings for the scenario we investigated.

These findings motivate further development towards our final goal quickly optimising high-performance controllers for arbitrarily-configured UAVs, that are optimised to mission, payload, and morphology, and are guaranteed to work in the real world. In particular, we see an opportunity for our testbed in optimising the control and behaviour UAVs with arbitrary morphologies. Such UAVs could be designed through evolution, and come in a range of unconventional physical configurations that may not be easily modellable. Optimisation in our testbed guarantees real-world performance and does not require a model, and as such will be a prominent technique for allowing such UAVs to reach their potential.

4 Funding Sources

This research received funding from the CSIRO Office of the Chief Executive for the Postdoctoral position in Evolutionary Aerial Robotics.

5 Conflicts of Interest

Gerard David Howard declares no conflict of interest. Alberto Elfes declares no conflict of interest.

Appendix A: Fitness Function

Symbol Definitions - Fitness Function


  • Acar et al. (2001) Acar E, Zhang Y, Choset H, Schervish M, Costa AG, Melamud R, Lean D, Graveline A (2001) Path planning for robotic demining and development of a test platform. In: International Conference on Field and Service Robotics, vol 1, pp 161–168
  • Biswas et al. (2009) Biswas A, Das S, Abraham A, Dasgupta S (2009) Design of fractional-order pi d

    controllers with an improved differential evolution. Engineering applications of artificial intelligence 22(2):343–350

  • Chiha et al. (2012) Chiha I, Ghabi J, Liouane N (2012) Tuning pid controller with multi-objective differential evolution. In: Communications Control and Signal Processing (ISCCSP), 2012 5th International Symposium on, pp 1–4
  • De Nardi et al. (2006)

    De Nardi R, Togelius J, Holland O, Lucas S (2006) Evolution of neural networks for helicopter control: Why modularity matters. In: Evolutionary Computation, 2006. CEC 2006. IEEE Congress on, pp 1799–1806

  • Degrave et al. (2015)

    Degrave J, Burm M, Kindermans PJ, Dambre J, et al. (2015) Transfer learning of gaits on a quadrupedal robot. Adaptive Behavior pp 4486–4491

  • Eiben and Smith (2003) Eiben AE, Smith JE (2003) Introduction to Evolutionary Computing. SpringerVerlag
  • Faina et al. (2017) Faina AJ, Toft L, Risi S (2017) Automating the incremental evolution of controllers for physical robots. Artificial Life 23(2):142–168
  • Floreano et al. (2003)

    Floreano D, Zufferey JC, Mattiussi C (2003) Evolving Spiking Neurons from Wheels to Wings. In: Dynamic Systems Approach for Embodiment and Sociality, Advanced Knowledge International, International Series on Advanced Intelligence, vol 6, pp 65–70, k. Murase and T. Asakura (eds.)

  • Ghiglino et al. (2015) Ghiglino P, Forshaw JL, Lappas VJ (2015) Online evolutionary swarm algorithm for self-tuning unmanned flight control laws. Journal of Guidance, Control, and Dynamics 38(4):772–782
  • Gongora et al. (2009) Gongora M, Passow B, Hopgood A (2009) Robustness analysis of evolutionary controller tuning using real systems. In: Evolutionary Computation, 2009. CEC ’09. IEEE Congress on, pp 606–613
  • Harvey et al. (1994) Harvey I, Husbands P, Cliff D (1994) Seeing the light: Artificial evolution, real vision. School of Cognitive and Computing Sciences, University of Sussex Falmer
  • Heijnen et al. (2017) Heijnen H, Howard D, Kottege N (2017) A testbed that evolved hexapod controllers in hardware. In: Robotics and Automation (ICRA), 2017 IEEE/RSJ International Conference on, IEEE, p In press
  • Holland and Nardi (2008) Holland OE, Nardi RD (2008) Coevolutionary modelling of a miniature rotorcraft. In: Intelligent Autonomous Systems 10 (IAS10), IOS Press
  • How et al. (2008) How JP, BEHIHKE B, Frank A, Dale D, Vian J (2008) Real-time indoor autonomous vehicle test environment. IEEE control systems 28(2):51–64
  • Howard (2017a) Howard D (2017a) A platform that directly evolves multirotor controllers. IEEE Transactions on Evolutionary Computation
  • Howard and Elfes (2014) Howard D, Elfes A (2014) Evolving spiking networks for turbulence-tolerant quadrotor control. In: International Conference on Artificial Life (ALIFE14), pp 431–438
  • Howard and Merz (2015) Howard D, Merz T (2015) A platform for the direct hardware evolution of quadcopter controllers. In: Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on, IEEE, pp 4614–4619
  • Howard (2017b) Howard GD (2017b) On self-adaptive mutation restarts for evolutionary robotics with real rotorcraft. In: Proceedings of the 17th Annual Conference on Genetic and Evolutionary Computation, ACM, p In press
  • Jakobi et al. (1995) Jakobi N, Husbands P, Harvey I (1995) Noise and the reality gap: The use of simulation in evolutionary robotics. In: Advances in artificial life, Springer, pp 704–720
  • Johnson (2012) Johnson W (2012) Helicopter theory. Courier Corporation
  • Koos et al. (2010) Koos S, Mouret JB, Doncieux S (2010) Crossing the reality gap in evolutionary robotics by promoting transferable controllers. In: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, ACM, New York, NY, USA, GECCO ’10, pp 119–126
  • Koppejan and Whiteson (2009)

    Koppejan R, Whiteson S (2009) Neuroevolutionary reinforcement learning for generalized helicopter control. In: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, ACM, New York, NY, USA, GECCO ’09, pp 145–152

  • Merz et al. (2006) Merz T, Rudol P, Wzorek M (2006) Control system framework for autonomous robots based on extended state machines. In: Autonomic and Autonomous Systems, 2006. ICAS’06. 2006 International Conference on, IEEE, pp 14–14
  • Moravec and Pošík (2014) Moravec J, Pošík P (2014) A comparative study: the effect of the perturbation vector type in the differential evolution algorithm on the accuracy of robot pose and heading estimation. Evolutionary Intelligence 6(3):171–191, DOI 10.1007/s12065-013-0090-2, URL https://doi.org/10.1007/s12065-013-0090-2
  • Nishiwaki et al. (2000) Nishiwaki K, Sugihara T, Kagami S, Kanehiro F, Inaba M, Inoue H (2000) Design and development of research platform for perception-action integration in humanoid robot: H6. In: Intelligent Robots and Systems, 2000.(IROS 2000). Proceedings. 2000 IEEE/RSJ International Conference on, IEEE, vol 3, pp 1559–1564
  • Powers et al. (2013) Powers C, Mellinger D, Kushleyev A, Kothmann B, Kumar V (2013) Influence of aerodynamics and proximity effects in quadrotor flight. In: Experimental robotics, Springer, pp 289–302
  • Rechenberg (1973) Rechenberg I (1973) Evolutionsstrategie: optimierung technischer systeme nach prinzipien der biologischen evolution. Frommann-Holzboog
  • Rossi and Eiben (2014) Rossi C, Eiben AE (2014) Simultaneous versus incremental learning of multiple skills by modular robots. Evolutionary Intelligence 7(2):119–131, DOI 10.1007/s12065-014-0109-3, URL https://doi.org/10.1007/s12065-014-0109-3
  • Samuele et al. (2010) Samuele R, Varshneya R, Johnson T, Johnson A, Glassman T (2010) Progress at the starshade testbed at northrop grumman aerospace systems: comparisons with computer simulations. In: Proc. SPIE, vol 7731, p 773151
  • Scheper et al. (2016) Scheper KYW, Tijmons S, de Visser CC, de Croon GCHE (2016) Behavior trees for evolutionary robotics. Artificial Life 22(1):23–48
  • Storn and Price (1997)

    Storn R, Price K (1997) Differential evolution – a simple and efficient heuristic for global optimization over continuous spaces. J of Global Optimization 11(4):341–359

  • Sutton and Barto (1998) Sutton RS, Barto AG (1998) Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, USA
  • Yosinski et al. (2011) Yosinski J, Clune J, Hidalgo D, Nguyen S, Zagal J, Lipson H (2011) Evolving robot gaits in hardware: the hyperneat generative encoding vs. parameter optimization. In: Proceedings of the 20th European Conference on Artificial Life, pp 890–897