AlphaPilot: Autonomous Drone Racing

by   Philipp Foehn, et al.

This paper presents a novel system for autonomous, vision-based drone racing combining learned data abstraction, nonlinear filtering, and time-optimal trajectory planning. The system has successfully been deployed at the first autonomous drone racing world championship: the 2019 AlphaPilot Challenge. Contrary to traditional drone racing systems, which only detect the next gate, our approach makes use of any visible gate and takes advantage of multiple, simultaneous gate detections to compensate for drift in the state estimate and build a global map of the gates. The global map and drift-compensated state estimate allow the drone to navigate through the race course even when the gates are not immediately visible and further enable to plan a near time-optimal path through the race course in real time based on approximate drone dynamics. The proposed system has been demonstrated to successfully guide the drone through tight race courses reaching speeds up to 8m/s and ranked second at the 2019 AlphaPilot Challenge.



There are no comments yet.


page 1

page 4


Autonomous drone race: A computationally efficient vision-based navigation and control strategy

Drone racing is becoming a popular sport where human pilots have to cont...

Autonomous Drone Racing with Deep Reinforcement Learning

In many robotic tasks, such as drone racing, the goal is to travel throu...

Visual Model-predictive Localization for Computationally Efficient Autonomous Racing of a 72-gram Drone

Drone racing is becoming a popular e-sport all over the world, and beati...

Event-based Navigation for Autonomous Drone Racing with Sparse Gated Recurrent Network

Event-based vision has already revolutionized the perception task for ro...

The Artificial Intelligence behind the winning entry to the 2019 AI Robotic Racing Competition

Robotics is the next frontier in the progress of Artificial Intelligence...

Beauty and the Beast: Optimal Methods Meet Learning for Drone Racing

Autonomous micro aerial vehicles still struggle with fast and agile mane...

Image Generation for Efficient Neural Network Training in Autonomous Drone Racing

Drone racing is a recreational sport in which the goal is to pass throug...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

I-a Motivation

Autonomous drones have seen a massive gain in robustness in recent years and perform an increasingly large set of tasks across various commercial industries; however, they are still far from fully exploiting their physical capabilities. Indeed, most autonomous drones only fly at low speeds near hover conditions in order to be able to robustly sense their environment and to have sufficient time to avoid obstacles. Faster and more agile flight could not only increase the flight range of autonomous drones, but also improve their ability to avoid fast dynamic obstacles and enhance their maneuverability in confined spaces. Human pilots have shown that drones are capable of flying through complex environments, such as race courses, at breathtaking speeds. However, autonomous drones are still far from human performance in terms of speed, versatility, and robustness, so that a lot of research and innovation is needed in order to fill this gap.

In order to push the capabilities and performance of autonomous drones, in 2019, Lockheed Martin and the Drone Racing League have launched the AlphaPilot Challenge111, an open innovation challenge with a grand prize of $ million. The goal of the challenge is to develop a fully autonomous drone that navigates through a race course using machine vision, and which could one day beat the best human pilot. While other autonomous drone races [Moon16ram, Moon19springer] focus on complex navigation, the AlphaPilot Challenge pushes the limits in terms of speed and course size to advance the state of the art and enter the domain of human performance. Due to the high speeds at which drones must fly in order to beat the best human pilots, the challenging visual environments (e.g., low light, motion blur), and the limited computational power of drones, autonomous drone racing raises fundamental challenges in real-time state estimation, perception, planning, and control.

Fig. 1: Our AlphaPilot drone waiting on the start podium to autonomously race through the gates ahead.

I-B Related Work

Autonomous navigation in indoor or GPS-denied environments typically relies on simultaneous localization and mapping (SLAM), often in the form of visual-inertial odometry (VIO) [Cadena16tro]. There exists a variety of VIO algorithms, e.g., [Mourikis07icra, Blosch15iros, Qin18tro, Forster17troSVO], that are based on feature detection and tracking that achieve very good results in general navigation tasks [Delmerico18icra]. However, the performance of these algorithms significantly degrades during agile and high-speed flight as encountered in drone racing. The drone’s large translational and rotational velocities cause large optic flow, making robust feature detection and tracking over sequential images difficult and thus often causing substantial drift in the VIO state estimate [Delmerico19icra].

To overcome this difficulty, several approaches exploiting the structure of drone racing with gates as landmarks have been developed, e.g., [Li19arxiv] and [Jun18ral], where the drone locates itself relative to gates. In [Li19arxiv], a handcrafted process is used to extract gate information from images that is then fused with attitude estimates from an inertial measurement unit (IMU) to compute an attitude reference that guides the drone towards the visible gate. While the approach is computationally very light-weight, it struggles with scenarios where multiple gates are visible and does not allow to employ more sophisticated planning and control algorithms which, e.g., plan several gates ahead. In [Jun18ral]

, a convolutional neural network (CNN) is used to retrieve a bounding box of the gate and a line-of-sight-based control law aided by optic flow is then used to steer the drone towards the detected gate. The approach presented in 

[Kaufmann18icra] also relies on relative gate data but has the advantage that it works even when no gate is visible. In particular, it uses a CNN to directly infer relative gate poses from images and fuse the results with a VIO state estimate. However, the CNN does not perform well when multiple gates are visible as it is frequently the case for drone racing.

I-C Contribution

The approach contributed herein builds upon the work of [Kaufmann18icra]

and fuses VIO with a robust CNN-based gate corner detection using an extended Kalman filter (EKF), achieving high accuracy at little computational cost. The gate corner detections are used as static features to compensate for the VIO drift and to align the drones’ flight path precisely with the gates. Contrary to all previous works

[Li19arxiv, Jun18ral, Kaufmann18icra], which only detect the next gate, our approach makes use of any gate detection and even profits from multiple simultaneous detections to compensate for VIO drift and build a global gate map. The global map allows the drone to navigate through the race course even when the gates are not immediately visible and further enables the usage of sophisticated path planning and control algorithms. In particular, a computationally efficient, sampling-based path planner (see e.g., [lavalle2006planning], and references therein) is employed that plans near time-optimal paths through multiple gates ahead and is capable of adjusting the path in real time if the global map is updated.

Ii AlphaPilot Race Format and Drone

Ii-a Race Format

From more than 400 teams that participated in a series of qualification tests including a simulated drone race [guerra2019flightgoggles], the top nine teams were selected to compete in the 2019 AlphaPilot Challenge. The challenge consists of three qualification races and a final championship race at which the six best teams from the qualification races compete for the grand prize of $ million. Each race is implemented as a time trial competition in which each team is given three attempts to fly through a race course as fast a possible without competing drones on the course. Taking off from a start podium, the drones have to autonomously navigate through a sequence of gates with distinct appearances in the correct order and terminate at a designated finish gate. The race course layout, gate sequence, and position are provided ahead of each race up to approximately horizontal uncertainty, enforcing teams to come up with solutions that adapt to the real gate positions. Initially, the race courses were planned to have a lap length of approximately and required the completion up to three laps. However, due to technical difficulties, no race required to complete multiple laps and the track length at the final championship race was limited to about .

Ii-B Drone Specifications

Fig. 2: Illustration of the race drone with its body-fixed coordinate frame in blue and a camera coordinate frame in red.

All teams were provided with an identical race drone (Fig. 1) that was approximately in diameter, weighed , and had a thrust-to-weight ratio of . The drone was equipped with a NVIDIA Jetson Xavier embedded computer for interfacing all sensors and actuators and handling all computation for autonomous navigation onboard. The sensor suite included two forward-facing stereo camera pairs (Fig. 2), an IMU, and a downward-facing laser rangefinder (LRF). All sensor data were globally time stamped by software upon reception at the onboard computer. Detailed specifications of the available sensors are given in Table I. The drone was equipped with a flight controller that controlled the total thrust along the drone’s -axis (see Fig. 2) and the angular velocity in the body-fixed coordinate frame .

Sensor Model Rate Details
Camera Leopard Imaging global shutter, color,
IMX 264 resolution:
IMU Bosch BMI088 range: ,
resolution: ,
LRF Garmin LIDAR-Lite v3 range: -
TABLE I: Sensor specifications.
Sensor Interface
State Estimation
Planning & Control
Drone Interface
Camera 4
Camera 3
Camera 2
Camera 1
gate map
vehicle state
Fig. 3: Overview of the system architecture and its main components. All components within a dotted area run in a single thread.

Ii-C Drone Model

Bold lower case and upper case letters will be used to denote vectors and matrices, respectively. The subscripts in

are used to express a vector from point to point expressed in frame . Without loss of generality, is used to represent the origin of frame , and represents the origin of coordinate frame . For the sake of readability, the leading subscript may be omitted if the frame in which the vector is expressed is clear from context.

The drone is modelled as a rigid body of mass with rotor drag proportional to its velocity acting on it [kai2017drag]

. The translational degrees-of-freedom are described by the position of its center of mass

with respect to an inertial frame as illustrated in Fig. 2. The rotational degrees-of-freedom are parametrized using a unit quaternion , where denotes the rotation matrix mapping a vector from the body-fixed coordinate frame to the inertial frame  [shuster1993attitude]. A unit quaternion consists of a scalar and a vector and is defined as  [shuster1993attitude]. The drone’s equations of motion are


where and are the force and bodyrate inputs, is the drone’s -axis expressed in its body-fixed frame , is a constant diagonal matrix containing the rotor drag coefficients, denotes the drone’s velocity, is gravity and denotes the quaternion multiplication operator [shuster1993attitude]. The drag coefficients were identified experimentally to be and .

Fig. 4: The gate detection module returns sets of corner points for each gate in the input image (fourth column) using a two-stage process. In the first stage, a neural network transforms an input image (first column) into a set of confidence maps for corners (second column) and Part Affinity Fields (PAFs) [cao2017realtime] (third column). In the second stage, the PAFs are used to associate sets of corner points that belong to the same gate. For visualization, both corner maps (second column) and PAFs (third column) are displayed in a single image each. While color encodes the corner class for , it encodes the direction of the 2D vector fields for . The yellow lines in the bottom of the second column show the six edge candidates of the edge class (the corner of the middle gate is below the detection threshold), see Section IV-B. Best viewed in color.

Iii System Overview

The system is composed of five functional groups: Sensor interface, perception, state estimation, planning and control, and drone interface (see Fig. 3). In the following, a brief introduction to the functionality of our proposed perception, state estimation, and planning and control system is given.

Iii-a Perception

Of the two stereo camera pairs available on the drone, only the two central forward-facing cameras are used for gate detection (see Section IV) and, in combination with IMU measurements, to run VIO. The advantage is that the amount of image data to be processed is reduced while maintaining a very large field of view. Due to its robustness, multi-camera capability and computational efficiency, ROVIO [Blosch15iros] has been chosen as VIO pipeline. At low speeds, ROVIO is able to provide an accurate estimate of the quadrotor vehicle’s pose and velocity relative to its starting position, however, at larger speeds the state estimate suffers from drift.

Iii-B State Estimation

In order to compensate for a drifting VIO estimate, the output of the gate detection and VIO are fused together with the measurements from the downward-facing laser rangefinder (LRF) using an EKF (see Section V). The EKF estimates a global map of the gates and, since the gates are stationary, uses the gate detections to align the VIO estimate with the global gate map, i.e., compensates for the VIO drift.

Computing the state estimate, in particular interfacing the cameras and running VIO, introduces latency in the order of to the system. In order to be able to achieve a high bandwidth of the control system despite large latencies, the vehicle’s state estimate is predicted forward to the vehicle’s current time using the IMU measurements.

Iii-C Planning and Control

The global gate map and the latency-compensated state estimate of the vehicle are used to plan a near time-optimal path through the next gates starting from the vehicle’s current state (see Section VI). The path is re-planned every time (i) the vehicle passes through a gate, (ii) the estimate of the gate map or (iii) the VIO drift are updated significantly, i.e., large changes in the gate positions or VIO drift. The path is tracked using a cascaded control scheme (see Section VII) with an outer proportional-derivative (PD) position control loop and an inner proportional (P) attitude control loop. Finally, the outputs of the control loops, i.e., a total thrust and angular velocity command, are sent to the drone.

Iii-D Software Architecture

The NVIDIA Jetson Xavier provides eight CPU cores, however, four cores are used to run the sensor and drone interface. The other four cores are used to run the gate detection, VIO, EKF state estimation, and planning and control, each in a separate thread on a separate core. All threads are implemented asynchronously to run at their own speed, i.e., whenever new data are available, in order to maximize data throughput and to reduce processing latencies. The gate detection thread is able to process all camera images in real time at , whereas the VIO thread only achieves approximately . In order to deal with the asynchronous nature of the gate detection and VIO thread and their output, all data are globally time stamped and integrated in the EKF accordingly. The EKF thread runs every time a new gate detection or LRF measurement is available. The planning and control thread is executed at a fixed rate of . To achieve this, the planning and control thread includes the state prediction which compensates for latencies introduced by the VIO.

Iv Gate Detection

To correct for drift accumulated by the VIO pipeline, the gates are used as distinct landmarks for relative localization. In contrast to previous CNN-based approaches to gate detection, we do not infer the relative pose to a gate directly, but instead segment the four corners of the observed gate in the input image. This allows the detection of an arbitrary amount of gates, and allows for a more principled inclusion of gate measurements in the EKF through the use of reprojection error. Furthermore, it exhibits more predictable behavior for partial gate observations and overlapping gates. Since the exact shape of the gates is known, detecting a set of characteristic points per gate allows to constrain the relative pose. For the quadratic gates of the AlphaPilot Challenge, these characteristic points are chosen to be the inner corner of the gate border (see Fig. 4, 4th column). However, just detecting the four corners of all gates is not enough. If just four corners of several gates are extracted, the association of corners to gates is undefined (see Fig. 4, 3rd row, 2nd column). To solve this problem, we additionally train our network to extract so-called Part Affinity Fields (PAFs), as proposed by [cao2017realtime]. These are vector fields, which, in our case, are defined along the edges of the gates, and point from one corner to the next corner of the same gate, see column three in Figure 4. In Section IV-B, we describe how the PAFs are then used to solve the aforementioned gate association problem.

Iv-a Stage 1: Predicting Corner Maps and Part Affinity Fields

In the first detection stage, each input image is mapped by a neural network into a set of corner maps and PAFs. The network is trained in a supervised manner by minimizing the Mean-Squared-Error loss between the network prediction and ground-truth maps. In the following, ground-truth maps for both map types are explained in detail.

Iv-A1 Corner Maps

For each corner class , , a ground-truth corner map is represented by a single-channel map of the same size as the input image and indicates the existence of a corner of class at pixel location in the image. The value at location in is defined by a Gaussian as


where denotes the ground truth image position of the nearest corner with class . The choice of the parameter controls the width of the Gaussian. We use pixel in our implementation. Gaussians are used to account for small errors in the ground truth corner positions that are provided by hand.

Iv-A2 Part Affinity Fields

We define a PAF for each of the four possible classes of edges, defined by its two connecting corners as . For each edge class the ground-truth PAF is represented by a two-channel map of the same size as the input image and points from corner to corner of the same gate, provided that the given image point lies within distance of such an edge. We use pixel in our implementation. Let be the set of gates and be the set of image points that are within distance of the line connecting the corner points and belonging to gate . Furthermore, let be the unit vector pointing from to of the same gate. Then, the part affinity field is defined as:


Note that a special case might exist in which the same point lies in of several gates. In that case, the of all corresponding gates are averaged.

Iv-B Stage 2: Corner Association

Discrete corner locations for each class are extracted from the corner map using non-maximum suppression and thresholding. This allows the formation of an exhaustive set of edge candidates , see the yellow lines in Fig. 4. Given the corresponding PAF , each edge candidate is assigned a score which expresses the agreement of that candidate with the PAF. This score is given by the line integral



lineraly interpolates between the two corner candidate locations

and . In practice, the line integral is approximated by uniformly sampling the integrand.

As described in [cao2017realtime], extracting the “best” set of edges for class according to this score is an instance of the maximum weight matching problem in a bipartite graph, as each corner can only be assigned one edge. Once this problem is solved for each of the four edge classes, the pairwise associations can be extended to sets of associated points for each gate. We refer the reader to [cao2017realtime] for the detailed solutions of these problems.

Iv-C Network Architecture, Training Data and Deployment

The network architecture deployed consists of a 5-level U-Net [ronneberger2015u] with convolutional filters of size

per level. At each layer, the input feature map is zero-padded to preserve a constant height and width throughout the network. As activation function, LeakyReLU with

is used. The network is trained on a dataset consisting of 28k images recorded in 5 different environments. Each sample is annotated using the open source image annotation software labelme

333, which is extended with KLT-Tracking for semi-automatic labelling. For deployment on the Jetson Xavier, the network is ported to TensorRT To optimize memory footprint and inference time, inference is performed in half-precision mode (FP16) and batches of two images are fed to the network.

V State Estimation

The nonlinear measurement models of the VIO, gate detection, and laser rangefinder are fused using an EKF [kalman1960new]. In order to obtain the best possible pose accuracy relative to the gates, the EKF estimates the translational and rotational misalignment of the VIO origin frame with respect to the inertial frame , represented by and , jointly with the gate positions and gate heading . It can thus correct for an imprecise initial position estimate, VIO drift, and uncertainty in gate positions. The EKF’s state space at time is with covariance , described by


The drone’s corrected pose can then be computed from the VIO estimate by transforming it from frame into the inertial frame using .

All estimated parameters are expected to be time-invariant but subject to noise and drift. This is modelled by a Gaussian random walk, simplifying the EKF process update to:


where is the random walk process noise. For each measurement with noise the predicted a priori estimate is corrected with measurement function and Kalman gain resulting in the a posteriori estimate , as


where is the measurement function with jacobian .

To apply the EKFs linear update step on the over-parametrized quaternion, it is lifted to its tangent space description, similar to [Forster17troOnmanifold]. The quaternion is described by a reference quaternion , which is adjusted after each update step, and an error quaternion , of which only its vector part is in the EKF’s state space.

V-a Measurement Modalities

All measurements at time are passed to the EKF together with the VIO estimate and with respect to the VIO frame .

V-A1 Gate Measurements

Gate measurements consist of the image pixel coordinates of a specific gate corner. Corners are denoted with top left and right, and bottom left and right, as in and the gates are enumerated . All gates are of equal width and height , so that the corner positions in the gate frame can be written as . The measurement equation can be written as the pinhole camera projection [Szeliski10book] of the gate corner into the camera frame. A pinhole camera maps the gate corner point expressed in the camera frame into pixel coordinates as


where indicates the scalar -component of a vector, and are the camera’s focal lengths and is the camera’s optical center. The gate corner point is given by


with and being the transformation between the inertial frame and camera frame ,


where and describe a constant transformation between the drone’s body frame and camera frame (see Fig. 2

). The Jacobian with respect to the EKF’s state space is derived using the chain rule,


where the first term representing the derivative of the projection remains the same for all components of the state space.

V-A2 Laser Rangefinder Measurement

The drone’s laser rangefinder measures the distance along the drones negative -axis to the ground, which is assumed to be flat and at a height of . The measurement equation can be described by


The Jacobian with respect to the state space is again derived by and .

Vi Path Planning

For the purpose of path planning, the drone is assumed to be a point mass with bounded accelerations as inputs. This simplification allows for the computation of time-optimal motion primitives in closed-form and enables the planning of time-optimal paths through the race course in real time. Although the dynamics of the quadrotor vehicle’s acceleration cannot be neglected in practice, it is assumed that this simplifcation still captures the most relevant dynamics for path planning and that the resulting paths approximate the true time-optimal paths well. In the following, time-optimal motion primitives based on the simplified dynamics are first introduced and then a path planning strategy based on these motion primitives is presented.


Fig. 5: Example time-optimal motion primitive starting from rest at the origin to a random final position with non-zero final velocity. The velocities are constrained to and the inputs to . The dotted lines denote the per-axis time-optimal maneuvers.

Vi-a Time-Optimal Motion Primitive

The minimum times , and required for the vehicle to fly from an initial state, consisting of position and velocity, to a final state while satisfying the simplified dynamics with the input acceleration being constrained to are computed for each axis individually. Without loss of generality, only the -axis is considered in the following. Using Pontryagin’s maximum principle [bertsekas1995dynamicprogramming], it can be shown that the optimal control input is bang-bang in acceleration, i.e., has the form


or vice versa with the control input first being followed by . In order to control the maximum velocity of the vehicle, e.g., to constrain the solutions to ranges where the simplified dynamics approximate the true dynamics well or to limit the motion blur of the camera images, a velocity constraint of the form can be added, in which case the optimal control input has bang-singular-bang solution [maurer1977optimalcontrol]


or vice versa. It is straightforward to verify that there exist closed-form solutions for the minimum time as well as the switching times in both cases (15) or (16).

Once the minimum time along each axis is computed, the maximum minimum time is computed and motion primitives of the same form as in (15) or (16) are computed among the two faster axes but with the final time constrained to such that trajectories along each axis end at the same time. In order for such a motion primitive to exist, a new parameter is introduced that scales the acceleration bounds, i.e., the applied control inputs are scaled to and , respectively. Fig. 5 depicts the position and velocity of an example time-optimal motion primitive.

Vi-B Sampling-Based Receding Horizon Path Planning

The objective of the path planner is to find the time-optimal path from the drone’s current state to the final gate, passing through all the gates in the correct order. Since the previously introduced motion primitive allows the generation of time-optimal motions between any initial and any final state, the time-optimal path can be planned by concatenating a time-optimal motion primitive starting from the drone’s current (simplified) state to the first gate with time-optimal motion primitives that connect the gates in the correct order until the final gate. This reduces the path planning problem to finding the drone’s optimal state at each gate such that the total time is minimized. To find the optimal path, a sampling-based strategy is employed where states at each gate are randomly sampled and the total time is evaluated subsequently. In particular, the position of each sampled state at a specific gate is fixed to the center of the gate and the velocity is sampled uniformly at random such the velocity lies within the constraints of the motion primitives and the angle between the velocity and the gate normal does not exceed a maximum angle It is trivial to show that as the number of sampled states approaches infinity, the computed path converges to the time-optimal path.

In order to solve the problem efficiently, the path planning problem is interpreted as a shortest path problem. At each gate, different velocities are sampled and the arc length from each sampled state at the previous gate is set to be equal to the duration of the time-optimal motion primitive that guides the drone from one state to the other. Due to the existence of a closed-form expression for the minimum time , setting up and solving the shortest path problem can be done very efficiently using, e.g., Dijkstra’s algorithm [bertsekas1995dynamicprogramming]. In order to further reduce the computational cost, the path is planned in a receding horizon fashion, i.e., the path is only planned through the next gates.

Vii Control

This section presents a control strategy to track the near time-optimal path from Section VI. The control strategy is based on a cascaded control scheme with an outer position control loop and an inner attitude control loop, where the position control loop is designed under the assumption that the attitude control loop can track set point changes perfectly, i.e., without any dynamics or delay.

Vii-a Position Control

The position control loop along the inertial -axis is designed such that it responds to position errors in the fashion of a second-order system with time constant and damping ratio ,


Similarly, two control loops along the inertial - and -axis are shaped to make the horizontal position errors behave like second-order systems with time constants and damping ratio . Inserting (17) into the translational dynamics (1), the total thrust is computed to be


Vii-B Attitude Control

The required acceleration from the position controller determines the orientation of the drone’s -axis and is used, in combination with a reference yaw angle , to compute the drone’s reference attitude. The reference yaw angle is chosen such that the drone’s -axis points towards the reference position ahead of the current position, i.e., that the drone looks in the direction it flies. A nonlinear attitude controller similar to [brescianini2018attitude] is applied that prioritizes the alignment of the drone’s -axis, which is crucial for its translational dynamics, over the correction of the yaw orientation:


where , , and are the components of the attitude error and where is a diagonal matrix containing the per-axis first-order system time constants for small attitude errors.

Viii Results


Fig. 6: Top view of the planned (left) and executed (center) path at the championship race, and an executed multi-lap path at a testing facility (right). Left: Fastest planned path in color, sub-optimal sampled paths in gray. Center: VIO trajectory as and corrected estimate as .

The proposed system was used to race in the 2019 AlphaPilot championship race. The course at the championship race consisted of five gates and had a total length of . A top view of the race course as well as the results of the path planning and the fastest actual flight are depicted in Fig. 6 (left and center). With the motion primitive’s maximum velocity set to , the drone successfully completed the race course in a total time of , with only two other teams also completing the full race course. The drone flew at an average velocity of and reached the peak velocity of multiple times. Note that due to missing ground truth, Fig. 6 only shows the estimated and corrected drone position.

The system was further evaluated at a testing facility where there was sufficient space for the drone to fly multiple laps (see Fig. 6, right), albeit the course consisted of only two gates. The drone was commanded to pass four times through gate 1 before finishing in the final gate. Although the gates were not visible to the drone for most of the time, the drone successfully managed to fly multiple laps. Thanks to the global gate map and the VIO state estimate, the system was able to plan and execute paths to gates that are not directly visible. By repeatedly seeing either one of the two gates, the drone was able to compensate for the drift of the VIO state estimate, allowing the drone to pass the gates every time exactly through their center. Note that although seeing gate 1 in Fig. 6 (right) at least once was important in order to update the position of the gate in the global map, the VIO drift was also estimated by seeing the final gate.

The results of the system’s main components are discussed in detail in the following subsections, and a video of the results is attached to the paper.

Viii-a Gate Detection

Even in instances of strong illumination changes, the gate detector was able to accurately identify the gates in a range of . Fig. 4 illustrates the quality of detections during the championship race (1st row) as well as for cases with multiple gates, represented in the test set (2nd/3rd row).

Gate detection is evaluated quantitatively on a separate test set of 4k images with respect to intersection over union (IoU) and false positive/negative corner predictions. While the IoU score only takes full gate detections into account, the false positives/negatives are computed for each corner detection. On the test set, the network achieves an IoU score with the human-annotated ground truth of %, an average false negative rate of corners per image and an average false positive rate of corners per image.

With the network architecture explained in Section IV, one simultaneous inference for the left- and right-facing camera requires computing ( per pixel). By implementing the network in TensorRT and performing inference in half-precision mode (FP16), this computation takes on the Jetson Xavier and can therefore be performed at the camera update rate.

Viii-B State Estimation

Compared to a pure VIO-based solution, the EKF has proven to significantly improve the accuracy of the state estimation relative to the gates. As opposed to the works by [Li19arxiv, Jun18ral, Kaufmann18icra], the proposed EKF is not constrained to only use the next gate, but can work with any gate detection and even profits from multiple detections in one image. Fig. 6 (center) depicts the flown trajectory estimated by the VIO system as  and the EKF-corrected trajectory as  (the estimated corrections are depicted in gray). Accumulated drift clearly leads to a large discrepancy between VIO estimate  and the corrected estimate . Towards the end of the track at the two last gates this discrepancy would be large enough to cause the drone to crash into the gate. However, the filter corrects this discrepancy accurately and provides a precise pose estimate relative to the gates. Additionally, the imperfect initial pose, in particular the yaw orientation, is corrected by the EKF while flying towards the first gate as visible in the zoomed section in Fig. 6 (center).

Viii-C Planning and Control

Fig. 6 (left) shows the nominally planned path for the AlphaPilot championship race, where the coloured line depicts the fastest path along all the sampled paths depicted in gray. In particular, a total of different states are sampled at each gate, with the velocity limited to and the angle between the velocity and the gate normal limited to . During flight, the path is re-planned in a receding horizon fashion through the next gates (see Fig. 6, center). It was experimentally found that choosing greatly reduces the computational cost w.r.t. planning over the full track, while having only minimal impact on the flight time. Re-planning the paths takes less than on the Jetson Xavier and can be done in every control update step.

Fig. 6 (right) shows resulting path and velocity of the drone in a multi-lap scenario, where the drone’s velocity was limited to . It can be seen that drone’s velocity is decreased when it has to fly a tight turn due to its limited thrust.

Ix Discussion and Conclusion

The proposed system managed to complete the course at a velocity of with a success rate of and at with a success rate of . At higher speeds, the combination of VIO tracking failures and no visible gates caused the drone to crash after passing the first few gates. This failure could be caught by integrating the gate measurements directly in a VIO pipeline, tightly coupling all sensor data. Another solution could be a perception-aware path planner trading off time-optimality against motion blur and maximum gate visibility.

The advantages of the proposed system are (i) a drift-free state estimate at high speeds, (ii) a global and consistent gate map, and (iii) a real-time capable near time-optimal path planner. However, these advantages could only partially be exploited as the races neither included multiple laps, nor had complex segments where the next gates were not directly visible. Nevertheless, the system has proven that it can handle these situations and is able to navigate through complex race courses reaching speeds up to and completing the championship race track of in .

While the 2019 AlphaPilot Challenge pushed the field of autonomous drone racing, in particularly in terms of speed, autonomous drones are still far away from beating human pilots. Moreover, the challenge also left open a number of problems, most importantly that the race environment was partially known and static without competing drones or moving gates. In order for autonomous drones to fly at high speeds outside of controlled or known environments and succeed in many more real-world applications, they must be able to handle unknown environments, perceive obstacles and react accordingly. These features are areas of active research and are intended to be included in future versions of the proposed drone racing system.