Autonomous Precision Drone Landing with Fiducial Markers and a Gimbal-Mounted Camera for Active Tracking

by   Joshua Springer, et al.

Precision landing is a remaining challenge in autonomous drone flight, with no widespread solution. Fiducial markers provide a computationally cheap way for a drone to locate a landing pad and autonomously execute precision landings. However, most work in this field has depended on fixed, downward-facing cameras which restrict the ability of the drone to detect the marker. We present a method of autonomous landing that uses a gimbal-mounted camera to quickly search for the landing pad by simply spinning in place while tilting the camera up and down, and to continually aim the camera at the landing pad during approach and landing. This method demonstrates successful search, tracking, and landing with 4 of 5 tested fiducial systems on a physical drone with no human intervention. Per fiducial system, we present the number of successful and unsuccessful landings, and the distributions of the distances from the drone to the center of the landing pad after each successful landing, with a statistical comparison among the systems. We also show representative examples of flight trajectories, marker tracking performance, and control outputs for each channel during the landing. Finally, we discuss qualitative strengths and weaknesses underlying the performance of each system.


Conceptual Design of Human-Drone Communication in Collaborative Environments

Autonomous robots and drones will work collaboratively and cooperatively...

Evaluation of April Tag and WhyCode Fiducial Systems for Autonomous Precision Drone Landing with a Gimbal-Mounted Camera

Fiducial markers provide a computationally cheap way for drones to deter...

DronePose: The identification, segmentation, and orientation detection of drones via neural networks

The growing ubiquity of drones has raised concerns over the ability of t...

Human-Drone Interactions with Semi-Autonomous Cohorts of Collaborating Drones

Research in human-drone interactions has primarily focused on cases in w...

Recovery of Meteorites Using an Autonomous Drone and Machine Learning

The recovery of freshly fallen meteorites from tracked and triangulated ...

Control Parameters Considered Harmful: Detecting Range Specification Bugs in Drone Configuration Modules via Learning-Guided Search

In order to support a variety of missions and deal with different flight...

Wall Detection Via IMU Data Classification In Autonomous Quadcopters

An autonomous drone flying near obstacles needs to be able to detect and...

I Introduction

Landing is a remaining task of autonomous drone flight that does not yet have a widespread solution. Finding some reliable method for autonomous landing is a necessary step in enabling fully autonomous mission cycles, and therefore several projects offer potential solutions (see Section III). Although GPS is the main navigational tool of drones in general, it does not provide enough positioning accuracy for landing in many situations (e.g. bad weather, urban canyons, even certain areas of the world). One of the main methods for increasing positioning accuracy in drone landing is to mark a known landing site with fiducial markers (see Section II

) that provide a computationally cheap way for the drone to estimate its position relative to the landing pad using just a monocular camera – arguably the most common drone peripheral sensor. However, to this end, most projects to date have used a fixed, downward-facing camera, which makes it easy for the drone to lose sight of the landing pad in the event of antagonistic events such as wind gusts. It also makes it more difficult for the drone to find a landing pad because it must move a large distance in order to see new areas.

In this paper, we evaluate a method of landing with fiducial markers and a gimbal-mounted camera that allows the drone to track the landing pad during approach and descent, and also allows the drone to search for the landing pad by merely spinning in place and tilting the camera up and down. The gimbal-mounted camera also increases the complexity of the system in multiple ways. First, the tracking system requires some extra, relatively simple components in order to aim the camera at the gimbal correctly – typically the pixel position of the marker in the camera frame as input, and some controller to generate an output signal for the gimbal. Second, whereas systems with downward-facing cameras can use only the position of the detected marker to direct the drone during a landing, systems with gimbal-mounted cameras require both the position of the detected marker, and a coordinate system transform that takes into account the orientation of either the camera/gimbal or the marker itself. The problem here is that many commercially-available gimbals do not provide their orientation data as an output, and the orientations of fiducial markers are subject to ambiguity in many cases (see Section II). It is also possible to add more sensors, e.g. an IMU, to the camera to extract its orientation, but we prefer to minimize requirements of physical components in order to make the system more generalizable. Instead, we assume that the landing pad is relatively level – a warranted assumption in any reasonable case – and carry out the required pose transforms using the somewhat unreliable orientation of the detected fiducial marker. As a precursor to this study, we first conducted tests of 5 fiducial systems to determine their prevalence of orientation ambiguity [fiducial_precursor_evaluation]. We have also conducted tests of the landing method itself in simulation with success [joshua_master_thesis]. We use these 2 precursor tests to inform our real world drone tests in this paper.

Ii Background

Fiducial markers – such as April Tag [apriltag3_paper], WhyCode [whycode_paper], ARTag [ar_tag], ArUco [aruco_orig], etc. – are 2D patterns whose pose (position + orientation) can be determined computationally cheaply using monocular images. While most fiducial systems provide accurate estimates of position (3D translation from the camera to the marker), their orientation estimates are often ambiguous as a result of the limitations of embedding 2D patterns into 3D space. The ambiguity manifests as sign flips in the components of the orientation of the recognized pose, and propagates through subsequent calculations, e.g. coordinate system transforms. This can cause erratic behavior if a drone control signal derives from a transformed pose.

(a) WhyCode “Bundle”
(b) April Tag 24h10
(c) WhyCode
(d) April Tag 48h12
Fig. 1: The 4 landing pads in this paper.

In a precursor study [fiducial_precursor_evaluation], we have evaluated 2 existing fiducial systems – April Tag 48h12 (Figure 0(d)) and WhyCode (Figure 0(c)) – and 3 custom fiducial systems that we have made through modifications of the existing systems – April Tag 24h10 (Figure 0(b)), WhyCode Ellipse (Figure 0(c)), and WhyCode Multi (Figure 0(a)). We evaluate them in terms of orientation ambiguity and runtime framerate when executing on a Raspberry Pi 4. Our modifications are as follows: WhyCode Ellipse changes the sampling locations for the decision problem that determines the orientation of WhyCode markers, WhyCode Multi uses the positions of multiple coplanar markers to determine the orientation of the plane connecting them (and then assumes they all have this orientation), and April Tag 24h10 has a layout that is comparable to but smaller than April Tag 48h12, in the hopes of a higher detection rate. We conclude that the systems can be ranked in 3 statistically different groups in order of increasing orientation ambiguity: {WhyCode Ellipse, April Tag 48h12}, {WhyCode Orig, WhyCode Multi}, and finally {April Tag 24h10}; they can also be ranked in 3 groups in order of decreasing runtime detection rate: {WhyCode Orig, WhyCode Ellipse}, {WhyCode Multi}, and {April Tag 48h12, April Tag 24h10}.

Iii Related Work

Some projects have accomplished precision drone landing with fiducial markers and a fixed camera. In some cases [wynn, accurate_landing_UAV_ground_pattern], they use multiple ArUco markers of different sizes to allow a drone to maintain detection of the landing pad as it approaches very close. In the case of [vision_based_x_platform], the drone detects a single, custom X-shaped marker with 2 fixed cameras – one pointing down and the other pointing forward and down. In [fiducial_vessel_landing_ar_tag_two_fixed_cameras], the drone has a front-facing camera and a downward facing camera, and detects a single AR Tag for landing on a moving boat. The same is also done in [fiducial_landing_two_fixed_cameras_apriltag] but with an April Tag marker. It is also possible to use GPS for an initial approach, with a final approach guided by a fiducial marker – in one case, April Tag [high_velocity_landing]. One method [fiducial_landing_downward_facing_90_deg_gimbaled_camera] uses a DJI Phantom 4, which has a gimbal-mounted camera, but keeps the gimbal pointed 90° down (at the ground) during the landing, when detecting fiducial markers. The drone in [fiducial_landing_many_markers_voting_fixed_camera] uses a fixed, downward facing camera, but does address the issue of orientation ambiguity using many bundled April Tag markers, each of which estimates the location of the drone relative to the landing pad, and uses a voting scheme to choose which estimate it believes. In [fiducial_landing_ship_6dof_single_fixed_downfront_camera_apriltag], the drone has a single camera facing forwards and down, and detects April Tag markers on its landing pad. Finally, one method [lentimark_landing] uses a closed-source, commercialized marker system called Lentimark [lentimark], which mitigates the planar pose ambiguity problem using Moiré patterns on the outside of otherwise conventional AR Tags. They embed a single Lentimark marker inside of an AR Tag, mount them on a post, and land a drone autonomously while tracking the marker during landing with a gimbal-mounted camera. The drone lands on the ground in front of the post.

Some of these projects report visual loss of the marker during landing, as a result of the lack of camera tracking. We contribute a method that goes beyond these projects to actively track the marked landing pad with a gimbal-mounted camera independently of the drone’s movement, enabling both autonomous precision landing in unfriendly environments (e.g. wind) and the ability to easily and safely search for the marker by simply spinning in place and tilting the camera up and down. The drone lands on top of the marked landing pad. We also test new fiducial systems that have not yet appeared in such autonomous landing scenarios, and we focus only on widespread, open source marker systems that have a large community using, supporting, and developing them.

Iv Methods

Iv-a System Overview

Our testing platform is a DJI Spark, which provides stable performance, the ability to test indoors (reducing logistical considerations), little risk in the case of crashes or malfunctions, and access to the DJI Mobile SDK for autonomous control. We have created a custom app [our_android_app] based on the DJI Mobile SDK code samples, which decodes video frames from the Spark via its controller and offloads them to an external companion board for analysis. The companion board – a Raspberry Pi 4 – runs software to detect fiducial markers in the images, then generates control commands and sends them to the drone via the app and controller. While the control system stays the same, the landing pad and fiducial software change, to test each of the 5 fiducial systems in the precursor study (see Section II). Each of the fiducial systems provides exactly the same attributes to the control system: the relative position from the drone to the landing pad (a position target), the pixel position of the landing pad in the camera frame (normalized so for ease of use, where (0,0) is the center), and the yaw of the landing pad (for alignment in the later stages of landing). The control system constrains these attributes to specific intervals depending on the landing phase, and then passes to them DJI Mobile SDK as VirtualStick inputs, such that they appear to the system as input from the controller. The autopilot software uses these as velocity setpoints. The position targets control the drone’s translational velocity, and the normalized pixel positions control the tilt of the gimbal and the yaw of the drone during approach.

The control policy has several phases. During takeoff, the drone ascends in place to an altitude of 1.2 meters (fully automated within DJI Mobile SDK). It then transitions to search, where it spins in place while tilting the camera up and down. After it finds the marker, it can transition to approach, where it moves toward the landing pad quickly, without changing altitude. This phase also has a “deadzone” parameter that disables planar movement if the drone is within a small planar distance to the marker. Then it enters the yaw align phase, where it spins in order to align with the yaw of the landing pad. Once it has aligned, it enters descent and decreases altitude until a specific minimum altitude (different for each marker, see Section V), while correcting its horizontal position above the landing pad. It then enters a landing commit phase, where the DJI Mobile SDK controls its descent and touchdown detection, and disables the motors. Finally, the drone has landed.

At the start of each landing attempt, the drone and landing pad are always placed a constant distance from each other, with the drone facing directly away from the landing pad so that it must search for the landing pad after takeoff. At the first landing attempt, the landing pad faces away from the drone, and it is rotated clockwise after each of the landing attempts, in order to simulate approaches from all directions. We consider a landing attempt successful if it requires no human intervention except starting the landing process, and if the drone touches down fully on the landing pad (not partially touching the ground).

Iv-B Basis of Comparison

We compare the landing performance of the system using each of the 5 fiducial systems, keeping all other factors the same. The markers on each landing pad have the same width at the widest point, although they have different areas as a result of being different shapes. We compare the systems on the number of successful landings they produce, and on the accuracy

of the landing - the ability of the system to minimize the distance from the camera to the center of the marker after touchdown. This distance should be minimized. A Kruskal-Wallis test determines any statistically significant differences in the accuracies of the landings produced by each marker system, where a p-value of

implies that we should reject the null hypothesis that there is no statistical difference among the groups. Pairwise tests rank the systems against each other, where the result of a two-tailed Wilcoxon test

implies that the two systems show some statistical difference, and the result of one-tailed Wilcoxon test implies that the first system results in statistically more accurate landings than the second. If , we do not calculate , because we cannot determine that the two systems are statistically different.

V Results

Table I shows an overview of the performances of the landing system while using each fiducial system. All systems except WhyCode Multi achieved 20 successful landings, with varying numbers of failed landing attempts. WhyCode Multi achieved no successful landings. Figure 2 shows the distribution of the distances from the camera to the center of the landing pad, which is ideally 0. Lower values imply more accurate landings. Table II shows the result of a Kruskal-Wallis test on the distances from the camera to the center of the landing pad, concluding a statistical difference among the groups. Table III shows a result of pairwise Wilcoxon tests on the same data, concluding that April Tag 48h12 outperforms all other systems, and all the other systems are not statistically different from each other.

Figure 3 shows the perceived positions of the drone during its approach during one landing that is representative of the whole. Figure 4 shows the position of the landing pad in the camera frame during the same landing. Figure 5 shows the control output of the drone during that landing.

System Successes Failures
April Tag 48h12 20 3
April Tag 24h10 20 5
WhyCode Ellipse 20 2
WhyCode Orig 20 4
WhyCode Multi 0 10
TABLE I: Summary of the successful landings and failures of each system. Experiments continued until each system (except WhyCode Multi) had achieved 20 successful landings.
Fig. 2: The distances from the camera to the center of the landing pad after the drone has landed. Ideally, this distance should be 0.
0.006065 12.4235
TABLE II: A Kruskal-Wallis test to determine if there are statistical differences in the systems’ abilities to accurately land the drone. It finds a statistical difference.
System 1 System 2
April Tag 48h12 April Tag 24h10 0.014 0.007
April Tag 48h12 WhyCode Ellipse 0.001 0.000
April Tag 48h12 WhyCode Orig 0.005 0.003
April Tag 24h10 April Tag 48h12 0.014 0.993
April Tag 24h10 WhyCode Ellipse 0.701
April Tag 24h10 WhyCode Orig 1.000
WhyCode Ellipse April Tag 48h12 0.001 1.000
WhyCode Ellipse April Tag 24h10 0.701
WhyCode Ellipse WhyCode Orig 0.643
WhyCode Orig April Tag 48h12 0.005 0.997
WhyCode Orig April Tag 24h10 0.985
WhyCode Orig WhyCode Ellipse 0.643
TABLE III: A pairwise Wilcoxon test to determine a possible ranking of the landing systems by lowest distance from the ideal landing location. April Tag 48h12 outperforms the other systems, and the other systems do not have significant differences.
Fig. 3: An example landing trajectory, representative of the set of landing attempts. The East (left/right) position target reduces quickly as the drone turns to face the landing pad. The North (forward/backward) position target decreases slowly as the drone approaches the landing pad. Once both the North and East position targets are adequately small, the drone rotates to align with the landing pad, reducing its yaw displacement. Finally, the Up position target gets closer to 0 at the end of the landing, during the descent phase. The position targets are no longer generated once the drone reaches the landing commit phase, as the DJI Mobile SDK then forces the camera to point directly forward such that it can no longer see the landing pad.
Fig. 4: The normalized pixel positions of the landing pad during an example landing. The main takeaway from this is that the drone was able to keep the marker relatively centered (i.e. near ) in the camera frame even as it was moving towards the landing pad and changing its orientation. The minor variations in the landing pad’s pixel positions do not negatively affect the landing process.
Fig. 5: The VirtualStick control outputs sent to the drone during an example landing, annotated with flight modes. While VirtualStick commands can be in the interval , each control output is constrained to the interval in order to ensure slow, stable movement in the presence of high latency. The throttle which is in so it can only command the drone to go down, not up. Each control output corresponds to a particular rate, i.e. yaw corresponds to clockwise yaw velocity, gimbal tilt corresponds to the angular up velocity of the gimbal, pitch corresponds to the forward velocity of the drone, roll corresponds to the right velocity of the drone, and throttle corresponds to the up velocity of the drone. During takeoff, control outputs are neutral. During search, the drone rotates counterclockwise, and the gimbal tilts up and down. During approach, the drone tracks the landing pad with yaw and gimbal tilt, while approaching with pitch and roll. During yaw align, the drone tracks the landing pad with gimbal tilt, maintains its position with pitch and roll, and aligns to the landing pad’s yaw. Finally, during descent, the drone maintains its position and orientation while applying negative throttle to go downwards.

Vi Discussion

In the given landing scenario, 4 out of 5 systems demonstrated successful landings, with April Tag 48h12 providing the closest-to-ideal landings. April Tag 24h10, WhyCode Ellipse, and WhyCode Orig have statistically similar levels of accuracy, as a result of their limitations. April Tag 24h10’s high orientation ambiguity means it cannot guarantee proper positioning over the landing pad during approach, however it does allow the drone to track the landing pad above a height of 0.4 meters. WhyCode Ellipse and WhyCode Orig, while exhibiting less orientation ambiguity, require higher landing commit heights, below which the rest of their landing is blind, which could be problematic in windy conditions.

All systems experienced occasional visual loss of the landing pad, contributing to the small number of landing failures. In the case of the April Tag systems, this visual loss typically occurred upon initial acquisition of the landing pad, probably because of the intricate structure of April Tag markers, as well as the fact that initial acquisition happens from a relatively long distance and under motion. Once the April Tag markers are identified, however, they provide reliable detection even at very close distances, so that the drone can commit to the landing later, thereby relinquishing control at a safer altitude. On the other hand, the WhyCode markers exhibit reliable long-distance detection even under motion, and present problems at close range, as they begin to eclipse the camera frame completely. They therefore require a higher landing commit altitude, which means they execute more of the landing blindly than April Tag markers allow. These results exactly echo findings from our original precursor study 

[joshua_master_thesis], where we tested this algorithm in simulation, finding that April Tag markers are very sensitive to obstructions, while WhyCode markers are robust to some obstructions. Finally, some visual loss of the landing pad occurs simply as a result of intense sun glares or shadows. This is an inherent problem in identifying markers in the visible spectrum and must be considered when implementing such a landing system.

Perhaps the most interesting finding of this study is that the orientation ambiguity problem does not prohibit autonomous precision landing using a gimbal-mounted camera. Orientation ambiguity caused discontinuities in the control outputs, as predicted in our precursor fiducial system evaluation [fiducial_precursor_evaluation]

. However, these discontinuities resulted only in minor disturbances in the behavior of the drone, rather than fully destructive interference. Importantly, most of the discontinuities occur when the drone is almost directly above the landing pad, since the orientation is most ambiguous when the camera is normal to the fiducial marker. In these cases, the drone’s planar radius to the landing pad is small, and so it is likely to be inside the deadzone, where the control outputs are reduced to 0. Ultimately, it would be better to use some filtering or outlier detection to attempt to entirely remove these erroneous control outputs, however.

A main obstacle in this setup is the restrictions on the system architecture imposed by the DJI Mobile SDK, which requires an app for e.g. authentication with DJI’s servers. While video frames can reach the app relatively quickly, the tablet is not representative of the hardware that will eventually be embedded onboard a drone, and it is difficult to run the necessary fiducial software within the Android environment. Therefore, the video frames were offloaded via a WiFi connection to a Raspberry Pi 4. The task of compressing and transmitting the images reduced the maximum processing framerate to about 6-7 Hz, with an inconsistent delay of between 0.5 and 2 seconds from acquisition of an image to a corresponding command from the companion board. The low framerate and inconsistent procesing times caused the bulk of the problems in these experiments. This latency can be solved by using a different drone platform with an onboard processing unit and using only wired connections, thereby avoiding image compression and transfer. In such a context, the control policy could be less conservative, with higher limits on the control ouputs, more tightly tuned gimbal tracking, etc.

WhyCode Multi unfortunately did not demonstrate any successful landings, although it did provide good performance in the search phase and the initial part of the approach phase. However, it invariably overshot the landing pad every time. While it does appear to output correct position targets in lab experiments, it seems that its position targets are highly dependent on the angle from which the marker is viewed. The camera for the spark can only point between (straight forward) and down. It is possible that the drone could recover from an overshoot if the gimbal were able to point backwards at the marker, but we pass this test to a future study.

Vii Conclusion

We have shown that autonomous precision landing is possible with fiducial markers and a gimbal-mounted camera for active tracking. Of our tested fiducial systems, 4 out of 5 have demonstrated success in the real world, even with high processing latency - with the only non-successful system being WhyCode Multi. Our platform is a DJI Spark which does not have extra onboard processing hardware, but all essential image processing and control signal generation is possible onboard a Raspberry Pi, meaning that all hardware and processing can be embedded into a single drone system in the future. We have also demonstrated that the gimbal-mounted camera setup allows the drone to search for the landing pad by simply spinning in place and sweeping its gimbal up and down, thereby quickly scanning a large search area with little effort.

Viii Future Work

The natural next step is to embed the companion board into a drone and partially re-conduct this proof of concept with a larger drone outdoors. This will significantly reducing the processing latency and likely improve performance. It would be helpful to re-test the WhyCode Multi setup using a gimbal that is able to point backwards, in order to potentially carry out some successful autonomous landings. Finally, methods for filtering any erroneous control signals as a result of orientation ambiguity should be explored.