ROS packages of the Autonomous landing system of a UAV in Gazebo
Autonomous landing is a capability that is essential to achieve the full potential of multi-rotor drones in many social and industrial applications. The implementation and testing of this capability on physical platforms is risky and resource-intensive; hence, in order to ensure both a sound design process and a safe deployment, simulations are required before implementing a physical prototype. This paper presents the development of a monocular visual system, using a software-in-the-loop methodology, that autonomously and efficiently lands a quadcopter drone on a predefined landing pad, thus reducing the risks of the physical testing stage. In addition to ensuring that the autonomous landing system as a whole fulfils the design requirements using a Gazebo-based simulation, our approach provides a tool for safe parameter tuning and design testing prior to physical implementation. Finally, the proposed monocular vision-only approach to landing pad tracking made it possible to effectively implement the system in an F450 quadcopter drone with the standard computational capabilities of an Odroid XU4 embedded processor.READ FULL TEXT VIEW PDF
ROS packages of the Autonomous landing system of a UAV in Gazebo
Unmanned Aerial Vehicles (UAVs) have recently become popular due to their potential in terms of performing complex tasks such as infrastructure inspection , target detection [2, 3], or search and rescue . The use of these gadgets has led to both substantial improvements in the efficiency of these processes and a reduction in human casualties while performing hazardous labours. The deployment of UAVs in such applications requires a complete suite of sensors such as GPS, laser rangefinders, radar and cameras , which can be used to endow the vehicle with environmental awareness and the capability to perceive events of interest. However, the use of many peripherals in a UAV requires an extensive amount of on-board computational resources and power that are not always available owing to the vehicle’s dimensions and the high implementation costs.
Cameras have been proposed as a feasible alternative to overcome these issues, as they have a relative low price and enable the estimation of content-rich representations of the environment. For instance, cameras have been widely employed in various tasks such as mapping and object tracking . Further research efforts have been conducted to make use of cameras in the development of visual-based autonomous landing systems for UAVs. Autonomous landing maneuvers remain a crucial task for rotorcraft, and allow the development of complete, end-to-end autonomous flying vehicles that are capable of performing complex assignments such as those mentioned above.
Most state-of-the-art visual-based landing systems have shown unprecedented results that are comparable to the performance of UAVs with a full suite of sensors . The employment of natural landmarks to land a rotorcraft in unstructured environments is a strategy commonly used for emergency landing situation [11, 6]. These methods rely on the use of vision-based SLAM algorithms such as ORB_SLAM2  for localization of the vehicle and mapping of appropriate landing spots in the environment. Nevertheless, these techniques are prone to deliver low spatial resolution and be computationally expensive, hindering the performance of autonomous landing applications.
On the other hand, the utilization of artificial landmarks is one of the most traditional techniques used in landings on both static  and moving platforms . The extraction of information from a landmark, such as the relative pose or template coordinates, is broadly used in the application of image-based visual servoing (IBVS), a technique that performs the majority of the control calculations in 2D image space  and reduces the computational load of a small rotorcraft while landing.
Visual servoing is commonly used with classical computer vision methods to track features over several image frames and to create stable control references to land the aircraft on a desired target. Despite the potential of IBVS in terms of autonomous landing, additional assumptions are required, for example that the features in the image are static features of the object, or that the object does not leave the field of view. Furthermore, the implementation of visual-based autonomous landing systems requires rigorous assessments in simulated environments to identify possible perils before deploying the whole system in the real world.
In this work, we address these problems by proposing a complete monocular visual-based perception and control strategy for the autonomous landing of a UAV in a Gazebo-based simulated environment. This system aims to mitigate the current limitations on classic computer vision-based methods created by changes in the appearance in the image by using a Kalman filter to estimate the position of the template throughout the landing process. Additionally, the use of only IBVS techniques for the control of the aircraft reduces the computational cost of the system and eliminates the need for expensive 3D position reconstruction calculations, thus allowing for real-time control of small UAVs with low-cost computers.
Fig. 1 illustrates the general workflow of the proposed method. Initially, the system computes the homography matrix between the current image frame and the predefined template, using a feature-based detector. Next, the homography matrix is used to compute the corners and the centroid of the object in the current image frame. These points are then passed to a Kalman filter estimation module. Finally, the Kalman filter estimations are used to track the template in the image frame, and as a process variable for a set of three PID-based controllers that perform the safe landing of the vehicle.
The full system was developed and assessed in a Gazebo-based simulated environment in order to bridge the gap between real-world deployment and theory, and to reduce the number of risks while the vehicle is tested. All the parameters for the vision and control systems employed in the Gazebo-based simulation were directly transferred to the real-world quad-rotor in a zero-shot222It refers to when parameters are learned or set in a source domain (simulation) and tested without fine-tuning in a target domain (real-world). sim2real (simulation to reality) fashion in order to validate that these simple approaches can be effectively transferred to the vehicle without additional tuning . Overall, the principal contributions of this work can be summarized as follows:
A complete, flexible, Gazebo-based simulation of a visual-based landing system for low-cost UAVs;
The implementation of a Kalman-filter-based methodology for landing platform tracking using monocular vision in both a simulated and a physical drone;
A control strategy for quadcopter landing that is seamlessly implemented using the popular PX4 software-in-the-loop (SITL) Gazebo interface, which facilitates its transfer to a physical drone.
This paper consists of five sections, as follows: Section II presents related work. In Section III, the feature-based detector and Kalman filter are explained. Section IV describes the proposed control strategy for the landing maneuver, while Sections V and VI contain the simulated and experimental results, respectively. Finally, Section VII presents the conclusion.
Autonomous landing for multi-rotor aircraft is a problem that has been extensively studied. Various approaches have relied on the use of vision-based techniques to identify the salient features in an image and to land the vehicle on both static [18, 19, 20] and moving platforms [10, 14, 15]. Classic computer vision methods, such as feature-based extraction and description or homography-based approaches [21, 22], are commonly used to estimate the relative pose of the vehicle with respect to a landing platform at a relative low computational cost.
Spatial information can be extracted from natural and artificial landmarks. In , the authors proposed the use of natural landmarks for the detection and reconstruction of landing sites based on the texture of the ground. Visual-based SLAM techniques are also exploited to assemble world’s representations and find feasible landing spots for the rotor-craft in unstructured environments as shown in . Similarly, the use of artificial landmarks can alleviate the autonomous landing task by providing references with known dimensions for detection and tracking over several image frames .
The use of markers has been exploited to provide a traceable reference for landing control systems and to enhance the position estimation of aerial vehicles through visual inertial odometry (VIO). In , the authors estimated the relative pose of the aircraft with respect to a spherical target and used an extended Kalman filter to fuse these measurements with IMU data to accurately locate the vehicle within the space.
Kalman filters are not exclusively employed to fuse information from multiple sensor sources but also to estimate the states of a system from a unique noisy source [24, 25]. These estimations are used in IBVS, with linear control strategies such as nested PIDs [6, 18] and nonlinear ones like sliding mode controllers , to accurately land an aerial vehicle. The utilization of Gaussian estimators for IBVS provides numerically stable and continuous references for controllers, even when the object of interest is outside the field of view of the camera.
or to automate the complete landing task with deep reinforcement learning (DRL) agents
. However, the use of artificial neural networks requires substantial computational resources for real-time inference and thousand of human-labeled images based on the task at hand. Likewise, visual-based 3D reconstruction techniques tend to be computationally expensive for on-board computers in small UAVs , and need to satisfy various assumptions to achieve accurate pose estimations.
We aim to reduce the computational load when performing IBVS with the use of a vision-based tracking system, and to produce a stable reference for a set of nested PID-based controllers similar to those in . The idea behind the detection and tracking system is to produce a 2D image-based reference for the controller, thus avoiding expensive 3D pose reconstructions as in . In this work, the use of the Kalman filter is restricted to filtering 2D estimations of the landing pad from noisy observations, unlike the application of VIO in most other related work. Contrary to commonly used simulation tools like RotorS , which provide Gazebo-based simulation environments for multi-rotor drones with no interface with a real flight controller, our implementation utilises the SITL provided by PX4, which runs the Pixhawk flight stack, and therefore provides direct support to the physical robot deployment process.
. We first explain how the feature-based object detector detects the landing platform when comparing the platform’s template with the input image. Next, we describe how the system translates the corners and the centroid of the detected platform from the homography matrix to a vector that contains the system observations. Finally, we explain how a tailored Kalman filter is used to estimate the pose of the landing platform, even when no detection has been obtained.
Object detection is a crucial task in robotic perception. Feature-based detectors and descriptors are widely used, due to their speed in computing the salient features of images. For increased robustness in object detection, these features should be invariant to rotation, scale and affine transformations over several frames . To find correspondences between two images, we consider a set of features in the template image and the current frame , where represent the number of features in each image. Each feature in the template and scene frames is associated with a descriptor , where is the dimension of the descriptor for each feature.
With a set of descriptors, it is possible to compute matches between image pairs by performing distance calculations, such as the Euclidean distance between the descriptors of the template and the scene, as shown in (1). Two features are matched when the closest descriptors between two images in the descriptor space have been found. As a result, similar features in the template image are matched with the similar pair in the current image frame, as illustrated in Fig. 2.
Finding correspondences between image pairs allows us to compute the homography matrix . This matrix is a transform that maps points from one image frame (template) to the corresponding points in the other image frame (scene). To compute the homography, at least four matches are needed. Then, knowing the homography between two images and the dimensions of the template , it is possible to apply a perspective transform that maps the template position from the template image to the scene image using (2).
In (2), are the coordinates of points (e.g. corners) in the template image and are the same points mapped in the scene image, where . Object detection with feature-based methods and homography calculations tends to speed up the process and can provide a reliable estimation of the location of the object of interest in the current image frame.
Using the homography matrix, the corners and centroid of the template detected in the current image frame can be computed. is defined as a vector of coordinates, where each row corresponds to a point at time index .
These points are used to determine the observations that will be fed into the Kalman filter. The vector of observations at time is defined as , where are the centroid coordinates of the landing pad; are the width and height of the template, respectively; and represents the angle of the template with respect to the axis of the image, as shown in Fig. 3.
The Kalman filter is an estimator that infers hidden states from indirect, inaccurate and uncertain observations. It is possible to use the Filter to handle noisy observations from the detection module and produce a continuous estimate of the template position at each time step .
We assume that we have a Linear Dynamic System (LDS) for a landing platform such as in (3), where is the coordinate of a pixel at time index and is the time between two consecutive image frames. Similarly, in (4), corresponds to the velocity component of a pixel in the image.
The set of states is given by (5), with as the position of the centroid of the template in the image frame. The filter states are the same as the vector of observations plus their first-order derivatives.
is a white noise random vector such that. is defined as the covariance matrix of the process noise . For the sake of notation,
represents the identity matrix.
Likewise, the measurement model of the filter is given in (8). is defined as the observation matrix and is a white noise random vector such that . As for , here is the covariance matrix of the observation noise.
With the motion and measurement models defined, it is possible to formulate the pose estimation process of the platform by giving the Kalman filter equations (9)-(15). In this set of equations, is defined as the covariance matrix of the posterior estimate, is the innovation vector, is the Kalman gain and is the covariance matrix of the innovation. Additionally, represents the predicted states and the corrected states after a measurement update.
When an observation is obtained by the detection module, the correction phase (11)-(15) is computed immediately after the prediction step. This step aims to correct the error in the estimations using an observation of the template in the current image frame at time .
The set of states produced by the Kalman filter can be applied in an IBVS module to control the landing procedure and to obtain the position of the landing platform in the current image frame, as shown in Fig. 4. These states are computed in the 2D image frame to reduce the computation carried out by the on-board computer of the UAV. Furthermore, since we are not estimating the relative pose between the vehicle and the landing platform, we are not feeding IMU data to the tracking module; this allows for the use of a linear Kalman filter and avoids the calculation of Jacobians at each time-step. Pseudo-code for the vision-based detection and tracking system is given in Algorithm 1.
This section describes the PID-based controller used to autonomously land our rotorcraft. The IBVS controller uses the 2D output of the estimation module as a reference to compute position-velocity control signals to land the vehicle. These signals are sent to the native position-velocity loops implemented in the PX4 flight-stack, which transforms the positions into speeds and then converts them into thrust commands for the vehicle’s engines, to guarantee correct control of the aircraft.
In order to ensure that the aircraft moves towards the landing platform and lands on it, a control strategy is required. Autonomous landing of the vehicle is accomplished by feeding the position estimates of the template from the Kalman filter to a set of three PID-based controllers.
The IBVS PIDs will perform all the calculations in the current image frame. Setting a 2D image-based reference for the controller, and thus avoiding the need for expensive 3D reconstructions, increases the computational speed in on-board computers, allowing for real-time control over the approaches of the vehicle to the landing pad . High-rate controllers tend to be robust against sudden image changes, and with the Kalman filter output as the reference for control, the system is capable of tracking the landing platform even if it is abruptly moved out of the camera’s field of view.
Our approach uses a set of three PID-based controllers attached in a cascade in an outer loop, with the two native controllers already implemented in the Pixhawk flight stack. The controllers of the flight stack have a standard cascaded position-velocity loop, in which the outer position loop transforms the position inputs to velocity outputs and the velocity outputs are converted in the inner loop into thrust commands for the vehicle’s propellers. The idea is to transform pixel coordinate errors into velocity commands and to let the inner controllers of the Flight Controller Unit (FCU) handle the thrust.
We use a reference vector for the controllers , where represent the center of the current image frame and zero corresponds to the desired angle between the aircraft and template measured with respect to the x axis of the image. These controllers command the velocities of the rotorcraft, denoted as , to center the vehicle with respect to the template detected in the current image frame. The third controller modifies the yaw rate of the aircraft in order to align it with the landing pad in the x axis. The error vector of the controllers at time is given by (16).
Fig. 5 is a simplified representation of the three PID-based controllers and an altitude controller with an ON/OFF strategy to control the descent of the UAV. The error vector is used to feed the first three controllers and to produce a control effort , which is delivered to the cascade controllers of the FCU. The output of the three PID controllers is provided by the vector in (17). Each PID controller was discretized using trapezoidal integration and derivation.
The ON/OFF altitude controller in Fig. 5 starts to land the aircraft whenever the difference between the height and width of the template estimates tends to zero (18). represents the output of the altitude controller, is the current height in meters of the vehicle and is a descent constant. This descent condition guarantees that the aircraft will land only if the template dimensions form a square, which is the actual shape of the landing platform.
Acquiring feedback from the vision-based module closes the visual servoing control loop and allows for the implementation of an on-board end-to-end control strategy for a UAV. Algorithm 2 shows pseudo-code for the controller pipeline for the rotorcraft.
This section describes the experiments carried out to assess the different modules of the autonomous landing system using a Gazebo-based simulation. We provide an open-source implementation of our system in Github111 https://github.com/MikeS96/autonomous_landing_uav.
The system was simulated using the SITL provided by PX4, which runs the Pixhawk flight stack in a Gazebo-based environment. Our implementation relies on the SITL simulation environment presented in , where the PX4 on SITL is connected via UDP with an offboard API (ROS), ground station and the gazebo simulator. To obtain accurate results, a custom model of a DJI F450 quad-rotor was implemented to mimic the dynamics and physics involved in a real-world model, as shown in Fig. 6 (a). All the perception and control pipelines of the system, shown in Fig. 1, were implemented in the Robot Operating System (ROS). In addition, a custom Gazebo-world with a landing platform was used to rigorously assess the performance of both the vision and control module.
The assessment of the detection and tracking system was carried out using three different detector-descriptor algorithms, which are efficient to compute and, orientation and scale invariant : ORB, SIFT and SURF. After extracting landing pad detections from the captured aerial images as explained in Section III-A, the Kalman filter was used to estimate the state of the landing pad. Our evaluation procedure demonstrates the improvements obtained by our tracking module compared with plain detection. All the detector-descriptors were tested with the aircraft hovering at a height of meters above the landing pad.
The RANSAC algorithm was used to compute the homography matrix . Both SIFT and SURF used the Manhattan distance to compute the matches between descriptors, whereas ORB employed the Hamming distance. Figure 7 illustrates the results of the three algorithms. The violin plots show the error between the observations and the ground truth of the platform. These plots show the distribution of the error for the five observed states, with the median error represented as a white dot, the interquartile range as a broad black bar in the center of the violin, and the lower/upper adjacent values as a thin line.
It can be seen from Fig. 7 (a) and (b) that the centroid coordinates of the landing platform show similar behavior for the ORB and SIFT detectors, with a median value close to zero. In contrast, SURF has more dispersion in its error distribution and a median of above
pixels. The best detector for the centroid coordinates is SIFT, as it gives a more uniform distribution compared with ORB and SURF, and most of the error values are clustered close to zero.
Figure 7(c) presents the error in the angle , and it can be observed that the SIFT detector gives better performance than the other two detectors. The errors in the width and height can be seen in Fig. 7
(d) and (e), respectively. From this figure, it can be seen that the three detectors have very similar behavior for both variables, although SIFT outperforms ORB and SURF with an error distribution close to zero and few outliers.
The SIFT detector-descriptor is better than the other detectors for all observations . Although ORB shows similar behavior to SIFT for the first three states, it has a large set of outliers for the last two states, while SURF gives the worst performance throughout the observation space.
|States||Using SIFT only||SIFT with Kalman Filter|
|Centroid X [px]||41.93||105.12||2.74||4.71|
|Centroid Y [px]||30.16||78.55||1.12||2.91|
Finally, Table I
shows the average and standard deviation in the errors in pixels between the SIFT detector and the Kalman filter attached to the SIFT detector. The results demonstrate that all of the observations are substantially improved with the Kalman filter, reducing the average error to almost zero and decreasing the standard deviation of each state. This analysis leads to the conclusion that the detection and tracking pipeline can accurately track the landing platform with a SIFT detector and a linear Kalman filter to facilitate the computations in the on-board computers of a small UAV.
To assess the performance of the IBVS control system, three PID-based strategies were implemented. Various tests were carried out with P, PD and PID controllers to determine which was optimal for the landing procedure. For each control strategy, the gains were tweaked in a gazebo-based simulation until the most stable parameters were found for each controller. Using the best gains, five landing trials were conducted in a custom Gazebo environment, and the results were averaged. The image size was set to pixels, and the SIFT detector-descriptor was employed.
Fig. 8 presents the output of the three controllers for each state . The first two figures (Fig. 8(a) and (b)) correspond to the centroid of the landing platform. It can be seen that the three controllers were capable of tracking the 2D reference provided by the vision module and to center the vehicle on the pad. However, the P strategy (blue) operated more slowly than the PD (orange) and PID (green) strategies, which tended to land the aircraft faster.
In a similar fashion, all of the controllers were shown to be capable of aligning the heading of the vehicle with the landing platform, as shown in Fig. 8(e). The estimated width and height of the landing pad, as illustrated in Fig. 8(c) and (d), have a tendency to increase as the altitude controller starts the vehicle’s descent. This effect is due to the landing platform becoming bigger in the current image frame as the height of the aircraft decreases.
Although all of the controllers were capable of landing the aircraft, in order to perform a thorough assessment we present, the error for each controller for the states in Table II. From this table, it can be seen that the RMSE and standard deviation (in pixels) for each controller are strikingly similar for the three states under consideration. The P and PID controllers gave better numerical results than the PD controller. However, this behavior was due to the landing speed of the PD controller; since it is capable of landing more quickly, there are fewer samples to compute the RMSE. The PD controller landed in approximately seconds, around seconds faster than the PID controller.
Fig. 9 complements the information in Table II by presenting the dynamic behavior of the error in the first three states for each controller while the aircraft is landing. As mentioned above, the P controller is slower than the other two controllers. PD tends to be a faster strategy and has fewer overshoots in its dynamic behavior. The performance of PID seems to be between those of the other two controllers.
The odometry of the vehicle is presented in Fig. 10 for four different variables for each controller. The first plot in Fig. 10(a), shows how the altitude of the vehicle is reduced to zero for each controller. Both of the linear velocities of the aircraft undergo substantial variation at the beginning of the tests, as shown in Fig. 10(b) and (c), but when the vehicle is centered with respect to the landing platform, the linear speeds tend to zero. Likewise, the yaw rate shown in Fig. 10(e) behaves as expected for the three controllers: its magnitude reduced to zero, which means that the vehicle is correctly aligned with the landing platform.
The position and heading errors between the landing platform and the aircraft were computed for the different trials, and are shown in Table III. It can be seen from the table that the average error in the coordinates is less than centimeters for all the control strategies. Similarly, the error in the angle between the vehicle and the platform is less than degrees. This demonstrates that all of the controllers are capable of achieving a precision landing of the aircraft with small errors over different trials, confirming the efficiency of the vision-based system with various control strategies.
Although all of the controllers were capable of accurately landing the vehicle on the landing platform, the best performance was shown by the PD and PID controllers, as these had more stable responses and lower variations in the different attempts. Although the PD strategy is less accurate than the PID controller, it is the preferred option due to its speed in landing the aircraft.
To assess the robustness of the PD controller under low light conditions and different wind disturbance, Fig. 11 presents the errors obtained over different trials while the aircraft is landing. The error in illustrated in Fig. 11(a) shows how the aircraft is capable to minimize it towards zero with different wind conditions. Similarly, the error in presented in 11(b) demonstrates a similar behavior as 11(a) where the error is minimized, nevertheless, with bigger wind disturbances the aircraft is prone to experience an overdamped response rather than underdamped as Fig. 9 demonstrated. Finally, the angle is considerably affected by the wind disturbances in 11(c) as the vehicle is not capable to align itself with respect to the landing platform. However, the vehicle was capable to land in all tests, validating the effectiveness of our method while landing with unideal conditions.
This section presents the results obtained in real-world tests using a DJI F450 in an autonomous landing sequence.
To thoroughly assess the performance of the autonomous landing system, a custom DJI F450 with an Odroid XU4 on-board computer and Pixhawk 1, as shown in Fig. 6 (b), was used to test the developed framework. Due to the limited computational resources of the Odroid, the PD controller was employed, as this was the fastest method of landing the vehicle, and the result of three landing trials were averaged to evaluate the system. The size of the image was also reduced to 320 x 240 pixels to obtain a frame rate of FPS and to ensure system convergence. The SIFT feature detector-descriptor was used (based on the simulation results) to carry out these tests.
To bridge the algorithms developed during the simulation phase with the real-world, it was necessary to unplug the SITL component. This was achieved by connecting the Pixhawk FCU to the on-board computer and launching all the nodes developed in ROS. This process guaranteed that the system was connected with the physical FCU, bypassing the need for the SITL component. The detection and control pipeline will therefore operate directly in the custom rotor-craft, enabling it to carry out autonomous landing maneuvers. All the parameters used during the simulation where transferred to the aircraft without finetuning to demonstrate that the use of simple vision and control models allow for zero-shot domain transfer.
Fig. 12 presents the results obtained with the PD controller for each state . As expected, the system is capable of landing the rotor-craft on the landing platform within approximately seconds. The real-world system displays more spiky behavior than the simulated vehicle (Fig. 8 (orange)); however, as the test advances, the response of the controller stabilizes, guaranteeing the appropriate landing of the UAV.
Comparably, the RMSE of the controller during the landing procedure was also assessed and presented in Table IV. This error was computed over the three landing trials conducted with the real-world rotor-craft while the vehicle was trying landing. Altogether, it is possible to appraise that the vehicle maintains strikingly similar values of RMSE for the variables when compared with the RMSE presented for the simulation in Table II. In fact, the RMSE is slightly reduced within the real-world landing trials. The plots of these errors are unshown as their dynamic behavior is similar as the ones presented in Fig. 9 (orange).
Finally, to complete the assessment process, Table V  presents the position error between the vehicle and the landing platform. This error was computed as the distance from the center of the pad to the center of the rotor-craft once it had landed. It can be seen that the average value is less than centimeters. Compared to the results in Table III, the error in the real-world implementation of the PD controller is around five times that of the simulation. Although these results seem undesirable, the rotor-craft is capable of precisely landing on the desired platform and accomplishing the autonomous landing task, as expected from the simulation results.
This paper presents a SITL approach to developing a monocular image-based autonomous landing system for quadcopter drones. The proposed method and system, which integrates ROS, Gazebo and PX4’s SITL tools, enables users to not only endow quadcopters with low-cost vision-based autonomous landing capabilities, but also to fine-tune all the parameters of a potentially dangerous device in a safe simulated environment. With minimal modifications, both the vision and control modules developed in our simulated environment, were successfully validated in a physical DJI F450 with an Odroid XU4 on-board computer and a Pixhawk 1 flight controller.
This work was funded by Universidad Autónoma de Occidente (UAO), project 17INTER-297. The authors would like to thank the Robotics and Autonomous Systems (RAS) research incubator and UAO’s Vicerrectoría de Investigaciones, Innovación y Emprendimiento for their support.