A Model-free Deep Reinforcement Learning Approach To Maneuver A Quadrotor Despite Single Rotor Failure

09/22/2021 ∙ by Paras Sharma, et al. ∙ 0

Ability to recover from faults and continue mission is desirable for many quadrotor applications. The quadrotor's rotor may fail while performing a mission and it is essential to develop recovery strategies so that the vehicle is not damaged. In this paper, we develop a model-free deep reinforcement learning approach for a quadrotor to recover from a single rotor failure. The approach is based on Soft-actor-critic that enables the vehicle to hover, land, and perform complex maneuvers. Simulation results are presented to validate the proposed approach using a custom simulator. The results show that the proposed approach achieves hover, landing, and path following in 2D and 3D. We also show that the proposed approach is robust to wind disturbances.



There are no comments yet.


page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Quadrotors are used in various applications ranging across military purposes [9], automating deliveries [16], surveillance [29], search and rescue [25], etc. Quadrotors use the differential thrust created by the rotors to fly. The rotors are the most essential part of a quadrotor and the failure of even a single rotor can lead to a catastrophic crash. Hence, there is an impending need for developing strategies to recover a quadrotor with a rotor failure and possibly enable it to retain its maneuvering capability so that the mission can still be continued especially in persistent monitoring applications.

There have been several works that focus on developing recovery strategies under single rotor failures (SRF). There are two possible solution under SRF – (i) modify the configuration of the quadrotor and (ii) develop efficient fault tolerant controllers. The first approach in used in [17, 23, 3, 22] by using tilting rotor configuration. Under SRF, the remaining three rotors change their orientation by rotating along the axis passing through the arm of the UAV. In [2] a morphing quadrotor is designed that changes its configuration by changing the arm configuration. [13] presents a detailed aerodynamic model of the quadrotor and the propellers and provides a fault tolerant strategy using tilting rotors and retractable arms. However, adding additional servos to the quadrotor further increases the complexity of the model and also possibility of additional failures.

Several nonlinear controllers have also been developed to recover from SRF. In [27], a cascade controller is designed to recover the flight from any arbitary position with 3 rotors. In [18], a controller is designed to recover from rotor failures. In [21], a backstepping approach to handle single rotor failure by stopping the rotor opposite to the failed rotor, essentially forming a birotor configuration and performing an emergency landing. [19] uses a configurable centre of mass system for stabilization to achieve similar performance of that of a 4 rotor system. A vision based approach was in [28]

, where the features are obtained using a event driven camera and state-estimation along with a fault-tolerant controller are desiged for recovery. In

[14], a non singular sliding mode controller is designed to handle SRF in quadrotors. In [5], an iterative optimal control algorithm is utilized to design the policies for the vehicle to stabilize under single and double rotor failures.

An alternative approach to recover from SRF is to reinforcement learning (RL) algorithms. In [15], the authors have shown how reinforcement learning controls a quadrotor even with very high level external disturbances. In [7], RL is used to a fault tolerant controller that recovers the vehicle in hover mode and its performance is better than the traditional fault tolerant methods. In [1]

RL and convolutional neural networks are used to achieve recoverable hover condition after rotor failures.

In this paper, we advance the approaches developed in [7, 1] to not only achieve hover, but also achieve maneuverability. We use Soft Actor-Critic (SAC) deep reinforcement learning algorithm to achieve fault tolerance. Further, we use a model-free methodology compared to model-based [7, 1] which allows the system to learn any kind of uncertainties and unmodelled errors which can be an issue in model-based learning. The main contributions of this paper are

  • [leftmargin=*]

  • Formulate the problem as a deep reinforcement learning problem through the use of SAC

  • Develop a simulator for SRF

  • Show that SAC-based RL framework prevents a quadrotor from crashing and provides maneuverability along all the 3 axes, with the remaining three active rotors.

  • Show that the quadrotor with a SRF can preform three types of functions (a) hover, (b) land and (c) path following in 3D.

Ii Problem Formulation

Fig. 1: The body frame of a quadrotor. We consider a NED (North East Down) frame of reference. The rotors 1, 2, 3 and 4, at the ends of each arm of length , generate upward forces and , while rotating at angular velocities of and . Rotation along the north axis is roll, rotation along the east axis is pitch and rotation along the down axis is yaw.

Quadrotors use differential thrust generated from fixed rotors for stabilization and control. Four control inputs are required for attitude control – roll, pitch and yaw and thrust for altitude control. Due to presence of rotors, they are susceptible to rotor failure. In Fig. 1 we see that the rotors generate thrust () in the opposite direction of the weight of the quadrotor . The thrust from each rotor generates a torque on the centre of mass of the quadrotor. When the thrusts on each rotor are equal, the resultant torque is 0, enabling the vehicle move only along the . The thrusts () on each rotor, can be controlled individually to generate thrust differentials, which in turn control the roll and the pitch angles. The angular momentum of the quadrotor is mitigated by have a configuration of two clockwise rotating rotors and two anti-clockwise rotating rotors. The diagonally opposite rotors spin in the same direction and yaw control is achieved by increasing or decreasing the angular velocities of the diagonally opposite rotors [6].

In this paper, we study the case of a single rotor failure (SRF) where we assume that one of the rotors has completely stopped spinning and is no longer capable of producing thrust. Such a situation causes imbalance in the net angular momentum and torque acting on the quadrotor. Although, the roll, pitch and thrust controls can be retained by accurate modeling of the system, but yaw control cannot regained. Failure of a rotor also leads to the change in the dynamics of the quadrotor and this change in itself depends on the physical model of the drone (affected by parameters like the length of the arm, moment of inertia, weight of the drone, etc). Designing an adaptive control system becomes challenging in these cases. The controllers designed in

[27]-[14] can be used for this purpose, but they are not robust to any parameter variations in the model. Therefore, in this paper, we design the controller using a model-free approach through deep reinforcement learning with relative ease. This controller recovers the quadrotor from a SRF and retains its maneuverability capabilities even though yaw control still remains unavailable.

We assume that the quadrotor has a fault detection to detect the fault. The RL controller executes parallely to the vehicle low-level controller and takes over the low level controller one the fault is detected. We assume that SRF is detected using any of the approaches given in  [24].

Iii Methodology

We use Soft-Actor-Critic algorithm [11] to determine policies for the quadrotor under SRF. Soft Actor Critic (SAC) is an off-policy algorithm that has entropy regularization which helps in exploration-exploitation trade-off. The SAC[11] performs better than other popular DRL algorithms such as DDPG [20], PPO[26], SQL[10] and TD3[8].

Iii-a Training Objective

The objective is to make the quadrotor reach the goal position after the fault has occurred. The DRL controller needs to learn to go to a defined goal position. If the goal position is stationary, then the vehicle is hover. If the goal position is in motion, then the quadrotor should track it to minimize the error between the quadrotor position and the goal position. We will now define the observation state and the reward function.

Iii-B Observation State

The observation state at any time has 22 state variables. They are

  • [leftmargin=*]

  • Position error, , where, .

  • Full Rotation Matrix, , where

    are the yaw, pitch and roll angles respectively.

  • Linear velocity, {}

  • Angular velocity, {}

  • RPM values of rotors at time , {}. Our model generates the change in PWM values rather than the absolute PWMs which motivates the need of current RPM values in the observation space.

In order to reduce complexity of the state space for faster on-board computation, we do not consider higher order derivatives of the motion.

Iii-C Reward Function

We have devised a simplistic reward function. As the quadrotor moves away from the goal, the vehicle received negative reward. The reward at time is given as


where , and are constants, is the responsible for maintaining the position of the quadrotor at the desired goal position. We use of the function because it grows rapidly near the origin, which would provide greater incentive for the model to minimize the positional error. It also saturates asymptotically as the error increases, hence penalising all the high positional errors equally. moderates rapid changes in the angular velocities of the rotors. During initial experiments, it was found that without , the model tends to change the angular velocities too rapidly. This would provide stability during the length of a training episode, but during testing for longer time periods, these rapid changes eventually lead the quadrotor to be unstable. In this paper, we use , and .

Fig. 2:

(a) Critic Network: Three fully connected layers with ReLU as activation function with one neuron in output layer (b) Actor Network: two hidden layers with ReLU activation, followed by mean and sigma estimation layer at the same level. These estimations are used for creating distributions from which actions are sampled (c) Hyper-parameters for actor and critic networks

Iii-D Critic Network

Critic network(Q-function) is used to estimate the value of a state, that is, the expected reward from this state. Critic network consists of three hidden layers and an output layer with one neuron. Fig (a)a shows the architecture of the network. Hyper parameters for the critic network is shown in Figure (c)c.

Iii-E Actor Network

Actor network is also know as the policy network and is used to estimate the action from a given observation state. The output are the change is PWM values with respect to the current PWM values. As this is stochastic policy, we generate a normal distribution and sample the action from that distribution. We are using fully-connected layers to estimate the mean and standard deviation for each rotor. These values are used to form the normal distribution corresponding to the rotor. The sample is then passed to the hyperbolic tangent function to map the value in range of

. We rescale this value to change in PWM signal. As the change in hardware control signals cannot be very high, we have limited the maximum change of PWM to 0.15 at any step. We have a linear mapping from . A depiction of the network is presented in Fig. (b)b. Hyper-parameters for the actor network is shown in Figure (c)c.

Iii-F Training

The algorithm takes two critic networks or Q-functions, , and one actor network or policy function . The parameters of each Q-function is copied to the target function. Replay buffer is initialized as empty. Now, for each step, we start by sampling an action for the current state . We pass the to simulator and observe the next state , reward and if the episode has ended, i.e if is true. We store the transition in the . As this algorithm is off-policy, to update the networks, we sample a batch of transitions from

. The loss function

of Q-network  with parameters  in SAC is given as


where the target is given by


where a’ . We update both the Q-functions by one step of gradient descent as given by


Similarly, to learn the policy network, we need to maximize the expected reward and entropy. The expectation of policy can be written as


As we want to maximize the reward, gradient ascent is determined as


where is sampled from the new policy. We can also perform typical gradient descent update just by using the negative of equation (8). As the actions stored in are old, we sample new action for each state from the current policy. These new actions are then used to update the parameters .

Temperature coefficient is one of the important hyper-parameters of SAC. is updated using the methods given in [12] The final step of each iteration is to update the target networks. The target Q-networks are obtained by polyak averaging the Q-network parameters in an iterative manner. is step size used for polyak averaging. The complete flow of SAC is shown in Algorithm 1.

for each environment step do
end for
Algorithm 1 Soft Actor-Critic Algorithm
Fig. 3: a) Rewards vs Steps plot, Policy network achieved high reward value in 5 Million steps or 5000 episodes. We let the policy train to converge further after that. b) vs Steps plot, It can be seen that starts with a high value which means high exploration, The converges as rewards converges which leads to low exploration and high exploitation of the learned policy. c) Loss vs Steps. Q-Loss and Policy loss convergence is in sync with the convergence of rewards and . Each of the plots are compiled over 15 Million steps

Iv Simulation results

Iv-a Simulator Setup

The proposed algorithm is evaluated using a python based quadrotor simulator [4]. The simulator design allows us to pass the PWM values to rotors, which is then mapped to the rotor’s RPM by the simulator, and simulate it for time. All the values that’s required for the observation state is taken from the simulator and passed to the SAC algorithm to process. An action is generated by the actor network which is then simulated to achieve the next state of the quadrotor. We used a control frequency of 100Hz, i.e. for running the simulator. All the parameters of the quadrotor used in the simulator is shown in Table I.

Mass of quadrotor () 1.2
Max RPM 900
Min RPM 0
Max thrust per rotor () 9.1
Arm length () 0.16
Motor height () 0.05
Rotor moment of inertia () 2.7e-5

Inertial tensor of the quadrotor (

Thrust coefficient ()
Torque coefficient ()
TABLE I: Parameters used in the simulator while training

Iv-B Training and Testing

(a) Hovering: Coordinates
(b) Hovering: Motor PWMs
(c) Hovering: 3D trajectory
(d) Landing: Coordinates
(e) Landing: Motor PWMs
(f) Landing: 3D trajectory
(g) XY circle: Coordinates
(h) XY circle: Motor PWMs
(i) XY circle: 3D trajectory
(j) YZ circle: Coordinates
(k) YZ circle: Motor PWMs
(l) YZ circle: 3D trajectory
(m) Saddle: Coordinates
(n) Saddle: Motor PWMs
(o) Saddle: 3D trajectory
Fig. 4: Test run results. row: hovering maneuver, row: landing maneuver, row: circular trajectory in XY plane, row: circular trajectory in YZ plane, row: saddle shaped trajectory. ’Coordinates’ plots the actual XYZ coordinates of the quadrotor as well as the desired XYZ coordinates to perform the maneuver. ’Motor PWMs’ plots the PWM input to every rotor. One of the rotos never spins and its angular velocity can be seen to be 0 in all the plots. ’3D trajectory’ shows the the actual trajectory maneuvered by the quadrotor in 3D space.

We used 10 seconds of simulation for each episode of training with a control frequency of 100Hz, resulting in every episode having 1000 steps. Every episode of training and testing start with the quadrotor hovering at the origin with all rotors working. We then disable one of the rotors and let the SAC DRL algorithm generate the PWM values for the remaining 3 rotors. The goal position is set to be the origin itself.

The hyper-parameters of the algorithm are and Replay Buffer size of 1e6 and Batch size is of 256. We trained the algorithm for about 15 Million steps with the actor and critic being updated at each step. The various plots achieved during training are shown in Fig 3. In Fig (a)a

, we can see sudden spikes in the rewards, which are a consequence of exploration of new states by the quadrotor. In the algorithm, given a state, we take a random action with probability of 0.001 which is a part of exploration strategy. The quadrotor starts at a fixed position, so, the space explored will be limited. As the policy tends to converge, probability of exploring new states also reduces. But exploring new state can lead to few spikes in the rewards.

Test simulations were run for 40 seconds with control frequency of 100Hz. We tested 3 types of maneuvers on a quadrotor with a single rotor failure: (a) hovering, (b) landing and (c) path following in 3D space.

Iv-C Hovering

In this test the vehicle is initialized at the origin and made to hover there by setting the goal as the origin itself. In the real world, the origin shall be shifted to the altitude at which the quadrotor should hover. In this and the subsequent tests, rotor 1 of the vehicle have been disabled in order to mimic a SRF. Fig.(c)c shows the trajectory of the quadrotor in 3D space, while hovering at the origin. Slight swinging behaviour can be seen in the position of the quadrotor as it tries to compensate for the failed rotor. Fig.(a)a shows the XYZ coordinates of the quadrotor vs. the time elapsed. Periodic oscillations along with slight deviation from the origin can be seen in the X and the Y coordinates caused due to the loss of yaw control which leads the quadrotor to rotate about it’s vertical axis. The PWM inputs shown in Fig.(b)b indicates that the model learnt to slow down the rotor 3 that is diagonally opposite to rotor 1, to keep the net torque on the UAV close to 0. Ideally, the diagonally opposite rotor should also be set to 0 while the other 2 rotors should rotate with the same RPM but that is only feasible when the vehicle is already in a perfectly horizontal position at the time of occurrence of the fault and it does not need to make any roll and pitch corrections (and doing this would also mean that the quadrotor would lose maneuvering capabilities). Our algorithm on the other hand, learns to use all the three remaining rotors to be able to correct for orientation errors as well to retain its maneuverability. It uses rotor 3 intermittently to provide necessary torques for roll and pitch controls. It also oscillates the speed of rotor 2 and rotor 4 to maintain orientation and roll and pitch control. Rotor 2 and 4 also increase their speed up to be able to support the weight of the quadrotor.

Iv-D Landing

In this test, the quadrotor is initiated at the origin with a failure of rotor 1. The goal is then set at the origin itself for the initial 5 seconds, in order to let the quadrotor recover from the failure and stabilize the hovering motion at its current position. Then the altitude of the goal is slowly reduce at the rate of , leading the quadrotor to start its descent. Since the simulator we use, lacks a physical ground, we assume that the ground is 1.5 meters below the initialization position (or the origin). Upon descending for 1.5 meters, the inputs to all the rotors are cut-off, resembling the standard procedure of landing a quadrotor where the thrust is cut off just before touching down on the ground. Fig.(d)d shows that the quadrotor performs a controlled descent by maintaining the descent rate until all the rotors are shut down. In this case, rotor 3 is used less frequently than for any other maneuver (Fig.(e)e). The descent trajectory followed by the quadrotor is shown in Fig.(f)f.

Iv-E Path Following

This test shows that our algorithm can provide maneuverability to a quadrotor with a single rotor failure. We performed three tests to show the robustness of our algorithm against any trajectory: (a) moving in a circular trajectory in the XY plane, (b) moving in a circular trajectory in the YZ plane and (c) moving in a saddle shaped trajectory spanning the entire 3D space. The results for the same can be seen in Fig.(g)g-(i)i, Fig.(m)m-(o)o and Fig.(j)j-(o)o. For all the 3 trajectories, the quadrotor is initialised at the origin and the for the first 5 seconds, the goal position is set to be the origin itself. This is done to give the quadrotor some time to recover from the fault and stabilize itself. The the goal position is slowly moved along the desired trajectory, making the quadrotor follow the trajectory. From the ’Coordinates’ graphs of Fig.4 we can see that the quadrotor successfully follows the desired trajectory, with only a slight deviation from the desired path.

Iv-F Wind Tolerance

To show the tolerance against wind, we added a wind model with random directions. We found that the quadrotor was able to tolerate winds with speed up to . This shows that the model is robust against light breezes even though while training, wind disturbances were not included. In the presence of winds, we need to provide more time for the quadrotor to first stabilize from the SRF than without wind This can be seen in Fig.5. We have presented the results for motion of quadrotor following the saddle shaped trajectory under these wind conditions. Here it can be seen that drone needs the initial 10 seconds to stabilize its hover maneuver The quadrotor has no observation variable corresponding to the wind while training. The results show that the quadrotor is able to maneuver the trajectory in every axes robustly.

(a) Saddle: Coordinates
(b) Saddle: Simulation
Fig. 5: Results with added wind disturbance

V Conclusions

In this paper, we modeled and evaluated a model free deep reinforcement learning based algorithm that uses Soft Actor-Critic methods to handle a single rotor failure in quadrotors. The algorithm can stabilize a quadrotor from SRF and also provides maneuvering capabilities to the quadrotor with just 3 active rotors. The SAC-based controller was able to hover, land and following 3D trajectories. The SAC-based algorithm can run at 100Hz which is as good as a micro-controller based controller. The controller needs to be tested on a quadrotor for its actual performance assessment. The future work can be extended to study multi-robot failure, and integrate a high-level planner to maneuver to safety location for landing.


  • [1] R. Arasanipalai, A. Agrawal, and D. Ghose (2020) Mid-flight propeller failure detection and control of propeller-deficient quadcopter using reinforcement learning. arXiv preprint arXiv:2002.11564. Cited by: §I, §I.
  • [2] T. Avant, U. Lee, B. Katona, and K. Morgansen (2018) Dynamics, hover configurations, and rotor failure restabilization of a morphing quadrotor. In 2018 Annual American Control Conference (ACC), pp. 4855–4862. Cited by: §I.
  • [3] A. Bin Junaid, A. Diaz De Cerio Sanchez, J. Betancor Bosch, N. Vitzilaios, and Y. Zweiri (2018) Design and implementation of a dual-axis tilting quadcopter. Robotics 7 (4). External Links: ISSN 2218-6581, Document Cited by: §I.
  • [4] () Bobzwik/quadcopter_simcon: quadcopter simulation and control. dynamics generated with pydy.. Note: https://github.com/bobzwik/Quadcopter_SimCon Cited by: §IV-A.
  • [5] C. De Crousaz, F. Farshidian, M. Neunert, and J. Buchli (2015) Unified motion control for dynamic quadrotor maneuvers demonstrated on slung load and rotor failure tasks. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 2223–2229. Cited by: §I.
  • [6] A. Elruby, M. Elkhatib, N. El-Amary, and A. Hashad (2014-01) Dynamics modeling and control of quadrotor vehicle. Vol. 2014, pp. 12. External Links: Document Cited by: §II.
  • [7] F. Fei, Z. Tu, D. Xu, and X. Deng (2020) Learn-to-recover: retrofitting uavs with reinforcement learning-assisted flight control under cyber-physical attacks. In IEEE International Conference on Robotics and Automation (ICRA), pp. 7358–7364. Cited by: §I, §I.
  • [8] S. Fujimoto, H. van Hoof, and D. Meger (2018) Addressing function approximation error in actor-critic methods. CoRR abs/1802.09477. External Links: 1802.09477 Cited by: §III.
  • [9] D. Glade (2000-07) Unmanned aerial vehicles: implications for military operations. pp. 39. Cited by: §I.
  • [10] T. Haarnoja, H. Tang, P. Abbeel, and S. Levine (2017) Reinforcement learning with deep energy-based policies. External Links: 1702.08165 Cited by: §III.
  • [11] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In

    International conference on machine learning

    pp. 1861–1870. Cited by: §III.
  • [12] T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel, et al. (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905. Cited by: §III-F.
  • [13] M. Hedayatpour, M. Mehrandezh, and F. Janabi-Sharifi (2019) Precision modeling and optimally-safe design of quadcopters for controlled crash landing in case of rotor failure. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vol. , pp. 5206–5211. External Links: Document Cited by: §I.
  • [14] Z. Hou, P. Lu, and Z. Tu (2020) Nonsingular terminal sliding mode control for a quadrotor uav with a total rotor failure. Aerospace Science and Technology 98, pp. 105716. Cited by: §I, §II.
  • [15] J. Hwangbo, I. Sa, R. Siegwart, and M. Hutter (2017) Control of a quadrotor with reinforcement learning. IEEE Robotics and Automation Letters 2 (4), pp. 2096 – 2103. Cited by: §I.
  • [16] R. Kellermann, T. Biehle, and L. Fischer (2020) Drones for parcel and passenger transportation: a literature review. Transportation Research Interdisciplinary Perspectives 4, pp. 100088. External Links: ISSN 2590-1982, Document Cited by: §I.
  • [17] R. Kumar, S. Sridhar, F. Cazaurang, K. Cohen, and M. Kumar (2018-09) Reconfigurable fault-tolerant tilt-rotor quadcopter system. In Dynamic Systems and Control Conference, Cited by: §I.
  • [18] A. Lanzon, A. Freddi, and S. Longhi (2014) Flight control of a quadrotor vehicle subsequent to a rotor failure. Journal of Guidance, Control, and Dynamics 37 (2), pp. 580–591. Cited by: §I.
  • [19] S. J. Lee, I. Jang, and H. J. Kim (2020) Fail-safe flight of a fully-actuated quadrotor in a single motor failure. IEEE Robotics and Automation Letters 5 (4), pp. 6403–6410. Cited by: §I.
  • [20] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971. Cited by: §III.
  • [21] V. Lippiello, F. Ruggiero, and D. Serra (2014) Emergency landing for a quadrotor in case of a propeller failure: a backstepping approach. In 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4782–4788. Cited by: §I.
  • [22] A. Nemati, R. Kumar, and M. Kumar (2016-10) STABILITY and control of tilting-rotor quadcopter in case of a propeller failure. ASME Dynamic Systems and Control Division DSCC 2016, pp. . Cited by: §I.
  • [23] A. Nemati, R. Kumar, and M. Kumar (2016-10) Stabilizing and Control of Tilting-Rotor Quadcopter in Case of a Propeller Failure. Dynamic Systems and Control Conference, Vol. Volume 1. Note: V001T05A005 External Links: Document, https://asmedigitalcollection.asme.org/DSCC/proceedings-pdf/DSCC2016/50695/V001T05A005/2375239/v001t05a005-dscc2016-9897.pdf Cited by: §I.
  • [24] N. P. Nguyen and S. K. Hong (2019) Fault diagnosis and fault-tolerant control scheme for quadcopter uavs with a total loss of actuator. Energies 12 (6). External Links: ISSN 1996-1073, Document Cited by: §II.
  • [25] M. Półka, S. Ptak, and Ł. Kuziora (2017) The use of uav’s for search and rescue operations. Procedia Engineering 192, pp. 748–752. Note: 12th international scientific conference of young scientists on sustainable, modern and safe transport External Links: ISSN 1877-7058, Document Cited by: §I.
  • [26] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov (2017) Proximal policy optimization algorithms. CoRR abs/1707.06347. External Links: 1707.06347 Cited by: §III.
  • [27] S. Sun, M. Baert, B. S. van Schijndel, and C. de Visser (2020) Upset recovery control for quadrotors subjected to a complete rotor failure from large initial disturbances. In IEEE International Conference on Robotics and Automation (ICRA), Vol. , pp. 4273–4279. External Links: Document Cited by: §I, §II.
  • [28] S. Sun, G. Cioffi, C. De Visser, and D. Scaramuzza (2021) Autonomous quadrotor flight despite rotor failure with onboard vision sensors: frames vs. events. IEEE Robotics and Automation Letters 6 (2), pp. 580–587. Cited by: §I.
  • [29] Z. Zaheer, A. Usmani, E. Khan, and M. A. Qadeer (2016) Aerial surveillance system using uav. In 2016 Thirteenth International Conference on Wireless and Optical Communications Networks (WOCN), Vol. , pp. 1–7. External Links: Document Cited by: §I.