I Introduction
Research activities to develop increased autonomy in unmanned aerial vehicles (UAVs) have taken a centre stage in the recent years due to their usefulness in providing costeffective solutions to dangerous, dirty and dull tasks, such as aerial grasping [1], emergency evacuation [2] and building inspection [3]. In these applications, it is crucial for UAVs to be able to fly autonomously in uncertain environments with variations in operating conditions [4]. Therefore, in such conditions, adaptability is a must rather than a choice.
Given the ability of artificial neural networks (ANNs) to generalise knowledge from training samples, an ANNbased controller can be used to control nonlinear dynamic systems [5]. On the other hand, deep neural networks (DNNs) can approximate nonlinear functions with exponentially lower number of training parameters and higher sample complexity when compared to ANNs [6]. Therefore, DNNs propose a novel approach to enhance the control strategies [7].
In the literature, ANNs have successfully been integrated with control system design to improve tracking performance in uncertain environments [8]. In [9], the unknown part of the dynamical model of a quadcopter is modelled by DNN. In [10], DNN is used for direct inverse control of the quadrotor in simulation. In [11] and [12], DNNs are used to learn the dynamics of helicopter and multicopter, respectively. In [13], DNN precascaded module is used to improve the performance of UAV in tracking arbitrary handdrawn trajectory. However, in all these works, DNNs are trained offline and, then, used online without further learning. In other words, while the dynamics are learnt in the training phase, the controller is not updated in the testing phase – DNN simply mimics the conventional controller – and the operational uncertainties are no longer learnt.
Unlike the traditional use of DNNs in literature, in this work, we propose an online DNNbased approach for improving trajectory tracking performance of UAVs. After an offline pretraining phase with past flight data, a DNNbased controller is used in realtime to control the UAV. Without any prior knowledge of the system, besides the training data, the proposed approach shows its capability to reduce the trajectory tracking error online by compensating for internal uncertainties and external disturbances. Moreover, it is shown that the DNN module is computationally suitable for realtime operations and adequate for arbitrary trajectory, making it applicable to the realworld tasks. Furthermore, the proposed approach employs the expert knowledge for the online training. The overall control architecture and its training process are depicted in Fig. 1.
This work is organised as follows. The problem is formulated in Section II. Section III introduces the proposed approach. Then, Section IV presents the experimental setup. Section V provides realtime experiments with quadcopter UAV, to validate the proposed method. Finally, Section VI summarises this work with conclusions and future work.
Ii Problem Formulation
In this work, we consider a problem of designing a learning feedback control algorithm for a dynamical system, such as UAV. Our objective is to learn a control strategy of the system to achieve a highaccuracy tracking. To describe the problem, we introduce the dynamical model of UAV first.
Iia Dynamical Model of Unmanned Aerial Vehicle
The worldfixed reference frame is and the body frame is . The absolute position of UAV is given by three Cartesian coordinates at its center of gravity in , and its attitude is given by three Euler angles. The rotation matrix from to is given by the combination of three single rotation matrices around , and . The time derivative of the position gives the linear velocity of UAV expressed in . Equivalently, the time derivative of the attitude gives the angular velocity in and is the angular velocity in .
The vector of control inputs
is chosen as:(1) 
where is the total thrust along , whereas , and
are moments around
, and , respectively. Finally, the dynamical model of UAV is given as in [14]:(2) 
where is the mass of UAV, is the gravity acceleration constant, is the inertia matrix, , and denote , and , respectively.
Remark 1
The dynamical system in (2) is nonlinear, coupled and underactuated. Therefore, an advanced controller is required.
The system in (2) can be written in a general form as:
(3) 
where ,
is the disturbance term, and is defined in (1).
IiB Problem Description
If a precise model of the system exists, then the inversion of the system can be computed. Let denote the composition of functions and ; while denote the th composition of function , i.e., and [15]. Let define the dimension of the system’s input, i.e., , and let define the vector of relative degrees of the system, s.t. . Then, the input and the output of the system are related by
(4) 
If is affine in , then (4) becomes
(5) 
where and are the decoupling matrices. Finally, the control law at time to track the desired output of the system can be written as in [16]:
(6) 
However, in a real system, the system’s parameters might be unknown and difficult to estimate, e.g., moments of inertia. What is more, these parameters might change during the operation of the system, e.g., mass. Moreover, it is not always possible to predict the external disturbance term. Therefore, an adaptive controller which can learn online is required. Our objective is to learn the control of the system by only looking at the performance of the system, i.e., in our case, the tracking error:
(7) 
and its time derivative:
(8) 
Thus, and is the only required information about the system.
Iii Methodology
By their nature, DNNs are distinguished from more common singlehiddenlayer ANNs by their depth. The neurons are organised in input, multiplehidden and output layers. In DNN, like in classical ANNs, the weights are modified using a learning process governed by the training rules.
Iiia Offline PreTraining
During the offline pretraining phase, a supervised learning approach is used, in which a feedforward DNN learns to control the system from a conventional controller – proportionalintegralderivative (PID) controller, in our case. In this control scheme, shown in Fig.
(a)a, PID controller controls the system alone. Hence, it is utilized as an ordinary feedback controller to ensure the global asymptotic stability of the system and provide labelled training samples for DNN. The training of DNN requires the availability of a large number of labelled training samples. Each labelled training sample consists of an input and expected output pair . The training of DNN involves backpropagation to minimize the loss over all training examples. After the training, DNN can approximate the mapping from the training inputs to the outputs. The pseudocode of offline pretraining is provided in Algorithm 1.IiiB Online Training
During the online training phase, DNN controls the system, and, at the same time, learns how to improve the control performances. Since DNN training requires supervised learning, another process has to provide a feedback about its performances. In our case, fuzzy logic system (FLS) is used to provide this information. By definition, FLS incorporates the expert knowledge in form of rules and uses this knowledge to provide some useful information [17]. The control structure for online training is illustrated in Fig. (b)b.
In our approach, FLS observes the behaviour of the system controlled by DNN, and, depending on the situation, corrects the action of DNN. The possible evolutions of the error are depicted in Fig. 3. If the error is positive, i.e, , and its time derivative is also positive, i.e., , then the system diverges (top red curve in Fig. 3). In this case, FLS will force DNN to decrease the control signal significantly to stabilize the system, i.e., . In another possible case, if the error is negative, i.e., , and its time derivative is zero, i.e., , then the error is steady (bottom blue line in Fig. 3). In this case, DNN falls down in a local minimum and FLS will give a small positive perturbation, i.e., . Finally, if the error is zero, i.e., , and its time derivative is also zero, i.e., , then, this is the optimal case (green line in Fig. 3) and no action has to be taken, i.e., .
These empirical rules can be formally described by a Mamdani FLS with triangular membership functions to represent the fuzzy sets. The rules for each possible case are summarized by the rulebase in Table I. The inputs to the FLC are selected to be the tracking error and its time derivative, i.e., and ; while the output is the correction signal, i.e., . The input is represented by three fuzzy sets: negative, zero and positive; while the output can belong to five fuzzy sets: big decrease, small decrease, no changes, small increase and big increase.
However, FLS requires operations among fuzzy sets which are timeconsuming. Therefore, by using a similar approach to the one described in [18], a fuzzy mapping which represents the FLS in Table I can be generated for a general multidimensional case:
(9) 
where denotes Hadamard product and is the adaptation rate. The fuzzy mapping reduces significantly the computation time which makes this approach suitable for realtime systems [19]. The pseudocode of online training is provided in Algorithm 2.
Negative  Zero  Positive  
Negative  Big decrease  Small decrease  No changes 
Zero  Big decrease  No changes  Big increase 
Positive  No changes  Small increase  Big increase 
Iv Experimental Setup
The experimental platform used in this work is Parrot Bebop 2 quadcopter UAV. This UAV is controlled via a WiFi connection and the robot operating system (ROS) is used to communicate with UAV. The motion capture system provides the UAV’s realtime position at . This position is fed into the ground station computer (CPU: , , quadcore; GPU: ; RAM: DDR4) where the algorithms are executed. Once the control signal is computed, it is sent to the UAV at rate.
For the attitude/velocity tracking, the onboard nonlinear geometric controller on is used [20]. The attitude controller is responsible for mapping the highlevel control inputs, i.e., , to the lowlevel control commands, i.e., in (1).
Iva Deep Neural Network Structure
Three feedforward DNNs with hyperbolic tangent (
) activation functions are used to learn the control mapping for each controlled axis:
, and . The inputs to DNN for the axis are the errors and their time derivatives on the axis, , and the output is the desired pitch angle, . Similarly, the inputs to DNN for the axis are the errors and their time derivatives on the axis, , and the output is the desired roll angle, . Finally, the inputs to DNN for the axis are the errors and their time derivatives on the axis, , and the output is the desired vertical velocity, .Remark 2
Both DNN controllers with and without online learning consist of three parallel subnetworks for , and axes.
In our case, after some heuristic analysis and experimental trials, the architecture of each network is chosen to consist of
input neurons (), scaling neurons, fully connected hidden layers () with neurons in each layer (), unscaling neuron and output neuron (). From the asymptotic analysis, the runtime complexity for the forwardpropagation is
. While the runtime complexity for the backpropagation is , where is the number of iterations in the quasiNewton method. Moreover, the runtime complexity for the fuzzy mapping in (9) is constant w.r.t. the architecture of the network, i.e., . The dominant operation in DNN is the forwardpropagation; therefore, the runtime complexity of DNN is polynomial. However, DNN with online learning involves both forwardpropagation and backpropagation; therefore, the runtime complexity of DNN is also polynomial but asymptotic to . Therefore, the proposed architecture was chosen as a compromise between the learning capability of the neural network and the update time through the backpropagation.The error type is an important term in the loss index, and, in our case, it is chosen as the normalized squared error. The initialization algorithm is used to bring the neural network to a stable region of the loss function, and, in our case, it is selected as the random search. The training algorithm is the core part of the training, and, in our case, the quasiNewton method is the most suitable choice for both offline and online training.
IvB Data Collection
To prepare the training samples of the flight data, the system was controlled by a conventional controller alone, while the position errors and their time derivatives were collected as training inputs, and the control signal was saved as the labelled output. By using PID controller, instances have been collected in the training dataset for each axis. This dataset is large enough for our application, however, the proposed method does not have any limitations on the dataset size. The training data include slow circular and eightshaped trajectories on ,  and planes with the reference speed of .
V Experimental Results
In order to validate the capabilities of the proposed controller in Section III, the trajectory following problem of a quadcopter UAV is considered. The proposed control architecture and its training process are depicted in Fig. 1. Three different types of trajectories have been tested: slow circular, fast circular and squareshaped. In order to show the efficiency and efficacy of the DNNbased controller, it is compared with a welltuned PID controller (used during the offline pretraining) and DNN controller without online training, DNN.
The first study case is the tracking of the slow circular trajectory with radius at which has been used during the pretraining phase. Fig. (a)a shows the results of the 3D trajectory tracking for the first case. The projections on , and axes of this portion of the trajectory are shown on Fig. (b)b. The evolution of the Euclidean error for the tested controllers is illustrated in Fig. (c)c. The second study case is the tracking of the fast circular trajectory with radius at which has not been used during the pretraining phase. Fig. (d)d shows the results of the 3D trajectory tracking of the second case. The projections on , and axes of this portion of the trajectory are shown on Fig. (e)e. The evolution of the Euclidean error for the tested controllers is illustrated in Fig. (f)f. The third study case is the tracking of the squareshaped trajectory with side length at which also has not been used during the pretraining phase. Fig. (g)g shows the results of the 3D trajectory tracking of the third case. The projections on , and axes of this portion of the trajectory are shown on Fig. (h)h. The evolution of the Euclidean error for the tested controllers is illustrated in Fig. (i)i.
Va Discussion
A sample of experimental results for three controllers (PID, DNN and DNN) on three trajectories (slow circular, fast circular and squareshaped) are illustrated on Figs. 4. It is possible to observe that DNN controller with online training is able to learn the system dynamics and decrease the tracking error over time on all tested trajectories. As visualized from Figs. (b)b, (e)e and (h)h, DNN has faster responses, since it is able to estimate the desired control signal in (6) and predict the evolution of the system dynamics. It has to be emphasised that online DNN evolves from pretrained DNN during the learning process. Moreover, as expected, DNN without online learning has poor performances on the trajectories which have not been used for its training.
For a statistical analysis of control performances, the experiments are repeated five times for each trajectorycontroller combination under the same conditions. Fig. 5
presents a boxplot to compare the tracking performances of three different controllers on three tested trajectories. It is possible to observe that on average DNN controller with online learning outperforms other controllers on the tested trajectories. In addition, the maximum absolute error is also lower for the online DNNbased controller, even for previously unseen trajectory. Finally, the variance of the error is similar for PID and DNN with online learning controllers.
As can be seen from Table II, the DNNbased controller with online learning outperforms both PID and DNN for all tested trajectory in terms of mean absolute error (MAE). Averaged results from numerous experiments depict that the overall improvement of , and in MSE is achieved as compared to a welltuned PID controller for slow circular, fast circular and squarebased trajectories, respectively. While this ratio goes up to , and when compared with pretrained DNN for the same trajectories.
Trajectory  PID  DNN  DNN 

Slow circle  
Fast circle  
Squareshaped 
Though the online DNNbased controllers can learn promptly how to control the system, the computing time is still the main drawback of this controller with online backpropagation. The computing time is polynomially proportional to the number of hidden layers and the number of neurons in each layer. Therefore, deeper is the network, more complex functions it can learn but more computational power it requires. The average experimental computation time for DNN with online backpropagation is around , while for PID and DNN without online learning this time is only and , respectively. However, is still an acceptable time for realtime applications, which allows the controller to run at almost .
Vi Conclusions
In this work, we have presented a novel approach for a highlevel control of UAV that improves online the trajectory tracking performances by using deep learning and expert knowledge. The learning is subdivided into two phases: offline pretraining and online training. During the offline learning phase, a conventional controller performs a set of trajectories and the batch of training samples is collected. Then, DNNbased controller, DNN, is pretrained on the collected data samples. However, DNN cannot adapt to the new flying conditions unseen during the pretraining; therefore, the online training is required. During the online learning phase, DNN controls the system and adapts the control input to improve the tracking performance. The expert knowledge encoded into the rulebase, thanks to the fuzzy mapping, provides the adaptation information to DNN allowing the realtime learning. Once DNNs are trained during the flight on UAV, the experimental results show that the proposed approach improves the performance by around 50%. We believe that the results of this study will open the doors to a wider use of DNNbased controllers with online training in realworld control applications as the proposed structure is suitable to deploy in realtime control systems.
In the future, we will test the DNNbased controller for the aerial transportation where the system dynamics change drastically. In addition, we will extensively analyse the parameters and architecture of DNN and their performances. Moreover, the analytical stability proof of the proposed approach will be provided.
Acknowledgment
This research was partially supported by the Singapore Ministry of Education (RG185/17).
References
 [1] G. Loianno, V. Spurny, J. Thomas, T. Baca, D. Thakur, D. Hert, R. Penicka, T. Krajnik, A. Zhou, A. Cho, M. Saska, and V. Kumar, “Localization, grasping, and transportation of magnetic objects by a team of mavs in challenging desertlike environments,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1576–1583, July 2018.
 [2] A. Sarabakha and E. Kayacan, “Y6 Tricopter Autonomous Evacuation in an Indoor Environment Using QLearning Algorithm,” in 2016 IEEE 55th Conference on Decision and Control (CDC), Dec 2016, pp. 5992–5997.
 [3] L. Teixeira and M. Chli, “Realtime local 3d reconstruction for aerial inspection using superpixel expansion,” in 2017 IEEE International Conference on Robotics and Automation (ICRA), May 2017, pp. 4560–4567.
 [4] N. J. Sanket, C. D. Singh, K. Ganguly, C. Fermüller, and Y. Aloimonos, “GapFlyt: Active Vision Based Minimalist StructureLess Gap Detection For Quadrotor Flight,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 2799–2806, Oct 2018.
 [5] E. Kayacan, E. Kayacan, H. Ramon, and W. Saeys, “Adaptive NeuroFuzzy Control of a Spherical Rolling Robot Using SlidingModeControlTheoryBased Online Learning Algorithm,” IEEE Transactions on Cybernetics, vol. 43, no. 1, pp. 170–179, Feb 2013.
 [6] Y. LeCun, Y. Bengio, and G. Hinton, “Deep Learning,” Nature, vol. 521, pp. 436–444, May 2015.
 [7] S. Zhou, M. K. Helwa, and A. P. Schoellig, “An InversionBased Learning Approach for Improving Impromptu Trajectory Tracking of Robots With NonMinimum Phase Dynamics,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1663–1670, July 2018.
 [8] B. J. Emran and H. Najjaran, “Adaptive neural network control of quadrotor system under the presence of actuator constraints,” in 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Oct 2017, pp. 2619–2624.
 [9] S. Bansal, A. K. Akametalu, F. J. Jiang, F. Laine, and C. J. Tomlin, “Learning quadrotor dynamics using neural network for flight control,” in 2016 IEEE 55th Conference on Decision and Control (CDC), Dec 2016, pp. 4653–4660.

[10]
S. A. Nivison and P. P. Khargonekar, “Development of a robust deep recurrent neural network controller for flight applications,” in
2017 American Control Conference (ACC), May 2017, pp. 5336–5342.  [11] A. Punjani and P. Abbeel, “Deep learning helicopter dynamics models,” in 2015 IEEE International Conference on Robotics and Automation (ICRA), May 2015, pp. 3223–3230.
 [12] N. Mohajerin and S. L. Waslander, “Modular Deep Recurrent Neural Network: Application to Quadrotors,” in 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Oct 2014, pp. 1374–1379.
 [13] Q. Li, J. Qian, Z. Zhu, X. Bao, M. K. Helwa, and A. P. Schoellig, “Deep neural networks for improved, impromptu trajectory tracking of quadrotors,” in 2017 IEEE International Conference on Robotics and Automation (ICRA), May 2017, pp. 5183–5189.
 [14] R. Mahony, V. Kumar, and P. Corke, “Multirotor Aerial Vehicles: Modeling, Estimation, and Control of Quadrotor,” Robotics Automation Magazine, IEEE, vol. 19, no. 3, pp. 20–32, 2012.
 [15] M. Sun and D. Wang, “Analysis of Nonlinear DiscreteTime Systems with HigherOrder Iterative Learning Control,” Dynamics and Control, vol. 11, no. 1, pp. 81–96, Jan 2001.
 [16] S. Zhou, M. K. Helwa, and A. P. Schoellig, “Design of deep neural networks as addon blocks for improving impromptu trajectory tracking,” in 2017 IEEE 56th Annual Conference on Decision and Control (CDC), Dec 2017, pp. 5201–5207.
 [17] J. M. Mendel, Type1 Fuzzy Systems. Springer International Publishing, 2017, pp. 101–159.
 [18] A. Sarabakha, C. Fu, E. Kayacan, and T. Kumbasar, “Type2 Fuzzy Logic Controllers Made Even Simpler: From Design to Deployment for UAVs,” IEEE Transactions on Industrial Electronics, vol. 65, no. 6, pp. 5069–5077, June 2018.
 [19] A. Sarabakha, C. Fu, and E. Kayacan, “DoubleInput Interval Type2 Fuzzy Logic Controllers: Analysis and Design,” in IEEE International Conference on Fuzzy Systems (FUZZIEEE), July 2017, pp. 1–6.
 [20] T. Lee, M. Leok, and N. H. McClamroch, “Nonlinear Robust Tracking Control of a Quadrotor UAV on SE(3),” Asian Journal of Control, vol. 15, no. 2, pp. 391–408, 2013.
Comments
There are no comments yet.