I Introduction
Highaccuracy tracking is an essential requirement in advanced manufacturing, selfdriving cars, medical robots, and autonomous flying vehicles, among others. To achieve highaccuracy tracking for these complex, typically highdimensional, nonlinear robotic systems, a standard approach is to use inverse dynamics control [1] or feedforward linearization techniques [2] to convert the complex nonlinear dynamics into a set of decoupled double integrators. Then, a standard, linear, outerloop controller, e.g., a proportionalderivative (PD) controller, can be used to make the decoupled linear system track the desired trajectory [1]. However, these linearization techniques depend on having accurate system models, which are difficult to obtain in practice.
To address this problem, robust control techniques have been used for many decades to design the outerloop controllers to account for the uncertainties in the model [3]. However, the selection of the uncertainty bounds in the robust controller design is challenging. On the one hand, selecting high bounds typically results in a conservative behavior, and hence, a large tracking error. On the other hand, relatively small uncertainty bounds may not represent the true upper bounds of the uncertainties, and consequently, stability of the overall system is not ensured. Alternatively, several approaches have been proposed for learning the inverse system dynamics from collected data where the system models are not available or not sufficiently accurate; see [4, 5, 6, 7]. Combining apriori model knowledge with learning data has also been studied in [4, 8]. However, these learning approaches typically neglect the learning regression errors in the analysis, and they do not provide a proof of stability of the overall, learningbased control system, which is crucial for safetycritical applications such as medical robots. The limitations of the robust control and the learningbased techniques show the urgent need for novel, robust, learningbased control approaches that ensure both stability of the overall control system and highaccuracy tracking. This sets the stage for the research carried out in this paper.
In this paper, we provide a novel, robust, learningbased control technique that achieves both closedloop stability and highaccuracy tracking. In particular, we use Gaussian processes (GPs) to approximate the error between the commanded acceleration to the linearized system and the actual acceleration of the robotic system, and then use the predicted mean and variance of the GP to calculate an upper bound on the uncertainty of the linearization. This uncertainty bound is then used in a robust, outerloop controller to ensure stability of the overall system (see Figure 1). Moreover, we show that using our proposed strategy, the tracking error converges to a ball with a radius that can be made arbitrarily small through appropriate control design, and hence, our proposed approach also achieves highaccuracy tracking. Furthermore, we verify the effectiveness of the proposed approach via simulations on a 2 DOF planar manipulator using MATLAB Simulink and experimentally on a UR10 6 DOF industrial manipulator.
This paper is organized as follows. Section II provides a summary of some recent related work. Section III describes the considered problem, and Section IV provides the proposed approach. Section V derives theoretical guarantees for the proposed approach. Section VI and VII include the simulation and experimental results, and Section VIII concludes the paper.
Notation and Basic Definitions: For a set , denotes its closure and its interior. The notation denotes a ball of radius centered at a point . A matrix is positive definite
if it is symmetric and all its eigenvalues are positive. For a vector
, denotes its Euclidean norm. A function is smooth if its partial derivatives of all orders exist and are continuous. The solutions of are uniformly ultimately bounded with ultimate bound if there exist positive constants , and for every , there exists such that implies , for all , where is the initial time instant. A kernel is a symmetric function . A reproducing kernel Hilbert space (RKHS) corresponding to a kernel includes functions of the form with , and representing points .Ii Related Work
The study of safe learning dates back to the beginning of this century [9]. In [10] and [11]
, Lyapunovbased reinforcement learning is used to allow a learning agent to safely switch between precomputed baseline controllers. Then, in
[12], risksensitive reinforcement learning is proposed, in which the expected return is heuristically weighted with the probability of reaching an error state. In several other papers, including
[13], [14] and [15], safe exploration methods are utilized to allow the learning modules to achieve a desired balance between ensuring safe operation and exploring new states for improved performance. In [9], a general framework is proposed for ensuring safety of learningbased control strategies for uncertain robotic systems. In this framework, robust reachability guarantees from control theory are combined with Bayesian analysis based on empirical observations. The result is a safetypreserving, supervisory controller of the learning module that allows the system to freely execute its learning policy almost everywhere, but imposes control actions to ensure safety at critical states. Despite its effectiveness for ensuring safety, the supervisory controller in this approach has no role in reducing tracking errors.Focusing our attention on safe, learningbased inverse dynamics control, we refer to [16, 17, 18]. In [16], a model reference adaptive control (MRAC) architecture based on Gaussian processes (GPs) is proposed, and stability of the overall control system is proved. While the approach in [16] is based on adaptive control theory, our approach is based on robust control theory. In particular, in [16], the mean of the GP is used to exactly cancel the uncertainty vector, while in our approach, we use both the mean and variance of the GP to learn an upper bound on the uncertainty vector to be used in a robust, outerloop controller. Hence, unlike [16], in our approach, the uncertainty of the learning module is not only incorporated in the stability analysis but also in the outerloop controller design. Intuitively, the less certain our GPs are, the more robust the outerloop controller should be for ensuring safety. When more data is collected and the GPs are more certain, the outerloop controller can be less conservative for improved performance. While the results of [16] are tested in simulations on a twodimensional system, we test our results experimentally on a 6 DOF manipulator.
In [17, 18], GPs are utilized to learn the errors in the output torques of the inverse dynamics model online. In [17]
, the GP learning is combined with a stateoftheart gradient descent method for learning feedback terms online. The main idea behind this approach is that the gradient descent method would correct for fast perturbations, while the GP is responsible for correcting slow perturbations. This allows for exponential smoothing of the GP hyperparameters, which increases the robustness of the GP at the cost of having slower reactiveness. Nevertheless,
[17] does not provide a proof of the robust stability of the closedloop system. In [18], the variance of the GP prediction is utilized to adapt the parameters of an outerloop PD controller online, and the uniform ultimate boundedness of the tracking error is proved under some assumptions on the structure of the PD controller (e.g., the gain matrix was assumed to be diagonal, which imposes a decentralized gain control scheme). The results of [18] are verified via simulations on a 2 DOF manipulator.Our approach differs from [18] in several aspects. First, we do not use an adaptive PD controller in the outer loop, but add a robustness term to the output of the outerloop controller. Second, while [18]
uses the GP to learn the error in the estimated torque from the nominal inverse dynamics, in our approach, we learn the error between the commanded and actual accelerations. This can be beneficial in two aspects:
(i) This makes our approach applicable to industrial manipulators that have onboard controllers for calculating the torque and allow the user to only send commanded acceleration/velocity; (ii) this makes our approach also applicable beyond inverse dynamics control of manipulators; indeed, our proposed approach can be applied to any Lagrangian system for which feedforward/feedback linearization can be used to convert the nonlinear dynamics of the system to a set of decoupled double integrators, such as a quadrotor under a feedforward linearization, see Section 5.3 of [19]. Third, while [18] shows uniform ultimate boundedness of the tracking error, it does not provide discussions on the size of the ultimate ball. In this work, we show that using our proposed approach, the size of the ball can be made arbitrarily small through the control design. Fourth, in our approach, we do not impose any assumption on the structure of the outerloop PD controller and decentralized, outerloop control is not needed for our proof. Finally, we verify our approach experimentally on a 6 DOF manipulator.Iii Problem Statement
In this paper, we consider Lagrangian systems, which represent a wide class of mechanical systems [20]. In what follows, we focus our attention on a class of Lagrangian systems represented by:
(1) 
where is the vector of generalized coordinates (displacements or angles), is the vector of generalized velocities, is the vector of generalized forces (forces or torques), is the system’s degree of freedom, , , and are matrices of proper dimensions and smooth functions, and is a positive definite matrix. Fullyactuated robotic manipulators are an example of Lagrangian systems that can be expressed by (1). Despite focusing our discussion on systems represented by (1), we emphasize that our results in this paper can be easily generalized to a wider class of nonlinear Lagrangian systems for which feedback/feedforward linearization can be utilized to convert the dynamics of the system into a set of decoupled double integrators plus an uncertainty vector.
For the nonlinear system (1) with uncertain matrices , , and , we aim to make the system positions and velocities track a desired smooth trajectory . For simplicity of notation, in our discussion, we drop the dependency on time t from , their derivatives, and . Our goal is to design a novel, learningbased control strategy that is easy to interpret and implement, and that satisfies the following desired objectives:

Robustness: The overall, closedloop control system satisfies robust stability in the sense that the tracking error has an upper bound under the system uncertainties.

HighAccuracy Tracking: For feasible desired trajectories, the tracking error converges to a ball around the origin that can be made arbitrarily small through the control design. For the ideal case, where the preassumed system parameters are correct, the tracking error should converge exponentially to the origin.

Adaptability: The proposed strategy should incorporate online learning to continuously adapt to online changes of the system parameters and disturbances.

Generalizability of the Approach: The proposed approach should be general enough to be also applicable to industrial robots that have onboard controllers for calculating the forces/torques and allow the user to send only commanded acceleration/velocity.
Iv Methodology
We present our proposed methodology, and then in the next sections, we show that it satisfies objectives (O1)(O4).
A standard approach for solving the tracking control problem for (1) is inverse dynamics control. Since is positive definite by assumption, it is invertible. Hence, it is evident that if the matrices , , and are all known, then the following inverse dynamics control law
(2) 
converts the complex nonlinear dynamic system (1) into
(3) 
where is the commanded acceleration, a new input to the linearized system (3) to be calculated by an outerloop control law, e.g., a PD controller (see Figure 1). However, the standard inverse dynamics control (2) heavily depends on accurate knowledge of the system parameters. In practice, the matrices , , and are not perfectly known, and consequently, one has to use estimated values of these matrices , , and , respectively, where , , and are composed of smooth functions. Hence, in practice, the control law (2) should be replaced with
(4) 
Now by plugging (4) into the system model (1), we get
(5) 
where , with , , and . It can be shown that even if the left hand side (LHS) of (1) has a smooth, unstructured, added uncertainty , e.g., unmodeled friction, (5) is still valid with modified . Because of , the dynamics (5) resulting from the inverse dynamics control are still nonlinear and coupled. To control the uncertain system (5), on the one hand, robust control methods are typically very conservative, while on the other hand, learning methods do not provide stability guarantees.
Hence, in this paper, we combine ideas from robust control theory with ideas from machine learning, particularly Gaussian processes (GPs) for regression, to provide a robust, learningbased control strategy that satisfies objectives (O1)(O4). The main idea behind our proposed approach is to use GPs to learn the uncertainty vector
in (5) online. Following [18], we use a set of independent GPs, one for learning each element of , to reduce the complexity of the regression. It is evident that conditioned on knowing , and , one can learn each element of independently from the rest of the elements of . A main advantage of GP regression is that it does not only provide an estimated value of the mean , but it also provides an expected variance , which represents the accuracy of the regression model based on the distance to the training data. The punchline here is that one can use both the mean and variance of the GP to calculate an upper bound on that is guaranteed to be correct with high probability, as we will show later in this section. One can then use this upper bound to design a robust, outerloop controller that ensures robust stability of the overall system. Hence, our proposed strategy consists of three parts:(i) InnerLoop Controller: We use the inverse dynamics control law (4), where , , and are estimated values of the system matrices from an apriori model.
(ii) GPs for Learning the Uncertainty: We use a set of GPs to learn the uncertainty vector in (5). We start by reviewing GP regression [21, 15]. A GP is a nonparametric regression model that is used to approximate a nonlinear function , where is the input vector. The ability of the GP to approximate the function is based on the assumptions that function values associated with different values of
are random variables, and that any finite number of these variables have a joint Gaussian distribution. The GP predicts the value of the function,
, at an arbitrary input from a set of observations , where , , are assumed to be noisy measurements of the function’s true values. That is, , where is a zero mean Gaussian noise with variance . Assuming, without loss of generality (w.l.o.g.), a zero prior mean of the GP and conditioned on the previous observations, the mean and variance of the GP prediction are given by:(6) 
(7) 
respectively, where is the vector of observed, noisy function values. The matrix is the covariance matrix with entries , , where is the covariance function defining the covariance between two function values (also called the kernel). The vector contains the covariances between the new input and the observed data points, and
is the identity matrix. The tuning of the GP is typically done through the selection of the kernel function and the tuning of its hyperparameters. For information about different standard kernel functions, please refer to
[21].We next discuss our implementation of the GPs. The GPs run in discrete time with sampling interval . At a sampling instant , the inputs to each GP regression model are the same , and the output is an estimated value of an element of the vector at . For the training data for each GP, observations of are used as the labeled input together with observations of an element of the vector as the labeled output, where is Gaussian noise with zero mean and variance ; see (5). For selecting the observations, we use the oldest point (OP) scheme for simplicity; this scheme depends on removing the oldest observation to accommodate for a new one [16]. We use the squared exponential kernel
(8) 
which is parameterized by the hyperparameters: , the prior variance, and the positive length scales which are the diagonal elements of the diagonal matrix . Hence, the expected mean and variance of each GP can be obtained through equations (6)(8). Guidelines for tuning the GP hyperparameters can be found in [15].
As stated before, a main advantage of GP regression is that the GP provides a variance, which represents the accuracy of the regression model based on the distance between the new input and the training data. One can then use the predicted mean and variance of the GP to provide a confidence interval around the mean that is guaranteed to be correct with high probability. There are several comprehensive studies in the machine learning literature on calculating these confidence intervals. For completeness, we review one of these results, particularly Theorem 6 of
[21]. Let , and denote the th element of the unknown vector .Assumption IV.1
The function , , has a bounded RKHS norm with respect to the covariance function of the GP, and the noise added to the output observations, , is uniformly bounded by .
The RKHS norm is a measure of the function smoothness, and its boundedness implies that the function is wellbehaved in the sense that it is regular with respect to the kernel [21]. Intuitively, Assumption IV.1 does not hold if the uncertainty is discontinuous, e.g., discontinuous friction.
Lemma IV.1 (Theorem 6 of [21])
Suppose that Assumption IV.1 holds. Let . Then, , where stands for the probability, is compact, are the GP mean and variance evaluated at conditioned on past observations, and . The variable is the maximum information gain and is given by , where is the matrix determinant, is the identity matrix, is the covariance matrix given by , .
Finding the information gain maximizer can be approximated by an efficient greedy algorithm [21]. Indeed, has a sublinear dependence on for many commonly used kernels, and can be numerically approximated by a constant [18].
The punchline here is that we know from Lemma IV.1 that one can define for each GP a confidence interval around the mean that is guaranteed to be correct for all points , a compact set, with probability higher than , where is typically picked very small. Let and represent the expected mean and variance of the th GP at the sampling instant , respectively, and let denote the parameter in Lemma IV.1 of the th GP, where . We select the upper bound on the absolute value of at to be
(9) 
Then, a good estimate of the upper bound on at is
(10) 
(iii) Robust, OuterLoop Controller: We use the estimated upper bound to design a robust, outerloop controller. In particular, for a smooth, bounded desired trajectory , we use the outerloop control law
(11) 
where and are the proportional and derivative matrices of the PD control law, respectively, and is an added vector to the PD control law that will be designed to achieve robustness. Let denote the tracking error vector. From (11) and (5), it can be shown that the tracking error dynamics are
(12) 
where
(13) 
and is the identity matrix. From (12) and (13), it is clear that the controller matrices and should be designed to make a Hurwitz matrix.
We now discuss how to design the robustness vector . To that end, let be the unique positive definite matrix satisfying , where is a positive definite matrix. We define as follows
(14) 
where is the last received upper bound on from the GPs, i.e., we use
(15) 
and is a small positive number. It should be noted that is a design parameter that can be selected to ensure highaccuracy tracking, as we will discuss in the next section.
V Theoretical Guarantees
After discussing the proposed strategy, we now justify that it satisfies both robust stability and highaccuracy tracking. To that end, we require the following reasonable assumption:
Assumption V.1
The GPs run at a sufficiently fast sampling rate such that the calculated upper bound on is accurate between two consecutive sampling instants.
We impose another assumption to ensure that the added robustness vector will not cause the uncertainty vector norm to blow up. It is easy to show that the uncertainty function is smooth, and so attains a maximum value on any compact set in its input space . However, since from (11) and (14), is a function of , an upper bound on , one still needs to ensure the boundedness of for bounded or bounded tracking error . Hence, we present the following assumption.
Assumption V.2
For a given, smooth, bounded desired trajectory , there exists such that for each , where is a compact set containing , and is the initial tracking error.
We now justify that Assumption V.2 is reasonable. In particular, we show that the assumption is satisfied for small uncertainties in the inertia matrix [1]. In this discussion, we suppose that in (14) satisfies , where is a positive scalar, and study whether imposing into (11) can make blow up. Recall that . From (11), we have . It is evident that , where . From (14), it is easy to verify , and so . Hence, . Now if the uncertainty in the matrix , , is sufficiently small such that is satisfied, then .
Since are all bounded by assumption, if , a compact set, then are also bounded. It is easy to show that there exists a fixed upper bound on that is valid for each , and Assumption V.2 is satisfied.
Remark V.1
We have shown that if the uncertainty in the matrix , , is sufficiently small such that is satisfied, then Assumption V.2 holds. This argument is true even if we have large uncertainties in the other system matrices, and . As indicated in Chapter 8 of [1], if the bounds on are known (), then one can always select such that is satisfied. In particular, by selecting , where is the identity matrix, it can be shown that . Consequently, it is not difficult to satisfy the condition in practice, and Assumption V.2 is not restrictive.
From Assumption V.2, we know that if , and consequently, it is reasonable to saturate any estimate of beyond . Hence, we suppose that the estimation of is slightly modified to be
(16) 
where is the upper bound on the uncertainty norm, , calculated from the GPs in (9), (10), and (15). It is straightforward to show that with the choice of in (16) and for bounded smooth trajectories, the condition for all implies that in (11) is always bounded, and so always lies in a compact set. To be able to provide theoretical guarantees, we also assume w.l.o.g. that the small positive number in (14) is selected sufficiently small such that
(17) 
where is such that , and is the smallest eigenvalue of the positive definite matrix .
Theorem V.1
Consider the Lagrangian system (1) and a smooth, bounded desired trajectory . Suppose that Assumptions IV.1, V.1, and V.2 hold. Then, the proposed, robust, learningbased control strategy in (4), (11), and (14), with the uncertainty upper bound calculated by (16) and the design parameter satisfying (17), ensures with high probability of at least that the tracking error is uniformly ultimately bounded with an ultimate bound that can be made arbitrarily small through the selection of the design parameter .
From Assumption V.2, we know that when , where is a compact set containing . In the first part of the proof, we assume that the upper bound calculated by (9), (10) and (15) is a correct upper bound on when . Thus, in the first part of the proof, we know that calculated by (16) is a correct upper bound on when , and we use Lyapunov stability analysis to prove that is uniformly ultimately bounded. Then, in the second part of the proof, we use Lemma IV.1 to evaluate the probability of satisfying the assumption that is a correct upper bound on when , and hence, the probability that the provided guarantees hold.
The first part of the proof closely follows the proof of the effectiveness of the robust controller in Theorem 3 of Chapter 8 of [1], and we include the main steps of the proof here for convenience. Consider a candidate Lyapunov function . From (12), it can be shown that , where . Then, from (14), we need to study two cases.
For the case where , we have
from the CauchySchwartz inequality. Since by definition and from Assumption V.2, we know that . Also, by our assumption in this part of the proof, . Then, from (16), , and . Thus, for this case, , which ensures exponential decrease of the Lyapunov function.
Next, consider the case where . If , then . Then, for and , it is easy to show
From (14), we have
It can be shown that the term has a maximum value of when . Thus, . From (16), , and consequently . If the condition is satisfied, then . Since is positive definite by definition, then , where is the smallest eigenvalue of . Hence, if , then . Thus, the Lyapunov function is strictly decreasing if . Let be the ball around the origin of radius , be a sufficiently small sublevel set of the Lyapunov function V satisfying , and be the smallest ball around the origin satisfying . Since the Lyapunov function is strictly decreasing outside , the tracking error eventually reaches and remains in , and so the tracking error is uniformly ultimately bounded, and its ultimate bound is the radius of . Note that from (17), , and is a correct upper bound on . One can see that and hence the radius of depend on the choice of the design parameter . Indeed, can be selected sufficiently small to make and arbitrarily small.
In the second part of the proof, we calculate the probability of our assumption in the first part that is a correct upper bound on when . Recall that implies that is in a compact set, as discussed immediately after (16). From Assumption V.1, our problem reduces to calculating the probability that is a correct upper bound on for all the sampling instants. Using the confidence region proposed in Lemma IV.1 for calculating the upper bound on the absolute value of each element of , and under Assumption IV.1, the probability that this upper bound is correct for all samples is higher than from Lemma IV.1. Since the GPs are independent and the added noise to the output observations is uncorrelated, then the probability that the upper bounds on the absolute values of all the elments of , and hence the upper bound on , are correct is higher than .
Remark V.2
Although in practice it is difficult to estimate the upper bound on used in (16), one can be conservative in this choice. Unlike robust control techniques that keep this conservative bound unchanged, (16) would relax the upper bound when the GPs learn a lower upper bound from collected data. Having a lessconservative upper bound results in a lower tracking error. It can be shown that if for all , then the tracking error will converge to an ultimate ball smaller than .
Remark V.3
In theory, can be selected sufficiently small to ensure arbitrarily accurate tracking as shown in the proof of Theorem V.1. Achieving that for cases with large uncertainties may be limited by the actuation limits of the robots. Incorporating the actuation limits in the theoretical analysis is an interesting point for future research.
Vi Simulation Results
The proposed approach is first verified via simulations on a 2 DOF planar manipulator using MATLAB Simulink
We use the robot dynamics (1) for the system, where , , and are as defined in Chapter 7 of [1]. For the system parameters, a value of is used for each link mass and for each link inertia. The length of the first link is and that of the second link is . The joints are assumed to have no mass and are not affected by friction. Then, it is assumed that these parameters are not perfectly known. Thus, in the inverse dynamics controller (4), we use parameters with different levels of uncertainties. The desired trajectories are sinusoidal trajectories with different amplitudes and frequencies. All the simulation runs are initialized at zero initial conditions.
We use GPs to learn the uncertainty vector in (5). Each GP uses the squared exponential kernel parameterized with , , and , for all and . The GPs run at and use the past observations for prediction. To generate confidence intervals, we use , which is simple to implement and found to be effective in practice [15]. For the robust controller, we use . We set the upper bound in (16) to be a very high positive number to evaluate the effectiveness of the upper bound estimated by the GPs.
A sequence of 12 trajectories was run for 3 different cases of model uncertainty. Each of the three cases makes the matrix differ from the matrix by using values for the estimated link masses that differ from the true link mass values. In particular, in the three uncertainty cases, the estimated mass differs from the actual mass by , , and for each link, respectively.
The tracking performance was compared between four controllers: a nominal controller with no robust control, a robust controller with a fixed upper bound on the uncertainty norm , a learningbased inverse dynamics controller in which GPs are used to learn the error of the nominal inverse model at the torque level and a nonrobust outerloop controller is used, and our proposed robust learning controller. The rootmeansquare (RMS) error of the joint angles was averaged over the 12 trajectories, and is presented for each controller and uncertainty case in Table I.
Uncertainty  Nominal  Fixed Robust  Learning  Robust Learning 

0.1554  0.0476  0.0190  0.0082  
0.2793  0.0498  0.0319  0.0103  
0.3768  0.0519  0.0539  0.0141 
It is clear that while the robust controller with a high, fixed value for the upper bound on the uncertainty improves the tracking performance compared to the nominal controller, it is conservative, and thus, still causes considerable tracking errors. The tracking errors are significantly reduced by our proposed robust learning controller, which is able to learn a less conservative upper bound on the uncertainty. On average, our proposed controller reduces the tracking errors by compared to the nominal controller, by compared to the fixed, robust controller, and by compared to the nonrobust learning controller that learns .
Vii Experimental Results
The proposed approach is further tested on a UR10 6 DOF industrial manipulator (see Figure 2) using the Robot Operating System (ROS).
Viia Experimental Setup
The interface to the UR10 does not permit direct torque control. Instead, only position and velocity control of the joints are available. Thus, for our proposed approach, we need to implement only the GP regression models and the robust, outerloop controller. The commanded acceleration calculated by the outerloop controller in (11) is integrated to obtain a velocity command that can be sent to the UR10. To test our approach for various uncertainties, we introduce artificial model uncertainty by adding a function to our calculated acceleration command .
The PD gains of the outerloop controller are tuned to achieve good tracking performance on a baseline desired trajectory in a nominal scenario with no added uncertainty. A desired trajectory of for each joint is used for this purpose, with gains selected to produce a total joint angle RMS error less than . This resulted in and , where is the identity matrix.
We use 6 GPs to learn the uncertainty vector , each of which uses the squared exponential kernel. The prior variance and length scale hyperparameters are optimized by maximizing the marginal likelihood function, while each noise variance is set to . Hyperparameter optimization is performed offline using approximately 1000 data points collected while tracking sinusoidal trajectories under uncertainty . Implementation and tuning of the GPs are done with the Python library GPy. Each GP runs at , and uses the past observations for prediction. For the confidence intervals, we use for simplicity [15]. For the robust controller, we use .
ViiB Results
The performance of the proposed robust learning controller is initially compared to that of the nominal, outerloop PD controller using a single trajectory and various cases of model uncertainty. Ten different cases of uncertainty of the form are tested over the desired trajectory for each joint, with the results displayed in Figure 3. The average RMS error of the nominal controller is and that of the proposed controller is , yielding an average improvement of .
Further experiments were performed to verify the generalizability of the proposed approach for different desired trajectories that cover different regions of the state space. A single case of uncertainty, where is the entrywise product, is selected and the performance of the proposed and nominal controllers under this uncertainty is compared on five additional trajectories. The results are presented in Table II, with an average overall improvement of compared to the nominal controller. The six trajectories are shown in a demo video at http://tiny.cc/mantraj
Trajectory  Nominal  Robust Learning  Improvement 
1  0.070  0.037  47.1% 
2  0.058  0.037  36.2% 
3  0.092  0.058  37.0% 
4  0.085  0.043  49.4% 
5  0.050  0.029  42.0% 
6  0.029  0.021  27.6% 
Average  0.064  0.038  39.9% 
To verify the reliability of the proposed method, experiments for six combinations of uncertainty and trajectory are repeated five times each with both the nominal and proposed robust learning controllers. The results are summarized in Figure 4. The figure shows that the performance under our proposed controller is highly repeatable and that it outperforms the nominal controller in all 30 cases.
Viii Conclusions
We have provided a novel, learningbased control strategy based on Gaussian processes (GPs) that ensures stability of the closedloop system and highaccuracy tracking of smooth trajectories for an important class of Lagrangian systems. The main idea is to use GPs to estimate an upper bound on the uncertainty of the linearized model, and then use the uncertainty bound in a robust, outerloop controller. Unlike most of the existing, learningbased inverse dynamics control techniques, we have provided a proof of the closedloop stability of the system that takes into consideration the regression errors of the learning module. Moreover, we have proved that the tracking error converges to a ball with a radius that can be made arbitrarily small. Furthermore, we have verified the effectiveness of our approach via simulations on a planar manipulator and experimentally on a 6 DOF industrial manipulator.
References
 [1] M. W. Spong, S. Hutchinson, M. Vidyasagar. Robot Modeling and Control. John Wiley & Sons, Inc., 2006.
 [2] V. Hagenmeyer, E. Delaleau. Exact feedforward linearization based on differential flatness. International Journal of Control, 76(6), pp. 537556, 2003.
 [3] C. Abdallah, D. M. Dawson, P. Dorato, M. Jamshidi. Survey of robust control for rigid robots. IEEE Control Systems Magazine, 11(2), pp. 2430, 1991.
 [4] J. Sun de la Cruz. Learning Inverse Dynamics for Robot Manipulator Control. M.A.Sc. Thesis, University of Waterloo, 2011.
 [5] R. Calandra, S. Ivaldi, M. P. Deisenroth, E. Rueckert, J. Peters. Learning inverse dynamics models with contacts. IEEE International Conf. on Robotics and Automation, Seattle, 2015, pp. 31863191.

[6]
A. S. Polydoros, L. Nalpantidis, V. Kruger. Realtime deep learning of robotic manipulator inverse dynamics.
IEEE/RSJ International Conf. on Intelligent Robots and Systems, Hamburg, 2015, pp. 34423448. 
[7]
S. Zhou, M. K. Helwa, A. P. Schoellig. Design of deep neural networks as addon blocks for improving impromptu trajectory tracking.
IEEE Conf. on Decision and Control, Melbourne, 2017, pp. 52015207.  [8] D. NguyenTuong, J. Peters. Using model knowledge for learning inverse dynamics. IEEE International Conf. on Robotics and Automation, Anchorage, 2010, pp. 26772682.
 [9] J. F. Fisac, A. K. Akametalu, M. N. Zeilinger, S. Kaynama, J. Gillula, C. J. Tomlin. A general safety framework for learningbased control in uncertain robotic systems. arXiv preprint arXiv:1705.01292, 2017.
 [10] T. J. Perkins, A. G. Barto. Lyapunov design for safe reinforcement learning. The Journal of Machine Learning Research, 3, pp. 803832, 2003.
 [11] J. W. Roberts, I. R. Manchester, R. Tedrake. Feedback controller parameterizations for reinforcement learning. IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Paris, 2011, pp. 310317.

[12]
P. Geibel, F. Wysotzki. Risksensitive reinforcement learning applied to control under constraints.
Journal of Artificial Intelligence Research
, 24, pp. 81108, 2005. 
[13]
M. Turchetta, F. Berkenkamp, A. Krause. Safe exploration in finite markov decision processes with Gaussian processes.
Advances in Neural Information Processing Systems, Barcelona, 2016, pp. 43124320.  [14] F. Berkenkamp, R. Moriconi, A. P. Schoellig, A. Krause. Safe learning of regions of attraction for uncertain, nonlinear systems with Gaussian processes. IEEE Conf. on Decision and Control, Las Vegas, 2016, pp. 46614666.
 [15] F. Berkenkamp, A. P. Schoellig, A. Krause. Safe controller optimization for quadrotors with Gaussian processes. IEEE International Conf. on Robotics and Automation, Stockholm, 2016, pp. 491496.
 [16] G. Chowdhary, H. A. Kingravi, J. P. How, P. A. Vela. Bayesian nonparametric adaptive control using gaussian processes. IEEE trans. on neural networks and learning systems, 26(3), pp. 537550, 2015.
 [17] F. Meier, D. Kappler, N. Ratliff, S. Schaal. Towards robust online inverse dynamics learning. IEEE/RSJ International Conf. on Intelligent Robots and Systems, Daejeon, 2016, pp. 40344039.
 [18] T. Beckers, J. Umlauft, D. Kulić, S. Hirche. Stable Gaussian process based tracking control of Lagrangian systems. IEEE Conf. on Decision and Control, Melbourne, 2017, pp. 55805585.
 [19] M. K. Helwa, A. P. Schoellig. On the construction of safe controllable regions for affine systems with applications to robotics. arXiv preprint arXiv:1610.01243, 2016.
 [20] R. M. Murray. Nonlinear control of mechanical systems: a Lagrangian perspective. Annual Reviews in Control, 21, pp. 3142, 1997.
 [21] N. Srinivas, A. Krause, S. M. Kakade, M. W. Seeger. Informationtheoretic regret bounds for Gaussian process optimization in the bandit setting. IEEE Trans. on Information Theory, 58(5), pp. 32503265, 2012.