## I Introduction

Dynamics models are fundamental in robotics. For instance, inverse dynamics models, which relate joint trajectories to joint torques, are used in high-precision trajectory tracking applications [FL_control, FL_control_elastic, siciliano], and also in problems where robots interact with the environment, such as force control [force_control, siciliano], impedance control [impedance_control, impedance_control_non_diag_stiff], and collision detection [col_det, col_det_GP].

In the aforementioned applications, the accuracy of the inverse dynamics model is crucial. However, deriving an accurate model of the robot inverse dynamics is a challenging task, in particular when system specifications are limited or uncertain, or when complex behaviors such as friction and elasticity are relevant. Indeed, in these contexts, the identification of parametric models derived from first principles of physics

[parametric_ID, handbook] are often not effective, due to model bias and unmodeled behaviors. For these reasons, in the last decades, several black-box and grey-box strategies for inverse dynamics identification have been proposed. A relevant class of solutions is based on Gaussian Process Regression (GPR) [rasmussen_GP_for_ML], see for instance [GIP, SP_peters, romeres2016online, Sparse_GP_RT_control, Rezaei_cascaded_GP]. Here, instead of identifying the physical parameters of the model, the inverse dynamics are treated as an unknown function, which relates position, velocity, and acceleration of the joints to torques. This unknown function is modeled a priori as a Gaussian Process (GP), with covariance parametrized through a kernel function [rasmussen_GP_for_ML, Scholkopf_LWK]. The posterior distribution of the joint torques, given the observed data, can be computed in closed form, and can be used to predict joint torques.Compared to physical models, which are strictly related to the dynamics equations, GP models are less interpretable, and, consequently, their use in control applications might be less straightforward. However, several works show that such models can be used in applications, see, for instance, [Sparse_GP_RT_control, Peters_computed_torque_control, stulp_computed_torque_control, hirche_stable_GP_control_4_lagrangian, Rezaei_cascaded_GP] concerning trajectory tracking, and [col_det_GP] concerning proprioceptive collision detection. Typically, in trajectory tracking, GP models are exploited by implementing a feedforward control scheme [craig_intro_robotics], see the diagram in Figure 1. Instead of using parametric models, in the GP implementation, the feedforward term is the output of the GP model evaluated for the position, velocity, and acceleration of the reference trajectory. The control loop is closed with a decentralized PD controller to cancel errors. When the GP model is accurate and the PD gains are set properly, the feedback loop is effective in canceling the residual tracking error. However, there are some issues that could limit the performance of a feedforward controller, as follows. (i) In the feedback loop, coupling between different degrees of freedom (DoF) are not considered. (ii) The robot inertia is configuration dependent, and, in some cases, it might be difficult to obtain a set of PD gains that can assure the same performance for all configurations. (iii) Convergence is not guaranteed, even if the inverse dynamics are known exactly; see [craig_intro_robotics], chapter 10.5 for details about (ii) and (iii).

An alternative control scheme is feedback linearization control [FL_control, FL_control_elastic, siciliano], described in the diagram in Figure 1. In contrast to feedforward control, where the model is used to compute a proper control input in advance, in feedback linearization, the inverse dynamics model is used to obtain a tracking error with linear dynamics. The control input is the sum of two terms. The first aims at compensating all the torques independent of accelerations. The second is given by a feedforward term proportional to the reference acceleration, and a PD feedback term. To account for couplings and variations of the inertia matrix, the second term is computed using an estimate of the inertia matrix. In contrast to feedforward control, feedback linearization assures asymptotic convergence, if the dynamics are known exactly. Moreover, the error dynamics are described by a second-order linear differential equation, fully characterized by the PD gains, providing a principled way to set the PD gains [siciliano].

In this work, we analyze two implementations of feedback linearization control based on GP models. The first implementation is simpler, and estimates directly the feedback linearization control input using the GP model. In contrast, the second implementation is composed of two steps. First, the inertia matrix and the compensation of the torques independent of accelerations are estimated separately by means of the GP model. Then, the feedback linearization control input is computed by applying its standard form. To the best of our knowledge, the first implementation has been attempted before only in [Peters_computed_torque_control]. However, that paper was focused on issues related to modeling, and not to control. In contrast, the second implementation has never been proposed before, and it requires the estimation of several different components of the dynamics equations from the GP model, which is introduced in this paper. We tested the two implementations with a simulated 7-DoF manipulator, varying also the choice of the GP prior, i.e., its kernel. The obtained results show that the second implementation is more robust w.r.t. the kernel choice and initial errors.

The remainder of the paper is organized as follows. In Section II, we provide background formulations of robot dynamics and control, as well as GPR. Section III describes the strategy proposed to estimate several different dynamics components from black-box GP models, and in Section IV we describe the two feedback linearization algorithms implemented. Experiments are described in Section V, and conclusions are drawn in Section VI.

## Ii Background

In the first part of this section, we provide background formulation of robot dynamics, as well as introduce the trajectory tracking problem and describe the feedforward and the feedback linearization controllers. In the second part, we describe GPR for inverse dynamics identification, detailing the black-box priors adopted in this work.

### Ii-a Robot dynamics and control

Consider a mechanical systems with degrees of freedom, and denote with its generalized coordinates at time ; and are the velocity and the acceleration of the joints, respectively. The generalized torques, i.e., the control input of the system, are denoted with . For compactness, in the following, we will denote explicitly the dependencies on only when strictly necessary. Under rigid body assumptions, the dynamics equations of a mechanical system are described by the following matrix equation

(1) |

where is the inertia matrix, while , , and account, respectively, for the contributions of fictitious forces, gravity, and friction, see [siciliano] for a more detailed description. For compactness, we introduce also . In the following, we will denote with and the estimates of and .

The trajectory tracking problem consists in designing a controller able to follow a reference trajectory , starting from initial conditions .

In feedback linearization control, the control input is

(2a) | |||

(2b) |

Assuming that the model is known exactly, i.e., and , combining (1) and (2) and recalling that is invertible, it can be proven that the tracking error goes asymptotically to zero if and [siciliano]. Indeed, under these assumptions, the dynamics of the tracking error is described by the following second order linear differential equation

(3) |

which is stable if and . This fact represent a considerable advantage w.r.t. feedforward control, since it provides a principled way to chose and . Indeed, selecting and , with

being the identity matrix, we obtain

decoupled second-order input/output relations with natural frequency and damping ratio .### Ii-B GPR for inverse dynamics identification

GPR provides a solid probabilistic framework to identify the inverse dynamics from data. Typically, in GPR, each joint torque is modeled by a distinct and independent GP. Consider an input/output dataset , where

is a vector collecting

measurements of , the*i*-th joint torque, while ; is the vector collecting the position, velocity and acceleration of the joints at time , hereafter denoted GP input. The probabilistic model of is

where

is i.i.d. Gaussian noise with standard deviation

, while is an unknown function modeled a priori as a GP, namely, . The covariance matrix is defined through a kernel function . Specifically, the covariance between and , i.e., the element of at row*j*and column

*l*, is equal to

. Exploiting the properties of Gaussian distributions, it can be proven that the posterior distribution of

given in a general input location is Gaussian [rasmussen_GP_for_ML]. Then, the maximum a posteriori estimator corresponds to the mean, which is given by the following expression

(4) |

where

Different solutions proposed in the literature can be grouped roughly based on the definition of the GP prior. In this paper, we will consider two black-box approaches, where the prior is defined without exploiting prior information about the physical model.

Squared Exponential kernel The Squared Exponential (SE) kernel [rasmussen_GP_for_ML, Scholkopf_LWK], defines the covariance between samples based on the distance between GP inputs, and it is defined by the following expression

(5) |

and

are the kernel hyperparameters. The first is a scaling factor, and the second is a positive definite matrix, which defines the norm used to compute the distance between inputs. A common choice consists in considering

to be diagonal, with the positive diagonal elements named lengthscales.Geometrically Inspired Polynomial kernel The Geometrically Inspired Polynomial (GIP) kernel has been recently introduced in [GIP]. This kernel is based on the property that the dynamics equations in (1) are a polynomial function in a proper transformation of the GP input, fully characterized only by the type of each joint. Specifically, is mapped in , the vector composed by the concatenation of the components associated with a prismatic joint and the sines and cosines of the revolute coordinates. As proved in [GIP], the inverse dynamics in (1) is a polynomial function in , and , where the elements of have maximum relative degree of one, whereas the ones of and have maximum relative degree two. To exploit this property, the GIP kernel is defined through the sum and the product of different polynomial kernels [MPK], hereafter denoted as , where is the degree of the polynomial kernel. In particular, we have

(6) | |||

where, in its turn, is given by the product of polynomial kernels with degree two, see [GIP] for all the details. In this way, the GIP kernel allows defining a regression problem in a finite-dimensional function space where (1) is contained, leading to better data efficiency in comparison with the SE kernel.

## Iii Estimate of the Dynamics Components From Gaussian Process Models of the Inverse Dynamics

In this section, we describe how it is possible to obtain estimates of the different contributions in the left-hand side of (1) when adopting GPR to identify the inverse dynamics; in particular, we discuss the computation of gravitational contributions, inertial contributions, and . We assume that a distinct GP is used for each of the degree of freedom, and we denote by , , the estimator of the *i*-th joint torque obtained applying (4). For convenience, from here on, we will point out explicitly the different components of the GP input, namely, the input of the will be instead of , which comprises the concatenation of . It is worth mentioning that the proposed approach is inspired by the strategy adopted in Newton-Euler algorithms, see [deluca_NE_dyn_est].

### Iii-a Gravitational contribution

As shown in (1), the torques due to the gravitational contributions account for all the terms that depend only on . Consequently, to obtain , i.e., the estimate of the *i*-th gravitational contribution in the configuration , we evaluate by setting , . Then, the estimate of is

(7) |

### Iii-B Inertial contributions

The inertial contributions, i.e., , accounts for all the contributions that depend simultaneously on and . Consequently, to estimates these contributions, we evaluate the GP models in , and subtract the gravitational contribution defined and computed previously. In particular, to obtain , i.e., the estimate of the element in position , we set all the accelerations to zero, except for the *j*-th component. Denoting with the vector with all elements equal to zero except for the -th element, which, instead, is equal to one, we have

(8) |

### Iii-C Estimation of

The vector collects all the contributions that do not depend on . Then, , i.e., the estimate of the *i*-th component of , is computed by evaluating the *i*-th GP model setting . Then, we have

(9) |

## Iv Feedback Linearization Control Based on Gaussian Process Model

In this section, we describe the two GP-based feedback linearization controllers implemented. The first implementation aims at estimating directly an approximation of (2) using the GP models, whereas the second computes the approximation of (2) by estimating and using the expressions derived in Section III.

### Iv-a Gp-Fl

In this approach, hereafter denoted as GP Feedback Linearization control (GP-FL), the control input is selected to be directly the estimate of (2b). The estimate of (2b) at time is obtained by evaluating the GP models with GP-input given by the concatenation of , and . Then, referring to the notation previously introduced, we have

(10) |

### Iv-B Gp-Fl-Dce

The second approach, named GP Feedback Linearization control with Dynamics Components Estimation (GP-FL-DCE), computes the control input based on (2) and the estimation of and obtained with the GP input. First, the elements of the inertia matrix and the estimates of are computed by applying, respectively, (8) and (9). Then, the input is

(11) |

where, as before, .

## V Experiments

Experiments have been carried out in PyBullet [pybullet], simulating a KUKA LBR iiwa, which is a 7-DoF collaborative manipulator^{1}^{1}1A video of the experiments is available at https://youtu.be/ehy8iDRGIDo. The system was controlled at 1000 HZ. Position, velocity, and torques of the joints are directly provided by the simulator. The accelerations needed for model identification were computed offline by means of acausal numerical differentiation of the velocities. Specifically, we applied the central difference approximation, namely, the acceleration of the joints at time is approximated with , where is the sampling time.

The remainder of this section is organized as follows. First, we compare the accuracy of GP models obtained with the SE and GIP kernel. Second, the control performance of GP-FL and GP-FL-DCE is compared on a trajectory tracking problem, with initial tracking error equal to zero and varying the kernel choice. Finally, the two strategies are tested on the same trajectory tracking problem, in the presence of initial tracking errors for all joints.

### V-a Model learning performance

To train and test the GP models obtained with the SE and GIP kernel, we collected two data sets, hereafter denoted by and . The first data set, , was used to derive the GP estimators (4), after optimizing the kernel hyperparameters by marginal likelihood maximization [rasmussen_GP_for_ML]. The second data set, , was used to compare the performance of the two GP estimators. Both data sets were collected by employing a hand-tuned PD controller to track a random reference trajectory. For each joint, the reference trajectory was Gaussian noise filtered with a second-order low-pass filter (with cutoff frequency 1 Hz). The length of the trajectory was 50 seconds, resulting in 50,000 samples. To limit the computational complexity of (4), the collected samples were down-sampled with a constant rate of 10, obtaining 5,000 samples for each dataset.

In Figure 2, we visualize the distribution of the absolute value of the errors obtained in with the two GP estimators. Moreover, in the table below Figure 2

, we reported the normalized Mean Squared Error (nMSE), namely, the ratio between the mean squared error and the variance of the correspondent joint torques, expressed as a percentage. As already showed in

[GIP], for all joints, the estimator based on the GIP kernel outperforms the one based on the SE kernel, showing better data efficiency and generalization.kernel | |||||||
---|---|---|---|---|---|---|---|

SE | 3.99 | 0.48 | 4.22 | 0.88 | 7.95 | 10.86 | 4.91 |

GIP | 0.42 | 0.12 | 0.55 | 0.18 | 1.21 | 1.80 | 1.05 |

### V-B Trajectory tracking without initial tracking error

In the first control experiment, the GP-FL and GP-FL-DCE controllers based on the two models were tested on the same trajectory tracking problem. For each dof, , the reference joint position was given by , where the frequencies were randomly sampled from . The controller gains are selected following the considerations reported in Section II, and , with and . The control horizon was 5s, and the initial tracking error was zero. In Figure 3 and 4, the evolution of the joint angles and control torques obtained by GP-FL with SE and GIP kernel, and GP-FL-DCE with SE kernel are reported, respectively.

First, we discuss the performance obtained using the model based on the SE kernel. It can be noticed that the GP-FL controller with the SE kernel works properly when the amplitudes of the reference oscillations are low, but it starts to fail suddenly towards the end of the control horizon, when zero torques are commanded to all joints. This observation suggests that the GP-FL scheme evaluates the GP model in unexplored regions, where predictions are equal to the prior mean [rasmussen_GP_for_ML], which is zero. This is due to the large magnitude of , which grows with the tracking error, and becomes significantly different from the accelerations seen during training. Instead, GP-FL-DCE with the SE kernel is able to track the reference trajectory, demonstrating better robustness compared to the GP-FL. This robustness is likely achieved thanks to the estimation of the individual components of the dynamics. Indeed, even though the robot is far from the configurations seen during training, the GP model based on SE provides sufficiently accurate estimates of and , which results in keeping the robot close to the reference.

Thanks to the better generalization of the GIP kernel, the GP-FL controller based on the GIP kernel is more robust and is always able to track the desired reference trajectory, also in unexplored areas of the state space. The performance of GP-FL-DCE with the GIP kernel has not been reported, since the trajectory obtained with this scheme and GP-FL-DCE with the GIP kernel is the same of GP-FL with GIP kernel. This is due to the definition of the GIP kernel, which is closer to the physics of the system, and already encodes the linear dependencies of the torques on the acceleration of the joints.

### V-C Trajectory tracking with initial tracking error

In this experiment, we tested the controllers on the same reference trajectory as in the previous experiment, considering also the presence of initial tracking errors. For all joints, we considered an initial error of . The obtained behavior confirmed the observations from the previous experiment. The GP-FL scheme with the SE kernel is not effective. In fact, the initial error makes the magnitude of the term large, leading to considerable distances from accelerations observed during training, and zero torques from the beginning. In Figure 5, we plotted the tracking errors obtained by GP-FL-DCE with the SE kernel and GP-FL with the GIP kernel, as well as the one obtained by a feedback linearization control based on the true model. For all three estimators, the main dynamics of the tracking error follows the exponential behavior described in (3). Significant differences between the tracking error evolution can be appreciated only at steady state, where the controllers based on GP models are subject to limited oscillations around zero, with absolute value lower than , and growing with the amplitude of the reference trajectories. These errors are due to model inaccuracies, which becomes more relevant when the reference trajectories cross regions that are far from the distribution of the training samples. The errors are higher for the controller based on the SE kernel. This is in accordance with the considerations presented in Section V-A, where we highlight that the model based on GIP is more accurate. In particular, the tracking errors at steady state are higher in joint 3, 4, 6, and 7, which are the ones where the GP estimator is less accurate, as confirmed by the nMSE obtained in the experiment of Section V-A.

## Vi Conclusions

In this paper, we analyze the implementation of feedback linearization control based on GP models. We considered two strategies. The first computes the control input directly with the GP model, whereas the second computes the input after estimating the individual components of the dynamics, in particular, the inertia matrix and the torques independent of accelerations. The two strategies were compared on a trajectory tracking problem with a simulated 7-DoF manipulator, varying also the kernel choice; we considered the SE and GIP kernels. Results show that the second implementation is more robust w.r.t. the kernel choice and model inaccuracies. Moreover, as regards the choice of kernel, the obtained performance shows that the use of a structure kernel, such as the GIP kernel, is advantageous, resulting in good performance for both implementations.