Uncertainty-Aware Imitation Learning using Kernelized Movement Primitives

03/05/2019 ∙ by João Silvério, et al. ∙ Bosch 0

During the past few years, probabilistic approaches to imitation learning have earned a relevant place in the literature. One of their most prominent features, in addition to extracting a mean trajectory from task demonstrations, is that they provide a variance estimation. The intuitive meaning of this variance, however, changes across different techniques, indicating either variability or uncertainty. In this paper we leverage kernelized movement primitives (KMP) to provide a new perspective on imitation learning by predicting variability, correlations and uncertainty about robot actions. This rich set of information is used in combination with optimal controller fusion to learn actions from data, with two main advantages: i) robots become safe when uncertain about their actions and ii) they are able to leverage partial demonstrations, given as elementary sub-tasks, to optimally perform a higher level, more complex task. We showcase our approach in a painting task, where a human user and a KUKA robot collaborate to paint a wooden board. The task is divided into two sub-tasks and we show that using our approach the robot becomes compliant (hence safe) outside the training regions and executes the two sub-tasks with optimal gains.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

page 6

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Probabilistic approaches to imitation learning [1] have witnessed a rise in popularity during the past few years. They are often seen as complementing deterministic techniques, such as dynamic movement primitives [2]

, with more complete descriptions of demonstration data, in particular in the form of covariance matrices that encode both the variability and correlations in the data. Widely used approaches at this level include Gaussian mixture models (GMM), popularized by the works of Calinon

[3] and more recently, probabilistic movement primitives [4] and kernelized movement primitives (KMP) [5, 6].

In recent work [7, 8], we discussed a fundamental difference between the type of variance encapsulated by the predictions of classical probabilistic techniques, particularly Gaussian mixture regression (GMR) and Gaussian process regression (GPR) [9]. We showed that the variance predicted by these two techniques has distinct, complementary interpretations. In particular that GMR predictions measure the variability in the training data, while those of GPR quantify the degree of uncertainty, increasing as one queries a model farther away from the region where it was trained. These properties are illustrated in Fig. 1. This finding led us to inquire: is there a probabilistic technique that can simultaneously predict both variability and uncertainty? Are these two notions compatible and unifiable into a single imitation learning framework where they both provide clear advantages from a learning perspective? In this paper we try to answer these questions.

The two types of variance have been individually leveraged by different lines of work. For instance, variability and data correlations (encapsulated in full covariance matrices) have been used to modulate control gains in several works [3, 10, 11]. Uncertainty, in the sense of absence of data/information, is also a concept with tradition in robotics. Problems in robot localization [12], control [13] and, more recently, Bayesian optimization [14], leverage uncertainty information to direct the robot to optimal performance. In [8] we took advantage of uncertainty to regulate robot stiffness, in order to make it compliant (and safer) when uncertain about its actions. However, to the best of our knowledge, variability and uncertainty have never been exploited simultaneously in imitation learning.

Fig. 1: Gaussian mixture regression (GMR) and Gaussian process regression (GPR) provide complementary notions of variance as variability and absence of datapoints. With a unified technique, robots can learn controllers that are modulated by both types of information.

In this paper we introduce an approach that predicts variability, correlations and uncertainty

using a single technique and uses this information to design optimal controllers from demonstrations. These drive the robot with high precision when the variability in the data is low (while respecting the observed correlations across degrees of freedom) and render the robot compliant (and safer to interact with) when the uncertainty is high. The uncertainty is further leveraged by the robot to know when different controllers, responsible for the execution of separate, elementary sub-tasks, should be activated. In particular we:

  1. demonstrate that KMP predicts full covariance matrices and uncertainty (Sections III and IV-A)

  2. exploit a linear quadratic regulator (LQR) formulation that yields control gains which are a function of covariance and uncertainty (Section IV-B)

  3. dovetail 1), 2) with the concept of fusion of controllers [7] which allows for demonstrating one complex task as separate sub-tasks, whose activation depends on individual uncertainty levels (Section IV-C)

Experimentally, we expand on a previously published robot-assisted painting scenario and validate the approach using a KUKA LWR where different types of controllers are used for individual sub-tasks (Section V). We provide a discussion of the approach and the obtained results in Section VI and concluding remarks and possible extensions in Section VII.

Ii Related Work

Most probabilistic regression techniques provide variance predictions in some form. GMR, relying on a previously trained GMM, computes full covariance matrices encoding the correlation between output variables. However, it does not measure uncertainty, defaulting to the covariance of the closest Gaussian component when a query point is far from the model. GPR, despite estimating uncertainty, assumes constant noise therefore not taking the variability of the outputs into account. Heteroscedastic Gaussian Processes (HGP)

[15, 16] introduce an input-dependent noise model into the regression problem. Nonetheless, tasks with multiple outputs require the training of separate HGP models, thus output correlations are not straightforward to learn in the standard formulation. In addition, the noise is treated as a latent function, hence each HGP depends upon the definition of two Gaussian processes (GP) per output, scaling poorly with the number of outputs. In [17], Choi et al. propose to use mixture density networks (MDN) in an imitation learning context to predict both variability and uncertainty. The main drawback of the approach, similarly to HGP, is that outputs are assumed to be uncorrelated. Moreover, in [17]

only the uncertainty is used in the proposed imitation learning framework, without considering variability. As opposed to the aforementioned works, we here show that KMP predicts both full covariance matrices and a diagonal uncertainty matrix, parameterized by its hyperparameters, allowing the access to all the desired information. Table

I details the differences between variance predictions of different algorithms, highlighting the fact that KMP estimates all desired features in our approach.

In terms of estimating optimal controllers from demonstrations, previous works have either exploited full covariance matrices encoding variability and correlations [3, 10, 11, 18] or diagonal uncertainty matrices [8]. While the former are aimed at control efficiency, by having the robot apply higher control efforts where required depending on variability, the latter target safety with the robot becoming more compliant when uncertain about its actions. The LQR we propose in Section IV-B is identical to the one in [3, 8, 11] however, by benefiting from the KMP predictions, it unifies the best of the two approaches.

Finally, inspired by previous work on learning kinematic constraints [19], we proposed a fusion of controllers in [7] to allow robots to smoothly switch between sub-tasks based on the uncertainty of each sub-task’s controller, when performing a more complex high-level task. Here we go one step further and consider optimal controllers learned from demonstrations into the fusion, instead of manually defining the control gains.

The approach described in the next sections therefore aims at a seamless unification of concepts exploited in previous work, taking imitation learning one step ahead into the learning of optimal controllers for potentially complex tasks.

Types of prediction
Variability Uncertainty Correlations
GMM/GMR [20]
GPR [9]
HGP [15, 16]
MDN [17]
Our approach
TABLE I: (Co)variance predictions of different techniques.

Iii Kernelized Movement Primitives

We consider datasets comprised of demonstrations with length , where and denote inputs and outputs ( are initials for ‘input’ and ‘output’), respectively, and are their dimensions. can represent any variable of interest to drive the movement synthesis (e.g., time, object/human poses) and encodes the desired state of the robot (e.g., an end-effector position, a joint space configuration). KMP assumes access to an -dimensional probabilistic trajectory distribution mapping a sequence of inputs to their corresponding means and covariances, which encompass the important features in the demonstration data. This probabilistic reference trajectory can be obtained in various ways, for example by computing means and covariances empirically at different points in a dataset or by using unsupervised clustering techniques. Here we follow the latter direction, in particular by using a GMM to cluster the data and GMR to obtain the trajectory distribution that initializes KMP (done once after data collection).

By concatenating the trajectory distribution into and

, KMP predicts a new Gaussian distribution at new test points

according to [5, 6]

(1)
(2)

where

(3)

is a matrix evaluating a chosen kernel function at the training inputs, and . Moreover, . Hyperparameters are regularization terms chosen as to constrain the magnitude of the predicted mean and covariance, respectively. The kernel treatment implicit in (1)-(2) assumes the previous choice of a kernel function that depends on the characteristics of the training data. We here consider the squared-exponential kernel

(4)

a common choice in the literature. We hence have that KMP with kernel (4) requires the definition of four hyperparameters . Note the similarity between predictions (1)-(2) and other kernel-based techniques (e.g. GPR, HGP). The main difference is that in KMP the noise model is learned through which describes both the variability and correlations present in the data throughout the trajectory. This makes KMP a richer representation when compared to GPR or HGP, which assume either constant noise (GPR) or input-dependent uncorrelated noise (HGP).

Iv Uncertainty-aware imitation learning with KMP

We now demonstrate that KMP provides an estimation of uncertainty through (2), by defaulting to a diagonal matrix completely specified by its hyperparameters in the absence of datapoints (Section IV-A). In addition we propose a control framework to convert the predictions into optimal robot actions (Section IV-B) and the fusion of optimal controllers (Section IV-C).

Iv-a Uncertainty predictions with KMP

In the light of the kernel treatment (2) and the exponential kernel (4), both covariance and uncertainty predictions emerge naturally in the KMP formulation. While the former occur within the training region, the latter arise when querying the model away from the original data.

Lemma 1

The squared exponential kernel (4) goes to zero as .

Let us consider , where is the index of the training point with the minimum distance to the test point .

(5)

Lemma 1 extends to other popular exponential kernels, including the Matérn kernel [9].

Theorem 1

Covariance predictions (2) converge to a diagonal matrix completely specified by the KMP hyperparameters as test inputs move away from the training dataset, i.e. . Particularly,

(6)

Following from Lemma 1 and knowing that we have . Hence

(7)

Moreover we have

which replaced in (7) yields (6).

Equation (6) plays a crucial role in our approach. It provides a principled way to know when the model is being queried in regions where data was not present during training. We leverage this information to 1) make the robot compliant when unsure about its actions and 2) let the robot know when to execute control actions pertaining to different KMPs. Moreover, through the dependence on , and , one can adjust the expression of uncertainty provided by the model, through the tuning of any of those hyperparameters. For instance, increasing the length of the initialized trajectory distribution has the effect of scaling the uncertainty. GPR offers a similar property, where the variance prediction converges to the scalar . However this is rather limiting as tuning this hyperparameter can have undesired effects on the mean prediction. In KMP, and do not affect the mean prediction as they do not parameterize the kernel function. Moreover, (2) is typically robust to their choice, providing freedom for tuning while yielding proper predictions (see [6] for details).

Iv-B Computing optimal controllers from KMP

We now propose to use to obtain variable control gains that result in a compliant robot both when the variability and uncertainty are high111In the context of movement synthesis, new inputs occur at every new time step thus we will replace by from now on in the notation.. We follow the concept introduced in [10] and formulate the problem as a LQR. Let us consider linear systems , where denote the system state at time and its first-order derivative ( is the dimension of the state) and is a control command, where denotes the number of controlled degrees of freedom. Moreover, and represent the state and input matrices. We will stick to task space control and hence make a simplifying assumption, in line with [3, 11], that the end-effector can be modeled as a unitary mass, yielding a double integrator system

(8)

where and are zero and identity matrices of appropriate dimension. We define the end-effector state at as its Cartesian position and velocity , i.e. , and therefore corresponds to acceleration commands.

At every time step of a task, a KMP is queried with an input test point , predicting a mean and a covariance matrix . We define , i.e. the desired state for the end-effector is given by the mean prediction of KMP. For time-driven tasks, where , a sequence of reference states can be easily computed and an optimal control command can be found, minimizing

(9)

where is a positive semi-definite matrix that determines how much the optimization penalizes deviations from and is an positive-definite matrix that penalizes the magnitude of the control commands. Equation (9) is the cost function of the finite horizon LQR and its solution is obtained through backward integration of the Riccati equations (see [3]). In non-time-driven tasks, e.g. when is the state of a human that collaborates with the robot, it is not straightforward to predict a sequence of desired states. In these cases, we resort to the infinite horizon formulation

(10)

which is solved iteratively using the algebraic Riccati equation. In both cases (9), (10), the solution is given by a linear state feedback control law

(11)

where are stiffness and damping gain matrices that drive the system to the desired state.

1:Initialization
2:Identify number of sub-tasks
3:Collect demonstrations
4:Generate trajectory distributions
5:Select hyperparameters and
1:Movement synthesis
2:Input: Test point
3:for  do
4:     Compute per (1), (2)
5:     Set and
6:     Find optimal gains and compute per (13)
7:     Set
8:end for
9:Compute from (12)
10:Output: Control command
Algorithm 1 Uncertainty-aware imitation learning

Finally, we set . Unlike in previous works where a similar choice is made [3, 8, 10, 11], in our approach the unique properties of KMP endow the robot with the ability to modulate its behavior in the face of two different conditions as a consequence of this setting. First, when the KMP is queried within the training region, full covariance matrices encoding variability and correlations in the demonstrations are estimated, resulting in control gains that reflect the structure in the data. The robot is hence more precise where variability is low (higher gains) and responds to perturbations according to the observed correlations. Second, as the test input deviates from the training data, the robot becomes increasingly more compliant, as a consequence of (6). Intuitively, this makes sense as the robot should be safe when the uncertainty about is actions increases, which is achieved in our formulation by an automatic decrease of the control gains. Our approach is hence the first to permit the robot to be optimal in the region where demonstrations were provided, and safe where data is absent.

Fig. 2:

Comparison between KMP, GMR and GPR. Datapoints are plotted in black, solid lines represent the mean and shaded areas correspond to two standard deviations (computed from respective variance estimations).

Iv-C Fusing optimal controllers

It is often convenient to split the demonstration of complex tasks into smaller, less complex sub-tasks (e.g. grasping an tool and avoiding obstacles). Here we adapt the previously introduced notion of fusion of controllers [7] to account for optimal controllers, such as those described in Section IV-B. Let us consider candidate controllers generating commands that may act on the robot at every time step (we omit the subscript in the remainder of this section). In a fusion of controllers, an optimal command is computed as

(12)

where are weight matrices that regulate the contribution of each individual controller. Examples of found in the literature include scalar terms that maximize external rewards [21] and precision matrices, either computed from covariance [7, 19] or uncertainty [8]. Equation (12) has an analytical solution given by where . When and are viewed as the mean and precision matrix of a Gaussian distribution, this solution corresponds to a Gaussian product.

(a) Examples of handover starting positions.
(b) Handover end position.
Fig. 3: Handover demonstrations. The robot, starting from different initial positions, is moved kinesthetically towards a handover location, where the human hands it a paint roller.

We here propose to use

(13)

where are optimal gains estimated using LQR, given the KMP of controller , and

(14)

As a consequence of (14), controllers with high uncertainty will have negligible influence in the resulting command computed from (12). This permits the demonstration of a task into sub-tasks, whose activation during reproduction depends on their individual uncertainties. Algorithm 1 summarizes the complete approach.

V Experimental results

In this section we validate our approach from Section IV using a toy example with synthetic data (Section V-A) and a robot-assisted painting task (Sections V-B and V-C). While we have exploited the latter scenario in previous work [7], here we expand it by considering optimal controllers. The complete task is divided into two sub-tasks: a handover of a paint roller, driven by the position of the human hand, and the application of painting strokes by the robot on a wooden board. A supplementary video showing the obtained results is available at http://joaosilverio.weebly.com/uncert.html

V-a 1-D regression example with synthetic data

We first consider the regression of a scalar function. Using an artificially generated dataset we trained a KMP with (number of Gaussian components used in the initialization GMM), , , and . We sampled a trajectory distribution to initialize the KMP with datapoints. Figure 2 shows the original dataset and the approximated function using KMP, GMR222computed from the GMM that initialized the KMP and GPR333with hyperparameters , . While the three techniques accurately predict the mean trend in the function, the variance prediction given by KMP unifies the predictions from GMR and GPR, approximating the variability of the former and the uncertainty of the latter in the appropriate regions of the input space.

V-B Robot Handover

Fig. 4: Training data, KMP initialization model and test data. Top left: Demonstrated human hand (blue) and robot end-effector (gray) positions. Red ellipsoids show the 3-component GMM used to initialize KMP. Top right and bottom: Test human hand (orange), KMP generated (green) and robot measured trajectories (black). ‘’ and ‘’ mark start and end of trajectory.

We now show that the proposed approach makes the robot track its reference trajectory using optimal gains near the demonstrations while rendering it compliant when the human is far from the training data. The handover of the paint roller is achieved by demonstrating to the robot the location of its end-effector as a function of the human hand position. Note that object handovers are an extensively studied problem in human-robot collaboration and here we simplify the problem to better focus on showcasing our approach. The human hand and robot end-effector positions are here denoted as and respectively and we wish to learn the mapping , hence we set , . We use a KMP with , , , and . Moreover, the KMP is initialized with a trajectory distribution of points, obtained using GMR at inputs sampled from the GMM. The cost function of the LQR problem is parameterized with and we follow the infinite horizon formulation minimizing (10), since the input is the human hand position (i.e. not a time-driven motion).

Figure 4 shows the training dataset obtained from 7 demonstrations and the resulting GMM used to initialize the KMP (top-left). Moreover it shows the robot end-effector motion computed for a new human hand trajectory used as a test set. As demonstrated, the robot starts at a given position in its task space and moves smoothly towards the handover position, with the learned optimal gains. Figure 5 shows the stiffness and damping gains during one execution, plotted as a function of time. The control gains gradually increase as the human hand approaches the robot, ensuring an accurate tracking of the handover position. This goes in opposite direction to the data covariance that starts large and gradually decreases (Fig. 4 top-left).

Fig. 5: Stiffness (solid lines) and damping (dashed) gains during one handover. The gains increase towards the end of the task since end-effector variability decreases as the robot approaches the handover location.
Fig. 6: Stiffness gains (blue) and variance (light brown), plotted for the first task space dimension, as a function of the distance to demonstrations and different . Control gains decrease as the distance increases, making the robot gradually more compliant, hence safer, when it does not know what to do.

Figure 6 shows the estimated gains (left axis) as one moves away from the region where demonstrations were provided. We manually selected one point in the test set and queried the model at several points up to away from it along the direction. In order to facilitate the visualization, we plot one single output dimension and omit the damping gains. Notice the increase in the predicted variance (right axis) as one moves away from the demonstrations, which leads to decreasing control gains. This proves experimentally our proposition in Section IV-A. Moreover, notice the influence of the kernel length scale on how quickly control gains approach . Increasing has the effect of decreasing the distance between points, hence higher values result in a slower increase of uncertainty as one moves away from the data. The squared-exponential nature of the kernel therefore permits regulating the rate at which the robot becomes compliant through the tunning of . The enclosed video further elucidates the compliance aspect of our approach.

V-C Fusion of task space controllers

In addition to the handover of the paint roller, we also teach the robot how to paint. The goal of this experiment is to show that accessing uncertainty, in addition to covariance, permits the fusion of control commands in a way that different sub-tasks are activated, depending on the state of the human (here defined by its right hand position). In this case, the complete painting task was demonstrated partially into two sub-tasks, whose activation will be inferred from the corresponding models.

Fig. 7: Human operators teach the robot how to apply painting strokes.
Fig. 8: Painting demonstration dataset and reproduction. Training human data (blue), robot data (gray), test human data (orange) and robot desired and observed trajectories (green and black, respectively).
Fig. 9: Rows 1-3: Forces generated by each KMP (blue and green) and force used by the robot (black) at three different time intervals of the complete painting task. Bottom row: First entry of the covariance matrix (2) predicted by each KMP.
(a) Paint roller handover.
(b) Compliant robot.
(c) Painting strokes.
Fig. 10: Fusion of optimal controllers:

snapshots of the complete painting task. The end-effector stiffness is depicted as an ellipsoid at different moments of the task (larger ellipsoids correspond to higher stiffnesses). The position of the human hand (pink) is used to query two KMPs, whose predictions generated both a reference end-effector position and a covariance matrix, from which optimal stiffnesses were computed.

We provided 5 demonstrations of painting strokes to the robot as shown in Fig. 7. During these demonstrations, the robot learns to map the wooden board motion (as defined by the human hand) to the movements it should perform with the end-effector. We used the same KMP and LQR parameters as in Section V-B. Figure 8 shows the data used to train the model, together with one reproduction for a human hand trajectory in the neighborhood of the demonstrations. Note that in this case, the differences in covariance in different parts of the end-effector trajectory are not as accentuated as in the handover task. Nonetheless, optimal gains are computed at every moment according to the observed variability and correlations.

Using two KMPs, each one responsible for a sub-task, we reproduced the complete task where the control commands generated by the two candidate controllers were fused as described in IV-C. The complete task took about 2 minutes, but here we report about (the reader is referred to the supplementary video for the full experiment). The first three rows in Fig. 9 show the forces generated by each candidate controller and the force used by the robot. Due to the unit mass assumption in Section IV-B, the acceleration commands are equivalent to desired task space forces which are converted into joint space torques through [22] (

is a vector of torques and

is the end-effector Jacobian). The bottom row shows the first element of the covariance matrix estimated from (2) (the remainder diagonal elements exhibit similar temporal profiles, hence were omitted). The first column () corresponds to the beginning of the task, where the paint roller is handed over. The force used by the robot closely matches the one generated by the handover KMP, as its predicted variance is significantly lower during the whole task (bottom plot). Notice the increased value of around – it reflects the high variability in the handover demonstrations at the beginning of the task. This value is consistently below the one generated by the painting KMP. If this were not the case, could be increased. In the second column of Fig. 9 () we can see that both KMP generate high variance. This corresponds to a region in between the two sub-tasks hence none of the two should be activated. The fact that the blue line does not reach is due to the human moving slightly closer to the region where the handover was demonstrated. Note however that the observed values are consistently higher than those when tasks are activated. Moreover, notice the low forces during this interval – high covariances yielded minimal control gains resulting in low forces and a compliant robot. Finally in the third column () we see the task space forces generated during one painting stroke. Notice how, this time, the result from the controller fusion matches the force given by the painting KMP, since the variance for this sub-task is consistently very low (see bottom plot).

Finally, Fig. 10 shows snapshots of different moments of the reproduction. We draw ellipsoids representing full stiffness matrices at the end-effector in the different task moments. These matrices were estimated from the covariance predictions (2) using LQR (Section IV-B). Figure 9(a) shows the two distinct moments of the handover: the beginning, where the end-effector stiffness is low and the user can move the robot around easily, and the end, where the stiffness is high, allowing for the insertion of the paint roller. For an easier visualization we only plotted the stiffness generated by the handover KMP (hence the blue color) since the one from the painting KMP was negligible in this part of the task (as we saw in Fig. 9). Figure 9(b) shows a part of the task where none of the two sub-tasks is active. This results in an extremely low stiffness matrix and a fully compliant robot that is safe for the human to interact with and move around in the workspace. In Fig. 9(c) the robot performs a painting stroke on the board, driven by the human hand position, with high stiffness since the demonstrations were consistent in this region. The drawn stiffness ellipsoid resulted from the painting KMP, since the one from the handover KMP was negligible in the vicinity of this sub-task.

Vi Discussion

In the previous section we showed how KMP can be used to estimate full covariance matrices and uncertainty in order to learn optimal and safe controllers, as well as tasks which are comprised of more than one sub-task. One relevant point of discussion is the fact that, unlike [17], our approach does not explicitly separate between covariance and uncertainty predictions – they are both the result of (2

). However, we know a priori the form of the uncertainty predictions, as it is defined by the KMP hyperparameters. If desired, one could potentially assign confidence to a prediction as to whether it corresponds to a covariance matrix or an uncertainty matrix. One way to achieve this could be by resorting to heuristics (e.g. the determinant of the prediction, the Frobenius norm of the distance between matrices) to disambiguate between the two possibilities. Alternatively, one could also exploit the freedom given by the hyperparameters in (

6) to accentuate the difference between the two types of prediction. For instance, by setting very low values for (unconstraining covariances), one can increase the uncertainty by several orders of magnitude. The same effect would be achieved by sampling more points into the KMP reference trajectory, thus increasing . This is helped by the fact that, in practice, there are physical limits to how big variability in the data can be (e.g. joint limits, robot workspace size), hence the uncertainty matrix can often be designed so that it is significantly greater than these.

Finally, in the considered setup the two tasks were performed in different parts of the workspace. This was a design choice, as the 3D human hand position was being used to drive the KMP of each sub-task. However, in practice, our approach can extend to more complicated scenarios. For example, if one was to augment the input vector to include other features (e.g. human upper-body configuration, eye gaze), several tasks could potentially overlap in the robot workspace, since there would be more features accounted for by the inputs. This would also lead to a decreased possibility of simultaneously activating undesired sub-tasks.

Vii Conclusions and Future Work

We proposed an imitation learning approach that takes into account the robot uncertainty about its actions, in addition to the variability and correlations in the data, to design optimal controllers from demonstrations. The approach was shown to allow for increased safety, as the robot is compliant when uncertain, and efficiency, where optimal controllers are used in the regions of the workspace where demonstration data was present. In addition to safety and optimality we also validated the approach for splitting the demonstrations of a task into different sub-tasks, that are activated based on the uncertainty levels.

In future work we will study how different kernels can be exploited in our framework. For instance, periodic kernels may allow for learning tasks that require rhythmic motions, while composite kernels might permit the robot to employ different kernels in different parts of a task. Moreover, we plan to exploit the compliance of the robot when uncertain to provide new demonstrations (of the same or new sub-tasks) when required during execution, in an online learning setting. Finally, it should be noted that the ability of KMP to predict variability, correlations and uncertainty is independent from the optimal control framework that we introduced here, which aimed to address a set of specific imitation learning problems. We are convinced that KMP alone can be exploited in other robotics problems, hence in future work we will study other possible applications.

References

  • [1] B. D. Argall, S. Chernova, M. Veloso, and B. Browning, “A survey of robot learning from demonstration,” Robotics and Autonomous Systems, vol. 57, no. 5, pp. 469–483, May 2009.
  • [2] A. Ijspeert, J. Nakanishi, P. Pastor, H. Hoffmann, and S. Schaal, “Dynamical movement primitives: Learning attractor models for motor behaviors,” Neural Computation, no. 25, pp. 328–373, 2013.
  • [3] S. Calinon, D. Bruno, and D. G. Caldwell, “A task-parameterized probabilistic model with minimal intervention control,” in Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), Hong Kong, China, May-June 2014, pp. 3339–3344.
  • [4] A. Paraschos, C. Daniel, J. Peters, and G. Neumann, “Probabilistic movement primitives,” in Advances in Neural Information Processing Systems (NIPS), 2013, pp. 2616–2624.
  • [5] Y. Huang, L. Rozo, J. Silvério, and D. G. Caldwell, “Non-parametric imitation learning of robot motor skills,” in Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), Montreal, Canada, May 2019, (to appear).
  • [6] Y. Huang, L. Rozo, J. Silvério, and D. G. Caldwell, “Kernelized movement primitives,” arXiv:1708.08638v2 [cs.RO], July 2017.
  • [7] J. Silvério, Y. Huang, L. Rozo, S. Calinon, and D. G. Caldwell, “Probabilistic learning of torque controllers from kinematic and force constraints,” in Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS), Madrid, Spain, October 2018, pp. 6552–6559.
  • [8] J. Silvério, Y. Huang, L. Rozo, and D. G. Caldwell, “An uncertainty-aware minimal intervention control strategy learned from demonstrations,” in Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS), Madrid, Spain, October 2018, pp. 6065–6071.
  • [9] C. E. Rasmussen and C. K. I. Williams,

    Gaussian processes for machine learning

    .    Cambridge, MA, USA: MIT Press, 2006.
  • [10] J. R. Medina, D. Lee, and S. Hirche, “Risk-sensitive optimal feedback control for haptic assistance,” in Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), May 2012, pp. 1025–1031.
  • [11] L. Rozo, D. Bruno, S. Calinon, and D. G. Caldwell, “Learning optimal controllers in human-robot cooperative transportation tasks with position and force constraints,” in Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS), Hamburg, Germany, Sept.-Oct. 2015, pp. 1024–1030.
  • [12] D. Fox, W. Burgard, F. Dellaert, and S. Thrun, “Monte carlo localization: Efficient position estimation for mobile robots,” in Proc. AAAI Conference on Artificial Intelligence, 1999, pp. 343–349.
  • [13] J. R. Medina and S. Hirche, “Uncertainty-dependent optimal control for robot control considering high-order cost statistics,” in Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS), Hamburg, Germany, September–October 2015, pp. 3995–4002.
  • [14] R. Calandra, A. Seyfarth, J. Peters, and M. P. Deisenroth, “Bayesian optimization for learning gaits under uncertainty,” Annals of Mathematics and Artificial Intelligence, vol. 76, no. 1, pp. 5–23, February 2016.
  • [15] P. W. Goldberg, C. K. Williams, and C. M. Bishop, “Regression with input-dependent noise: A Gaussian process treatment,” in Advances in Neural Information Processing Systems (NIPS), 1998, pp. 493–499.
  • [16] K. Kersting, C. Plagemann, P. Pfaff, and W. Burgard, “Most likely heteroscedastic Gaussian process regression,” in Proceedings of the 24th International Conference on Machine Learning (ICML), Oregon, USA, 2007, pp. 393–400.
  • [17] S. Choi, K. Lee, S. Lim, and S. Oh, “Uncertainty-aware learning from demonstration using mixture density networks with sampling-free variance modeling,” in Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), Brisbane, Australia, May 2018, pp. 6915–6922.
  • [18] A. X. Lee, H. Lu, A. Gupta, S. Levine, and P. Abbeel, “Learning force-based manipulation of deformable objects from multiple demonstrations,” in Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), Seattle, Washington, USA, May 2015, pp. 177–184.
  • [19] S. Calinon and A. G. Billard, “Statistical learning by imitation of competing constraints in joint space and task space,” Advanced Robotics, vol. 23, no. 15, pp. 2059–2076, 2009.
  • [20] S. Calinon, “A tutorial on task-parameterized movement learning and retrieval,” Intelligent Service Robotics, vol. 9, no. 1, pp. 1–29, January 2016.
  • [21] N. Dehio, R. F. Reinhart, and J. J. Steil, “Multiple task optimization with a mixture of controllers for motion generation,” in Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS), Hamburg, Germany, 2015, pp. 6416–6421.
  • [22] O. Khatib, “A unified approach for motion and force control of robot manipulators: The operational space formulation,” IEEE Journal on Robotics and Automation, vol. 3, no. 1, pp. 43–53, 1987.