The next generation of robots, that can operate in and adapt to unstructured and dynamic environments, must possess a diverse set of skills. However, it is implausible to pre-program robots with a library of all required skills. Learning from Demonstration (LfD) [1, 2] is a paradigm that aims to equip robots with the ability to learn efficiently from demonstrations provided by humans. Existing work in trajectory-based LfD has contributed a wide range of mathematical representations that encode skills from human demonstrations and then reproduce the learned skills at runtime. Proposed representations include Spring-damper systems with forcing functions 
, Gaussian Mixture Models (GMMs)[4, 5, 6]
, Neural Networks (NNs)[7, 8], Gaussian Processes (GPs) [9, 10, 11], and geometric objects , among others. Each of these representations is used to encode the demonstrations in a predefined space or coordinate system (e.g., Cartesian coordinates). In other words, a single best coordinate system for any given skill is assumed to both exist and be known. However, as we show in this work, the assumption that a single best coordinate system exists for each task does not hold. Further, encoding in only a single coordinate system prohibits the model from capturing some of the geometric features that underlie a demonstrated skill.
In this work, we contribute a learning framework that encodes demonstrations simultaneously in multiple coordinates, and balances the relative influences of the learned models in generating reproductions. The proposed framework, named Multi-Coordinate Cost Balancing (MCCB), encodes demonstrations in three differential coordinates: Cartesian, tangent, and Laplacian (Section III-A). Simultaneously learning in these three coordinates allows our method to capture all of the underlying geometric properties that are central to a given skill. MCCB encodes the joint density of the time index and the demonstrations in each differential coordinate frame using a separate statistical model. Thus, given any time instant, we are able to readily obtain the conditional mean and covariance in each coordinate system (Section III-B
). MCCB generates reproductions by solving an optimization problem with a blended cost function that consists of one term per coordinate. Each term penalizes deviations from the norm, weighted by the inverse of the expected variance in the corresponding coordinate system (SectionIII-C). Further, we subject the optimization problem to linear constraints on the reproductions, such as initial, target, and via point constraints. Our constrained optimization problem is convex with respect to the reproduction and hence can be solved efficiently.
A major hurdle in learning a wide variety of skills, without significant parameter tweaking, is that the relative importance of each differential coordinate (or the geometric feature) in encoding a given task is unknown ahead of time. For instance, consider the problem of encoding the demonstrations illustrated in Fig. 1. Using any one coordinate system in isolation, even when the most suitable one is known, does not yield good reproductions (the red, brown, and green dashed lines). To alleviate this problem, MCCB preferentially weights the costs defined in each coordinate (Fig. 2). Importantly, MCCB learns the optimal weights directly from the demonstrations without making task-dependent assumptions. To this end, MCCB solves a meta optimization problem that aims to minimize reproduction errors (Section III-D). As shown by the solid blue lines in Fig. 1, a cost function that optimally balances the costs in each coordinate yields better reproductions than any single-coordinate method.
In summary, we contribute a unified task-independent learning framework that (1) encodes demonstrations simultaneously in multiple differential coordinates, (2) defines a blended cost function that incentivizes conformance to the norm in each coordinate system while considering expected variance, and (3) learns optimal weights directly from the demonstrations to balance the relative influence of each differential coordinate in generating reproductions. Further, MCCB is compatible with and complementary to several existing LfD methods that utilize different statistical representations and coordinate systems [13, 14, 12, 11, 10, 15, 16].
Ii Related Work
Learning from demonstration has attracted a lot of attention from researchers in the past few decades. While several categories of LfD methods exist , our work falls under the category of trajectory-based LfD. In this category, demonstrations take the form of trajectories and the methods aim to synthesize trajectories that accurately reproduce the demonstrations.
Dynamical systems-based trajectory learning methods, such as [5, 6, 7], encode demonstrations using statistical dynamical systems and generate reproductions by forward propagating the dynamics. While such deterministic methods exhibit several advantages, such as convergence guarantees and robustness to perturbations, they are restricted to learning in a single coordinate system and ignore inherent uncertainties in the demonstrations. They incentivize conformance to the norm even when demonstrations exhibit high variance.
Trajectory optimization methods, such as  and , focus on geometric features by minimizing costs specified using predefined norms. An optimization framework proposed in  attempts to adapt multiple demonstrations to new initial and target locations by minimizing the distance between the demonstrations and the reproduction according to a learned Hilbert space norm. Indeed, learning an appropriate Hilbert space norm is related to finding an appropriate coordinate system based on the demonstrations. However, similar to the dynamical systems-based methods, the methods in [17, 18, 19] are restricted to a single predefined or learned coordinate system and do not explicitly model and utilize the inherent time-dependent variations in the demonstrations.
Probabilistic trajectory-learning methods, such as [10, 11] and , on the other hand, capture and utilize the variation observed in the demonstrations. However, these methods are also restricted to encoding demonstrations in a single predefined coordinate system that is assumed to be known.
Our design of the costs in each differential coordinate is inspired by the minimal intervention principle  that takes variance into account. While the approach in  does encode demonstrations in different frames of references, all the frames are restricted to Cartesian coordinates or orientation space. Furthermore, all the relevant frames for a given task are also expected to be provided by the user.
The motion planning framework in 
, complementary to our approach, utilizes a blended cost function, the construction of which is guided by probability distributions learned from the demonstrations. This framework incentivizes factors such as smoothness, manipulability, and obstacle avoidance, but is restricted to the Cartesian coordinate system. MCCB, on the other hand, encodes demonstrations in multiple differential coordinates and learns to optimally balance their relative influences, but does not consider factors such as manipulability and obstacle avoidance.
Differential coordinates have been extensively used in the computer graphics community [20, 21]. Prior work in trajectory learning that incorporates differential coordinates includes the Laplacian trajectory editing (LTE) algorithm . Using Laplacian coordinates, the LTE algorithm adapts a single demonstration to new initial, target, and via points while preserving the shape. However, the LTE algorithm does not reason about the relative importances of multiple coordinates.
The section describes the technical details of MCCB and its work flow as illustrated in Fig. 2.
Iii-a Differential Coordinate Transformations
In this section, we define the differential coordinates and their corresponding transformations used in MCCB.
Cartesian: Let a discrete finite-length trajectory in -dimensional Cartesian coordinates be denoted by and let denote a discrete sample at time index . This trajectory can be represented using a graph where is the set of vertices representing the samples in the trajectory and is the set of edges that represent the connections between the samples in the trajectory. The neighborhood of each vertex is defined by the set of adjacent vertices . In the case of discrete-time trajectories, the edges between any given vertex and its two neighbors are assumed to carry unit weights, while all other edges carry zero weights.
Laplacian: It is known that the discrete Laplace-Beltrami operator for the trajectory provides the Laplacian coordinate as . Note that the above relationship can be written as a linear differential operator in matrix form
where is the trajectory in the Laplacian coordinates, and , called the graph Laplacian, is given by
As pointed out in 
, the Laplacian coordinates have meaningful geometric interpretations. Specifically, the Laplacian coordinates can be seen as the discrete approximations of the derivative of the unit tangent vectors of an arc-length parametrized continuous trajectory. In other words, the Laplacian coordinates measure the deviation of each sample from the centroid of its neighbors.
Tangent: While the Laplacian coordinates are discrete approximations of second order differential transformations, a discrete approximation of the first differential transformation is possible. Consider such a first order transformation using first order finite differences defined as , where is called the tangent coordinate. The matrix form of the above relationship results in a linear differential operator given by
where is the trajectory in the tangent coordinates and , called the graph incidence matrix, is given by
Similar to the Laplacian coordinates, the tangent coordinates have geometric interpretations. Specifically, the tangent coordinates can be seen as discrete approximations of the un-normalized tangent vectors of an arc-length parametrized continuous trajectory, i.e., the tangent coordinates measure the local direction of motion at each sample of the trajectory.
In our work, we assume that a set of demonstrations in the Cartesian coordinates are available. Let the th demonstration be denoted by . Note that if the raw demonstrations are of varying duration in time, we perform time alignment using dynamic time warping. MCCB transforms each obtained demonstration into a trajectory in the tangent coordinates (denoted by ) and a trajectory in Laplacian coordinates (denoted by ) using (1) and (3), respectively.
Iii-B Encoding in Multiple Differential Coordinates
This section defines the costs associated with each coordinate. With the demonstrations available in all three differential coordinates, we employ three independent Gaussian mixture models (GMMs)111 MCCB does not rely on the use of GMMs and any statistical representation that can provide the conditional estimates will suffice.
MCCB does not rely on the use of GMMs and any statistical representation that can provide the conditional estimates will suffice.to approximate the joint probability densities of time and the samples in each coordinate system.
The GMM associated with the Cartesian coordinates attempts to approximate the joint density of and using a finite number of Gaussian basis functions as follows , where is the number of Gaussian basis functions, is the prior associated with the th basis function, is the set of parameters of the GMM, and is the conditional probability density given by , where is the mean and is the covariance matrix of the th Gaussian basis function.
We learn the parameters
of the model using the Expectation-Maximization algorithm based on the demonstrations. Given the learned model and a time instant, the expected value of the conditional density is given by Gaussian mixture regression (GMR)  as follows
where , , , and the conditional covariance is given by
Similar to the GMM learned in the Cartesian coordinates, we learn a second GMM in the tangent coordinates based on the demonstrations , and a third GMM in the Laplacian coordinates based on the demonstrations . The expected values of the conditional densities and are given by
and the corresponding conditional expectations are given by
Iii-C Imitation via Optimization
In this section, we explain the design of our multi-coordinate cost function. MCCB generates reproductions by solving a constrained optimization problem given by
where is the reproduction, are positive weights; are cost functions in the Cartesian, tangent, and Laplacian coordinates, respectively; and define linear constraints on . In practice, and we use the linear constraints to enforce constraints on initial, target, and via points.
We define the cost function in each coordinate system as follows
where denote the block diagonal matrices formed with the conditional covariances and , respectively, for all values of . Further, the notation denotes vectorization - for instance, denote the vectorized trajectories formed by vertically stacking and for all values of , respectively. Note that we construct the trajectories and in (14) and (15) from via the linear operators defined in (3) and (1), respectively. MCCB penalizes deviations from the conditional mean in each coordinate system. However, deviations are penalized less (more) severely if high (low) variance is observed in the demonstrations at any given time.
Iii-D Automated Cost Balancing
In order to obtain reproductions that successfully imitate demonstrations of a wide variety of skills, the weights and have to be chosen with care. Indeed, they preferentially weight the costs defined in each differential coordinate and thereby manipulate the relative incentive for successful imitation in each coordinate system.
We learn these weights directly from the available demonstrations. Note that, for known weights, the constrained optimization problem in (11) is convex in . We estimate the weights in the following form
where , such that , are positive scaling factors used to correct for inherent differences in the magnitudes of the costs, and , such that , are positive weights used to preferentially weight the cost defined in each coordinate system. MCCB estimates the scaling factors ’s as follows
With the scaling factors compensating the inherent scale difference in the costs, we compute the preferential weighting factors ’s that minimize reproduction error. To this end, we formulate the following meta optimization problem
where denotes the sum of squared errors computed over time, and is the solution to the following optimization problem
where denotes specific linear constraints pertaining to the demonstration , such as initial, target, and via points. Solving the above meta-optimization problem results in the preferential weights ’s that minimize reproduction errors of the solutions generated by the original constrained optimization problem in (11)-(12).
Iv Experimental Evaluation
This section describes the design and discusses the results of four experiments conducted to evaluate MCCB. In each experiment, we compared the performances of the following approaches:
Cartesian-coordinates: , ,
Tangent-coordinates: , ,
Laplacian-coordinates: , ,
Uniform weighting: , ,
MCCB: , ,
We measured the performance of each approach by the following geometric and kinematic metrics: Swept Error Area (SEA) , Sum of Squared Errors (SSE), Dynamic Time Warping Distance (DTWD), and Frechet Distance (FD) . These metrics allow us to evaluate different aspects of each method’s performance. The SEA and SSE metrics penalize both spatial and temporal misalignment, and thus evaluate kinematic performance. On the other hand, the DTWD and FD metrics penalize spatial misalignment while disregarding time misalignment, and thus evaluate geometric performance. Further, the SEA, SSE, and DTWD metrics evaluate aggregate performance by summing over or averaging across all the samples of each reproduction. The FD metric, on the other hand, computes the shortest possible cord length required to connect the demonstration and the reproduction in space while allowing time re-parametrization of either trajectory, and thus measures maximal deviation in space. Note that the SEA metric is restricted to 2-dimensional data, so we only report it for one of our experiments.
In all the experiments, we used the position constraints in (12) to enforce both initial and end point constraints uniformly across all the methods being compared. Further, we uniformly set the number of Gaussian basis functions to five across all the coordinates and all the experiments.
Iv-a Handwriting Skill
This experiment evaluates MCCB on the publicly available LASA human handwriting library , that consists of handwriting motions collected from pen input using a Tablet PC. The library contains a total of 25 handwriting motions, each with 7 demonstrations.
Fig. 3 shows that MCCB yields reproductions that are qualitatively similar to the demonstrations while satisfying the end-point constraints across all motions. As shown in Fig. 4, quantitative analysis indicates that MCCB ()222Weighting factors averaged over all 25 skills in the LASA dataset and three of the four baselines performed comparably with respect to the SEA, FD, SSE, and DTWD metrics, while the Cartesian baseline performed poorly in comparison. This is consistent with the fact that the demonstrations within the LASA dataset emphasize strong similarities in shape.
Iv-B Picking Skill
The second experiment evaluates the performance of MCCB in a picking task (Fig. 5). The data consists of six kinesthetic demonstrations, each a 3-dimensional robot end-effector position trajectory recorded as a human guided the robot in picking up two magnets atop two blocks. We enforced two via-point constraints (one at each picking point) in addition to the end-point constraints.
As shown in Fig. 6(a), MCCB generated reproductions that are qualitatively similar to the demonstrations while satisfying all the position constraints. Quantitative evaluations reveal that learning in tangent coordinates yielded better reproductions than learning in Cartesian and Laplacian coordinates (Fig. 7). This was expected since the demonstrations of this task, much like the LASA dataset, emphasize shape similarity. Further, MCCB () yielded the best performance, with respect to all three metrics. In fact, uniform weighting yielded poorer results, with respect to all three metrics, than when considering only the tangent coordinates. The results of this experiment show that while multi-coordinate methods can yield strong performance, it is critical that we balance the weights appropriately.
Iv-C Pressing Skill
In this experiment, we evaluated MCCB’s ability to learn pressing skills (Fig. 5). The data consists of six kinesthetic demonstrations, each a 3-dimensional robot end-effector position trajectory recorded as a human guided the robot in pressing two cylindrical pegs into their respective holes.
As shown in Fig. 6(b), MCCB successfully reproduced the demonstrations. Note that MCCB is capable of automatically capturing and reproducing the consistencies across the demonstrations in certain regions without any position constraints. Fig. 8 illustrates the performance of MCCB and the baselines with respect to three different metrics. Learning in Cartesian coordinates resulted in the better performance compared to learning in tangent and Laplacian coordinates. Quantitative evaluations further demonstrate that MCCB () consistently yielded the best performance with respect to all three metrics. The results of this experiment, in light of the results in Section IV-B, suggest that the relative importance of each of the differential coordinates vary across different skills.
Iv-D Pushing Skill
The final experiment evaluates the performance of MCCB in a pushing task (Fig. 5). The data consists of six kinesthetic demonstrations, each a 3-dimensional robot end-effector position trajectory recorded as a human guided the robot in sliding closed the lid of a wooden box.
As shown in Fig. 6(c), MCCB successfully generated reproductions that are similar to the demonstrations. As evidenced by quantitative evaluations in Fig. 9, encoding demonstrations in the Laplacian coordinates yielded better performance, with respect to all three metrics, when compared to learning only in either of the other two coordinates, while, MCCB () consistently outperformed all the other approaches. Note that learning in the Laplacian coordinates alone resulted in better performance than uniformly weighting of all the coordinates. These results are consistent with the results from the previous sections and indicate that MCCB yields consistently good performance. The results are summarized in Table I.
|Handwriting||✓ ✓||✓ ✓||✓||✓|
V Discussion and Conclusion
We introduced MCCB, a learning framework for encoding demonstrations in multiple differential coordinates, and automated balancing of costs defined in those coordinates. As shown in Table I, we demonstrated that the relative effectiveness of each coordinate system is not consistent across a variety of tasks since any given skill might be better suited for learning in one (or more) coordinate system(s). Furthermore, uniform weighting of costs in different coordinates does not consistently yield the best results across different skills. Indeed, uniform weighting, in some cases, yielded poorer performances compared to when only one coordinate system was used. On the other hand, MCCB learned to balance the costs and consistently yielded the best performance. Since the weights are learned directly from the demonstrations, MCCB makes no task-specific assumptions and does not require tedious parameter tuning. Note that although we used GMMs as the base representation in this work, MCCB is agnostic to the statistical model used to encode the demonstrations in each coordinate system, and thus can be combined with other techniques, such as [13, 14, 12, 11, 10, 15, 16]. Furthermore, MCCB can be extended to include more coordinate systems that capture additional trajectory features.
This research is supported in part by NSF NRI 1637758.
-  B. D. Argall, S. Chernova, M. Veloso, and B. Browning, “A survey of robot learning from demonstration,” Robotics and autonomous systems, vol. 57, no. 5, pp. 469–483, 2009.
-  A. Billard, S. Calinon, R. Dillmann, and S. Schaal, “Robot programming by demonstration,” in Springer handbook of robotics. Springer, 2008, pp. 1371–1394.
-  P. Pastor, H. Hoffmann, T. Asfour, and S. Schaal, “Learning and generalization of motor skills by learning from demonstration,” in IEEE International Conference on Robotics and Automation (ICRA), 2009, pp. 763–768.
-  S. Calinon, F. Guenter, and A. Billard, “On learning, representing, and generalizing a task in a humanoid robot,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 37, no. 2, pp. 286–298, 2007.
-  S. M. Khansari-Zadeh and A. Billard, “Learning stable nonlinear dynamical systems with gaussian mixture models,” IEEE Transactions on Robotics, vol. 27, no. 5, pp. 943–957, 2011.
-  H. C. Ravichandar and A. Dani, “Learning position and orientation dynamics from demonstrations via contraction analysis,” Autonomous Robots, pp. 1–16, 2018.
-  K. Neumann, A. Lemme, and J. J. Steil, “Neural learning of stable dynamical systems based on data-driven lyapunov candidates,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2013, pp. 1216–1222.
S. Levine and V. Koltun, “Learning complex neural network policies with
trajectory optimization,” in
International Conference on Machine Learning, 2014, pp. 829–837.
-  M. Schneider and W. Ertel, “Robot learning by demonstration with local gaussian process regression,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2010, pp. 255–260.
-  M. A. Rana, M. Mukadam, S. R. Ahmadzadeh, S. Chernova, and B. Boots, “Towards robust skill generalization: Unifying learning from demonstration and motion planning,” in Proceedings of the 2017 Conference on Robot Learning (CoRL), 2017.
-  J. Umlauft and S. Hirche, “Learning stable stochastic nonlinear dynamical systems,” in International Conference on Machine Learning, 2017, pp. 3502–3510.
-  S. R. Ahmadzadeh, M. A. Rana, and S. Chernova, “Generalized cylinders for learning, reproduction, generalization, and refinement of robot skills.” in Robotics: Science and Systems, 2017.
-  S. Calinon, D. Bruno, and D. G. Caldwell, “A task-parameterized probabilistic model with minimal intervention control,” in IEEE International Conference on Robotics and Automation (ICRA), 2014, pp. 3339–3344.
-  A. Paraschos, C. Daniel, J. R. Peters, and G. Neumann, “Probabilistic movement primitives,” in Advances in neural information processing systems, 2013, pp. 2616–2624.
-  T. Osa, A. M. G. Esfahani, R. Stolkin, R. Lioutikov, J. Peters, and G. Neumann, “Guiding trajectory optimization by demonstrated distributions,” IEEE Robotics and Automation Letters, vol. 2, no. 2, pp. 819–826, 2017.
-  T. Nierhoff, S. Hirche, and Y. Nakamura, “Spatial adaption of robot trajectories based on laplacian trajectory editing,” Autonomous Robots, vol. 40, no. 1, pp. 159–173, 2016.
-  N. Ratliff, M. Zucker, J. A. Bagnell, and S. Srinivasa, “Chomp: Gradient optimization techniques for efficient motion planning,” in IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2009, pp. 489–494.
-  J. Schulman, Y. Duan, J. Ho, A. Lee, I. Awwal, H. Bradlow, J. Pan, S. Patil, K. Goldberg, and P. Abbeel, “Motion planning with sequential convex optimization and convex collision checking,” The International Journal of Robotics Research, vol. 33, no. 9, pp. 1251–1270, 2014.
-  A. D. Dragan, K. Muelling, J. A. Bagnell, and S. S. Srinivasa, “Movement primitives via optimization,” in IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2015, pp. 2339–2346.
-  Y. Lipman, O. Sorkine, D. Cohen-Or, D. Levin, C. Rossi, and H.-P. Seidel, “Differential coordinates for interactive mesh editing,” in Shape Modeling Applications. IEEE, 2004, pp. 181–190.
B. Lévy, “Laplace-beltrami eigenfunctions towards an algorithm that” understands” geometry,” inIEEE International Conference on Shape Modeling and Applications, 2006, pp. 13–13.
D. A. Cohn, Z. Ghahramani, and M. I. Jordan, “Active learning with statistical models,”
Journal of artificial intelligence research, vol. 4, pp. 129–145, 1996.
-  S. M. Khansari-Zadeh and A. Billard, “Learning control lyapunov function to ensure stability of dynamical system-based robot reaching motions,” Robotics and Autonomous Systems, vol. 62, no. 6, pp. 752–765, 2014.
-  M. M. Fréchet, “Sur quelques points du calcul fonctionnel,” Rendiconti del Circolo Matematico di Palermo (1884-1940), vol. 22, no. 1, pp. 1–72, 1906.