Learning Context-Adaptive Task Constraints for Robotic Manipulation

08/06/2020 ∙ by Dennis Mronga, et al. ∙ 0

Constraint-based control approaches offer a flexible way to specify robotic manipulation tasks and execute them on robots with many degrees of freedom. However, the specification of task constraints and their associated priorities usually requires a human-expert and often leads to tailor-made solutions for specific situations. This paper presents our recent efforts to automatically derive task constraints for a constraint-based robot controller from data and adapt them with respect to previously unseen situations (contexts). We use a programming-by-demonstration approach to generate training data in multiple variations (context changes) of a given task. From this data we learn a probabilistic model that maps context variables to task constraints and their respective soft task priorities. We evaluate our approach with 3 different dual-arm manipulation tasks on an industrial robot and show that it performs better in terms of reproduction accuracy than constraint-based controllers with manually specified constraints.



There are no comments yet.


page 16

page 20

page 21

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many robotic manipulation tasks like bi-manual handling of an object, polishing a table or opening a door can be described as a combination of simpler tasks. For example, the problem of polishing a table can be decomposed into ”maintain surface contact” and ”follow trajectory”. Apart from that robotic manipulation tasks are usually subject to constraints, which may be related to the environment (e.g., properties of the contacted surface), to restrictions of the given task (e.g., a container with liquid that must not be tilted) or physical limitations of the robot (e.g., joint limits).

Constraint-based control, also referred to as task-oriented or Whole-Body Control, offers a flexible way to deal with such (constrained) multi-task problems. It formulates simultaneously running tasks as constraints to an instantaneous optimization problem, where the computed optimum represents the robot joint command that best accomplishes all the tasks. This way, multiple robot tasks can be integrated, complex control problems can be composed from simpler (sub-)tasks and the degrees of freedom (dof) of the entire robot body can be exploited. Within the last years a large number of frameworks have been proposed that allow multi-task control on velocity Smits et al. (2008), acceleration Flacco et al. (2012) or torque Dietrich et al. (2012) level. Most of these frameworks use of some kind of prioritization strategy in order to facilitate the parallel execution of possibly conflicting tasks. Depending on the type of prioritization, the selected task priorities are referred to as either strict Sentis and Khatib (2006) or soft Dehio et al. (2015), while some frameworks also allow a mixture of both types Liu et al. (2016).

Even though constraint-based control is a proven tool to specify complex control problems, it requires a human expert to model the overall problem in terms of task constraints and associated priorities. This process is mostly performed manually, which is time-consuming, error-prone and leads to solutions, which are often tailored to a specific situation. If the specification of the given task or the environment changes, these handcrafted solutions will likely fail.

In order to overcome these issues we develop an approach to (a) automatically derive task constraints for robotics manipulation and their associated soft priorities from data and (b) generalize about task variations and adapt to previously unseen situations. The data is obtained by the means of a programming-by-demonstration approach and the tasks are varied in between the demonstrations.

Throughout this paper, we refer to these task variations as context changes. Generally spoken, context in robotics can be defined as ”a configuration of features which are (…) useful to influence the decision process of a robotic system” Bloisi et al. (2016). Approaches that are able to automatically adapt the robot controls with respect to such changes are referred to as context-adaptive

. As an adaptation model, a Dirichlet Process Gaussian Mixture Model (DP-GMM 

Neal (1992)

) is used, which models the joint distribution of

context variables and task constraints. Using this probabilistic model, we use Gaussian Mixture Regression (GMR) Calinon et al. (2007) for reproduction of the task constraints and their associated priorities.

Compared to previous approaches with similar scope, we focus on

  • Simultaneous learning of task constraints and soft task priorities from user demonstrations. Most existing approaches attempt to learn either one or the other.

  • Severe context changes of the demonstrated task that induce multiple modes in the data. The context changes are mostly described by the means of categorical variables. Most existing approaches use continuous context representations that are subject to minor adaptations, like varying start- or end positions.

This paper is organized as follows: Section 2 presents a summary of the related work on automatic derivation and generalization of constraints in task-oriented control frameworks. Section 3 gives a quick overview on the constraint-based control framework. In Section 4 we illustrate our methods on learning adaptive task constraints from demonstration. In Section 5 we show experimental results and discuss possible extensions and future works in Section 6.

Throughout the document we use the notations and symbols shown in Table 1

. Vectors are represented by lowercase bold characters, matrices by uppercase bold characters.

2 Related Work

Constraint-based control is a powerful tool to program robots with many degrees of freedom and it has been applied to increasingly complex robotic tasks throughout the years. However, nearly all the available approaches leave the task specification to the skilled programmer, which has to model motion and physical constraints of the robot, select task priorities and tune task parameters in a cumbersome, mostly manual trial-and-error procedure. Even worse, the resulting task specification usually performs well only in a limited context. If the task or environment changes, the task parameters have to be adapted again. In our work we want to provide a way for the non-expert to program complex robotic systems using constraint-based control. To achieve this, we use programming-by-demonstration methods to record data from robotic manipulation tasks and derive task constraints, as well as their associated task priorities from this data using probabilistic regression models. By demonstrating the tasks in varying contexts the models are able to adapt the reproduced task constraints with respect to a variety of context changes that the task is subject to.

Different works exist that also attempt to ease the burden of the human programmer and automatize the process of selecting task constraints and/or priorities for constraint-based frameworks. A number of approaches apply constrained stochastic optimization or reinforcement learning to find task priorities that improve the overall robot behavior e.g., in terms of robustness 

Charbonneau et al. (2018), safety Modugno et al. (2016a), constraint satisfaction Modugno et al. (2016b); Lober et al. (2016), smoothness of motion Mronga et al. (2020) or generalization capabilities Dehio et al. (2015). Compared to our work these approaches focus on the automatic derivation of (soft) task priorities in terms of mixing weights that balance the contribution of different (predefined) task constraints. Here, we want derive both, task constraints and their respective priorities from data. Furthermore, the aforementioned methods provide only limited generalization capabilities by optimizing task priorities with respect to one particular situation. Our approach on the other hand attempts to generalize task constraints over a variety of situations.

In robot behavior learning, a widespread approach is to learn initial trajectories by imitation and refine them using reinforcement learning, where the behaviors are often represented by a movement model, for example dynamic movement primitives (DMP) Kober and Peters (2010). DMP’s themselves have been designed to generalize over some meta-parameters like initial position or movement duration. The capability to adapt to more complex context changes can be achieved by the means of hierarchical approaches, where an upper-level policy is learned that generalizes over the meta-parameters of the lower-level policy Fabisch and Metzen (2014); Kupcsik et al. (2017); Wilbers et al. (2017). However, these methods typically focus on a single task that is executed on a robot with six or seven dof, while we on the other hand focus on multi-task scenarios on more complex systems.

Learning task constraints from user-demonstrations has also been dealt with before. For example Armesto et al. Armesto et al. (2017)

learn wiping a smooth surface with a 7-dof arm by separately estimating a task policy and a nullspace constraint that generalizes over previously unseen contexts (in this case the orientation of the surface). They use fixed task priorities and a fixed hierarchy. Compared to that, in our work we additionally want to estimate the task priorities from the demonstrated motions. The work of Perico et al. 

Perico et al. (2019)

combines a constraint-based control framework with imitation learning. They represent a demonstrated trajectory using a probabilistic model and integrate it as a constraint in their control framework. The variance of the demonstrated trajectories is thereby used to modulate the stiffness of the robot and guide the human operator towards the estimated target. The other task parameters (e.g., the task priorities) are still selected manually. In contrast to that, we want to use the variability of the user-demonstrations to obtain an estimate of all the task priorities. Also, the generalization capabilities are limited to variations of the target position and orientation of the end effector, whereas we want to generalize over more complex task parameters. In 

Silvério et al. (2017) the authors extend the probabilistic movement model developed in Calinon (2016) to additionally learn task priorities from demonstration. For that they use a soft weighting scheme for a manually selected set of candidate hierarchies. In contrast, our approach relies on soft task priorities and does not require the selection of candidate hierarchies. In Fang et al. (2016)Random Forest Regression is combined with constraint-based robot control in order to learn a pouring task. The required training data is generated by naive user-demonstrations in an interactive simulated environment. However, this method is somewhat specific to the problem of pouring liquid into a container, while we attempt to provide a more general approach.

Yet another promising research direction is to parametrize constraint-based controllers by the means of high-level reasoning mechanisms Leidner et al. (2016); Tenorth et al. (2014). However, here the task parameters to reason about still have to be selected manually or at least some range of allowed values has to be provided by the human expert. In a sense these approaches do not automatize the task specification process, but shift the problem of parameter selection on a higher, more user-friendly level.

Operators Time derivative of Estimate of Inverse of a matrix Transpose of matrix Skew symmetric matrix of (see e.g., Lynch and Park (2017)) Trace of a square matrix Dimensions Number of robot joints Number of task constraints Number of user demonstrations per context Number of context variables Number of samples per demonstration Number of feature variables Number of mixture components Number of task frames Robot Control Pose in Cartesian space Position in Cartesian space Rotation matrix Twist in Cartesian space Diagonal gain matrix Task Jacobian Rotation angle Unit rotation axis Robot joint positions Diagonal task weight matrix Task weight vector Mixture Models Probability distribution of Mean of a Gaussian Covariance matrix of a Gaussian Variance Mixing weight in a GMM Data Sets and modeling Context vector Multi-dimensional data set Data set with poses Data set with twists Data set with contexts

Table 1: Overview of notations and variable names

3 Constraint-Based Control Framework

The control framework that we use is an adaptation of the approach in Smits et al. (2008). For controlling the pose of a robot link in Cartesian space, we use a proportional controller with feed forward term111In most equations we omit the dependence on time or robot joint state, for the sake of better readability.


where is the twist that represents the control output composed of linear and angular velocity, is the desired (feed forward) twist and , are diagonal matrices containing the 6 feedback and feed forward gain constants, respectively. The vectors and denote the reference and actual position of the controlled robot frame. The term denotes the matrix logarithm of the rotation matrix , where refer to the actual and reference orientation. The logarithm of an -element is a matrix representation of a constant angular velocity, which, if integrated for one second, rotates the frame to frame  Lynch and Park (2017). Finally, this term is multiplied by in order to transform to the base frame of the robot.

For each robot task, we define a controller according to Equation (1) and represent its control output as a constraint in the following online optimization problem222Here, only Cartesian position and orientation constraints are considered. However, the framework is also able to deal with other types of constraints like joint limits, collision avoidance or contact forces.


where is the robot’s reference joint velocity, the number of robot joints, is the number of task constraints and is the weighted task Jacobian related to the -th task. The term is a diagonal matrix containing the task weights . The solution of Equation (2) is computed using the damped Pseudo Inverse method as described in Maciejewski and Klein (1988). The task weights thereby balance the importance of the constraint variables. For example, when controlling only the position of the robot in Cartesian space, the orientation might be irrelevant, so the corresponding task weights can be set to zero. This means the tasks are not hierarchically organized as in Sentis and Khatib (2006), but the solution is computed as a weighted combination of the control outputs. In the over-constrained case, an approximate solution will be assumed, governed by the values of the weights.

We prefer task weights, also referred to as soft task priorities

, over strict hierarchies here, since they facilitate the application of machine learning methods as described in the next section.

4 Learning Adaptive Task Constraints From Demonstration

The design of controls and selection of task weights as described in the previous section is usually done by an expert in a manual fashion. This process is time-consuming and the resulting motions are often tailored to a specific situation. An automated procedure that derives the reference input for the controller in Equation (1) and the corresponding task weights as in Equation (2) could not only ease the burden for the programmer, but also lead to better results, especially if the solution can be adapted automatically to context changes. Such context changes could refer to the task itself (e.g., goal positions, orientation constraints, …), the environment (e.g., size or shape of objects, position and moving direction of obstacles, …) or the morphology of the robot itself (e.g., single arm or dual arm, with/without mobile base).

Figure 1: Approach overview: Learning context-adaptive task constraints from user demonstrations

Here, we propose an approach that automatically derives task constraints from data recorded in user demonstrations. By recording the data in varying contexts we are able to generalize task constraints to novel situations. Figure 1 shows the general idea of the approach.

  1. We assume that the kinematic model of the robot is known, as well as a number of task-relevant coordinate systems that have to be selected by the user in advance (e.g., the robot base, end effector or the coordinate frame of a certain object). We refer to these coordinate system as task frames, according to Finkemeyer et al. (2004).

  2. We perform user demonstrations in the form of kinesthetic teaching for each context. For each demonstration we record the relative pose and twist for each pair of task frames. Each of these pose/twist trajectories create a time-dependent 6D-candidate task constraint with associated soft task priority according to Equation (1) and (2).

  3. We model the joint probability distribution of task constraints and context variables as a Dirichlet Process Gaussian Mixture Model (DP-GMM)

  4. We reproduce task constraints and their respective priorities using in a novel, previously unseen context using Gaussian Mixture Regression (GMR)

In the following sections, we provide more detailed explanations of out approach.

4.1 Representation of Context

In our approach, we describe the context in the form of a context vector , where

is the number of context variables. The context variables can be real-valued (e.g., the size of an object) or categorical (e.g., whether an object is allowed to be tilted or not). In the latter case we use one-hot encoding to model the different categories. Currently, the user has to specify the context variables manually for each demonstration. In general, the context variables may vary with respect to time. However, here we assume that the context remains constant throughout a single demonstration.

4.2 Data Preprocessing

After recording, we first re-sample and temporally align all data streams. The time variable is normalized to to make the trajectories invariant with respect to time and linear scaling.

Since we use regression methods for reproduction of the demonstrated tasks we have to convert the rotational part of the pose trajectories to a sutiable representation first. Euler angles are not unique, suffer from gimbal lock and have a discontinuous representation space, i.e. they wrap around . Thus, they are not well suited for regression. Orthogonal rotation matrices have a continuous representation space, but are unfortunately over-parameterized. Also, after regression, the orthogonality constraint has to be enforced, e.g., by the means of a Gram-Schmidt orthonormalization process. Quaternions are not unique, discontinuous and the unit-length constraint has to be enforced during training. Thus, we decide to represent rotations as elements of the Lie algebra , which is the tangent space of , the space of orthogonal rotation matrices. An arbitrary element can be mapped to this 3-dimensional representation using the logarithmic map Lynch and Park (2017):




where is the skew-symmetric matrix form of the unit rotation axis and is the rotation angle for a given . The rotation vector gives us a 3D-representation of rotations. When restricting the rotation angle to , this representation will be unique (see e.g., Hartley et al. (2013)). However, when or , the rotation axis inverts its sign. Thus, we have to handle these cases explicitly: First we ensure that the orientation trajectory starts in the upper half of (). Then we walk through each data point in the recorded trajectory and apply and for the remaining elements whenever inverts its sign. As a result, we get a continuous 3D-representation of our orientation data.

The advantage of using elements to represent rotations is that averaging of these elements is a linear operation just as it is for scalars and 3-dimensional position vectors, when the previously mentioned boundary cases are considered properly. Moreover, since we want to estimate the soft task priorities from the variability in the user demonstrations and the task weights in (2) are six-dimensional (three entries correspond to the linear and angular velocity, respectively), we require a 3-dimensional representation of the orientation.

In summary, we can now represent continuous 3D-pose trajectories using . As a final preprocessing step, we normalize the complete data set to have zero mean and unit variance.

After preprocessing we have, for each context , a normalized dataset with context data and motion data , Here is the normalized time variable, is the number of performed user demonstrations per context, the number of samples per experiment, the number of context variables and the number of feature variables. The number of feature variables depends on the number of selected task frames as follows: (e.g., for , we have 18 pose and 18 twist variables). Since strongly grows with , the problem quickly becomes intractable for large , so the task frames should be selected with care.

4.3 Estimation of Task Constraints

We want to estimate the task constraints (the relative pose and twist of the task frames) and the respective ”soft” priorities or task weights that are required to reproduce the demonstrated task in a given context .

To achieve this, we model the joint distribution of context and motion variables as a Dirichlet Process Gaussian Mixture Model (DP-GMM). The model parameters are trained using variational inference, where is the number of mixture components, are the mixing weights, the means and

the covariance matrices of the Gaussian distributions. In a DP-GMM the mixing weights

are modeled as a Dirichlet Process, so that the effective number of mixture components can be inferred from data. In practice only an upper bound for the number of mixtures must be selected and the algorithm will set some of the mixture weights to near zero.

The reproduction of the task constraints is then achieved as follows: Starting from an initial pose , we estimate a series of twist commands from the conditional distribution using Gaussian Mixture Regression (GMR) Calinon et al. (2007). The respective pose commands are computed by integrating . This process is repeated until converging to the target pose . Since convergence of the algorithm is not guaranteed in each case, we stop the process if and , where and is a manually selected minimum distance threshold.

Thus, the input to the model are the context vector and the initial relative poses for each pair of task frames . The output of the model is a trajectory of relative poses and twists for each pair of task frames that is used as input to (1). By using twist commands as variables, the acquired trajectories can be adapted with respect to varying start and end poses.

Since it is likely that two different twists refer to a similar pose in the demonstrations (e.g. when changing the direction of motion), the distribution may be multi-modal. In order to avoid averaging between two equiprobable modes we modify the model as follows: We shift the twist trajectory one time step backwards () and add this data to the joint distribution. Then we estimate the twist commands from the conditional distribution . This way, we can resolve ambiguities between twist and pose data, like changes in motion direction.

The advantage of GMR over other regression techniques here is that it is able to generate smooth and continuous motions and that it provides information about the variation of the input data, which we require to estimate the soft task priorities. Furthermore, the time for regression is independent of the size of the data set, as GMR models the joint probability of the data, and then derives the regression function from the joint density model Stulp and Sigaud (2015). Since we have quite large data sets we prefer GMM-GMR over other approaches that model the regression function directly like e.g. Gaussian Process Regression (GPR) Rasmussen (2004).

4.4 Estimation of Task Weights

Since the resulting trajectories are reproduced from different user demonstrations, each point in the trajectory can be assigned a variance , which describes the variability of the demonstrations in context . In order to retrieve this variance, we compute the conditional distribution for each time step in the trajectory, given the estimated twist trajectory. Then, for each time step, we collapse the mixture distribution to a single Gaussian with the following mean and covariance:


From the computed covariance matrix we use the diagonal entries to estimate the task weights for a given context as follows:


where is the variance over all demonstrations recorded in context and is the maximum variance, which is used as normalization factor. The idea is that a high variability in the user demonstrations corresponds to a low priority of the task constraints and vice versa. Figuratively, this means that a demonstrated motion with low variability throughout all demonstrations is ”constrained” and thus very important for the performed task, while a high variability reflects less important parts of the task. When, for example, performing a task like polishing a table, the motion perpendicular to the table surface is constrained and a low variability will be perceived in that direction. Thus the corresponding task constraint is assigned a high priority. The motion parallel to the surface on the other hand is quite arbitrary and can be assigned lower priority, i.e. the motion must not be tracked very accurately.

(a) Mixture Components ()

Resulting Confidence Interval (

Figure 2: Example: Estimated task constraints (only x-position) and confidence interval, which is used for predicting the task weights.

Figure 2 illustrates the reproduction of a motion (only x-position) in a fixed context. Figure 1(a) shows the mean and spread of mixture components fitted to different user demonstrations, the predicted trajectory using GMR and the mean trajectory from the user demonstrations. Figure 1(b) shows the resulting confidence interval , which is used to estimate the task weights according to Equation (7).

4.5 Generalization to unknown Contexts

In order to achieve generalization capabilities with respect to previously unseen situations we perform the demonstrations under multiple variations of the given task. We refer to these variations as context changes here. As described before, the context is described by a real-valued vector , where categorical variables are modeled using one-hot encoding. Previously introduced approaches like the one described in Calinon (2016) focus on generalization over different start or target positions for a given task. Here, we want to additionally deal with more severe context changes, e.g., the size of the handled objects, whether to use a single arm or two arms for the given task or whether or not an object may be tilted during task execution. Such changes can be represented in our control approach by modifying task weights of particular constraints in an appropriate way. For example, if an object may be tilted during execution, the task weights corresponding to the rotational motion can be low, so that the remaining freedom of motion can be used by the robot to perform additional tasks, like collision avoidance.

To achieve these generalization capabilities, we estimate the meta-parameters of the GMM using leave-one-out cross validation, where we use the data from each context as a hold out set once in each split and train on the remaining contexts. This way we optimize the model to generalize to new contexts.

(a) Rotate object: Rotating an object by 90° degrees
(b) Collaboration: Collaborative transport of a bulky object
(c) Assembly: Connecting a tube and a connector piece
Figure 3: Kinesthetic teaching of dual-arm manipulation tasks
# Name OS LA RA C AC
Rot. clockw. 0.3 1 1 1 0
Obj. clockw. 0.35 1 1 1 0
Obj. clockw. 0.4 1 1 1 0
Obj. clockw. 0.45 1 1 1 0
Obj. clockw. 0.5 1 1 1 0
Obj. anticlockw. 0.3 1 1 0 1
Obj. anticlockw. 0.35 1 1 0 1
Obj. anticlockw. 0.4 1 1 0 1
Obj. anticlockw. 0.45 1 1 0 1
Obj. anticlockw. 0.5 1 1 0 1
Obj. left arm anticlockw. 0.5 1 0 0 1
Obj. left arm clockw. 0.5 1 0 1 0
Obj. right arm clockw. 0.5 0 1 1 0
Obj. right arm anticlockw. 0.5 0 1 0 1
(a) Rotate Object
# Name AT LA RA Collab. no tilt 0 1 1 Collab. with tilt 1 1 1 Collab. no tilt left arm 0 1 0 Collab. with tilt left arm 1 1 0 Collab. no tilt right arm 0 1 0 Collab. with tilt right arm 1 1 0 (b) Collaboration # Name LA RA Assembly 1 1 Assembly left arm 1 0 Assembly right arm 0 1 (c) Assembly
Table 2: Contexts and context variables used for experimental evaluation, OS - Object Size, C/AC - Clockwise/Anticlockwise, LA/RA - Left Arm/Right Arm, AT - Allow Tilt

5 Experimental Results

We evaluate our approach by the means of 3 different manipulation tasks:
Rotate Object The robot rotates a rigid object by degrees (Figure 2(a)). We vary the start pose, the width of the object (between and ), the rotation direction (clockwise/anticlockwise) and whether both robot arms or a single arm (left arm/right arm) is used for execution. In total we get 14 different contexts, parameterized by context variables. The user demonstrations of this task are illustrated in the accompanying video 01_pbd_rotate_panel.mp4.
Collaboration The robot carries a bulky object in collaboration with a human (Figure 2(b)). We vary the start pose, whether or not the object may be tilted during transport and whether both robot arms or a single arm (left arm/right arm) is used for the experiment. We obtain data in 6 different contexts, parameterized by context variables. The user demonstrations of this task are illustrated in the accompanying video 02_pbd_collaboration.mp4.
Assembly The robot assembles a tube and a connector piece. We vary the start pose and whether both robot arms or a single arm (left arm/right arm) is used for the experiment. Thus, we perform the task in 3 different contexts, parameterized by context variables. The user demonstrations of this task are illustrated in the accompanying video 03_pbd_assembly.mp4.

A summary of all recorded contexts and the context variables can be found in Table 2.

The experiments are conducted on a stationary dual-arm robot consisting of two KUKA iiwa lightweight arms333https://www.kuka.com/en-us/products/robotics-systems/industrial-robots/lbr-iiwa, each equipped with an Robotiq 3-finger gripper444https://robotiq.com/products/3-finger-adaptive-robot-gripper. We select the base frame of the robot (denoted as Base), as well as the end-effector frames of the two arms (denoted as Left EE and Right EE) as task frames. The resulting task constraints will be denoted as Base-Left EE, Base-Right EE and Left EE-Right EE in the following. Since we have three 6-dimensional Cartesian constraints, we get pose and twist variables, respectively. For each context, we perform experiments (with varying start pose). The recorded trajectories are re-sampled to contain samples each.

(d) Rotate Object: Reproduction with varying object size
(e) Rotate Object: Reproduction with varying rotation direction
(f) Collaboration: Reproduction with varying start position
(g) Assembly: Reproduction with varying start position
Figure 4: Results when reproducing task constraints in previously unseen context: Gray: Mean of demonstrations, Blue Dashed: Left Arm (constraint Base-Left EE), Green Dashed: Right Arm (constraint Base-Right EE), Red Dashed: Reproduction in previously unseen context
Figure 5: Reproduction of the Rotate Object task in context
Figure 6: Reproduction of the Collaboration task in context
Figure 7: Reproduction of the Assembly task in context

5.1 Reproduction of Task Constraints

As described in section 4.3, a joint distribution is learned using the recorded context and motion data . We use a Dirichlet Process Gaussian Mixture Model to model the distribution. For all tasks, we set the number of mixture components to and let the Dirichlet Process decide automatically on the effective number of mixtures. Reproduction of the task constraints is achieved by iteratively retrieving from using Gaussian Mixture Regression and computing the pose trajectory through integration , starting from an initial pose .

We evaluate the ability of the approach to generalize with respect to previously unseen situations, e.g., a new start pose or category of the task. The latter is thereby described by the context vector . The results are displayed in Figure 4 and explained in the following.

5.1.1 Rotate Object

The model is trained for clockwise rotation direction using both arms with data from the contexts . Figure 3(d) shows the reproduction in the test contexts , which represent previously unseen object sizes. As it can been seen the trained model is able to generalize over the size of the manipulated object. Figure 5 shows video snapshots of the reproduction of the Rotate Object task in context .

Next, we train the model using clockwise rotation with both arms and anticlockwise rotation using only a single arm. We evaluate the learned model using anticlockwise rotation using both arms, a previously unseen context. Thus, we use the contexts for training and contexts for evaluation. The results are displayed in Figure 3(e). As can be seen here, the approach is able to generalize with respect to a change of the rotation direction (using both hands). While the model was trained with single arm motions for a counterclockwise rotation direction, it is able to generate dual-arm motions with both arms in the same rotation direction.

The reproduction of this task is also illustrated in the accompanying video 04_reproduction_rotate_panel.mp4.

5.1.2 Collaboration

Here, we train the model using of the demonstrations for the fixed context , which have varying start poses. We use the remaining demonstrations with unknown start poses for evaluation. The results in Figure 3(f) show the capability of the approach to generalize about different start poses. For the sake of clarity only the -position is illustrated. Figure 6 shows video snapshots of the Collaboration task in context . The results are also illustrated in the accompanying video 05_reproduction_collaboration_a.mp4.

5.1.3 Assembly

Finally, we train the model using demonstrations from the assembly task with varying start poses (fixed context ) and use the remaining demonstrations with previously unknown start poses for evaluation. Figure 3(g) shows the result. For the sake of clarity again only the -position is shown. The results underline the ability of the model to generalize with respect to previously unknown start poses. Figure 7 shows video snapshots of the Assembly task in context . The results are also illustrated in the accompanying video 06_reproduction_assembly.mp4.

We evaluate the quality of the model by measuring the mean-absolute-error (MAE) between the reproduced trajectory and the mean of the demonstrated trajectories for each individual context. Figure 8 summarizes the results. It can be seen that the mean reproduction error is in the magnitude of around . Since we are not dealing with high precision tasks here and the KUKA arms have integrated joint level compliance controllers that may compensate small inaccuracies, the resulting reproduction errors are acceptable. Further improvements can be achieved by obtaining more training data.

Figure 8: Reproduction Error (MAE) for Individual Contexts: Estimated task weights vs. manually selected task weights
(a) Rotate Object task (only x-position, fixed context ). Left: Reproduction of task constraints and variance, Right: Estimation of task weights according to (7).
(b) Reproduction in context : Without tilting
(c) Reproduction in context : With tilting
Figure 9: Estimation of task weights: Temporal, inter-constraint and context adaptation.

5.2 Estimation of Task Weights

Next we evaluate the capability of the approach to estimate suitable task weights and adapt them (a) over time (during task execution), (b) with respect to different constraint variables and (c) with respect to different contexts. As described in section 4, we estimate the task weights from the inter-demonstration variance of the normalized data according to Equation (7). A large variance corresponds to small task weights and vice versa.

5.2.1 Temporal and Inter-Constraint Adaptation

Figure 8(a) shows the reproduction of the Rotate Object task (only x-axis), along with the demonstrated motions, the mean of the demonstrations and the estimated confidence interval . Since we chose different start poses, the motion initially shows a large variance for the constraints Base-Left EE and Base-Right EE and becomes smaller during task execution, since we try to bring the object to the same final pose in each demonstration. Accordingly, the respective task weights are low in the beginning and increase during the course of the task. In contrast to that, the constraint Left EE - Right EE has a low variance and, accordingly, a large task weight during the whole task. This is because the relative motion of the grippers is constrained by the object that they are holding. The results show that the estimated (soft) task priorities reflect the importance of the different constraints. In this case this means that the relative pose of the end effectors is more important than the pose of each individual end effector. Furthermore, it shows that the prioritization of constraints can be adapted during the course of a given task.

5.2.2 Context Adaptation

Figure 8(b) shows the effect of estimating the task weights in different contexts for the Collaboration task (only -rotation). We estimate from (Figure 8(b)) and from (Figure 8(c)) and compare the resulting task weights. Since we allow tilting the load during the motion in context , the variance is large and the corresponding task weight drops during task execution. Compared to that the task weight in context remains high during the whole motion. The results are illustrated in the accompanying video 05_reproduction_collaboration_b.mp4.

5.2.3 Comparison with Manually Selected Task Weights

Finally, we compare the quality of task execution using estimated and manually selected task weights. For the latter case, we select a fixed default value for all task weights. Figure 8 shows the reproduction error for both cases. It can be seen that the reproduction error when using estimated task weights is lower on average. Apart from that, the use of variable task weights provides a bigger flexibility for executing additional task like e.g., collision avoidance.

5.3 Discussion

In the previous sections, we experimentally evaluated the approach for automatic derivation and contextual adaptation of task constraints. We found that the use of GMM-GMR offers an intuitive way to program robot tasks using constraint-based control approaches and derive suitable task priorities automatically from user demonstrations. Furthermore, the learned models can generalize to a certain degree with respect to context changes that reflect variations of the environment or the given task. As a result, we achieve a better performance compared to manually tuned task constraints and the user does not need reprogram every novel situation, but can rely on the generalization capabilities of the model. On the long-term we strive towards a decision process, where task constraints can be described on a semantic level and their numerical counterparts are automatically selected depending on the current situation. Such a framework has the potential to greatly increase the usability and autonomy of robotic systems.

The task weight estimation according to Equation (7) relies on good quality user demonstrations that reflect the given task constraints. If the user demonstrations do not cover the constraint space well, the resulting task weights might unnecessarily over- or under-constrain the task execution. Thus, manually tuned task priorities might still perform better in specific context, but obviously do not generalize well over different contexts.

Furthermore, the approach obviously does not scale for many task frames since the number of resulting constraints is equal to the number of the possible combinations drawn from the set of task frames. The selection of task frames is a design choice by the user and requires expert knowledge on whether a frame is relevant or not. Eventually, the information whether a task frame introduces redundant information on the task could be derived from the data acquired in user demonstrations. Redundant or irrelevant task frames could then be ignored when training the model.

In this work we decided to use categorical variables for representing the context to ease the labeling of the demonstrations for the user. Although Gaussians are usually not well suited to represent categorical variables, GMM’s are able to fit the data quite well if suitable regularization of the model parameters is done. In future, a different representation of the categorical variables like binomial distributions could be chosen.

6 Conclusion

The combination of constraint-based control and imitation learning has great potential. While imitation learning offers an intuitive user interface to define new robot tasks, constraint-based task specification and control provides a powerful and flexible tool to compose complex robot behaviors. The seamless integration of both promises improvements in terms of usability, general applicability and autonomy of complex robotic systems with many dof. For examples, it is straightforward to integrate expert knowledge by manually programming some constraints, while learning others that cannot be easily specified.

One shortcoming of our approach is that for each demonstration, the current context has to be labeled by the human expert. Thus, a logical next step would be to classify the current context from the recorded data and determine whether a demonstration belongs to a known or to an unknown context. Another issue is that the task frames have to be selected by the user in advance and, for a large number of task frames, the approach does not scale. Thus, it would also be useful to select optimal task frames from the user demonstrations, e.g., use frames that maximize the information gain. We plan to investigate both problems in future. Moreover, we would like to apply our approach to more complex scenarios including different types of constraints (contact forces, obstacles, …) and more complex robots (e.g., humanoids). Finally, the estimated task weights from the model might not be optimal, since they strongly depend on the quality of user demonstrations. For example, the computed task weights might unnecessarily over-constrain the system, leaving less dof for additional tasks. Thus, we would like to add an optimization step that improves the task weights with respect to a suitable criterion, like e.g., manipulability.


This work has been supported by a grant of the German Federal Ministry for Economic Affairs and Energy (BMWi, grant number 50RA1701).



  • L. Armesto, J. Bosga, V. Ivan, and S. Vijayakumar (2017) Efficient Learning of Constraints and Generic Null Space Policies. pp. 1520–1526. External Links: ISBN 9781509046331 Cited by: §2.
  • D. D. Bloisi, D. Nardi, F. Riccio, and F. Trapani (2016) Context in robotics and information fusion. In Context-Enhanced Information Fusion: Boosting Real-World Performance with Domain Knowledge, L. Snidaro, J. García, J. Llinas, and E. Blasch (Eds.), pp. 675–699. External Links: ISBN 978-3-319-28971-7, Document, Link Cited by: §1.
  • S. Calinon, F. Guenter, and A. Billard (2007) On learning, representing, and generalizing a task in a humanoid robot. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 37 (2), pp. 286–298. External Links: Document, ISSN 10834419 Cited by: §1, §4.3.
  • S. Calinon (2016) A tutorial on task-parameterized movement learning and retrieval. Intelligent Service Robotics 9 (1), pp. 1–29. External Links: Document, ISSN 18612784 Cited by: §2, §4.5.
  • M. Charbonneau, V. Modugno, F. Nori, G. Oriolo, D. Pucci, and S. Ivaldi (2018) Learning robust task priorities of QP-based whole-body torque-controllers To cite this version : HAL Id : hal-01895146 Learning robust task priorities of QP-based whole-body. Cited by: §2.
  • N. Dehio, R. F. Reinhart, and J. J. Steil (2015) Multiple task optimization with a mixture of controllers for motion generation. IEEE International Conference on Intelligent Robots and Systems 2015-Decem, pp. 6416–6421. External Links: Document, ISBN 9781479999941, ISSN 21530866 Cited by: §1, §2.
  • A. Dietrich, T. Wimboeck, A. Albu-Schaeffer, and G. Hirzinger (2012) Reactive Whole-Body Control: Dynamic Mobile Manipulation Using a Large Number of Actuated Degrees of Freedom. Robotics & Automation Magazine, IEEE 19 (June), pp. 20–33. External Links: Document, ISSN 1070-9932 Cited by: §1.
  • A. Fabisch and J. H. Metzen (2014) Active contextual policy search. Journal of Machine Learning Research 15, pp. 3371–3399. External Links: Document, ISSN 15337928 Cited by: §2.
  • Z. Fang, G. Bartels, and M. Beetz (2016) Learning models for constraint-based motion parameterization from interactive physics-based simulation. IEEE International Conference on Intelligent Robots and Systems 2016-Novem (288533), pp. 4005–4012. External Links: Document, ISBN 9781509037629, ISSN 21530866 Cited by: §2.
  • B. Finkemeyer, U. Thomas, and F. M. Wahl (2004) Compliant motion programming: the task frame formalism revisited. In In Mechatronics & Robotics, Cited by: item 1.
  • F. Flacco, A. De Luca, and O. Khatib (2012) Prioritized multi-task motion control of redundant robots under hard joint constraints. IEEE International Conference on Intelligent Robots and Systems, pp. 3970–3977. External Links: Document, ISBN 9781467317375, ISSN 21530858 Cited by: §1.
  • R. Hartley, J. Trumpf, Y. Dai, and H. Li (2013) Rotation averaging.

    International Journal of Computer Vision

    103 (3), pp. 267–305.
    External Links: Document, ISSN 09205691 Cited by: §4.2.
  • J. Kober and J. Peters (2010) Imitation and reinforcement learning. Robotics & Automation Magazine, IEEE 17, pp. 55 – 62. External Links: Document Cited by: §2.
  • A. Kupcsik, M. P. Deisenroth, J. Peters, A. P. Loh, P. Vadakkepat, and G. Neumann (2017) Model-based contextual policy search for data-efficient generalization of robot skills. Artificial Intelligence 247, pp. 415 – 439. Note: Special Issue on AI and Robotics External Links: ISSN 0004-3702, Document, Link Cited by: §2.
  • D. Leidner, A. Dietrich, M. Beetz, and A. Albu-Schäffer (2016) Knowledge-enabled parameterization of whole-body control strategies for compliant service robots. Autonomous Robots 40 (3), pp. 519–536. External Links: Document, ISBN 0929-5593, ISSN 15737527 Cited by: §2.
  • M. Liu, Y. Tan, and V. Padois (2016) Generalized hierarchical control. Autonomous Robots 40 (1), pp. 17–31. External Links: Document, ISSN 15737527 Cited by: §1.
  • R. Lober, V. Padois, and O. Sigaud (2016) Efficient reinforcement learning for humanoid whole-body control. IEEE-RAS International Conference on Humanoid Robots, pp. 684–689. External Links: Document, ISBN 9781509047185, ISSN 21640580 Cited by: §2.
  • K.M. Lynch and F.C. Park (2017) Modern robotics. Cambridge University Press. External Links: ISBN 9781107156302, LCCN 2017302004, Link Cited by: Table 1, §3, §4.2.
  • A. A. Maciejewski and C. A. Klein (1988) Numerical filtering for the operation of robotic manipulators through kinematically singular configurations. J. Field Robotics 5, pp. 527–552. Cited by: §3.
  • V. Modugno, U. Chervet, G. Oriolo, and S. Ivaldi (2016a) Learning soft task priorities for safe control of humanoid robots with constrained stochastic optimization. IEEE-RAS International Conference on Humanoid Robots, pp. 101–108. External Links: Document, ISBN 9781509047185, ISSN 21640580 Cited by: §2.
  • V. Modugno, G. Neumann, E. A. Rückert, G. Oriolo, J. Peters, and S. Ivaldi (2016b) Learning soft task priorities for control of redundant robots. 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 221–226. Cited by: §2.
  • D. Mronga, T. Knobloch, J. de Gea Fernández, and F. Kirchner (2020) A constraint-based approach for human–robot collision avoidance. Advanced Robotics 34 (5), pp. 265–281. External Links: Document, Link, https://doi.org/10.1080/01691864.2020.1721322 Cited by: §2.
  • R. M. Neal (1992) Bayesian mixture modeling. In Maximum Entropy and Bayesian Methods: Seattle, 1991, C. R. Smith, G. J. Erickson, and P. O. Neudorfer (Eds.), pp. 197–211. External Links: ISBN 978-94-017-2219-3, Document, Link Cited by: §1.
  • C. A. V. Perico, J. De Schutter, and E. Aertbelien (2019) Combining Imitation Learning with Constraint-Based Task Specification and Control. IEEE Robotics and Automation Letters 4 (2), pp. 1892–1899. External Links: Document, ISSN 23773766 Cited by: §2.
  • C. E. Rasmussen (2004) Gaussian processes in machine learning. In Advanced Lectures on Machine Learning: ML Summer Schools 2003, Canberra, Australia, February 2 - 14, 2003, Tübingen, Germany, August 4 - 16, 2003, Revised Lectures, O. Bousquet, U. von Luxburg, and G. Rätsch (Eds.), pp. 63–71. External Links: ISBN 978-3-540-28650-9, Document, Link Cited by: §4.3.
  • L. Sentis and O. Khatib (2006) A whole-body control framework for humanoids operating in human environments. Proceedings - IEEE International Conference on Robotics and Automation 2006 (May), pp. 2641–2648. External Links: Document, ISBN 0780395069, ISSN 10504729 Cited by: §1, §3.
  • J. Silvério, S. Calinon, L. Rozo, and D. G. Caldwell (2017) Learning Competing Constraints and Task Priorities from Demonstrations of Bimanual Skills. pp. 1–14. External Links: 1707.06791, Link Cited by: §2.
  • R. Smits, T. D. Laet, K. Claes, H. Bruyninckx, and J. D. Schutter (2008) iTASC : a Tool for Multi-Sensor Integration in Robot Manipulation. Control 2. External Links: ISBN 9781424421442 Cited by: §1, §3.
  • F. Stulp and O. Sigaud (2015) Many regression algorithms, one unified model - A review. Neural Networks, pp. 28. External Links: Link Cited by: §4.3.
  • M. Tenorth, G. Bartels, and M. Beetz (2014) Knowledge-based Specification of Robot Motions. Proceedings of the European Conference on Artificial Intelligence (ECAI). Cited by: §2.
  • D. Wilbers, R. Lioutikov, and J. Peters (2017) Context-driven movement primitive adaptation. In IEEE International Conference on Robotics and Automation (ICRA), pp. 3469–3475. Cited by: §2.