With the advances in automation and robotics, robots are more frequently working in close proximity to humans. This can be seen in collaborative manufacturing, where humans and robots work together to assemble components, and in household and assistive robotics, where robots provide physical assistance to humans. Thus, there is a growing need for robots that can effectively and safely interact with humans in close proximity.
A key challenge in robot-human close-proximity interaction is the generation of robot trajectories that are safe, i.e. they do not physically harm the human, and are comfortable, i.e. the human is able to interpret and anticipate the robot’s behavior. Collaborative robotic systems can create safe trajectories with frequent monitoring and replanning [11, 3], but at the cost of efficiency. Anticipatory methods that use predictions of human motion can instead be used to generate safer trajectories using learned models of human motion.
Several different factors define safety and comfort of a robot’s trajectory. While a trajectory may be safe for a nearby human, it might not be comfortable. The visibility of the robot’s end effector in the peripheral vision of the human can increase comfort. Inference of the robot’s intent through partial observation of its trajectory can also greatly increase comfort. Finally, sudden and unexpected robot behavior can be a major source for discomfort. Effective robot trajectories in human-robot collaboration must take into account all of these factors. Further, user experience factors are not necessarily complimentary to trajectory efficiency, as previously shown for robot-human handover tasks . Thus, collaborative robot trajectory generation must effectively balance efficiency, safety, and comfort factors.
In this paper, we address the problem of adapting robot trajectories for human-robot collaborative environments with an overall goal of improving human safety and comfort, as well as increasing task efficiency. We define the default trajectory executed by the robot for a particular task as the nominal trajectory. Given some observation of human motion directly preceding execution of the nominal trajectory, we use a prediction of the human’s motion to adapt the nominal trajectory for improved safety and comfort.
Figure 1 shows an overview of our approach. We combine multiple objective functions to satisfy several factors of human comfort in addition to safety. We use time-sampled stochastic predictions of human motion to generate objective functions , using the uncertainty of the prediction to generate appropriately conservative trajectories. We call our approach Collaborative Multi-Objective Trajectory Optimization – CoMOTO.
In order to evaluate CoMOTO, we define several metrics that incorporate key factors in safety and comfort of humans in collaborative environments. We perform experiments in three collaborative picking test cases, and compare the results against established baselines. Our results show that CoMOTO performs consistently well for all of our metrics across different close-proximity collaborative picking scenarios, while the baselines are able to perform well in only a particular metric or for only a specific test scenario.
Ii Related Works
Human-robot collaborative manipulation is a well-studied topic. Several works focus on reactive systems for safety in collaborative environments. Lasota et al. propose a reactive speed control system for collaborative robots . Their system monitors human pose to measure the separation distance between the human and the robot, scales the robot’s execution speed by the separation distance, and stops execution completely below a specified separation threshold. Dumonteil et al. propose a similar reactive approach for collaborative robots in industrial applications using a state machine , by reactively replanning trajectories to avoid potential collisions. Reactive systems based on unadapted trajectories are task inefficient due to repeated replanning, and so we instead focus on using human motion prediction to generate initially safer trajectories.
Several prior works have utilized human motion prediction in order to proactively adapt robot trajectories. Mainprice and Berenson propose a prediction based planning framework that generates swept volumes for collision detection based on human motion prediction using a GMM, and interleaves planning and execution to update the prediction . However, since their system selects an updated predicted human trajectory at each iteration, their framework requires constant replanning. Fishman et al. address the problem of coordinated human-robot collaboration, specifically in a handover task , using a joint optimal control model to simultaneously plan the robot’s behavior and predict the humans’ behavior by inferring human goals. Stouraitis et al.
develop a method that involves estimating a human partner’s policy to optimize trajectories for dyadic collaborative manipulation, where a human and robot work together to manipulate a single large object. Huang and Mutlu present an anticipatory control method based on inference from human gaze , highlighting the task efficiency benefits of using predictive and anticipatory planning methods. Maeda et al. use early human action recognition to initiate a corresponding robot response . Maeda et al. also present a Probabilistic Movement Primitive framework for learning a mixture model of human-robot interaction primitives, used to identify human tasks as well as to coordinate robot movement with the observed human movement. These works either focus on improving task efficiency through predicting human intent, or use human motion prediction to create safe trajectories. Our work builds on these ideas by using human motion prediction to improve human comfort factors, as well.
Our objective of combining several factors of safety and comfort for trajectory optimization is similar to Mainprice et al.’s work , which considers distance for safety, and visibility and reachability for comfort in robot handover tasks. However, their work assumes the human will remain still during the robot’s trajectory execution, and it does not consider the pose of the human’s arm. We extend collaborative multi-objective trajectory optimization to account for a moving human, with an articulated human model.
Human comfort factors beyond collision avoidance are important considerations in human-robot collaborative environments . Dragan et al. propose legibility and predictability of robot motion to a human observer [1, 2]. A legible motion is one from which an observer can quickly and confidently infer the motion’s goal after only partial observation, and predictable motion is the most expected motion to reach a goal. Stulp et al. present legibility as a task-specific behavior that can be learned rather than a general characteristic of a trajectory. Medina et al. emphasize the importance of smoothness for robot-human handover trajectories . In order to account for multiple factors that affect human comfort, our work considers legibility, predictability (through efficient execution), and smoothness.
Iii Collaborative Trajectory Optimization
Our framework, CoMOTO, uses stochastic human motion prediction to calculate an objective function, composed of a set of costs relevant to close-proximity interaction, which is minimized using a trajectory optimization framework. Specifically, the trajectory generation pipeline consists of a brief 1 second observation period of the human’s motion , which is used to predict the remaining trajectory of the human (see Section V-A for details). The predicted trajectory is then used as input to calculate a set of costs, including separation distance, visibility, legibility, deviation from a nominal trajectory, and smoothness, that account for the stochastic nature of the prediction. The costs themselves are detailed in Section IV. We formulate an objective function using a weighted combination of the costs, which is then minimized to generate a robot trajectory using TrajOpt , although other trajectory optimization works, such as [20, 9], and cost based planning algorithms, such as , can be used instead. The generated trajectory is then executed concurrently with the remainder of the human’s motion.
Our work does not include a reactive safety system. While reactive methods are necessary for absolute collision prevention, they inherently reduce efficiency by requiring replanning and re-execution of trajectories. We instead address safety at the planning step itself step. By leveraging human motion prediction with a separation distance cost to generate trajectories that will be inherently safer, we reduce the intervention frequency of a reactive safety system, thereby increasing task efficiency.
Iv Trajectory Adaptation Costs
Our objective function is split into several costs that cover different elements of safety and comfort in a collaborate environment. Each cost is a function of time parameterized robot joint trajectory and predicted human motion for .
Iv-a Distance Cost
Distance between the human and the robot is the most critical factor in safe collaborative manipulation. Thus we formulate a cost that penalizes lower separation distances between the human and the robot. The cost is further scaled by the covariance of the prediction, with higher covariance resulting in a higher cost, resulting in more conservative trajectories when the predicted motion has higher uncertainty The cost is formulated as follows:
where and are the mean and covariance of the predicted 3D position of the human joint at time , and is the 3D position of the robot joint at time .
Iv-B Visibility Cost
During trajectory execution, visibility of the robot’s end effector is an essential factor for human comfort . If the robot is out of the field of view of the human, the human may be distracted and try to locate it, thus decreasing both human comfort and task efficiency. This is a basic human instinct for safety against unpredictable moving objects. The visibility cost penalizes the end effector for being farther from the human’s gaze.
We define the visibility cost as the angle between the predicted human gaze and the line between the position of the robot end effector and the human’s head. We define the predicted human gaze as the line from the predicted position of the human head to the position of the object with which the human is interacting. The cost is scaled inversely to the variance of the prediction of the human head pose.
where is the 3D position of the object with which the human is interacting, and are the mean and variance of the predicted 3D position of the human head, and is the 3D end-effector position at time .
Iv-C Legibility Cost
The robot’s motion must be legible, that is, it must convey its intent through its trajectory. Dragan et al. define a legible robot trajectory as one from which the user can quickly and confidently infer the task goal after only partial trajectory execution . We choose to implement a legibility cost in order to improve the human’s ability to understand the robot’s intent. We replicate the Legibility cost from .
is the trajectory from start to timestep . denotes the robot’s start configuration, denotes its goal configuration, and denotes its configuration at time . is a weighing function that increases cost of legibility towards the beginning of the trajectory. The optimal trajectory is a linear trajectory in the Cartesian space. is the length of the trajectory in Cartesian space. Dragan et al. include a regularizer term in order to prevent excessively long trajectories, which we exclude from our cost as that requirement is met by the Nominal Trajectory Cost.
Iv-D Nominal Trajectory Cost
The nominal trajectory is the default trajectory executed by the robot without any adaptation. This trajectory is calculated using a collision cost and a joint velocity cost in TrajOpt. The nominal trajectory can be viewed as one that optimizes smoothness, collision avoidance (with objects), and efficiency in the absence of a human. While the costs defined thus far focus solely on the human, our nominal trajectory cost brings balance to the overall cost function, and acts as a regularizer to preserve efficiency.
The nominal trajectory cost penalizes deviation from the nominal trajectory. The cost is calculated as a sum of Cartesian distances of the end effector between the nominal trajectory and the adapted trajectory at each timestep:
where is the position of the end effector at time in the nominal trajectory.
Iv-E Smoothness Cost
Smooth robot motion is a necessary component for a comfortable collaborative environment. Several dynamical quantities can be minimized across the trajectory to generate smooth motion. Prior trajectory optimization frameworks such as  use sum of squared velocities of the robot as a smoothing cost. However, in order to better decrease jerkiness of adapted trajectories as well as to even out speed across the execution of the trajectory, we use the sum of squared acceleration of the robot as follows:
Iv-F Cost Balancing
The final objective function is the sum of all the above costs. The overall optimization problem is given by
where denotes the optimized robot joint trajectory, , represents the pre-specified weights associated with the cost, and denotes the desired goal location.
We note that the costs used in the objective function do not necessarily incentivize the same behavior. For instance, minimizing the distance cost will push the robot trajectory away from the human. On the other hand, the visibility cost will work to pull the trajectory closer to the human. Nevertheless, each cost function considers an important aspect of the interaction. Thus, it is necessary to carefully balance the relative influence of each cost function.
V Experiments and Results
We perform a series of experiments on three different test cases involving a human and a robot working in a collaborative environment in order to evaluate the performance of our approach. We define four key metrics to evaluate CoMOTO and compare our results against several baselines. The human motion predictions are generated in Matlab. All trajectory optimization is performed using TrajOpt . The experiments are run on a KUKA LBR iiwa R820 Robot in a ROS Gazebo simulation , a visualization of which is shown in Figure 2. The coefficients are chosen empirically for optimal performance.
We present three test cases involving close human-robot collaboration, categorized by the behavior of the human:
Stationary: Stationary human with robot reaching for an object.
Reaching-far: Human and robot reaching for distant objects.
Reaching-near: Human and robot reaching for closely-positioned objects.
For each test case, we use unique human trajectories and unique nominal robot trajectories, totalling experiments per test case.
V-a Human Motion Prediction
We use  as the framework for stochastic human motion prediction. The provided code includes ground truth trajectories that are split into training and testing datasets. The trajectories are 3D positions of the human’s right arm recorded at 100Hz. The GMM model is trained on 100 trajectories of a human reaching for an object and 100 trajectories of a still human with arm stretched out.
Since the dataset only contains recorded trajectories for the right arm (shoulder, elbow, wrist and palm), the remaining human skeleton consisting of the neck, head, torso and left arm is extrapolated using fixed offsets. The same offsets are applied to the mean of the prediction of the right shoulder to generate the remaining predictions. The covariances for the remaining skeleton is identical to that of the right shoulder.
For each experiment, the ground truth human trajectory is split into two. The first 100 samples (1 second) are used as the observation. The prediction is subsampled to 10Hz by taking every 10th sample, and extrapolated to a length of 20 samples (2 seconds) by repeating the final prediction sample. The entire ground truth human trajectory is used for measuring the metrics for each test case.
We evaluate CoMOTO, against the following baselines:
Nominal: the non-adapted nominal trajectory generated by TrajOpt using common costs and constraints, including collision, joint velocity, and joint target constraints. We include the nominal trajectory alone to show how our approach improves this trajectory’s performance with respect to the full set of metrics described in Section V-C.
Speed-Adjusted: the nominal trajectory executed with real-time speed adjustment based on human-robot separation distance as described in .
Legible: the legible motion optimization algorithm of . The baseline implementation uses a legibility cost identical to the one used in our approach. The optimal trajectory is again a linear trajectory in the Cartesian space.
Distant+Visible: local path optimization using the method presented in , optimizing for costs based on human-robot separation distance and human visibility. To provide a direct multi-objective optimization comparison to our approach, we use their cost-based optimization to adapt the nominal trajectory, rather than a path generated by a T-RRT planner.
V-C Evaluation Metrics
Comparison plots between CoMOTO and each baseline with respect to all evaluation metrics (meanSD)
We measure the performance of each algorithm according to the following metrics:
Separation distance (Dst.): percentage of the trajectory where the separation distance between the robot and the human exceeds 20cm.
End effector visibility (Vis.): percentage of the trajectory where the robot’s end effector is within the human’s field of view. When calculating this metric, we assume the human is looking at the object for which they are currently reaching.
Legibility (Leg.): legibility of the robot’s end effector motion to the human observer, calculated as described in .
Deviation from the nominal trajectory (Nom.): sum of squared distance between the adapted trajectory and the nominal trajectory.
All trajectories are evaluated using the complete ground truth trajectory. The human is assumed to maintain the final pose in their trajectory once their execution is complete.
V-D Results and Discussion
We provide the performance of our method and the baselines for all of our metrics in Table I. Additionally, we ran a one-way analysis of variance (ANOVA) with correlated samples for each metric, to determine whether differences in our measurements were statistically significant. Where ANOVA showed a significant difference of trajectory optimization approach on any of our metrics at , we conducted post tests between the approaches with Tukey’s HSD test. For brevity, we show significant differences in Table I only for the best performing approach in each metric, where * or ** denotes that the approach performed significantly better than all other approaches. We also provide visual comparisons of our approach to each baseline to better show the comparison over all metrics at once, in the radar plots of Figure 3. We break down the results individually for each test case below.
We first consider the scenario where the robot must pick an object with a stationary human present in the workspace, testing CoMOTO’s performance where human motion prediction may not be required. The results for this test case are shown in Table I and Figure 2(a). Since the motion prediction model was trained on stationary trajectories as well as moving trajectories, the predictions are observed to have minimal motion. With a stationary human, CoMOTO and the Dist+Vis baseline perform comparably on visibility, with no significant difference. However, consideration of the human’s arm in CoMOTO’s distance cost results in significantly better performance in separation distance compared to the other methods. The Legible baseline, with no other costs to minimize, performs exceptionally well in legibility, though CoMOTO still significantly outperforms the other baselines. The Speed-Adjusted baseline, though following the nominal trajectory, is not always able to complete execution of the trajectory since the goal position of the robot may be within its stopping threshold of 6 cm.
We next consider the scenario where the human and robot are concurrently reaching for different objects. The results for this test case are shown in Table I and Figure 2(b). CoMOTO significantly improves the performance in separation distance, visibility and legibility over the Nominal baseline. The anticipatory nature of CoMOTO allows it to outperform the Distant+Visible baseline in both separation distance and visibility. Similar to the stationary case, while CoMOTO has lower legibility than the legible baseline, it significantly outperforms the other baselines.
Finally, we consider the scenario where the human and robot are concurrently reaching for objects that are close together, providing a scenario where high separation distance cannot be maintained. As can be seen in Table I and Figure 2(c), all methods perform poorly in the distance metric. However, CoMOTO is able to perform significantly better in separation distance while still maintaining good performance in the other metrics. The reactive Speed-Adjusted baseline is often unable to complete the trajectory due to the close proximity, resulting in poor performance in nominal trajectory deviation.
Summary: Our experiments demonstrate that CoMOTO is a highly adaptable framework that can optimize trajectories to improve factors of safety, comfort, and efficiency across different operating scenarios. We note that, across all three scenarios, CoMOTO scores consistently better than all baselines both in terms of the distance metric and deviation from nominal trajectory (see Table I). This observation suggests that our approach outperforms all the baselines in terms of maintaining a safe and comfortable distance from the human while simultaneously making sure that the trajectory deviates minimally from the nominal trajectory. As one would expect, the Dist+Vis baseline results in slightly better visibility than CoMOTO only in the stationary scenario. In the other two scenarios in which the human moves, CoMOTO outperforms all baselines, including Dist+Vis, in terms of the visibility metric. Finally, the Leg. baseline consistently scores the highest in terms of legibility across all the scenarios. We also note, however, that CoMOTO scores the second highest legibility scores across all scenarios, while still performing well on the other competing metrics.
We have presented CoMOTO, a novel stochastic human motion prediction based multi-objective trajectory adaptation framework for human-robot collaborative tasks. We have presented several metrics to measure the performance of adapted trajectories with regards to key factors of human safety and comfort, and have evaluated CoMOTO against established baselines. Analysis of the results of our experiments shows that CoMOTO performs comparably or better than the established baselines over the full set of metrics, where the baselines show strong performance on either individual safety, comfort, or efficiency metrics or in single collaborative scenarios. Future work will explore the addition of a reactive replanning system which consider updated human motion predictions at regular time intervals to generate probability of collisions and re-adapt robot trajectories.
-  (2013-03) Legibility and predictability of robot motion. In Proceedings of Human-Robot Interaction, Cited by: §II, §IV-C, 3rd item.
-  (2013) Generating legible motion. Cited by: §II, §IV-C, §V-B.
-  (2015) Reactive planning on a collaborative robot for industrial applications. In 12th International Conference on Informatics in Control, Automation and Robotics (ICINCO), Vol. 2, pp. 450–457. Cited by: §I, §II.
-  (2019) Trajectory optimization for coordinated human-robot collaboration. arXiv preprint arXiv:1910.04339. Cited by: §II.
-  (2017) Towards mri-based autonomous robotic us acquisitions: a first feasibility study. IEEE transactions on medical imaging 36 (2), pp. 538–548. Cited by: §V.
-  (2015) Adaptive coordination strategies for human-robot handovers.. In Robotics: science and systems, Vol. 11. Cited by: §I.
-  (2016) Anticipatory robot control for efficient human-robot collaboration. In 11th ACM/IEEE international conference on human-robot interaction (HRI), pp. 83–90. Cited by: §II.
-  (2008) Transition-based rrt for path planning in continuous cost spaces. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2145–2150. Cited by: §III.
-  (2011) STOMP: stochastic trajectory optimization for motion planning. In IEEE international conference on robotics and automation, pp. 4569–4574. Cited by: §III.
-  (2017) A survey of methods for safe human-robot interaction. Foundations and Trends® in Robotics 5 (4), pp. 261–349. Cited by: §II.
-  (2014) Toward safe close-proximity human-robot interaction with standard industrial robots. In IEEE International Conference on Automation Science and Engineering (CASE), pp. 339–344. Cited by: §I, §II, §V-B.
-  (2018) Unsupervised early prediction of human reaching for human–robot collaboration in shared workspaces. Autonomous Robots 42 (3), pp. 631–648. Cited by: §V-A.
-  (2016) Anticipative interaction primitives for human-robot collaboration. In AAAI Fall Symposium Series, Cited by: §II.
-  (2013) Human-robot collaborative manipulation planning using early prediction of human motion. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 299–306. Cited by: §II.
-  (2011) Planning human-aware motions using a sampling-based costmap planner. In IEEE International Conference on Robotics and Automation, pp. 5012–5017. Cited by: §II, §V-B.
-  (2016) A human-inspired controller for fluid human-robot handovers. In IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids), pp. 324–331. Cited by: §II.
-  (2014) Motion planning with sequential convex optimization and convex collision checking. The International Journal of Robotics Research 33 (9), pp. 1251–1270. Cited by: §III, §V.
-  (2007) Spatial reasoning for human-robot interaction. In IEEE/RSJ International Conference on Intelligent Robots and Systems, Cited by: §IV-B.
-  (2018) Dyadic collaborative manipulation through hybrid trajectory optimization.. In CoRL, pp. 869–878. Cited by: §II.
-  (2013) Chomp: covariant hamiltonian optimization for motion planning. The International Journal of Robotics Research 32 (9-10), pp. 1164–1193. Cited by: §III, §IV-E.