Workspace monitoring and planning for safe mobile manipulation

by   Christian Frese, et al.

In order to enable physical human-robot interaction where humans and (mobile) manipulators share their workspace and work together, robots have to be equipped with important capabilities to guarantee human safety. The robots have to recognize possible collisions with the human co-worker and react anticipatorily by adapting their motion to avert dangerous situations while they are executing their task. Therefore, methods have been developed that allow to monitor the workspace of mobile manipulators using multiple depth sensors to gather information about the robot environment. This encompasses both 3D information about obstacles in the close robot surroundings and the prediction of obstacle motions in the entire monitored space. Based on this information, a collision-free robot motion is planned and during the execution the robot continuously reacts to unforeseen dangerous situations by adapting its planned motion, slowing down or stopping. For the demonstration of a manufacturing scenario, the developed methods have been implemented on a prototypical mobile manipulator. The algorithms handle both robot platform and manipulator in a uniform manner so that an overall optimization of the path and of the collision avoidance behavior is possible. By integrating the monitoring, planning, and interaction control components, the task of grasping, placing and delivering objects to humans in a shared workspace is demonstrated.



There are no comments yet.


page 1

page 2

page 5

page 8

page 11

page 13

page 18

page 20


Contact Anticipation for Physical Human-Robot Interaction with Robotic Manipulators using Onboard Proximity Sensors

In this paper, we present a framework that unites obstacle avoidance and...

Exploiting Ergonomic Priors in Human-to-Robot Task Transfer

In recent years, there has been a booming shift in the development of ve...

Utilizing Bluetooth Low Energy to recognize proximity, touch and humans

Interacting with humans is one of the main challenges for mobile robots ...

Human-centered manipulation and navigation with Robot DE NIRO

Social assistance robots in health and elderly care have the potential t...

KittingBot: A Mobile Manipulation Robot for Collaborative Kitting in Automotive Logistics

Individualized manufacturing of cars requires kitting: the collection of...

A Multi-Behavior Planning Framework for Robot Guide

The guiding task of a mobile robot requires not only human-aware navigat...

A Scalable Framework For Real-Time Multi-Robot, Multi-Human Collision Avoidance

Robust motion planning is a well-studied problem in the robotics literat...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Workspace monitoring

Monitoring the robot’s environment by sensors in order to detect humans and other static or dynamic objects in the robot workspace is a prerequisite for shared human–robot workspaces and for a close collaboration between human and robot. The acquired 3D information about obstacles in the robot’s environment is needed by motion control and planning algorithms for collision avoidance. Due to the weakly structured human–robot workspace also information about the objects that have to be handled by the robot and the collaborating humans have to be captured by the robot.

With respect to collision avoidance for fixed-base and mobile manipulators, the information about obstacles should be available in 3D and in a preferably large space around robot in order to be aware of possible dangerous situations at an early stage. On the other hand, the sensor data processing has to be real-time capable due to safety reasons. To cope with these contrary requirements two approaches are combined:

  1. A 3D obstacle representation is computed in the close robot surroundings only [8]. This 3D obstacle model describes the current situation.

  2. In the complete robot workspace, dynamic obstacles are tracked and their motion is predicted in order to predict possible future collisions [10]. Obstacle tracking can be performed in a D grid representation, which enables multi-sensor fusion with real-time performance.

1.1 Obstacle perception using a 3D environment representation

For gathering 3D information, depth sensors with different measurement principles are available: time-of-flight sensors (e.g., laser scanner, ToF camera), triangulation based sensors (e.g., Kinect for Xbox 360), or stereo cameras. Several existing approaches concentrate on the usage of one type of depth sensor (e.g., [9]). But all these sensors can be described by the well-known ray based depth sensor model with a set of rays beginning in the sensor origin. The sensor measures the distances to the points where the rays hit the first object. The measurements of all these different depth sensors can be transformed into point clouds, so that the further data processing becomes independent of the used measurement principle. Exploiting this fact, the developed monitoring algorithm becomes generally applicable to different types of depth sensors.

The sensors can be installed in the workspace or mounted on the robot. Many monitoring approaches deal with sensors fixed in the workspace [21, 26]. But especially in the case of mobile robots, onboard sensors allow to cover the relevant part of large workspaces with only few sensors. So, the developed monitoring algorithm is designed for handling data from different kinds of depth sensors that may be both installed in the workspace and on the robot.

To ensure safe human–robot interaction, the 3D obstacle representation has to consider occlusions of the robot and of the obstacles. This is particularly important when an object is located between the sensor and the robot (see Fig. 1). Then, considering only the measured obstacle points would lead to an underestimate of the robot–obstacle distance and of the collision risk. In order to reduce these occlusions and to enhance the overall observed space, information from multiple depth sensors is fused.

Figure 1: Example of occlusion when an obstacle is located between sensor and robot.

The 3D obstacle model is based on an octree data structure. An octree represents the 3D space by a set of nodes. Each node corresponds to a voxel. If necessary, the node is recursively subdivided into eight sub-voxels (nodes), till the minimum voxel size is reached [12, 13]. The octree is located in the robot’s base frame and is limited to the close robot surroundings that have to be monitored in 3D.

Processing the data of multiple sensors is performed in two steps: Firstly, the data of each sensor is pre-processed independently, in order to allow for parallel computing. Then in the second step, the information is fused to obtain the final obstacle representation.

During the pre-processing procedure, the sensor data is filtered to distinguish between measurements representing obstacles and measurements representing the robot itself, which might be in the field of view of the sensor [2]. The filter relies on a geometric robot model. The filtered sensor data is then converted into an octree-based workspace model of the sensor at the current time : The obstacle measurements are used to compute the set of octree nodes that contain an obstacle point . By means of ray tracing, all nodes that are occupied or occluded by an obstacle and all nodes that are occupied or occluded by the robot are computed. Based on the sensor properties, also the set of nodes in the sensor field of view is known. For this reason, the space that can be assumed to be free due to the measurements acquired by sensor is given by

Figure 2: 3D workspace monitoring principle with two sensors: fusion of information about occupied and occluded space results in the final obstacle representation .

The information from all available sensors is fused by comparing the obstacle space of one sensor with the free space of all other sensors:


That means contains only the occluded nodes that are not detected as free by another sensor and the occluded space is reduced as far as the sensor arrangement permits. In the final obstacle representation, the obstacle nodes of all sensors are merged:


The 3D monitoring principle is illustrated in Fig. 2 showing an example of two depth sensors. The sensors’ fields of view contain the known robot (blue object) and an obstacle (red object). For each sensor, the two upper drawings show the nodes occupied or occluded by the obstacle (red), the nodes occupied or occluded by the robot (blue), and the free nodes (green). The red cells in the lower picture represent the resulting obstacle nodes in the final octree. The occluded space is reduced considerably thanks to the multi-sensor fusion.

Fig. 3 shows an example of a 3D obstacle representation obtained from real sensor data on a mobile manipulator. For better understanding, the occupied cells are visualized as red cubes and the occluded cells as rose cubes. The obstacle represents a human standing next to the robot. The human’s arm is located between the manipulator and two sensors that are mounted on the platform (see Fig. 10(a)) so that parts of the human in the vicinity of the robot are not visible to the sensors. Considering the occluded space prevents an underestimate of the robot–obstacle distance.

1.2 Obstacle tracking and prediction

The various object tracking methods presented in the literature can be categorized, among other criteria, in 2D and 3D approaches, and in methods for general objects or specific ones, e.g., vehicles or pedestrians [16, 18, 22].

In the context of robot workspace monitoring, it is important to reliably detect moving objects of any kind: humans, trolleys, forklifts, etc. Moreover, multiple sensors are usually required to achieve the desired spatial coverage. These can be homogeneous sensors with complementary fields of view, as usually employed in the literature [4], or heterogeneous sensors with considerably different range, resolution, or even sensing principle. In the latter case, particular attention must be paid to the association step in order to achieve the desired robustness for objects moving from one sensor’s field of view to a different sensor’s field of view.

To enable safe human–robot collaboration, an obstacle tracking approach has been developed which is based on a generic object model and supports multiple heterogeneous sensors, e.g., 2D linescan lasers, 3D laserscanners, and depth cameras. For multi-sensor fusion, the acquired data is mapped into a D grid representation, in which both 2D and 3D sensor data can be integrated [10]

. Indoor environments allow a rather simple obstacle detection: after robot point filtering and possibly outlier detection, any measured 3D point between floor and ceiling can be classified as obstacle point and inserted into the grid. Each obstacle grid cell is annotated with features such as the density of the obstacle points detected by each sensor and the height above ground of the obstacle, if available.

Object hypotheses are obtained by clustering connected components of adjacent obstacle cells in the grid. They are associated to existing tracks by means of a distance function

which rates the spatial distance between the estimated centroid position of

and the predicted position of for the time of the measurement as well as the dissimilarity of the computed features. For the latter, the heterogeneity of the sensors has to be taken into account. For instance, the point densities are compared per sensor, and the height above ground is included in the distance function only if both and have been measured by 3D sensors. This approach allows to improve the association quality by incorporating all relevant features available from the sensor data and at the same time to continuously track objects moving between the fields of view of multiple, possibly heterogeneous sensors (Fig. 3).

Kalman filtering based on a linear motion model is applied to estimate the state vector of each obstacle which contains its position and velocity projected onto the ground plane. Based on the estimated velocity and the state covariance, moving obstacles can be detected and their future trajectories can be predicted. The area that is likely to be occupied in the future by the moving obstacle is enlarged depending on the uncertainties estimated by the Kalman filter.

Figure 3: Examples of monitoring results: LABEL: 3D obstacle representation with occupied obstacle cells (red), occluded obstacle cells (rose), and distances between the obstacle and the robot (yellow); LABEL: continuous tracking of a walking person using heterogeneous 2D and 3D sensors.

1.3 Collision avoidance

The information about obstacles detected by the monitoring algorithms is used for a basic collision avoidance strategy. Based on the known geometric robot model, the current minimum distance between the robot and the 3D obstacle representation (see Section 1.1) is computed. The future minimum distance between the robot and the dynamic obstacles within a certain prediction interval is estimated taking the planned robot motion and the predicted obstacle motion from Section 1.2 into account. According to the minimum robot–obstacle distance the robot slows down or even stops. In the vicinity of humans this behavior is necessary to ensure safety but also to enhance the comfort level in human–robot cooperation. Additionally to this basic collision avoidance strategy, safety and the task execution velocity may be improved by collision-free motion planning as discussed in Section 2.

So far, all objects except the robot itself are interpreted as obstacles. But in typical applications as, e.g., pick and place tasks, the classification of what is an obstacle or another kind of object or even which measurement is belonging to the robot may change over time. Generally, all objects have to be considered as obstacles to avoid undesired collisions. But when one object that was an obstacle till now has to be grasped, it should no longer be interpreted as obstacle in order to allow the robot to move close to the object and even to touch it. When the object is grasped by the robot, it has to be (temporary) interpreted as part of the robot, as not only collisions between the robot and its environment but also collisions between the grasped object and the environment have to be avoided. Therefore, the robot models used for filtering the sensor data and for distance computation are adapted at runtime by adding and removing objects that change their purpose (obstacle, object to be handled, robot part, etc.) depending on the task execution state.

2 Path planning and adaptation

Robot path planning provides the flexibility required for task accomplishment in dynamic workspaces shared with humans. For typical tasks like pick, place, and handover, the goals are specified as Cartesian end-effector poses. Because the pick/place goal positions are not known beforehand and the scene may change at any time due to dynamic obstacles, this is a typical single-query planning scenario. Moreover, obstacle motions may invalidate the planned path during its execution and require an appropriate reaction, e.g., stopping the robot or adapting the path online.

2.1 Path planning to Cartesian goal poses

The computational complexity of path planning depends on the degrees of freedom (DoF) of the robot. A mobile platform has 3 degrees of freedom: planar translation in 2 coordinate directions and planar rotation. For these robots, it is feasible to compute optimal paths in a discretized configuration space, using, e.g., A* graph search

[5]. Another possibility is the construction of a state time lattice based on precomputed motion primitives. These search-based planners allow to consider dynamic obstacles, non-holonomic kinematics, and sequences of multiple goal points [19].

For robots with more degrees of freedom, optimal planning is no longer feasible under online requirements. Robot arms usually have at least 6 DoF, and mobile manipulators have about 10 DoF. For these robots, sampling-based planning algorithms can be applied [15].

The usual approach to solve such path planning problems consists of the following two stages:

  1. Computation of an inverse kinematics solution, i.e., a collision-free joint configuration in which the robot end-effector is located at the desired Cartesian goal pose.

  2. Sampling-based path planning from start to goal in configuration space.

This approach is well suited for 6 DoF industrial robots which have a unique inverse kinematics in most cases. For redundant robots having more degrees of freedom than the 6-dimensional Cartesian workspace pose, an infinite number of configurations corresponds to the same end-effector pose. In this case, the two-stage approach is inherently suboptimal: as the goal configuration is chosen disregarding the start position, the resulting paths may become considerably longer than necessary. Fig. 4 illustrates this effect by means of a 10 DoF mobile manipulator (7 DoF lightweight arm plus 3 DoF for position and orientation of the mobile platform in the ground plane). Goal configuration 1 (depicted orange) is much closer to the start configuration (green) than goal configuration 2 (red). The chosen goal configuration might even be unreachable from the start position depending on the obstacle constellation and the resulting configuration space connectivity.

The suboptimal robot behavior resulting from the two-stage planner may also be unexpected for humans and may thus result in dangerous situations, e.g., when the mobile platform starts to rotate in front of the table even if a motion of the manipulator would be sufficient to accomplish the task.

Figure 4: Different goal configurations corresponding to the same Cartesian end-effector goal pose may require paths of considerably different lengths from the start configuration.

Therefore, single-stage planning methods have been proposed in order to exploit the optimization potential of redundant robot kinematics [1, 23, 24, 25]. By considering the start configuration and the lengths of possible collision-free paths, the goal configuration can be optimized to achieve shorter path lengths and reduced task cycle times.

For instance, the well-known rapidly-exploring random tree algorithm [15] can be modified to accept a Cartesian goal pose instead of a goal configuration . The distance between an arbitrary configuration sampled from the configuration space and the goal configuration is quantified by means of computing the direct kinematics of the robot and applying a distance measure in the 6-DoF Cartesian space (3D position plus 3 DoF for orientation in space),


The expansion of the RRT tree proceeds as usual by sampling collision-free configurations and connecting them to their nearest neighbors in the tree using linear paths in the configuration space. However, it is not possible to directly connect the goal to a vertex of the tree as in the standard RRT algorithm because the goal configuration is not known beforehand. Instead, so-called approach attempts are initiated starting from tree vertices which are close to the goal pose according to (4). An approach is a sequence of vertices connecting the tree vertex to the goal corresponding to a linear path of the end-effector in the Cartesian space (cf. Fig. 5). The configurations of the approach vertices are computed from the Cartesian end-effector poses by a gradient descent using the Jacobian of the robot kinematics subject to constraints such as joint limits [7]. If an approach is successful, it leads to a goal configuration that satisfies and is connected to the tree. If a collision occurs in one of the approach configurations, the last collision-free configuration is added to the tree.

Figure 5: Visualization of goal approach attempts and of collision-free RRT vertices found during exploration. Cartesian end-effector positions of the vertex configurations are shown.

An evaluation based on the simulation of a sequence of pick, place, and handover tasks for the mobile manipulator has shown that the described single-stage planner can find shorter paths compared to a two-stage method without increasing the overall computation time [23]. Thus it is possible to exploit the kinematic redundancy of the robot to choose an appropriate collision-free goal configuration from the infinite manifold of inverse kinematics solutions and at the same time to reduce the path length.

2.2 Path smoothing

The first collision-free path obtained by connecting randomly sampled configurations is usually far from the optimal solution and may contain large detours. Two methods to alleviate the suboptimality are

  1. path optimization within the sampling-based planner by considering multiple alternative connections for newly sampled configurations and by continuing the path improvement for a certain time after the first solution has been found—as performed, e.g., in the RRT* algorithm [14, 23], and

  2. heuristic path smoothing as a post-processing step.

Heuristic post-processing of the path can be carried out rather fast and may be beneficial even in combination with optimizing sampling-based planners such as RRT*. The path computed by a sampling-based planner is usually represented by a sequence of vertices corresponding to configurations of the robot which are connected by collision-free linear paths in the configuration space. A simple but effective smoothing technique is to remove unnecessary vertices from the sequence. Thanks to the triangle inequality, removal of one or more vertices is ensured to decrease the overall path length (Fig. 6). Of course, it has to be verified that the resulting path is collision-free. As the collision checking is the main computational cost of path smoothing, several heuristic strategies have been proposed in order to minimize the path segments to be checked, including

  • iteratively removing a single vertex by directly connecting its predecessor and its successor ,

  • binary interval search to find the largest subsequence of vertices that can be removed, and

  • attempting to directly connect two randomly chosen, non-adjacent vertices , , , of the path.

Another smoothing technique does not remove or add vertices, but modifies the configuration of a certain joint in a path vertex. A joint showing a local extremum at vertex with respect to the preceding value and the succeeding value of the joint is set to the intermediate value corresponding to a straight-line motion from to in configuration space—provided that the resulting path segment is collision-free (Fig. 6). In the special case , joint does not need to move at all. As sampling-based planners choose all joint configurations at random, it is very unlikely that identical values occur in adjacent vertices. So unnecessary motions result which suggest to a human observer the impression of a purposeless robot behavior. For instance, a mobile manipulator may move its arm back and forth while driving, although it would suffice to move the mobile platform. Such apparently erratic behavior can be substantially reduced by the described path smoothing technique.

Figure 6: Path smoothing techniques: LABEL: removing the unneeded vertex , LABEL: changing extremum values of individual joints (note that , , and do not generally constitute a straight line if more than two dimensions are considered), LABEL: adding the connection from to in order to avoid the detour via .

Moreover, the joint motions can be interleaved as much as possible in order to reduce the total path execution time. For each path segment, the joint requiring the longest motion time is identified, given the maximum attainable joint velocities. Then it is attempted to shift part of the joint motion to the adjacent path segments if this reduces the required motion time.

In some cases, it can also be beneficial to add new vertices to the path. For adjacent segments of a certain minimum length, it is tested whether the connection of their centers is collision-free (Fig. 6). This procedure is applied recursively. The modified path is again guaranteed to be shorter as the triangle inequality applied to yields .

2.3 Path adaptation by elastic bands

In dynamic environments, the planned path may be invalidated during its execution due to obstacles moving unexpectedly into the robot’s path. In shared human–robot workspaces, humans may step or grasp into the robot’s path at any time. Besides stopping the robot before hitting obstacles as described in Section 1.3, the robot may also adapt its plan and continue to move towards its goal. The following principles for plan adaptation can be distinguished:

  1. Discarding the invalidated plan and replanning from the current robot state to the goal. Using efficient search-based planners, replanning is possible in real-time for low-dimensional configuration spaces, e.g., for mobile robots [19]. The advantage of the replanning principle is that the optimal solution given the current robot state and the current obstacle constellation can be found. Disadvantages include potential oscillations between alternative paths having similar costs and a robot behavior which may be unexpected to humans due to sudden plan changes.

  2. Iteratively adapting the planned path to environment changes. Such methods may have lower computational complexity so that real-time path adaptation is feasible for higher-dimensional configuration spaces, e.g., for mobile manipulators [28]. Another advantage is that the incremental path adaptation creates a behavior which is more predictable for humans. On the other hand, the adapted plans may be suboptimal or the plan adaptation method may fail to find a valid path even if a collision-free solution exists. For instance, these methods usually do not allow the robot to pass a moving obstacle on the opposite side than planned originally.

A method realizing the second principle is the elastic band framework [20]. It has mostly been applied to mobile robots in a 2D environment, but can also handle fixed-base and mobile manipulators. The adapted motion plan is always ensured to be collision-free so that human safety can be guaranteed. The robot follows a smooth path which is adapted smoothly to changes in the environment without unexpected changes of motion direction that could scare human co-workers.

The method has to be initialized with a valid path which can be obtained as described in Section 2.1. This initial path is then smoothed and adapted to dynamic obstacles observed by the robot’s sensors.

The elastic band representation of a path consists of a sequence of so-called bubbles which are robot configurations annotated with a quantification of the free space at these configurations, i.e., the distance to the nearest obstacle. Adjacent bubbles always overlap, so that not only the sequence of configurations, but the whole connecting path is ensured to be collision-free (see Fig. 7). In narrow workspace regions, the bubbles are smaller and thus a greater number of them is required to cover the path.

Figure 7: Path adaptation for a mobile manipulator using elastic bands.

Artificial forces are computed which push the elastic band towards a smooth curve and repulse it from obstacles. Internal forces model a mutual attraction of neighboring bubbles in order to smooth and shorten the path. For the computation of external forces, the information of the distance and direction to the nearest obstacle is used. For each joint, the algorithm tests whether a motion in any direction increases the distance to the nearest obstacle and chooses the repulsing force accordingly. In this way, the obstacle information is transferred from the 3D workspace to the higher-dimensional configuration space of the robot. It is beneficial to compute the obstacle distances separately for the individual links of the robot, at least for platform and manipulator. Otherwise the motion of the upper manipulator joints has no effect on the computed obstacle distance in many cases, so that no valid force can be obtained.

In more detail, a bubble around a joint configuration can be characterized as follows. If the th revolute joint rotates by , any part of the robot may move at most in Cartesian workspace, where is the radius of a cylinder which is aligned to the rotation axis of the joint and contains all subsequent robot links in their considered configuration [3]. For a planar translation of a mobile robot by , the motion distance of any robot part is simply given by the length of the translation, . Altogether, a bubble can be defined as


where denotes a lower bound of the clearance at configuration , i.e., of the minimum Cartesian distance between any part of the robot and any obstacle in the environment [20]. Because bubbles are convex, two bubbles and are guaranteed to overlap if any configuration along the straight line from to in configuration space is contained in both bubbles according to (5). A good candidate for checking this condition is the configuration


If adjacent bubbles do not overlap, a new intermediate bubble is inserted into the elastic band. On the other hand, a bubble can be removed from the sequence if and overlap.

The internal contraction force


attracts bubble to the adjacent configurations and within the elastic band so that detours are avoided and a smooth motion results [20].

The obstacle force pushes each bubble away from the nearest obstacle. To compute the obstacle force, not only the distance bound , but also the locations of the nearest obstacle points and the corresponding robot points for each robot link are retrieved from the collision test and distance computation library. The obstacle repulsion force for the translational motion of a mobile robot platform can be directly obtained from these values as . Therein, the scaling factor


increases the repulsion for closer obstacles, while obstacles with a distance greater than from the robot no longer cause a repulsion. For each revolute joint , it is tested whether a small motion increases or decreases the robot–obstacle distance , where the robot link position corresponding to the modified configuration is computed by means of the kinematic robot model, while remains fixed. The force for the revolute joint is then given by


This method of force computation has the advantage that one distance computation is sufficient per iteration and bubble. An alternative method is to call the distance computation library for each configuration modification , which may yield more accurate results if multiple obstacles are close to the robot at the cost of a considerably increased computational effort.

The total force to be applied to each bubble is given by a weighted sum of internal and obstacle forces, . In each iteration, the bubble configuration is modified according to , where is chosen to satisfy in order to ensure that the modified configuration is located inside the bubble and is thus collision-free. Additionally, force components tangential to the elastic band can be suppressed, which helps to avoid oscillations [20].

Thanks to the obstacle force, the clearance of the path is increased to an appropriate minimum robot–obstacle distance. In contrast, sampling-based planners usually do not maximize the clearance, but compute collision-free paths which may get arbitrary close to obstacles.

Figure 8: Evasive motion of both mobile platform and manipulator computed by the elastic band method.

By applying the artificial forces, the elastic band adapts to dynamic obstacles which are detected by the sensors of the robot and represented in the octree model from Section 1.1. Smooth evasive motions result which are intuitively understandable to humans. Figure 8 shows an experiment in which the mobile manipulator avoids a human crossing its path. Both mobile platform and manipulator perform a simultaneous, coordinated evasive motion.

If the path is blocked by an obstacle while the robot is moving, this is detected by the elastic band method as the corresponding bubble configurations are no longer collision-free. The robot may then continue to move as long as a sufficient clearance is guaranteed by the bubbles and come to a safe stop before it hits the obstacle. Meanwhile, the elastic band optimization continues so that a collision-free path may be recovered if either a suitable detour is found thanks to the obstacle force or the obstacle is removed from the path.

3 Physical interaction with humans and objects

In shared human–robot workspaces, the robot physically interacts with a dynamic environment. This encompasses both interaction with objects (tools, parts, etc.) and interaction with human co-workers.

3.1 Grasping objects

An important example of interaction with objects is grasping. When an object is to be grasped, its position and shape are often not known exactly, for instance, because humans may have repositioned the object. Therefore, the robot has to be equipped with appropriate sensors to estimate the object’s position and shape. For instance, a depth camera may be mounted near the gripper of the robot, as shown in Fig. 9.

Figure 9: Grasping box-shaped objects: LABEL: depth camera mounted near the robot hand, LABEL: acquired 3D point cloud (height encoded in false colors) and estimated bounding box models of the objects located on the planar surface, LABEL: positioning the robot hand exactly above the object, with fingers oriented according to the chosen grasping points.

In the following, a possible processing pipeline for the depth camera data is sketched. A hand–eye calibration is necessary to determine the pose of the camera relative to the tool center point of the robot [11]. Based on the acquired 3D point cloud, planar surfaces like tables are estimated and objects located on top of the surface are segmented. Bounding box models of the objects are estimated (see Fig. 9). Considering the estimated model of the considered object and the limitations imposed by the geometry of the hand, suitable grasping points are selected.

To grasp an object, the robot arm first moves to a position above the table so that it can observe the region of interest in which the object is supposed to be located. Once the object is detected, the hand is positioned exactly above the object and oriented according to the computed grasping points (see Fig. 9). Then, the robot can move its hand down towards the object without the risk of touching the object accidentally with its fingers. Finally, the fingers are closed to grasp the object.

The grasping behavior control can be implemented by means of a hierarchical state machine. Appropriate reactions to unexpected events and deviations from the planned workflow, as well as other tasks such as placing objects on workbenches or on the mobile platform can also be integrated in this state machine.

3.2 Physical human–robot interaction

Interaction with humans occurs, for example, when the robot hands over an object to a human (see Fig. 10(b)). The robot hand releases the object as soon as it detects the force in the robot joints resulting from the human grasping the object. In this context, it is important to distinguish between intended and unintended interactions, e.g., based on the current interaction situation known from the state machine representation. For example, objects handed over to a human again have to be considered as obstacles by the monitoring algorithms.

Intended contacts can furthermore constitute a means to teach the robot or to reposition it. Contact control strategies allow a fast reaction to imminent or detected physical contacts. For example, the kinematic redundancy can be exploited to control the robot joints in a way that allows to reposition the arm in reaction to the contact while maintaining the Cartesian end-effector pose [27].

4 Prototypes and use cases

4.1 Mobile manipulator

The described monitoring, planning and interaction methods have been integrated on a mobile manipulator and their functionality is demonstrated in a typical use case.

The mobile manipulator consists of the omni-directional mobile platform KUKA OmniRob and the lightweight arm KUKA LWR4. Overall, the mobile manipulator has 10 DoF. For grasping tasks, the robot is equipped with a 3-fingered BarrettHand and a PMD CamBoard nano depth camera. The depth camera is mounted close to the tool center point of the manipulator so that it can observe the object to be grasped.

(a) Mobile manipulator with sensor setup.
(b) Object hand-over with visualization of the 3D obstacle representation. (c) Co-worker crossing the robot path with visualization of the obstacle motion prediction.
Figure 10: Demonstrator setup and snap-shots of the use case.

For monitoring, onboard sensors are installed on the robot in order to cover the large workspace of the mobile manipulator with only few sensors (see Fig. 10(a)). Two 2D laser scanners (SICK S300) with a field of view are mounted at two opposite corners of the platform and monitor a plane around the platform. They detect, e.g., legs of humans next to the platform and objects standing on the floor. Two depth cameras (Microsoft Kinect for Xbox 360) observe the 3D surroundings of the robot arm. They detect, e.g., body and arms of interacting humans. Their placement is chosen in order to achieve a high sensing volume and small occlusions and is determined based on a 3D simulation of the covered space. The extrinsic sensor calibration is performed by registering the acquired 3D point cloud of the manipulator to the robot model, as described in [10].

In the considered use case, the mobile manipulator works in a shared human–robot workspace in an industrial environment. The robot’s main task is to transport parts as for example boxes containing screws or tools. The robot has to pick up these parts from a workbench. Their approximate position on the workbench is known, but the exact position and orientation may vary due to the humans interacting with the same parts. Therefore, the objects are detected and localized by means of the grasping methods presented in Section 3.1. The robot transports the parts to a second workbench, where it places the parts, or it hands the parts over to a human worker (see Section 3.2). Meanwhile, other humans may work at the same workbenches or walk around in the workspace so that they cross the robot path.

Due to the dynamic environment, the robot paths to grasping, placing, and hand-over poses are planned online as described in Section 2.1. The planning takes the human co-workers, the furnishings in the workspace and further objects (e.g., on the workbenches) into account. They are detected by the monitoring algorithms (see Section 1) that deliver both a 3D-representation of the obstacles in the close robot surroundings and a prediction of the obstacle motions (Fig. 10b,c). During the robot motion execution, the planned path is permanently adapted to the changing environment (see Section 2.3). If it is not possible to prevent the robot from a collision by an evasive movement, the robot is slowed down or stopped.

4.2 Other applications

Many of the presented monitoring and planning algorithms have also been adapted to other robotic systems, including indoor and outdoor robots. Two examples are:

  • A collaborative assembly station, in which two fixed-base lightweight robot arms support a human worker in performing assembly tasks [17].

  • The autonomous excavator IOSB.BoB [6]. The excavator is equipped with several 3D laserscanners for environment perception and workspace monitoring. From the planning perspective, an excavator can be considered as a mobile manipulator, to which sampling-based planning is applicable as described in Section 2.

5 Conclusions

In shared human–robot workspaces, it is essential that robots can perceive the dynamic environment and react to moving obstacles in order to avoid imminent collisions. Multi-sensor fusion is necessary in most cases to detect obstacles in the entire relevant space around the robot with sufficient robustness. Motion planning algorithms allow to find collision-free (near-)optimal configuration space paths to goal poses specified online in Cartesian workspace coordinates. Safe and intuitive human–robot interaction can be achieved by combining various sensing, planning, and control methods.

By employing a generic robot model, the described methods are applicable to mobile robots, fixed-base manipulators as well as mobile manipulators, as shown in the presented use case incorporating a 10 DoF mobile manipulator.


The presented research has been supported by the European Commission’s 7th Framework Programme as part of the project SAPHARI (Safe and Autonomous Physical Human-Aware Robot Interaction) under grant agreement ICT-287513.


  • [1] Bertram, D., Kuffner, J., Dillmann, R., Asfour, T.: An integrated approach to inverse kinematics and path planning for redundant manipulators. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 1874–1879 (2006)
  • [2] Blodow, N.: Realtime URDF filter.
  • [3] Brock, O.: Generating robot motion: The integration of planning and execution. Ph.D. thesis, Stanford University (1999)
  • [4] Chen, L., Wang, W., Knoll, A.: Global optimal data association for multiple people tracking. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 4713–4719 (2013)
  • [5] Dolgov, D., Thrun, S., Montemerlo, M., Diebel, J.: Path planning for autonomous vehicles in unknown semi-structured environments. International Journal of Robotics Research 29(5), 485–501 (2010)
  • [6] Emter, T., Frese, C., Zube, A., Petereit, J.: Algorithm toolbox for autonomous mobile robotic systems. ATZoffhighway worldwide 10(3), 48–53 (2017)
  • [7] Escande, A., Mansard, N., Wieber, P.B.: Hierarchical quadratic programming: Fast online humanoid-robot motion generation. International Journal of Robotics Research 33(7), 1006–1028 (2014)
  • [8] Fetzner, A., Frese, C., Frey, C.: A 3D representation of obstacles in the robot’s reachable area considering occlusions. In: Joint 45th International Symposium on Robotics and 8th German Conference on Robotics (ISR/Robotik), pp. 85–92 (2014)
  • [9] Flacco, F., Kröger, T., De Luca, A., Khatib, O.: A depth space approach to human-robot collision avoidance. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 338–345 (2012).
  • [10] Frese, C., Fetzner, A., Frey, C.: Multi-sensor obstacle tracking for safe human-robot interaction. In: Joint 45th International Symposium on Robotics and 8th German Conference on Robotics (ISR/Robotik), pp. 784–791 (2014)
  • [11] Fuchs, S., Hirzinger, G.: Extrinsic and depth calibration of ToF-cameras.

    In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2008)

  • [12] Hornung, A., Wurm, K.M., Bennewitz, M., Stachniss, C., Burgard, W.: OctoMap: An efficient probabilistic 3D mapping framework based on octrees. Autonomous Robots 34(3), 189–206 (2013). Software available at
  • [13] Jackins, C.L., Tanimoto, S.L.: Oct-trees and their use in representing three-dimensional objects. Computer Graphics and Image Processing 14(3), 249–270 (1980)
  • [14] Karaman, S., Frazzoli, E.: Sampling-based algorithms for optimal motion planning. International Journal of Robotics Research 30(7), 846–894 (2011)
  • [15] Kuffner, J.J., LaValle, S.M.: RRT-connect: An efficient approach to single-query path planning. In: IEEE International Conference on Robotics and Automation (ICRA), vol. 2, pp. 995–1001 (2000)
  • [16] Lee, J.H., Tsubouchi, T., Yamamoto, K., Egawa, S.: People tracking using a robot in motion with laser range finder. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2936–2942 (2006)
  • [17] Lengenfelder, C., Frese, C., Zube, A., Voit, M., Beyerer, J.: A cooperative HCI assembly station with dynamic projections. In: International Symposium on Robotics (ISR) (2020). Accepted for publication
  • [18] Moosmann, F., Stiller, C.: Joint self-localization and tracking of generic objects in 3D range data. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 1138–1144 (2013)
  • [19] Petereit, J., Emter, T., Frey, C.W.: Safe mobile robot motion planning for waypoint sequences in a dynamic environment. In: IEEE International Conference on Information Technology (ICIT), pp. 181–186 (2013)
  • [20] Quinlan, S., Khatib, O.: Elastic bands: Connecting path planning and control. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 802–807 (1993)
  • [21] Rybski, P., Anderson-Sprecher, P., Huber, D., Niessl, C., Simmons, R.: Sensor fusion for human safety in industrial workcells. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3612–3619 (2012)
  • [22] Schulz, D., Burgard, W., Fox, D., Cremers, A.B.: People tracking with mobile robots using sample-based joint probabilistic data association filters. International Journal of Robotics Research 22, 99–116 (2003)
  • [23] Seyboldt, R., Frese, C., Zube, A.: Sampling-based path planning to cartesian goal positions for a mobile manipulator exploiting kinematic redundancy. In: 47th International Symposium on Robotics (ISR), pp. 1–9 (2016)
  • [24] Vahrenkamp, N., Asfour, T., Dillmann, R.: Simultaneous grasp and motion planning. IEEE Robotics & Automation Magazine 19(2), 43–57 (2012)
  • [25] Vande Weghe, M., Ferguson, D., Srinivasa, S.S.: Randomized path planning for redundant manipulators without inverse kinematics. In: IEEE-RAS International Conference on Humanoid Robots (Humanoids), pp. 477–482 (2007)
  • [26] Wang, L.: Collaborations towards adaptive manufacturing. In: 16th International Conference on Computer Supported Cooperative Work in Design (CSCWD), pp. 14–21 (2012)
  • [27] Zube, A., Hofmann, J., Frese, C.: Model predictive contact control for human-robot interaction. In: 47th International Symposium on Robotics (ISR), pp. 279–285 (2016)
  • [28] Zube, A., Jung, A., Frese, C.: Robot path adaptation for shared human-robot workspaces. In: Workshop on Human-Friendly Robotics (2014)