Sensor-Based Control for Collaborative Robots: Fundamentals, Challenges and Opportunities

by   Andrea Cherubini, et al.

The objective of this paper is to present a systematic review of existing sensor-based control methodologies for applications that involve direct interaction between humans and robots, in the form of either physical collaboration or safe coexistence. To this end, we first introduce the basic formulation of the sensor-servo problem, then present the most common approaches: vision-based, touch-based, audio-based, and distance-based control. Afterwards, we discuss and formalize the methods that integrate heterogeneous sensors at the control level. The surveyed body of literature is classified according to the type of sensor, to the way multiple measurements are combined, and to the target objectives and applications. Finally, we discuss open problems, potential applications, and future research directions.



page 4


Robots and COVID-19: Challenges in integrating robots for collaborative automation

Objective: The status of human-robot collaboration for assembly applicat...

Sensors for Mobile Robots

A sensor is a device that converts a physical parameter or an environmen...

Every Action Based Sensor

In studying robots and planning problems, a basic question is what is th...

Assessing Machine Learning Approaches to Address IoT Sensor Drift

The proliferation of IoT sensors and their deployment in various industr...

Deep Phase Correlation for End-to-End Heterogeneous Sensor Measurements Matching

The crucial step for localization is to match the current observation to...

A survey of advances in vision-based vehicle re-identification

Vehicle re-identification (V-reID) has become significantly popular in t...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Robot control is a mature field: one that is already being heavily commercialized in industry. However, the methods required to regulate interaction and collaboration between humans and robots have not been fully established yet. These issues are the subject of research in the fields of physical human-robot interaction (pHRI) [13] and collaborative robotics (CoBots) [24]. The authors of [27] present a paradigm that specifies the three nested layers of consistent behaviors that the robot must follow to achieve safe pHRI:

  • Safety is the first and most important feature in collaborative robots. Although there has been a recent push towards standardization of robot safety (e.g., the ISO 13482:2014 for robots and robotic devices [46]), we are still in the initial stages. Safety is generally addressed through collision avoidance (with both humans or obstacles [48]), a feature that requires high reactivity (high bandwidth) and robustness at both the perception and control layers.

  • Coexistence is the robot capability of sharing the workspace with humans. This includes applications involving a passive human (e.g., medical operations where the robot is intervening on the patients’ body [7]), as well as scenarios where robot and human work together on the same task, without contact or coordination.

  • Collaboration is the capability of performing robot tasks with direct human interaction and coordination. There are two modes for this: physical collaboration (with explicit and intentional contact between human and robot), and contactless collaboration (where the actions are guided by an exchange of information, e.g., in the form of body gestures, voice commands, or other modalities). Especially for the second mode, it is crucial to establish means for intuitive control by the human operators, which may be non-expert users. The robot should be proactive in realizing the requested tasks, and it should be capable of inferring the user’s intentions, to interact more naturally from the human viewpoint.

All three layers are hampered by the unpredictability of human actions, which vary according to situations and individuals, complicating modeling [74], and use of classic control.

In the robotics literature, two major approaches for task execution have emerged: path/motion planning [53] and sensor-based control [18]. The planning methods rely on a priori knowledge of the future robot and environment states over a time window. Although they have proved their efficiency in well-structured applications, these methods are hardly applicable to human-robot collaboration, because of the unpredictable and dynamic nature of humans. It is in the authors’ view that sensor-based control is more efficient and flexible for pHRI, since it closes the perception-to-action loop at a lower level than path/motion planning. Note also that sensor-based control strategies strongly resemble the processes of our central nervous system [12], and can trace their origins back to the servomechanism problem [26]. The most known example is image-based visual servoing [18] which relies directly on visual feedback to control robot motion, without requiring a cognitive layer nor a precise model of the environment.

The aim of this article is to survey the current state of art in sensor-based control, as a means to facilitate the interaction between robots, humans, and surrounding environments

. Although we acknowledge the need for other techniques within a complete human-robot collaboration framework (e.g., path planning as mentioned, machine learning, etc.), here we review and classify the works which exploit sensory feedback to directly command the robot motion.

The timing and relevance of this survey is twofold. On one hand, while there have been previous reviews on topics such as (general) human-robot collaboration [5, 92] and human-robot safety [41], there is no specific review on the use of sensor-based control for human-robot collaborative tasks. On the other hand, we introduce a unifying paradigm for designing controllers with four sensing modalities. This feature gives our survey a valuable tutorial-like nature.

The rest of this manuscript is organized as follows: Section 2 presents the basic formulation of the sensor-based control problem; Section 3 describes the common approaches that integrate multiple sensors at the control level. Section 4 provides several classifications of the reviewed works. Section 5 presents insights and discusses open problems and areas of opportunity. Conclusions are given in Section 6.

Figure 1: Examples of artificial sensors. Clockwise from the top left: Microsoft Kinect and Intel Realsense (vision and distance), Sony D-Link DCS-5222L and AVT GT (vision), Syntouch BioTac and ATI Nano 43 (touch), sound sensor LM393 and 3Dio Free Space Pro II Binaural Microphone (audition), proximity sensor Sharp GP2Y0A02YK0F, Laser SICK, Hokuyo URG and proximity sensor SICK CM18-08BPP-KC1 (distance). Note that Intel Realsense and Microsoft Kinect provide both the senses of vision and of distance.

2 Sensing Modalities for Control

Recent developments on bio-inspired measurement technologies have made sensors affordable and lightweight, easing their use on robots. These sensors include RGB-D cameras, tactile skins, force/moment transducers, etcetera (see Fig. 

1). The works reviewed here rely on different combinations of sensing modalities, depending on the task at stake. We consider the following four robot senses:

  • Vision. This includes methods for processing and understanding images, to produce numeric or symbolic information reproducing human sight. Although image processing is complex and computationally expensive, the richness of this sense is unique. Robotic vision is fundamental for understanding the environment and human intention, so as to react accordingly.

  • Touch. In this review, touch includes both proprioceptive force and tact, with the latter involving direct physical contact with an external object. Proprioceptive force is analogous to the sense of muscle force [76]. The robot can measure it either from the joint position errors or via torque sensors embedded in the joints; it can then use both methods to infer and adapt to human intentions, by relying on force control [43, 91, 78, 63]. Human tact (somatosensation), on the other hand, results from activation of neural receptors, mostly in the skin. These have inspired the design of artificial tactile skins [94, 83], thoroughly used for human-robot collaboration.

  • Audition. In humans, localization of sound is performed by using binaural audition (i.e., two ears). They exploit auditory cues in the form of level/time/phase differences between left and right ears to determine the source’s horizontal position and its elevation [79]. Microphones artificially emulate this sense, and allow robots to “blindly” locate sound sources. Although robotic hearing typically uses two microphones mounted on a motorized head, other non-biological configurations exist, e.g. a head instrumented with a single microphone or an array of several omni-directional microphones [64].

  • Distance. This is the only sense among the four that humans cannot directly measure. Yet, numerous examples exist in the mammal kingdom (e.g., bats and whales), in the form of echolocation. Robots measure distance with optical (e.g., infrared or lidar), ultrasonic, or capacitive [37] sensors. The relevance of this particular “sense” in human-robot collaboration is motivated by the direct relationship existing between the distance from obstacles (here, the human) and safety.

Roboticists have designed other bio-inspired sensors, to smell (see [50] for a comprehensive survey and [80, 36, 77] for 3D tracking applications) and taste [85, 49, 40]. However, in our opinion, artificial smell and taste are not yet mature enough for human-robot collaboration. Most of the current work on these senses is for localization/identification of hazardous gases/substances. For these reasons, this review will focus only on the four senses mentioned above, namely vision, touch, audition and distance.

3 Sensor-Based Control

3.1 Bio-Inspired Strategy

Currently, one of the most disruptive theories in cognitive science — the embodied cognition theory — states that sophisticated bodily behaviors (e.g. choreographed limb motions) result from perceptually-guided actions of the body and of its interaction with the environment [96]. This revolutionary idea (referred by Shapiro as the “replacement hypothesis” [84]) challenges traditional cognitive science, by proposing the total replacement of complex symbolic task representations with simpler perception-to-action regulatory models. A classic example of this theory is the baseball outfielder problem. For traditional cognitive science, it is solved by computing a physics-based simulation of the flying ball to predict its trajectory and landing point [81]. Instead, the embodied cognition counterpart formulates the solution as the explicit control of the optical ball movements (as perceived by the outfielder) by running lateral paths that maintain a linear optical trajectory, to anticipate the ball trajectory and perform a successful catch [62]. From the perspective of robotics, the latter approach is interesting as it clearly models the — apparently complex — motion task as a simple feedback control problem, which can be solved with sensor-based strategies, and without simulations or symbolic task representations. The following section reviews the basic formulation of sensor-based feedback control, since it is the most used in the papers that we reviewed.

3.2 Formulation of Sensor-Based Control

Sensor-based control aims at deriving the robot control input (operational space velocity, joint velocity, displacement, etc.) that minimizes a trajectory error

, which can be estimated by sensors and depends on

. A general way of formulating this controller (accounting for actuation redundancy , sensing redundancy , and task constraints) is as the quadratic minimization problem:

subject to task constraints.

This formulation encompasses the classic inverse kinematics problem [95] of controlling the robot joint velocities (), so that the end-effector operational space position converges to a desired value . By defining the desired end-effector rate as , for , and setting for as the Jacobian matrix, it is easy to show that the solution to (1) (in the absence of constraints) is , with the generalized inverse of . This leads to the set-point controller111Throughout the paper, is a positive tuning scalar that determines the convergence rate of task error to .:


In the following sections, we show how each of the four senses (vision, touch, audition and distance) has been used for robot control, either with (1), or with similar techniques. Figure 2 shows relevant variables for the four cases. For simplicity, we assume there are no constraints in (1), although off-the-shelf quadratic programming solvers [69] could account for them.

3.3 Visual Servoing

3.3.1 Formulation

Visual servoing refers to the use of vision to control the robot motion [18]. The camera may be mounted on a moving part of the robot, or fixed in the workspace. These two configurations are referred to as “eye-in-hand” and “eye-to-hand” visual servoing, respectively. The error is defined with regards to some image features, here denoted by , to be regulated to a desired configuration ( is analogous to in the inverse kinematic formulation above). The visual error is:


Visual servoing schemes are called image-based if is defined in image space, and position-based if is defined in the 3D operational space. Here we only briefly recall the image-based approach (on its eye-in-hand modality), since the position-based one consists in projecting the task from the image to the operational space to obtain and then apply (2).

The simplest image-based controller uses , with and as the coordinates of an image pixel, to generate that drives to a reference (in Fig. 2a the centroid of the human hand). This is done by defining as:


If we use the camera’s 6D velocity as the control input , the image Jacobian matrix222Also known as interaction matrix in the visual servoing literature. relating and is:


where denotes the depth of the point with respect to the camera. In the absence of constraints, the solution of (1) is:


3.3.2 Application to Human-Robot Collaboration

Figure 2: Examples of four sensor-based servo controllers. (a) Visual servoing: the user hand is centered in the camera image. (b) Indirect force control: by applying a wrench, the user deviates the contact point away from a reference trajectory. (c) Audio-based control: a microphone rig is automatically oriented towards the sound source (the user’s mouth). (d) Distance-based control: the user acts as a repulsive force, related to his/her distance from the robot.

Humans generally use vision to teach the robot relevant configurations for collaborative tasks. For example, [15]

demonstrates an application where a human operator uses a QR code to specify the target poses for a 6 degrees-of-freedom (dof) robot arm. In

[39] the user provides the target tasks via a tablet-like interface that shows the robot the desired reference view. The human can specify various motions such as point-to-point, line-to-line, etc., that are automatically performed via visual feedback. The authors of [38] present a grasping system for a tele-operated dual arm robot. The user specifies the object to be manipulated, and the robot completes the task using visual servoing.

Assistive robotics represents another very common application domain for visual servoing. The motion of robotic wheelchairs has been semi-automated at various degrees. For instance, [47] presents a corridor following method that exploits the projection of parallel lines. The user provides target directions with a haptic interface, and the robot corrects the trajectories with visual feedback. Other works focus on mobile manipulation. The authors of [90] develop a vision-based controller for a robotic arm mounted on a wheelchair: the user specifies the object to be grasped and retrieved by the robot. A similar approach is reported in [32], where, the desired poses are input with “clicks” on an screen interface.

Medical robotics is another area that involves sensor-based interactions between humans and robots, and where vision has huge potential (see [7] for a comprehensive review). The authors of [4] design a laparoscopic camera, which regulates its pan/tilt motions to track human-held instruments.

3.4 Touch (or Force) Control

3.4.1 Formulation

Touch (or force) control requires the measurement of one or multiple (in the case of tactile skins) wrenches , which are (at most) composed of 3 translational forces, and 3 torques; is fed to the controller that moves the robot so that it exerts a desired interaction force with the human or environment. Force control strategies can be grouped into the following two classes [91]:

  • Direct control regulates the contact wrench to obtain a desired wrench . Specifying requires an explicit model of the task and environment. A widely adopted strategy is hybrid position/force control [78], which regulates the velocity and wrench along unconstrained and constrained task directions, respectively. Referring to (1), this is equivalent to setting


    with a binary diagonal selection matrix, and

    as the identity matrix. Applying a motion

    that nullifies in (7) guarantees that the components of (respectively ) specified via (respectively ) converge to (respectively ).

  • Indirect control (illustrated in Fig. 2b) does not require an explicit force feedback loop. To this category belong impedance control and its dual admittance control [43]. It consists in modelling the deviation of the contact point from a reference trajectory associated to the desired , via a virtual mechanical impedance with adjustable parameters (inertia , damping and stiffness ). Referring to (1), this is equivalent to setting:


    Here, represents the “deviated” contact point pose, with and as time derivatives. When , the displacement responds as a mass-spring-damping system under the action of an external force . In most cases, is defined for motion in free space (). The general formulation in (1) and (8) can account for both impedance control ( is measured and ) and admittance control ( measured and ).

3.4.2 Application to Human-Robot Collaboration

The authors of [10] use direct force control for collaborative human-robot laparoscopic surgery. They control the instruments with a hybrid position/force approach. In [25], a robot regulates the applied forces onto a beating human heart. Since the end-effector’s 3 linear dof are fully-constrained, position control cannot be performed: in (7).

A drawback of direct control is that it can realize only the tasks which can be described via constraint surfaces. If their location is unknown and/or the contact geometry is complex—as often in human-robot collaboration—indirect control is more suited since: i) it allows to define a priori how the robot should react to unknown external force disturbances, ii) it can use a reference trajectory output by another sensor (e.g., vision). In the next paragraph, we review indirect force control methods.

By sensing force, the robot can infer the motion commands (e.g., pushing, pulling) from the human user. For example, Maeda et al. [57] use force sensing and human motion estimation (based on minimum jerk) within an indirect (admittance) control framework for cooperative manipulation. In [88, 87], an assistant robot suppresses the involuntary vibrations of a human, who controls welding direction and speed. By exploiting kinematic redundancy, [34] also addresses manually guided robot operation. The papers [93, 14] present admittance controllers for two-arm robots moving a table in collaboration with a human. In [9], a human can control a medical robot arm, with an admittance controller. Robot Tele-operation is another common human-robot collaboration application where force feedback plays a crucial role; [73] is a comprehensive review on the topic.

All these works rely on local force/moment measures. To date, tactile sensors and skins (measuring the wrench along the robot body, see [6] for a review) have been used for object exploration [65] or recognition [1], but not for control as expressed in (1). One reason is that they are at a preliminary design stage, which still requires complex calibration [31, 56] that constitutes a research topic per se. An exception is [55], which uses tactile measures within (1). Similarly, in [98], tactile sensing regulates interaction with the environment. Yet, neither of these works considers pHRI. In our opinion, there is huge potential in the use of skins and tactile displays for human-robot collaboration.

3.5 Audio-Based control

3.5.1 Formulation

The purpose of audio-based control is to locate the sound source, and move the robot towards it. For simplicity, we present the two-dimensional binaural (i.e., with two microphones) configuration in Fig. 2c, with the angular velocity of the microphone rig as control input: . We hereby review the two most popular methods for defining error in (1): Interaural Time Difference (ITD) and Interaural Level Difference (ILD)333Or its frequency counterpart: Interaural Phase Difference (IPD).. The following is based on [58]:

  • ITD-based aural servoing uses the difference between the arrival times of the sound on each microphone; must be regulated to a desired . The controller can be represented with (1), by setting , with the desired rate (to obtain set-point regulation to ). Feature can be derived in real-time by using standard cross-correlation of the signals [97]. Under a far field assumption:


    with the sound celerity and the microphones baseline. From (9), the scalar ITD Jacobian is: . The motion that minimizes is:


    which is locally defined for , to ensure that .

  • ILD-based aural servoing uses , the difference in intensity between the left and right signals. This can be obtained in a time window of size as , where the denote the signals’ sound energies and the are the intensities at iteration . To regulate to a desired , one can set with . Assuming spherical propagation and slowly varying signal:


    where is the sound source frontal coordinate in the moving auditory frame, and the distance between the right microphone and the source. From (11), the scalar ILD Jacobian is . The motion that minimizes is:


    where is defined for sources located in front of the rig. In contrast with ITD-servoing, here the source location (i.e., and ) must be known or estimated.

While the methods above only control the angular velocity of the rig (), Magassouba has extended both to also regulate the 2D translations of a mobile platform, (ITD in [59, 60] and ILD in [61]).

3.5.2 Application to Human-Robot Collaboration

Due to the nature of this sense, audio-based controllers are mostly used in contact-less applications, to enrich other senses (e.g., force, distance) with sound, or to design intuitive interfaces.

Audio-based control is currently (in our opinion) an underdeveloped research area with great potential for human-robot collaboration, e.g., for tracking a speaker. Besides the cited works [58, 59, 60, 61], that closely follow the framework of Sec. 3, others formulate the problem differently. For example, the authors of [52, 51] propose a linear model to describe the relation between the pan motion of a robot head and the difference of intensity between its two microphones. The resulting controllers are much simpler than (10) and (12). Yet, their operating range is smaller, making them less robust than their – more analytical – counterparts.

3.6 Distance-Based control

3.6.1 Formulation

The simplest (and most popular) distance-based controller is the artificial potential fields method [48]. Despite being prone to local minima, it has been thoroughly deployed both on manipulators and on autonomous vehicles for obstacle avoidance. Besides, it is acceptable that a collaborative robot stops (e.g., because of local minima) as long as it avoids the human user. The potential fields method consists in modeling each obstacle as a source of repulsive forces, related to the robot distance from the obstacle (see Fig. 2d). All the forces are summed up resulting in a velocity in the most promising direction. Given , the position of the nearest obstacle in the robot frame, the original version [48] consists in applying operational space velocity


Here is the (arbitrarily tuned) minimal distance required for activating the controller. Since the quadratic denominator in (13) yields abrupt accelerations, more recent versions adopt a linear behavior. Referring to (1), this can be obtained by setting with as reference velocity:


By defining as control input , the solution to (1) is:


3.6.2 Application to Human-Robot Collaboration

Many works use this (or similar) distance-based methods for pHRI. To avoid human-robot collisions, the authors of [28] apply (15), by estimating between human head and robot with vision. Recently, these approaches have been boosted by the advent of 3D vision sensors (e.g. the Microsoft Kinect and Intel RealSense), which can provide both vision and distance control. The authors of [35] design a Kinect-based distance controller (again, for human collision avoidance) with an expression similar to (15), but smoothed by a sigmoid.

Proximity servoing is a similar technique, which regulates—via capacitive sensors—the distance between the robot surface and the human. In [82], these sensors modify the position and velocity of a robot arm when a human approaches it, to avoid collisions. The authors of [11, 54, 30] have developed a new capacitive skin for a dual-arm robot. They design a collision avoidance method based on an admittance model similar to (8), which relies on the joint torques (measured by the skin) to control the robot motion.

4 Integration of Multiple Sensors

In Sect. 3, we presented the most common sensor-based methods used for collaborative robots. Just like natural senses, artificial senses provide complementary information about the environment. Hence, to effectively perform a task, the robot should measure (and use for control) multiple feedback modalities. In this section, we review various methods for integrating multiple sensors in a unique controller.

Inspired by how humans merge their percepts [33], researchers have traditionally fused heterogeneous sensors to estimate the state of the environment. This can be done in the sensors’ Cartesian frames [86]

by relying on an Extended Kalman Filter (EKF) 

[89]. Yet the sensors must be related to a single quantity, which is seldom the case when measuring different physical phenomena [68]. An alternative is to use the sensed feedback directly in (1). This idea, proposed for position-force control in [78] and extended to vision in [23], brings new challenges to the control design, e.g., sensor synchronization, task compatibility and task representation. For instance, the designer should take care when transforming D velocities or wrenches to a unique frame. This requires (when mapping from frame to frame ) multiplication by


for a velocity, and by for a wrench. In (16), is the rotation matrix from to and

the skew-symmetric matrix associated to translation


According to [23], the three methods for combining sensors within a controller are:

  • Traded: the sensors control the robot one at a time. Predefined conditions on the task trigger the switches:

  • Shared: All sensors control the robot throughout operation. A common way is via nested control loops, as shown—for shared vision/touch control—in Fig. 3. Researchers have used at most two loops, denoted for outer and for inner loop:


    In the example of Fig. 3: , , applying (3) and applying (8).

  • Hybrid: the sensors act simultaneously, but on different axes of a predefined Cartesian task-frame [8]. The directions are selected by binary diagonal matrices , with the dimension of the task space, and such that :


    To express all in the same task frame, one should apply and/or . Note the analogy between (19) and the hybrid position/force control framework (7).

We will use this classification to characterize the works reviewed in the rest of this Section.

Figure 3: The most common scheme for shared vision/touch (admittance) control, used in [63], [3, 2]. The goal is to obtain desired visual features and wrench , based on current image and wrench . The outer visual servoing loop based on error (3) outputs a reference velocity that is then deformed by the inner admittance control loop based on error (8), to obtain the desired robot position .

4.1 Traded Control

The paper [20] presents a human-robot manufacturing cell for collaborative assembly of car joints. The approach (traded vision/touch) can manage physical contact between robot and human, and between robot and environment, via admittance control (8). Vision takes over in dangerous situations to trigger emergency stops. The switching condition is determined by the position of the human wrt the robot.

In [70, 71], a traded vision/audio controller enables a mobile robot to exploit sound source localization for visual control. The robot head automatically rotates towards the estimated direction of the human speaker, and then visually tracks him/her. The switching condition is that the sound source is visible. The audio-based task is equivalent to regulating to or to , as discussed in Sect. 3.5. Paper [44] presents another traded vision/audio controller for the iCub robot head to localize a human speaker. This method constructs audio-motor maps based and integrates visual feedback to update the map. Again, the switching condition is that the speaker’s face is visible. In [16], another traded vision/audio controller is deployed on a mobile robot, to drive it towards an unknown sound source; the switching condition is defined by a threshold on the frontal localization error.

The authors of [72] present a mobile assistant for people with walking impairments. The robot is equipped with: two wrench sensors to measure physical interaction with the human, an array of microphones for audio commands, laser sensors for detecting obstacles, and an RGB-D camera for estimating the users’ state. Its controller integrates audio, touch, vision and distance in a traded manner, with switching conditions determined by a knowledge-based layer.

The work [67] presents an object manipulation strategy, integrating distance (capacitive proximity sensors) and touch (tactile sensors). While the method does not explicitly consider humans, it may be applied for human-robot collaboration, since proximity sensors can detect humans if vision is occluded. The switching condition between the two modes is the contact with the object.

Another example of traded control—here, audio/distance—is [45], which presents a method for driving a mobile robot towards hidden sound sources, via an omnidirectional array of microphones. The controller switches to ultrasound-based obstacle avoidance in the presence of humans/objects. The detection of a nearby obstacle is the switching condition.

4.2 Shared Control

In applications where the robot and environment/human are in permanent contact (e.g. collaborative object transportation), shared control is preferable. Let us first review a pioneer controller [63] that relies on shared vision/touch, as outlined in Fig. 3[63] addresses tele-operated peg-in-hole assembly, by placing the visual loop outside the force loop. The reference trajectory output by visual servoing is deformed in the presence of contact by the admittance controller, to obtain the robot position command . Human interaction is not considered in this work.

The authors of [66] estimate sensory-motor responses to control a pan-tilt robot head with shared visual/audio feedback from humans. They assume local linear relations between the robot motions and the ITD/ILD measures. This results in controller which is simpler than the one presented in Sect. 3.5. The scheme is similar to Fig. 3, with an outer vision loop generating a reference motion, and audio modifying it.

4.3 Hybrid Control

Pomares et al. [75] propose a hybrid vision/touch controller for grasping objects, using a robot arm equipped with a hand. Visual feedback drives an active camera (installed on the robot tip) to observe the object and detect humans to be avoided, whereas touch feedback moves the fingers, to grasp the object. The authors define matrix in (7) to independently control arm and fingers with the respective sensor.

In [17], a hybrid scheme controls an ultrasonic probe in contact with the abdomen of a patient. The goal is to centre the lesions in the ultrasound image observed by the surgeon. The probe is moved by projecting, via , the touch and vision (from the ultrasound image) tasks in orthogonal directions.

4.4 Other Control Schemes

Some works do not strictly follow the classification given above. These are reviewed below.

The authors of [3, 2] combine vision and touch to address joint human-humanoid table carrying. The table must stay flat, to prevent objects on top from falling off. Vision controls the table inclination, whereas the forces exchanged with the human make the robot follow his/her intention. The approach is shared, with visual servoing in the outer loop of admittance control (Fig. 3), to make all dof compliant. However, it is also hybrid, since some dof are controlled only with admittance. Specifically vision regulates only the table height in [3], and both table height and roll angle in [2].

The works [19, 22] merge vision and distance to guarantee lidar-based obstacle avoidance during camera-based navigation. While following a pre-taught path, the robot must avoid obstacles which were not present before. Meanwhile, it moves the camera pan angle, to maintain scene visibility. Here, the selection matrix in (19) is a scalar function dependent on the time-to-collision. In the safe context (), the robot follows the taught path, with camera looking forward. In the unsafe context () the robot circumnavigates the obstacles. Therefore, the scheme is hybrid when or

(i.e., vision and distance operate on independent components of the task vector), and

shared when .

In [29], proximity (distance) and tactile (touch) measurements control a robot arm in a pHRI scenario to avoid obstacles or – when contact is inevitable – to generate compliant behaviors. The framework linearly combines the two senses, and provides this signal to an inner admittance-like control loop (as in the shared scheme of Fig. 3). Since the operation principle of both senses is complementary (one requires contact while the other does not), the integration can also be seen as traded.

The authors of [21] enables a robot to adapt to changes in the human behaviour, during a human-robot collaborative screwing task. In contrast with classic hybrid vision–touch–position control, their scheme enables smooth transitions, via weighted combinations of the tasks. The robot can execute vision and force tasks, either exclusively on different dof (hybrid approach) or simultaneously (shared approach).

5 Classification of Works and Discussion

Paper Sense(s) Control objective Sector Robot
[15][39] Vision Contactless guidance Service Arm
[38] Vision Remote guidance Service Arm
[47]-[32] Vision Contactless guidance Medical Wheeled
[4] Vision Contact w/humans Medical Arm
[10] Touch Contact w/humans Medical Arm
Remote guidance
[25] Touch Contact w/humans Medical Arm
[57]-[34] Touch Direct guidance Production Arm
[93] Touch Carrying Production Wheeled
[14] Touch Carrying Production Humanoid
[9] Touch Remote guidance Medical Arm
[58][52][51] Audition Contactless guidance Service Heads
[59]-[61] Audition Contactless guidance Service Wheeled
[28]-[82] Distance Collision avoidance Production Arm
[11]-[30] Distance Collision avoidance Service Arm
[20] V+T (tra.) Assembly Production Arm
[70]-[44] V+A(tra.) Contactless guidance Service Heads
[16] V+A(tra.) Contactless guidance Service Wheeled
[72] V+T+A+D Direct guidance Medical Wheeled
[67] D+T(tra.) Collision avoidance Production Arm
[45] D+A(tra.) Collision avoidance Service Wheeled
[66] V+A(sh.) Contactless guidance Service Heads
[75] V+T(hyb.) Collision avoidance Production Arm
[17] V+T Contact w/humans Medical Arm
(hyb.) Remote guidance
[3][2] V+T Contact w/humans Production Humanoid
[19][22] D+V Collision avoidance Production Wheeled
[29] D+T Direct guidance Service Arm
[21] V+T Assembly Production Arm
Table 1: Classification of all papers according to four criteria.

In this section, we use five criteria to classify all the surveyed papers which apply sensor-based control to collaborative robots. This taxonomy then serves as an inspiration to drive the following discussion on design choices, limitations and future challenges.

In total, we refer to the forty-five papers revised above. These include the works with only one sensor, discussed in Sect. 3 ([15][9], [58][30]) and those which integrate multiple sensors, discussed in Sect. 4 ([20][21]). The five criteria are: sensor(s), integration method (when multiple sensors are used), control objective, target sector and robot platform. In Table 1, we indicate these characteristics for each paper. Then, we focus on each characteristic, in Tables 2-5444In the Tables, we have used the following notation: V, T, A, D for Vision, Touch, Audition and Distance, and sh., hyb., tra. for shared, hybrid and traded..

Table 2 classifies the papers according to the sensor/s. Column mono indicates the papers relying only on one sensor. For the others, we specify the integration approach (see Sect. 4). Note that vision (alone or not) is by far the most popular sense, used in 22 papers. This comes as no surprise, since even for humans, vision provides the richest perceptual information to structure the world and perform motion [42]. Touch is the second most commonly used sensor (18 papers) and fundamental in pHRI, since it is the only one among the four that can be exploited directly to modulate contact.

Also note that, apart from [72]

, no paper integrates more than two sensors. Given the sensors wide accessibility and the recent progress in computation power, this is probably due to the difficulty in designing a framework capable of managing such diverse and broad data. Another reason may be the presumed (but disputable) redundancy of the three contact-less senses, which biases towards opting for vision, given its diffusion and popularity (also in terms of software). Touch – the only sensor measuring contact – is irreplaceable. This may also be the reason why, when merging two sensors, researchers have generally opted for vision+touch (7 out of 17 papers). The most popular among the three integration methods is

traded control, probably because it is the easiest to set up. In recent years, however, there has been a growing interest towards the shared+hybrid combination, which guarantees nice properties in terms of control smoothness.

Vision [15]-[4]
tra. [20] hyb. [75][17]
sh.+hyb. [3][2][21]
Audition [58]-[51]
tra. [70]-[72] sh. [66]
tra. [72]
sh.+hyb. [19][22]
sh.+tra. [29] tra. [72][45]
tra. [67]
Mono Vision Touch Audition
Table 2: Classification based on the sensors.

An unexploited application of shared control is the combination of vision and distance (proximity sensors) to avoid collisions with humans. This can be formulated as in Fig. 3 by replacing touch control error with an admittance-like distance control error:


where and represent the measured and desired distance to obstacles. With this approach, the robot can stabilize at a given “safe” distance from an obstacle, or move away from it.

In the authors’ opinion, no sensor(s) nor (if needed) integration method is the best, and the designer should choose according to the objective at stake. For this, nature and evolution can be extremely inspiring but technological constraints (e.g., hardware and software availability) must also be accounted for, with the golden rule of engineering that “simpler is better”.

Table 3 classifies the papers according to the control objective. In the table, we also apply the taxonomy of pHRI layers introduced in [27], and evoked in the introduction: safety, coexistence, collaboration. Works that focus on collision avoidance address safety, and works where the robot acts on passive humans address coexistence. For the collaboration layer, we distinguish two main classes of works. First, those where the human is guiding the robot (without contact, with direct contact, or with remote physical contact as in tele-operation), then those where the two collaborate (e.g., for part assembly or object carrying). The idea (also in line with [27]) is the lower lines in the table generally require higher cognitive capabilities (e.g., better modelling of environment and task). Some works, particularly in the field of medical robotics [4, 10, 17] cover both coexistence and collaboration, since the human is guiding the robot to operate on another human. Interestingly, the senses appear in the table with a trend analogous to biology. Distance is fundamental for collision avoidance, when the human is far, and his/her role in the interaction is basic (s/he is mainly perceived as an obstacle). Then, audio is used for contactless guidance. As human and robot are closer, touch takes over the role of audio. As mentioned above, vision is a transversal sense, capable of covering most distance ranges. Yet, when contact is present (i.e., in the four lower lines), it is systematically complemented by touch, a popular pairing as also shown in Table 2 and discussed above.

Collision avoidance distance [28]-[30] distance+touch [67]
(safety) distance+audition [45] vision+touch [75]
vision+distance [19, 22]
Contact with passive humans vision [4] touch [10, 25]
(coexistence) vision+touch [17]
Contactless guidance vision [15, 39], [47]-[32]
(collaboration) audition [58]-[51]
vision+audition [70]-[16][66]
Direct guidance touch+audition+distance+vision [72]
(collaboration) touch [57]-[34] touch+distance [29]
Remote guidance vision [38, 4] touch [10, 9]
(collaboration) vision+touch [17]
Collaborative assembly vision+touch [20, 21]
Collaborative carrying touch [93, 14]
(collaboration) vision+touch [3, 2]
Table 3: Classification based on the control objective with corresponding pHRI layer as proposed in [27] (in parenthesis).

Table 4 classifies the papers according to the target (or potential) sector. We propose three sectors: Production, Medical, and Service. Production is the historical sector of robotics; applications include: manufacturing (assembly, welding, pick-and-place), transportation (autonomous guided vehicles, logistics) and construction (material and brick transfer). The medical category has become very popular in recent years, with applications spanning from robotic surgery (surgical gripper and needle manipulation), diagnosis (positioning of ultrasonic probes or endoscopes), and assistance (intelligent wheelchairs, feeding and walking aids). The service sector is the one that in the authors’ opinion presents the highest potential for growth in the coming years. Applications include companionship (elderly and child care), domestic (cleaning, object retrieving), personal (chat partners, tele-presence). The table shows that all four sensors have been deployed in all three sectors. The only exception is audition not being used in production applications, probably because of the noise – common in industrial environments.

Finally, Table 5 gives a classification based on the robotic platform. We can see that (unsurprisingly) most works use fixed base arms. The second most used platforms here are wheeled robots. Then, the humanoids category, which refers to robots with anthropomorphic design (two arms and biped locomotion capabilities). Finally, we consider robot heads, which are used exclusively for audio-based control. While robot heads are commonly used for face tracking in Social Human Robot Interaction, such works are not reviewed in this survey as they do not generally involve contact.

Production (manufacturing, touch [57]-[14] distance [28]-[82]
transportation, construction) D+T [67] V+T [20][75][3][2][21]
V+D [19, 22]
Medical (surgery, diagnosis, vision [47]-[4] touch [10][25][9]
assistance) V+T+A+D [72] V+T [17]
Service (companionship, vision [15]-[38] audition [58]-[51]
domestic, personal) distance [11]-[30] V+A [70]-[16][66]
D+A [45] T+D [29]
Table 4: Classification based on target/potential sectors.

6 Conclusions

This work presents a systematic review of sensor-based controllers which enable collaboration and/or interaction between humans and robots. We considered four senses: vision, touch, audition and distance. First, we introduce a tutorial-like general formulation of sensor-based control, which we instantiate for visual servoing, touch control, aural servoing, and distance-based control, while reviewing representative papers. Next, with the same formulation, we model the methods that integrate multiple sensors, while again discussing related works. Finally, we classify the surveyed body of literature according to: used sense(s), integration method, control objective, target application and platform.

Althoug vision and touch (proprioceptive force rather than tact) emerge nowadays as the most popular senses on collaborative robots, the advent of cheap, precise and easy to integrate tactile, distance and audio sensors present great opportunities for the future. Typically, we believe that robot skins (e.g., on arms and hands) will simplify interaction, boosting the opportunities for human-robot collaboration. It is imperative that researchers develop the appropriate tools for this. Distance/proximity feedback is promising to fully perceive the human operating near the robot (something monocular vision cannot do). Audio feedback is key for developing robotic heads that can interact in a natural way with human speakers.

Arms vision [15]-[38][4] touch [10]-[34][9] distance [28]-[30]
V+T [20, 75, 17, 21] D+T [67, 29]
vision [47]-[32] touch [93] audition [59]-[61]
Wheeled V+A [16] V+T+A+D [72] D+A [45] V+D [19, 22]
Humanoids touch [14] V+T [3, 2]
Heads audition [58, 52, 51] V+A [70][44] [66]
Table 5: Classification based on the type of robot platform.

Finally, some open problems must be addressed, to develop robust controllers for real-world applications. For example, the use of task constraints has not been sufficiently explored when multiple sensors are integrated. Also, difficulty in obtaining models describing and predicting human behavior hampers the implementation of human-robot collaborative tasks. The use of multimodal data such as RGB-D cameras with multiple proximity sensors may be an interesting solution for this human motion sensing and estimation problem. More research needs to be conducted in this direction.


  • [1] Z. Abderrahmane, G. Ganesh, A. Crosnier, and A. Cherubini (2018) Haptic zero-shot learning: recognition of objects never touched before. Robotics and Autonomous Systems 105, pp. 11–25. Cited by: §3.4.2.
  • [2] D. J. Agravante, A. Cherubini, A. Bussy, P. Gergondet, and A. Kheddar (2014) Collaborative human-humanoid carrying using vision and haptic sensing. In IEEE Int. Conf. on Robotics and Automation, ICRA, Cited by: Figure 3, §4.4, Table 1, Table 2, Table 3, Table 4, Table 5.
  • [3] D. J. Agravante, A. Cherubini, A. Bussy, and A. Kheddar (2013) Human-humanoid joint haptic table carrying task with height stabilization using vision. In IEEE/RSJ Int. Conf. on Robots and Intelligent Systems, IROS, Cited by: Figure 3, §4.4, Table 1, Table 2, Table 3, Table 4, Table 5.
  • [4] A. Agustinos, R. Wolf, J. A. Long, P. Cinquin, and S. Voros (2014-08) Visual servoing of a robotic endoscope holder based on surgical instrument tracking. In IEEE RAS/EMBS Int. Conf. on Biomedical Robotics and Biomechatronics, Vol. , pp. 13–18. Cited by: §3.3.2, Table 1, Table 2, Table 3, Table 4, §5, Table 5.
  • [5] A. Ajoudani, A. M. Zanchettin, S. Ivaldi, A. Albu-Schäffer, K. Kosuge, and O. Khatib (2017-10-31) Progress and prospects of the human–robot collaboration. Autonomous Robots 42, pp. 957–975. Cited by: §1.
  • [6] B. D. Argall and A. G. Billard (2010) A survey of tactile human-robot interactions. Robotics and Autonomous Systems 58 (10), pp. 1159 – 1176. External Links: ISSN 0921-8890 Cited by: §3.4.2.
  • [7] M. Azizian, M. Khoshnam, N. Najmaei, and R. V. Patel (2014) Visual Servoing in medical robotics: a survey. Part I: endoscopic and direct vision imaging – techniques and applications. Int. J. of Med. Robot. 10 (3), pp. 263–274. Cited by: 2nd item, §3.3.2.
  • [8] J. Baeten, H. Bruyninckx, and J. De Schutter (2003) Integrated vision/force robotic servoing in the task frame formalism. Int. Journal of Robotics Research 22 (10-11), pp. 941–954. Cited by: 3rd item.
  • [9] J. Baumeyer, P. Vieyres, S. Miossec, C. Novales, G. Poisson, and S. Pinault (2015-06) Robotic co-manipulation with 6 dof admittance control: application to patient positioning in proton-therapy. In IEEE Int. Work. on Advanced Robotics and its Social Impacts, Vol. , pp. 1–6. Cited by: §3.4.2, Table 1, Table 2, Table 3, Table 4, §5, Table 5.
  • [10] E. Bauzano, B. Estebanez, I. Garcia-Morales, and V. F. Munoz (2016-Sept) Collaborative human-robot system for HALS suture procedures. IEEE Systems Journal 10 (3), pp. 957–966. Cited by: §3.4.2, Table 1, Table 2, Table 3, Table 4, §5, Table 5.
  • [11] F. Bergner, E. Dean-Leon, and G. Cheng (2017-05) Efficient event-driven reactive control for large scale robot skin. In IEEE Int. Conf. on Robotics and Automation, ICRA, Vol. , pp. 394–400. Cited by: §3.6.2, Table 1, Table 4.
  • [12] A. Berthoz (2002) The brain’s sense of movement. Harvard Univ. Press. Cited by: §1.
  • [13] A. Bicchi, M. Peshkin, and J. Colgate (B. Siciliano and O. Khatib (Eds.), Springer, pp. 1335-1348, 2008) Safety for physical human-robot interaction. Springer Handbook of Robotics. Cited by: §1.
  • [14] A. Bussy, A. Kheddar, A. Crosnier, and F. Keith (2012) Human-humanoid haptic joint object transportation case study. In IEEE/RSJ Int. Conf. on Robots and Intelligent Systems, IROS, pp. 3633–3638. Cited by: §3.4.2, Table 1, Table 3, Table 4, Table 5.
  • [15] C. Cai, N. Somani, and A. Knoll (2016-04) Orthogonal image features for visual servoing of a 6-dof manipulator with uncalibrated stereo cameras. IEEE Trans. on Robotics 32 (2), pp. 452–461. Cited by: §3.3.2, Table 1, Table 2, Table 3, Table 4, §5, Table 5.
  • [16] V. Chan, C. Jin, and A. van Schaik (2012) Neuromorphic audio-visual sensor fusion on a sound-localising robot. Frontiers in Neuroscience 6, pp. 21. Cited by: §4.1, Table 1, Table 3, Table 4, Table 5.
  • [17] P. Chatelain, A. Krupa, and N. Navab (2017-12) Confidence-driven control of an ultrasound probe. IEEE Transactions on Robotics 33 (6), pp. 1410–1424. Cited by: §4.3, Table 1, Table 2, Table 3, Table 4, §5, Table 5.
  • [18] F. Chaumette and S. Hutchinson (2006) Visual servo control, Part I: Basic approaches. IEEE Robotics and Automation Magazine 13 (4), pp. 82–90. Cited by: §1, §3.3.1.
  • [19] A. Cherubini and F. Chaumette (2013) Visual navigation of a mobile robot with laser-based collision avoidance.. Int. Journal of Robotics Research 32 (2), pp. 189–209. Cited by: §4.4, Table 1, Table 2, Table 3, Table 4, Table 5.
  • [20] A. Cherubini, R. Passama, A. Crosnier, A. Lasnier, and P. Fraisse (2016-08) Collaborative manufacturing with physical human-robot interaction. Robotics and Computer-Integrated Manufacturing 40, pp. 1–13. Cited by: §4.1, Table 1, Table 2, Table 3, Table 4, §5, Table 5.
  • [21] A. Cherubini, R. Passama, P. Fraisse, and A. Crosnier (2015) A unified multimodal control framework for human-robot interaction. Robotics and Autonomous Systems 70, pp. 106–115. Cited by: §4.4, Table 1, Table 2, Table 3, Table 4, §5, Table 5.
  • [22] A. Cherubini, F. Spindler, and F. Chaumette (2014) Autonomous visual navigation and laser-based moving obstacle avoidance. IEEE Trans. on Int. Transportation Systems 15 (5), pp. 2101–2110. Cited by: §4.4, Table 1, Table 2, Table 3, Table 4, Table 5.
  • [23] B. J. Nelson, J. D. Morrow, and P. K. Khosla (1995) Improved force control through visual servoing. In Proc. of the American Control Conference, Vol. 1, pp. 380–386. Cited by: §4, §4.
  • [24] J. E. Colgate, W. Wannasuphoprasit, and M. A. Peshkin (1996-12) Cobots: robots for collaboration with human operators. In Proc ASME Dynamic Systems and Control Division, Vol. 58, pp. 433–439. Cited by: §1.
  • [25] R. Cortesao and M. Dominici (2017-08) Robot force control on a beating heart. IEEE/ASME Transactions on Mechatronics 22 (4), pp. 1736–1743. Cited by: §3.4.2, Table 1, Table 3, Table 4.
  • [26] E.J. Davison and A. Goldenberg (1975) Robust control of a general servomechanism problem: the servo compensator. IFAC Proceedings Volumes 8 (1, Part 1), pp. 231 – 239. Cited by: §1.
  • [27] A. De Luca and F. Flacco (2012) Integrated control for pHRI: collision avoidance, detection, reaction and collaboration. In IEEE RAS/EMBS Int. Conf. on Biomedical Robotics and Biomechatronics, BIOROB, Cited by: §1, Table 3, §5.
  • [28] A. De Santis, V. Lippiello, B. Siciliano, and L. Villani (2007) Human-robot interaction control using force and vision. Advances in Control Theory and Applications 353, pp. 51–70. Cited by: §3.6.2, Table 1, Table 2, Table 3, Table 4, Table 5.
  • [29] E. Dean-Leon, F. Bergner, K. Ramirez-Amaro, and G. Cheng (2016-11) From multi-modal tactile signals to a compliant control. In IEEE-RAS Int. Conf. on Humanoid Robots, Vol. , pp. 892–898. Cited by: §4.4, Table 1, Table 2, Table 3, Table 4, Table 5.
  • [30] E. Dean-Leon, B. Pierce, F. Bergner, P. Mittendorfer, K. Ramirez-Amaro, W. Burger, and G. Cheng (2017-05) TOMM: tactile omnidirectional mobile manipulator. In IEEE Int. Conf. on Robotics and Automation, ICRA, Vol. , pp. 2441–2447. Cited by: §3.6.2, Table 1, Table 2, Table 3, Table 4, §5, Table 5.
  • [31] A. Del Prete, S. Denei, L. Natale, F. Mastrogiovanni, F. Nori, G. Cannata, and G. Metta (2011) Skin spatial calibration using force/torque measurements. In IEEE/RSJ Int. Conf. on Robots and Intelligent Systems, IROS, Cited by: §3.4.2.
  • [32] C. Dune, A. Remazeilles, E. Marchand, and C. Leroux (2008) Vision-based grasping of unknown objects to improve disabled people autonomy.. In Robotics: Science and Systems, Cited by: §3.3.2, Table 1, Table 3, Table 5.
  • [33] M. O. Ernst and M. S. Banks (2002) Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415, pp. 429–433. Cited by: §4.
  • [34] F. Ficuciello, A. Romano, L. Villani, and B. Siciliano (2013) Cartesian impedance control of redundant manipulators for human-robot co-manipulation. In IEEE/RSJ Int. Conf. on Robots and Intelligent Systems, IROS, Cited by: §3.4.2, Table 1, Table 3, Table 5.
  • [35] F. Flacco, T. Kroger, A. De Luca, and O. Khatib (2012) A Depth Space Approach to Human-Robot Collision Avoidance. In IEEE Int. Conf. on Robotics and Automation, ICRA, Cited by: §3.6.2.
  • [36] B. Gao, H. Li, W. Li, and F. Sun (2016) 3D moth-inspired chemical plume tracking and adaptive step control strategy. Adaptive Behavior 24 (1), pp. 52–65. Cited by: §2.
  • [37] D. Göger, M. Blankertz, and H. Wörn (2010) A tactile proximity sensor. IEEE Sensors, pp. 589–594. Cited by: 4th item.
  • [38] M. Gridseth, K. Hertkorn, and M. Jagersand (2015-06) On visual servoing to improve performance of robotic grasping. In Conf. on Computer and Robot Vision, Vol. , pp. 245–252. Cited by: §3.3.2, Table 1, Table 3, Table 4, Table 5.
  • [39] M. Gridseth, O. Ramirez, C. P. Quintero, and M. Jagersand (2016) ViTa: visual task specification interface for manipulation with uncalibrated visual servoing. In IEEE Int. Conf. on Robotics and Automation, ICRA, Cited by: §3.3.2, Table 1, Table 3.
  • [40] D. Ha, Q. Sun, K. Su, H. Wan, H. Li, N. Xu, F. Sun, L. Zhuang, N. Hu, and P. Wang (2015) Recent achievements in electronic tongue and bioelectronic tongue as taste sensors. Sensors and Actuators Part B: Chemical 207, pp. 1136 – 1146. Cited by: §2.
  • [41] S. Haddadin, A. De Luca, and A. Albu-Schäffer (2017) Robot collisions: a survey on detection, isolation, and identification. IEEE Trans. on Robotics 33 (6), pp. 1292 – 1312. Cited by: §1.
  • [42] D.D. Hoffman (1998) Visual intelligence: how we create what we see. W. W. Norton and Company. Cited by: §5.
  • [43] N. Hogan (1985) Impedance control: an approach to manipulation: parts I-III. ASME Journal of Dynamic Systems, Measurement, and Control 107, pp. 1–24. Cited by: 2nd item, 2nd item.
  • [44] J. Hornstein, M. Lopes, J. Santos-Victor, and F. Lacerda (2006-10) Sound localization for humanoid robots - building audio-motor maps based on the HRTF. In IEEE/RSJ Int. Conf. on Robots and Intelligent Systems, IROS, Vol. , pp. 1170–1176. Cited by: §4.1, Table 1, Table 5.
  • [45] J. Huang, T. Supaongprapa, I. Terakura, F. Wang, N. Ohnishi, and N. Sugie (1999) A model-based sound localization system and its application to robot navigation. Robotics and Autonomous Systems 27 (4), pp. 199 – 209. Cited by: §4.1, Table 1, Table 2, Table 3, Table 4, Table 5.
  • [46] (2014) ISO 13482:2014 Robots and robotic devices - Safety requirements for personal care robots. Technical report International Organization for Standardization, Geneva, Switzerland. Cited by: 1st item.
  • [47] V. K. Narayanan, F. Pasteau, M. Marchal, A. Krupa, and M. Babel (2016-08) Vision-based adaptive assistance and haptic guidance for safe wheelchair corridor following. Comput. Vis. Image Underst. 149 (C), pp. 171–185. Cited by: §3.3.2, Table 1, Table 3, Table 4, Table 5.
  • [48] O. Khatib (1985) Real-time obstacle avoidance for manipulators and mobile robots. In IEEE Int. Conf. on Robotics and Automation, ICRA, Cited by: 1st item, §3.6.1.
  • [49] Y. Kobayashi, M. Habara, H. Ikezazki, R. Chen, Y. Naito, and K. Toko (2010) Advanced taste sensors based on artificial lipids with global selectivity to basic taste qualities and high correlation to sensory scores. Sensors 10 (4), pp. 3411–3443. Cited by: §2.
  • [50] G. Kowadlo and R. A. Russell (2008) Robot odor localization: a taxonomy and survey. Int. Journal of Robotics Research 27 (8), pp. 869–894. Cited by: §2.
  • [51] M. Kumon, T. Shimoda, R. Kohzawa, I. Mizumoto, and Z. Iwai (2005-08) Audio servo for robotic systems with pinnae. In IEEE/RSJ Int. Conf. on Robots and Intelligent Systems, IROS, Vol. , pp. 1881–1886. Cited by: §3.5.2, Table 1, Table 2, Table 3, Table 4, Table 5.
  • [52] M. Kumon, T. Sugawara, K. Miike, I. Mizumoto, and Z. Iwai (2003-10) Adaptive audio servo for multirate robot systems. In IEEE/RSJ Int. Conf. on Robots and Intelligent Systems, IROS, Vol. 1, pp. 182–187. Cited by: §3.5.2, Table 1, Table 5.
  • [53] S. M. La Valle (2006) Planning algorithms. Cambridge Univ. Press. Cited by: §1.
  • [54] Q. Leboutet, E. Dean-León, and G. Cheng (2016) Tactile-based compliance with hierarchical force propagation for omnidirectional mobile manipulators. In IEEE-RAS Int. Conf. on Humanoid Robots, Cited by: §3.6.2.
  • [55] Q. Li, C. Schürman, R. Haschke, and H. Ritter (2013) A control framework for tactile servoing. In Robotics: Science and Systems (RSS), Cited by: §3.4.2.
  • [56] C. H. Lin, J. A. Fishel, and G. E. Loeb (2013) Estimating point of contact, force and torque in a biomimetic tactile sensor with deformable skin. Technical report SynTouch LLC. Cited by: §3.4.2.
  • [57] Y. Maeda, T. Hara, and T. Arai (2001) Human-robot cooperative manipulation with motion estimation. In IEEE/RSJ Int. Conf. on Robots and Intelligent Systems, IROS, Vol. 4, pp. 2240–2245. Cited by: §3.4.2, Table 1, Table 3, Table 4.
  • [58] A. Magassouba, N. Bertin, and F. Chaumette (2016) Binaural auditory interaction without HRTF for humanoid robots: a sensor-based control approach. In See, Touch, and Hear: 2nd Workshop on Multimodal Sensor-based Robot Control for HRI and Soft Manipulation, IROS, Cited by: §3.5.1, §3.5.2, Table 1, Table 2, Table 3, Table 4, §5, Table 5.
  • [59] A. Magassouba, N. Bertin, and F. Chaumette (2015-Sept) Sound-based control with two microphones. In IEEE/RSJ Int. Conf. on Robots and Intelligent Systems, IROS, Vol. , pp. 5568–5573. Cited by: §3.5.1, §3.5.2, Table 1, Table 5.
  • [60] A. Magassouba, N. Bertin, and F. Chaumette (2016-05) First applications of sound-based control on a mobile robot equipped with two microphones. In IEEE Int. Conf. on Robotics and Automation, ICRA, Vol. , pp. 2557–2562. Cited by: §3.5.1, §3.5.2.
  • [61] A. Magassouba, N. Bertin, and F. Chaumette (2016-10) Audio-based robot control from interchannel level difference and absolute sound energy. In IEEE/RSJ Int. Conf. on Robots and Intelligent Systems, IROS, Vol. , pp. 1992–1999. Cited by: §3.5.1, §3.5.2, Table 1, Table 5.
  • [62] M. McBeath, D. Shaffer, and M. Kaiser (1995) How baseball outfielders determine where to run to catch fly balls. Science 268 (5210), pp. 569–573. Cited by: §3.1.
  • [63] G. Morel, E. Malis, and S. Boudet (1998) Impedance based combination of visual and force control. In IEEE Int. Conf. on Robotics and Automation, ICRA, Vol. 2, pp. 1743–1748. Cited by: 2nd item, Figure 3, §4.2.
  • [64] K. Nakadai, H. Nakajima, M. Murase, S. Kaijiri, K. Yamada, T. Nakamura, Y. Hasegawa, H. G. Okuno, and H. Tsujino (2006) Robust tracking of multiple sound sources by spatial integration of room and robot microphone arrays. In IEEE Int. Conf. on Acoustics Speech and Signal Processing, Cited by: 3rd item.
  • [65] L. Natale and E. Torres-Jara (2006) A sensitive approach to grasping. In Proc. of the 6th Int. Workshop on Epigenetic Robotics, Cited by: §3.4.2.
  • [66] L. Natale, G. Metta, and G. Sandini (2002) Development of auditory-evoked reflexes: visuo-acoustic cues integration in a binocular head. Robotics and Autonomous Systems 39 (2), pp. 87 – 106. Cited by: §4.2, Table 1, Table 2, Table 3, Table 4, Table 5.
  • [67] S. E. Navarro, M. Schonert, B. Hein, and H. WWörn (2014) 6D proximity servoing for preshaping and haptic exploration using capacitive tactile proximity sensors. In IEEE/RSJ Int. Conf. on Robots and Intelligent Systems, IROS, Cited by: §4.1, Table 1, Table 2, Table 3, Table 4, Table 5.
  • [68] B. J. Nelson and P. K. Khosla (1996) Force and vision resolvability for assimilating disparate sensory feedback. IEEE Trans. on Robotics and Automation 12 (5), pp. 714–731. Cited by: §4.
  • [69] J. Nocedal and S. Wright (2000) Numerical Optimization. Springer Series in Operations Research and Financial Engineering. Cited by: §3.2.
  • [70] H. G. Okuno, K. Nakadai, K. I. Hidai, H. Mizoguchi, and H. Kitano (2001) Human-robot interaction through real-time auditory and visual multiple-talker tracking. In IEEE/RSJ Int. Conf. on Robots and Intelligent Systems, IROS, Vol. 3, pp. 1402–1409. Cited by: §4.1, Table 1, Table 2, Table 3, Table 4, Table 5.
  • [71] H. G. Okuno, K. Nakadai, T. Lourens, and H. Kitano (2004-05) Sound and visual tracking for humanoid robot. Applied Intelligence 20 (3), pp. 253–266. Cited by: §4.1.
  • [72] X. S. Papageorgiou, C. S. Tzafestas, P. Maragos, G. Pavlakos, G. Chalvatzaki, G. Moustris, I. Kokkinos, A. Peer, B. Stanczyk, E. Fotinea, and E. Efthimiou (2014) Advances in intelligent mobility assistance robot integrating multimodal sensory processing. In Universal Access in Human-Computer Interaction. Aging and Assistive Environments, pp. 692–703. Cited by: §4.1, Table 1, Table 2, Table 3, Table 4, §5, Table 5.
  • [73] C. Passenberg, A. Peer, and M. Buss (2010) A survey of environment- operator- and task-adapted controllers for teleoperation systems. Mechatronics 20 (7), pp. 787 – 801. Cited by: §3.4.2.
  • [74] S. Phoha (2014) Machine perception and learning grand challenge: situational intelligence using cross-sensory fusion. Frontiers in Robotics and AI 1, pp. 7. Cited by: §1.
  • [75] J. Pomares, I. Perea, G. J. García, C. A. Jara, J. A. Corrales, and F. Torres (2011) A multi-sensorial hybrid control for robotic manipulation in human-robot workspaces. Sensors 11 (10), pp. 9839–9862. Cited by: §4.3, Table 1, Table 2, Table 3, Table 4, Table 5.
  • [76] U. Proske and S. C. Gandevia (2012) The proprioceptive senses: their roles in signaling body shape, body position and movement, and muscle force. Physiol Rev. 92 (4), pp. 1651–1697. Cited by: 2nd item.
  • [77] F. Rahbar, A. Marjovi, P. Kibleur, and A. Martinoli (2017) A 3-D bio-inspired odor source localization and its validation in realistic environmental conditions. In IEEE/RSJ Int. Conf. on Robots and Intelligent Systems, IROS, Vol. , pp. 3983–3989. Cited by: §2.
  • [78] M. H. Raibert and J. J. Craig (1981) Hybrid position/force control of manipulators. ASME J. Dyn. Syst. Meas. Control (103), pp. 126–133. Cited by: 2nd item, 1st item, §4.
  • [79] L. Rayleigh (1907) On our perception of sound direction. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 13 (74), pp. 214–232. Cited by: 3rd item.
  • [80] R. A. Russell (2006) Tracking chemical plumes in 3-dimensions. In IEEE Int. Conf. on Robotics and Biomimetics, Vol. , pp. 31–36. Cited by: §2.
  • [81] B. V. H. Saxberg (1987-05-01) Projected free fall trajectories. Biol. Cyber. 56 (2), pp. 159–175. Cited by: §3.1.
  • [82] T. Schlegl, T. Kröger, A. Gaschler, O. Khatib, and H. Zangl (2013) Virtual whiskers – highly responsive robot collision avoidance. In IEEE/RSJ Int. Conf. on Robots and Intelligent Systems, IROS, Vol. . Cited by: §3.6.2, Table 1, Table 4.
  • [83] A. Schmitz, P. Maiolino, M. Maggiali, L. Natale, G. Cannata, and G. Metta (2011) Methods and technologies for the implementation of large-scale robot tactile sensors. IEEE Trans. on Robotics 27(3), pp. 389 – 400. Cited by: 2nd item.
  • [84] L. Shapiro (2010) Embodied cognition. New Problems of Philosophy, Taylor & Francis. External Links: ISBN 9780203850664 Cited by: §3.1.
  • [85] H. Shimazu, K. Kobayashi, A. Hashimoto, and T. Kameoka (2007) Tasting robot with an optical tongue: real time examining and advice giving on food and drink. In Human Interface and the Management of Information. Methods, Techniques and Tools in Information Design, M. J. Smith and G. Salvendy (Eds.), Cited by: §2.
  • [86] R. Smits, T. De Laet, K. Claes, H. Bruyninckx, and J. De Schutter (2008) iTASC: a tool for multi-sensor integration in robot manipulation. In IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, pp. 426–433. Cited by: §4.
  • [87] M. Suphi Erden and B. Maric (2011) Assisting manual welding with robot. Robotics and Computer Integrated Manufacturing 27, pp. 818–828. Cited by: §3.4.2.
  • [88] M. Suphi Erden and T. Tomiyama (2010) Human intent detection and physically interactive control of a robot without force sensors. IEEE Trans. on Robotics 26 (2), pp. 370–382. Cited by: §3.4.2.
  • [89] G. Taylor and L. Kleeman (2006) Visual Perception and Robotic Manipulation: 3D Object Recognition, Tracking and Hand-Eye Coordination. Springer Tracts in Advanced Robotics, Springer. Cited by: §4.
  • [90] K. M. Tsui, D. Kim, A. Behal, D. Kontak, and H. A. Yanco (2011-01) I want that: human-in-the-loop control of a wheelchair-mounted robotic arm. Appl. Bionics Biomechanics 8 (1), pp. 127–147. External Links: ISSN 1176-2322 Cited by: §3.3.2.
  • [91] L. Villani and J. De Schutter (2008) Force Control. In Springer Handbook of Robotics, B. Siciliano and O. Khatib (Eds.), pp. 161–185. Cited by: 2nd item, §3.4.1.
  • [92] V. Villani, F. Pini, F. Leali, and Cristian. Secchi (2018) Survey on human–robot collaboration in industrial settings: Safety, intuitive interfaces and applications. Mechatronics 55, pp. 248–266. Cited by: §1.
  • [93] Y. Wang, C. Smith, Y. Karayiannidis, and P. Ögren (2015-Sept) Cooperative control of a serial-to-parallel structure using a virtual kinematic chain in a mobile dual-arm manipulation application. In IEEE/RSJ Int. Conf. on Robots and Intelligent Systems, IROS, Vol. , pp. 2372–2379. Cited by: §3.4.2, Table 1, Table 3, Table 5.
  • [94] N. Wettels, V. J. Santos, R. S. Johansson, and G. Loeb (2008) Biomimetic tactile sensor array. Advanced Robotics 22(8), pp. 829–849. Cited by: 2nd item.
  • [95] D.E. Whitney (1969) Resolved motion rate control of manipulators and human prostheses. IEEE Trans. Man-Mach. Syst. 10(2), pp. 47–53. Cited by: §3.2.
  • [96] A. Wilson and S. Golonka (2013) Embodied cognition is not what you think it is. Frontiers Psych. 4, pp. 58. Cited by: §3.1.
  • [97] K. Youssef, S. Argentieri, and J. L. Zarader (2012) Towards a systematic study of binaural cues. In IEEE/RSJ Int. Conf. on Robots and Intelligent Systems, IROS, Vol. , pp. 1004–1009. Cited by: 1st item.
  • [98] H. Zhang and N. N. Chen (2000-10) Control of contact via tactile sensing. IEEE Trans. on Robotics and Automation 16(5), pp. 482–495. Cited by: §3.4.2.