Efficient and Trustworthy Social Navigation Via Explicit and Implicit Robot-Human Communication

10/26/2018 ∙ by Yuhang Che, et al. ∙ Stanford University 0

In this paper, we present a planning framework that uses a combination of implicit (robot motion) and explicit (visual/audio/haptic feedback) communication during mobile robot navigation in a manner that humans find understandable and trustworthy. First, we developed a model that approximates both continuous movements and discrete decisions in human navigation, considering the effects of implicit and explicit communication on human decision making. The model approximates the human as an optimal agent, with a reward function obtained through inverse reinforcement learning. Second, a planner uses this model to generate communicative actions that maximize the robot's transparency and efficiency. We implemented the planner on a mobile robot, using a wearable haptic device for explicit communication. In a user study of navigation in an indoor environment, the robot was able to actively communicate its intent to users in order to avoid collisions and facilitate efficient trajectories. Results showed that the planner generated plans that were easier to understand, reduced users' effort, and increased users' trust of the robot, compared to simply performing collision avoidance.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 7

page 10

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Mobile robots are entering human environments, with applications ranging from delivery and support in warehouses, to home and social services. In this work we focus on social navigation, in which the movements and decisions of robots and humans affect each other. Previously, researchers have investigated the problem of generating human-aware or human-like motions for mobile robots [1, 2, 3, 4, 5, 6]. Such motions are important because intelligent robots in human environment are expected to comply with social norms. However, there are a few limitations. First, many mobile robots do not look like humans and they move differently. Therefore, attempting to mimic human behavior may cause misunderstanding. Second, humans usually avoid each others’ spaces smoothly and efficiently. However, humans sometimes rely on communication to resolve conflicts. Robot motions alone lack the ability to deal with such complex navigation scenarios.

Fig. 1: A social navigation scenario in which the robot communicates its intent (to yield to the human) to the human both implicitly and explicitly. Implicit communication is achieved via robot motion (slowing down and stopping), and explicit communication is achieved through a wearable haptic interface.

To address these challenges, we incorporate two types of communication in robot social navigation. We distinguish the concept of explicit communication from implicit communication (Fig. 1). In the context of human-human interaction, explicit communication often refers to verbal communication, and implicit communication often refers to non-verbal communication. Here, we extend the definition of explicit communication to include any non-verbal communication modality that carries explicit information that can be interpreted with little or no ambiguity. Examples include visual displays (e.g., colors and symbols) and haptic feedback (e.g., vibration, force, and skin deformation). In this work, we focus on the use of haptic feedback because it is immediate and targeted. In addition, haptic feedback is advantageous when users’ visual and auditory channels are already overloaded. However, the algorithms developed in this work are not limited to haptic communication and can be directly applied to other forms of explicit communication.

We refer to implicit communication as non-verbal actions, such as eye contact and body language, that indirectly convey information. These actions could be intentional or unintentional, and humans must infer the underlying information. The most common type of implicit communication in navigation scenarios is movement. For example, we can often infer whether a person is in a hurry by observing how quickly they move. Similarly, a robot’s motion carries information about its intent.

We consider the effects of both implicit and explicit communication on human’s navigation behaviors. We develop models that capture these effects and predict human behaviors, including both continuous movements and discrete decisions. Leveraging the learned human models, we develop an algorithm that plans for a robot’s motion and communication through haptic feedback. Our approach relies on the assumption that users are cooperative – they will not intentionally thwart the robot. This should be generally true in environments where humans and robots work together, such as offices, homes, hospitals, and warehouses. We also assume that users are equipped with wearable haptic interfaces, and the robot can send feedback via these interfaces. Our approach is applicable to other communication modalities; the haptic interface can be replaced with sound or visual cues if appropriate in the scenario.

The main contributions of this work are:

  1. A predictive model of human navigation behavior in response to both implicit and explicit robot communication.

  2. An interactive planning algorithm based on the human model that enables a robot to proactively communicate through either implicit or explicit communication, with the goal of efficient and transparent interaction.

  3. Implementation of the proposed algorithm on a physical mobile robot platform, and analysis and verification of the algorithm with user studies.

The remainder of the paper is organized as follows. In Section II, we review existing techniques in social navigation and human-robot interaction. We present an overview of the proposed planning framework in Section III. Sections IV and V explain our model and planning algorithm. Initial evaluations in simulation are discussed in Section VI. Finally, we present our experimental evaluation and results in Section VII, and discuss directions for future work in Section VIII.

Ii Background

Ii-a Social Navigation

In traditional robot motion planning frameworks, humans are usually treated as obstacles that the robot should avoid. However, in social navigation scenarios, humans are different from most other obstacles – they have a purpose, follow social rules, and react to other agents in the environment, including the robot. Researchers have explored various methods for social-aware motion planning.

Early work in social navigation was inspired by social rules for human navigation, such as keeping appropriate distances between people and passing on specific side [7]. For example, Nakauchi and Simmons developed a robot that stands in line and considers personal spaces [8], Pacchierotti et al. presented a system for navigation in hallway based on the rules of proximics [9]. Kirby et al. and Svenstrup et al. proposed motion planning methods that incorporated proximics-based costs [10, 11]. Sisbot et al. combined several factors including accessibility and vision field in a human-aware planner [1]. In addition to spatial relationships, temporal factors are considered to generate more natural motions [12, 4].

An important aspect that should be considered in social navigation is that humans are reactive – their behaviors could be affected by the robot. Understanding and modeling such behaviors is necessary for planning socially acceptable actions. A popular framework to model human navigation in response to other agents is the Social Force Model (SFM) [13], which frames pedestrians as affected by interactive forces that drive them towards their goals and away from each other. Variations of SFM were studied [14, 15] to extend the original idea. SFM has been applied to human-robot interaction and social navigation scenarios [16, 3, 17, 18], and is popular because of the high computational efficiency. The drawback of SFM is that the prediction is not very accurate, and does not capture the stochasticity of human behavior.

Researchers have applied probabilistic methods to modeling human behavior and planning for social navigation. The Interactive Gaussian Process model was proposed in [19]; it takes into account the variability of human behavior and mutual interactions. Trautman et al. used this model to navigate a robot through a crowded cafeteria [2]. Inverse Reinforcement Learning (IRL) [20, 21, 22, 23] is another framework that allows a probabilistic representation of human behavior. IRL was applied in [24, 6, 5] to learn pedestrian models and plan for socially compliant navigation. In general, these approaches aim to generate “human-like” actions – the robot should behave similarly to a human in order to be socially acceptable.

Inspired by the aforementioned works, we also use IRL as part of our learning algorithm to predict humans’ responses. Our work differs from previous approaches in that we model jointly the effects of both explicit communication and robot motions on human navigation behavior. Further, we use this learned model to plan for proactive communication to facilitate social navigation for the purpose of avoiding collisions with people.

Ii-B Expressive Motion in Robotics

Besides social navigation, researchers have investigated the problem of planning interactive and communicative motions in the field of computational human-robot interaction [25]. Dragan et al. formalized the idea of legibility in [26] and proposed methods to generate legible motions for robot manipulators [27]. Legible motions are motions that express the robot’s intent to observers, and can be different from predictable motions [26]. Sadigh et al. modeled a human-robot system jointly as a dynamical system, and proposed methods to plan motions that actively influence human behaviors [28] and further actively gather information of human’s internal state [29, 30]. Similar ideas were explored in human-robot collaboration scenarios. Bestick et al. developed a method to enable a robot to purposefully influence the choices of a person in handover tasks to help the person avoid suboptimal grasps [31]. Nikolaidis et al. studied human-robot mutual adaptation, and proposed a decision framework to allow the robot to guide participants towards a better way of completing the task [32]. In our work, we consider robot motions that can inform or affect people as a form of implicit communication.

Ii-C Non-Verbal Communication Methods

In this work, we extend the traditional definitions of explicit and implicit communication. Besides verbal communication, we also consider non-verbal communications that carry explicit information as explicit communication, such as pointing gestures [33, 34], head nod [35] and visual displays [36]. Haptic feedback has been used as an explicit communication mechanism in human-robot interaction. Scheggi et al. designed a vibrotactile haptic bracelet to guide the user along trajectories that are feasible for human leader-robot follower formation tasks [37, 38]. Sieber et al. used haptic feedback to assist a user teleoperating a team of robots that collaboratively manipulated an object [39]. Here, we use haptic feedback to explicitly convey the robot’s intent to the user in collision avoidance scenarios during navigation.

Non-verbal communication, such as gaze, is considered as implicit if it indirectly conveys information [40, 41, 42]. In this work, we use the motion of the mobile robot, which does not impose any constraint on the appearance of the robot and can naturally be incorporated in social navigation.

Iii A Planning Framework
for Socially-Aware Robot Navigation

Fig. 2: An example scenario where the robot needs to plan for appropriate movement and communication.

Humans have an impressive ability to navigate seamlessly around other people. They walk still walk smoothly and efficiently when faced with a large group of people walking toward or around them in busy streets, concerts, cafeterias, conferences, or office spaces. However, robots lack the same elegance while navigating around people. Our goal is to design mobile robots that navigate in a similar fashion to humans in spaces shared by humans and robots. Our insight for social navigation is that mobile robots – similar to humans – need to pro-actively plan for interactions with people.

In this work, we focus on a scenario in which one robot interacts with one human. To illustrate this, we use the running example illustrated in Fig. 2: a robot and a human need to pass each other either on the left side or on the right side. To achieve this pass smoothly, the robot and human must understand each other’s intent and coordinate their movements. The objective of our planning framework is to generate appropriate explicit communication and robot motions (which also serve as implicit communication) to facilitate human-robot interactions in such scenarios. The explicit communication consists of a finite number of discrete signals, for example, expressing a plan to pass on the left or the right side. The robot motions consist of continuous actions such as the robot’s linear and angular velocities.

We model the joint human-robot system as a fully-observable interactive dynamical system in which the actions of the human and the robot can influence each other. In addition, we consider the stochasticity of the human’s actions and the effect of communication in our planning framework.

Let and denote the physical state of the human and the robot at a certain time step , respectively. The states are continuous, and include positions and velocities of each agent. The robot can apply continuous control inputs that change its state through the following dynamics:

(1)

Similarly, the human’s state is directly changed by her action :

(2)

We let and denote the sequence of robot and human actions over a finite horizon . and are then the state-action trajectories of the robot and the human over the horizon , respectively. We use to refer to the explicit communication actions. Each of the actions and

are vectors with appropriate dimensions. In this work, we focus on a one-dimensional communication action

, i.e., .

We define an overall reward function for the robot . We let this reward function depend on both the robot’s trajectory and the human’s trajectory. This reward function encodes properties such as efficiency (e.g., robot’s velocity, distance to goal), safety (distance to human), and other metrics. In this work, we use an optimization-based planning algorithm, where the robot will optimize the expected sum of its reward function . Eq. (3) represents this expected reward, which is computed over the human’s predicted trajectories . We emphasize that the distribution of is affected by the robot actions. In the next section, we present a hybrid model that predicts this distribution.

We will discuss our planning algorithm in Sec. V. We will use Model Predictive Control (MPC) [43] algorithm, where at every time step, the robot computes a finite horizon sequence of actions to maximize its expected reward:

(3)

In an MPC scheme, at each time step, the robot only executes the first control in the optimal sequence , and then replans at the next time step. The robot also plans on an explicit communication . The robot decides to either communicate (), or does not provide any communication (). When computing the plan at each time step, we assume that explicit communication can only happen at the beginning of the planning horizon. This is because only the first set of actions in the planned sequence is executed, and a new plan will be generated in the next time step.

The key insight of this framework is that the robot is aware of the stochasticity of human behavior and the influences of communication on humans’ actions. As a result, the robot will pro-actively communicate its intent to avoid undesirable situations. Take the example scenario in Fig. 2: if the robot and the human choose to pass each other on different sides, a collision will happen. By expressing its intent verbally (explicitly), the robot minimizes the chance of a potential collision.

Iv Hybrid Model of Human Navigation

To compute the optimal robot actions using Eq. (3), the robot is required to predict human actions over a finite time horizon. Since human actions are stochastic, the robot needs to predict the distribution of . We assume the human dynamics are known. Therefore, the distribution of induces an equivalent distribution over human trajectories for a given and initial state. Modeling this distribution exactly is quite challenging and may be computationally expensive. Our objective is to develop an approximate model that captures the interactive nature of human behavior, i.e., how the human will react to the robot’s movements and explicit communication.

On a high level, we model human navigation behavior as a joint mixture of distribution over the trajectory, similar to [5]:

(4)

The first part of the mixture, , models the distribution of the continues state-actions given a specific decision . The second part, , models the distribution of the discrete decision, . represents whether the human decides to pass the robot on the left or the right side for the example shown in Fig. 2. The discrete decision affects the distribution of the continuous actions – if the human decides to pass on the left side, then the possible trajectories would be restricted to one homotopy class.

For computational efficiency, we use relatively simple models for human and robot dynamics. The human state consists of positions and velocities in the 2D plane, and the robot state is simply its pose. We use a constant acceleration model for human dynamics. Thus, the human control input is . For the robot dynamics, we use a differential drive kinematic model, with control input , consisting of linear and angular velocities.

The formulation in Eq. (4) describes our modeling approach in its most general form. In our specific setup, the distribution should be conditioned on the history of human and robot trajectories, history of explicit communication, and future robot actions over the prediction horizon. To numerically compute the distribution and use it for planning, we make two assumptions:

Assumption 1. The human’s trajectory over the prediction horizon depends on the environment, , (including the human’s goal) and the future robot trajectory :

(5)

Here the human’s actions depend on the robot’s actions. However, the robot also needs to predict the human’s actions and plan accordingly. This inter-dependency leads to the infinite regress problem, i.e., a sequence of reasonings about each others’ actions that can never come to an end. To resolve this problem, we assume that the human has access to the future robot trajectory, and model the interaction between the human and the robot as a two-player Stackelberg game (leader-follower game). We argue that this assumption is reasonable: given relatively short planning horizon, humans are usually able to predict immediate actions of other agents. Since the focus of this work is the interaction between the robot and the human, the goals and environment are assumed to be known and fixed.

Assumption 2. We assume that humans make discrete decisions using only past information. With this, we can express the second term in Eq. (4) as:

(6)

where are the past trajectories of the robot and the human, and is the past explicit communications over a horizon of length . We further assume that only the most recent explicit communication actually affects the decision:

(7)

Here represents the most recent explicit communication, and represents the time when is communicated. Note that at each time step, the robot can decide to not perform any explicit communication (). The most recent communication refers to the most recent , e.g., the last time haptic feedback was provided.

In the next two subsections, we discuss details of modeling humans’ continuous navigation actions as in Eq. (5) and the humans’ discrete decision actions as in Eq. (7).

Iv-a Modeling a Human’s Continuous Navigation Actions

Inverse Reinforcement Learning. We employ a data-driven approach to model humans’ continuous navigation actions. Specifically, we use maximum entropy inverse reinforcement learning (MaxEnt IRL) [23, 21] to learn the distribution of human actions that matches a set of provided demonstrations in expectation. We assume the humans are approximately optimizing a reward function for a given discrete class

. Under this model, the probability of the actions

is proportional to the exponential of the total reward of the trajectory:

(8)

We parameterize the human reward function as a linear combination of features that capture relevant properties of the navigation behavior. The features are weighted by , which can be learned by maximizing the overall likelihood of the provided demonstrations :

(9)

The weight vector is a function of the discrete decision variable: . Given a selected set of features such as distance to the other agents, heading, or velocity, we learn appropriate weights corresponding to the humans’ reward functions for each discrete class from collected demonstrations. We will discuss the specific features used in this work in Sec. VI.

Iv-B Modeling a Human’s Discrete Navigation Decisions

When modeling the distribution of a human’s discrete decision, we consider the effect of both implicit communication (via robot movements) and explicit communication (via haptic feedback). We assume that the human will infer the robot’s intent, and act cooperatively during the interaction. Mathematically, this suggests that the distribution of the human’s decision is related to her belief over the robot’s intent, . In our particular social navigation problem, we assume this belief is equal to probability of choosing discrete actions:

(10)

In other words, if the human believes that there is an 80% chance that the robot will yield priority, she will decide to take priority with the same probability of 0.8.

Using Bayes rule, we can transform Eq. (10) to:

(11)

The second step in Eq. (11) assumes conditional independence of the robot trajectory and explicit communication . With this factorization, we can separately model the effect of robot motion (implicit communication) and explicit communication on the human’s decision making. The last term is the prior on the robot’s intent, which should be decided based on the application. In our implementation, there is an equal chance for the robot communicates different intents. So we choose a uniform prior.

The formulation casts the backward inference problem (from action to intent) into forward prediction problems (from intent to action). Next, we derive methods to compute each part of Eq. (11).

Implicit Communication.

We assume that the human in general expects the robot to be rational and efficient according to some reward function. Applying the principle of maximum entropy, we model the human as expecting robot movements with probability:

(12)

where is a reward function that the human expects the robot to optimize, given its intent . To compute the integration in the denominator of Eq. (12), we use the second-order Taylor series to approximate .

Explicit Communication. The human’s belief over the robot’s intent is strongly affected by explicit communication, because the intent is directly conveyed. However, the effect of explicit communication should decay over time, as the robot’s intent may change, and only short-term intents are communicated. Inspired by a model of human short-term verbal retention [44], we propose:

(13)

Here and are parameters that determine the characteristic of the distribution, and is a normalization factor. The explicit communication initially reflects the true intent with very high probability. However, the inference strength decays over time, and eventually the communication becomes irrelevant to the robot’s latest intent.

Fig. 3: Inference on the robot’s intent based on explicit communication. Assuming that happened at

To clarify this model, let us consider a simpler version of Eq. (10): – to infer the robot’s intent with only explicit communication. Assume that is binary, , and , and use Bayes rule again:

(14)

The change of over time is plotted in Fig. 3. Initially, the robot’s intent can be inferred with high probability given the explicit communication. As time passes, the inference strength decreases and eventually the belief over the robot’s intent becomes uniform (0.5).

In the scenarios we consider, the human infers the robot’s intent with both implicit and explicit communication. The combined effect can be computed with Eqs. (11) – (13) given in this section.

V Planning for Communication

The general planning framework was described in Section III. In this section, we discuss the details of the algorithm, including the design of reward functions, derivation of the solution to the optimization, and an outline of our implementation.

V-a Robot Reward Function

The overall reward that robot optimizes in Eq. (3) consists of four parts that quantify robot efficiency, human comfort, safety, and reward of explicit communication:

Robot Efficiency. The robot should get as close as possible to its target with minimum effort:

(15)

where is a weight that balances effort and goal. This component determines how fast the robot approaches the goal. is the robot position at time step , and is the goal position.

Human Comfort. We want the human to spend less effort to achieve her goal:

(16)

Similar to , balances between human reaching her goal and her effort.

Safety. The robot should avoid collisions with the human:

(17)

Explicit Communication. Too much explicit communication will distract or annoy the human. Therefore, we set a constant reward for performing each explicit communication:

(18)

The components defined above are then combined linearly with weights :

(19)

V-B Human-Aware Planning

To solve for the optimal robot actions using Eq. (3), we need to compute the expectation of the robot reward over the distribution of . In the last section, we derived a hybrid model for human behavior. To speed up computation, here we only consider the most likely trajectory given each possible human intent. With this, the expectation in Eq. (3) can be expressed as:

(20)

where is the most likely human response given , and can be computed as:

(21)

The distribution is given by Eqs. (10) and (11).

We use gradient-based optimization to solve the optimal robot controls . This requires the gradient information of the expected reward in Eq. (20). Letting , we aim to find:

(22)

where is the most likely human actions. As we have a symbolic representation of , both and can be computed analytically. Following the implicit differentiation method discussed in [28], we compute the last unknown term using the following expression:

(23)

We have assumed a discrete set of communication actions . To obtain the optimal explicit communication , we enumerate all possible communication actions and solve the optimization for each . This can be done in parallel because the optimizations do not depend on each other.

Data: - set of explicit communicative actions
1 repeat
       /* get current+predicted states */
2       GetTrackingInfo();
3       PredictOneStep();
       /* update the hybrid model */
4       UpdateModel();
       /* find the optimal actions */
5       GenerateInitGuess();
6       ComputePlan();
7       ;
8       for  do
9             ComputePlan();
10             if  then
11                   ;
12                   ;
13                   ;
14                  
15             end if
16            
17       end for
      /* execute the actions */
18       if  then
19             Communicate();
20             ;
21             ;
22            
23       end if
24      ExecuteControl();
25       ;
26      
27until GoalReached(, , );
Algorithm 1 Outline of the planning algorithm.

The planning algorithm is outlined in Alg. 1. At every time step, the algorithm first retrieves states of the robot and the human, and performs a one step prediction of robot and human states in order to compensate time spent for planning. Then it updates the belief over the human’s discrete decision with new observations using equations (10) and (11). Before optimizing for robot actions, the algorithm needs to generate an initial guess for the initial state of the human. If there is a plan from the previous time step, then the plan is used as the initial guess. When there is no previous plan (first time step, or first detection of human), we generate the initial guesses as follows: First, we compute the robot actions ignoring the human. We use a feedback control based policy to steer the robot towards its goal [45]. We then compute the human actions using an attracting potential field at the goal position, and a repelling potential field centered at the robot. Finally, with the initial guess, the algorithm can perform the optimization and compute the robot movements and the explicit communication .

We implemented the planning algorithm in C++, and used the software package NLopt [46] to perform numerical optimization. In our implementation, we chose a planning horizon of , and a time step of 0.5 seconds.

Vi Simulation Results

Vi-a Social Navigation Scenario

We consider the scenario shown in Fig. 1 for our simulation and experimental evaluation: a human and a robot move in (approximately) orthogonal directions and encounter each other. In this scenario, they have to coordinate with each other and decide who should pass first to avoid collision. The robot can explicitly communicate its intent: to yield priority to the human (human priority) or not (robot priority). To numerically evaluate the model and compute the optimal plan, we need to define features leading to a learned reward function, which can then be used to model humans’ continuous actions Eq. (8) as well as discrete decisions Eq. (12) of the human.

We select features that are commonly used to characterize social navigation of pedestrians [47]:

[leftmargin=0cm]

Velocity.

Pedestrians tend to maintain a constant desired velocity. We use a feature that sums up the squared velocity along the trajectory:

(24)

Here, and in the following equations, represents L2 norm.

Acceleration.

To navigate efficiently, pedestrians usually avoid unnecessary accelerations:

(25)
Distance to goal.

Pedestrians typically try to get to the target position as close as possible given a time horizon:

(26)
Avoiding static obstacles.

Pedestrians avoid static obstacles in the environment:

(27)

where is the position of the closest obstacle to the human at time , and is a scaling factor.

Collision avoidance with the robot.

Pedestrians tend to avoid each other in navigation. We assume that they avoid the robot in a similar fashion:

(28)
Avoiding the front side of the robot.

In addition to avoiding the robot, we observe that humans tend to not cut in front of the robot, especially when they think that the robot has priority. This behavior is captured by the feature:

(29)

where is a position in front of the robot, and scales the Gaussian and aligns it with the robot’s orientation.

Fig. 4: Visualization of (a subset of) the features. Warm color indicates high reward and cool color indicates low reward. Human trajectory is in black, and robot trajectory is in red. Arrows indicate the positions and moving directions at the specified time step.

The features (except for velocity and acceleration) are visualized in Fig. 4. Human and robot trajectories are from a demonstration we collected to train the IRL model. The figure shows that the human indeed avoided low reward regions (cool color) and navigated to the high reward region (warm color).

Fig. 5: Sample predictions using the learned reward functions and cross validation. (a)-(b) show an example where the prediction matches the actual measurement. (c)-(d) show an example where initially the prediction doesn’t match the measurement. We use the model to re-predict human actions at each time step, and the prediction starts to match measurement after two seconds. (e) Cross validation of prediction accuracy. Type I corresponds to cases where the model initially predicts the same homotopy class as the demonstration ((a), (b)), and type II corresponds to cases where the model initially predicts a different homotopy class ((c), (d)). Blue indicates errors of the IRL model, and orange indicates errors of the social force (SF) model.

Rewards for Discrete Decision Model

In general, it is challenging to to define the human’s expected reward function , as it is not measurable. Here we design two reward functions based on our observations. Data driven methods like IRL can potentially produce more accurate results. Our goal, however, is to derive an approximate model that can inform the planning algorithm. Therefore, we prefer to start with simpler designs:

Human Priority (). When the robot’s intent is to yield priority to the human, it should avoid moving directly towards the human and getting too close. We use a reward function:

(30)

where is a unit vector that points from the robot to the human: . Note that here means that the summation is over the previous time steps.

Robot Priority (). When the robot has priority, the human would expect it to move with a desired velocity, regardless of the human’s state:

(31)

Here is the robot’s speed and is the desired speed.

In Eqs. (30) and (31), the reward at each time step is divided by because when the robot is relatively far from the user, its actions carries less information about its intent. We use the operation to achieve numerically stable results.

Vi-B Human Model Evaluation

Data Collection for Learning Human Model. We collected navigation demonstrations from four users. Each user was asked to walk back and forth between two target locations. At the same time, a mobile robot (TurtleBot2, Open Source Robotics Foundation, Inc.) also moved around in the environment. We collected data for two scenarios: the human priority scenario and the robot priority scenario. When the user had priority, the robot would slow down and yield to the user upon encounter. When the robot had priority, it would simply ignore the user and continue moving. To distract the human from focusing solely on the robot, we asked the users to remember two numbers at one target location, and answer arithmetic problems using the two numbers at the other target location.

For each user, we recorded 64 trials for each scenario (a trial is moving from one target location to the other). We used 80% of the data to train the model, and the other 20% as the testing set.

Prediction with Learned Reward Function. Using MaxEnt IRL, we recover the human reward functions in human priority and robot priority scenarios. The reward functions can be used to compute the most likely continuous navigation actions using Eq. (21). This computes the actions over a fixed time horizon. To predict the entire trajectory, we use an MPC approach: we compute the most likely actions starting from time step , predict the human state at , and recompute the most likely actions starting from .

Fig. 5 presents two prediction examples. In the first example ((a)-(b)), the predicted trajectory matches the measured trajectory very well (same homotopy class, type I). In the second example ((c)-(d)), the model initially predicts a trajectory that passes the obstacle on a different side (different homotopy class, type II). Although the prediction does not match the measured trajectory, it is still reasonable. In this example, both the predicted and measured trajectories pass behind the robot. Passing on either side of the obstacle has little effect on the reward and is almost equally likely. When predicting at s, because the human already started to move toward one side of the obstacle, the model is able to generate a more accurate prediction.

The cross validation result is shown in Fig. 5(e). The prediction error is calculated as the average Euclidean distance between the predicted and measured trajectories:

(32)

where is the total number of time steps. We compute the prediction error separately for cases where the initial prediction is in the same homotopy class as the measurement (type I), and cases where the initial prediction and the measurement are in different homotopy classes (type II). It can be observed that the initial prediction error is much smaller for type I, but both types becomes more accurate as the human gets closer to the goal. Compared with a social force model we used previously [18], the model described here performed better for this scenario.

Evaluation of the Discrete Decision Model.

Fig. 6: Demonstrations of the discrete decision model in different scenarios. (a) A scenario where the robot slowed down to let the user pass first. Top plots show the trajectories of the robot and the user, and a map of the environment at three different time steps. Bottom plot shows the user’s belief over the robot’s intent (human priority) over time, given different explicit communication at s, predicted by our model. (b) A scenario where the robot didn’t slow down and passed first.

In addition to continuous trajectories, our model can also predict discrete decisions, or equivalently, human’s belief over the robot’s intent. As it is impossible to measure this belief in an experiment, we aim to show that the prediction is reasonable with a few test scenarios.

Fig. 6(a) presents a scenario where the robot slowed down to let the user pass first. The scenario is illustrated with the three plots in the top row, showing trajectories and the environment at different time steps. The data is from one demonstration we collected for training the IRL model. The plot in the bottom row shows the predicted belief over time, given no explicit communication, and two different explicit communication actions at s: communicating human priority () and communicating robot priority (). In the case of no communication, the belief decreases initially, but rises up as the robot stops to yield to the user. The model suggests that the user can infer the robot’s intent (human priority) to some extent by observing its movement (slowing down). When the robot communicates human priority at the beginning, the belief jumps to a high value and maintains this value afterwards. An interesting test scenario is when the robot communicates robot priority, which is not its true intent. According to our model, initially the belief drops, indicating that the user believes the explicit communication. However, when the robot slows down to let the user pass first, the belief starts to increase, indicating that the user starts to believe otherwise as she observes the robot movements.

We also tested scenarios where the robot’s intent is to give itself priority. An example is shown in Fig. 6(b). Similarly, we show the change in belief given different explicit communication in the bottom plot. These examples demonstrate that our model can capture the effect of both explicit and implicit communication, and the prediction matches our intuition.

Vi-C Case Study in Simulation

Fig. 7: (a) Simulated scenario using the proposed interactive planner. In this scenario, the robot slows down and explicitly communicates its intent (human priority) to the human. Top: trajectory snippets at five individual time steps. Black and red arrows represent the positions and moving directions of the human and the robot at the given time step. Thin, solid red lines represent the robot’s planned trajectories, thin solid/dashed black lines represent predicted human trajectories. Bottom: Velocity profile of the robot and the human. (b) Various starting positions of the robot vs. whether explicit communication is used by the planner for the same scenario. Red region represent the starting positions where explicit communication is used.

Before conducting a user study with a physical mobile robot (described Section VII), we first tested our algorithm in simulation. The purpose of the simulation is two-fold: first to validate that the proposed algorithm can work in idea setups, and second to select appropriate parameters that produce reasonable behaviors.

Fig. 7(a) illustrates a simulated scenario where the human and the robot move along orthogonal paths. Here the simulated human user follows a predefined trajectory, and the trajectory is deterministic regardless of the robot’s actions. While this is not realistic, the purpose is to test whether the planner can generate reasonable robot behaviors. We will further validate the effectiveness of the plan with real-world user studies. The subplots in the top row shows the scenario at 5 different time steps, and the subplot in the bottom row shows the speed of the user and the robot over time. We can observe that the robot starts to slow down at s, and explicitly communicates its intent to the user at s. The robot continues moving slowly to allow the user pass first, and then speeds up towards its goal.

Fig. 8: (a) Experimental field. (b-c) Different starting and goal positions for the robot. (d) The wearable haptic interface and the vibration motor.

To understand why the planner does this, we visualize the generated plan and the predictions of human movements, as shown in Fig. 7(a). As the robot speeds up and approaches the intersection, the user becomes unsure of how to avoid the robot. This is reflected in the second subplot in the first row, where the user’s possible future trajectories diverge. In this simulation, we set the reward function coefficients , so that the planner cares more about the human user’s efficiency and comfort. As a result, the planner explicitly communicates human priority to the user to minimize the chance that the user would slow down to yield to the robot.

We tested the planner with various robot starting positions and target positions. Fig. 7(b) presents the relationship of robot starting position and whether explicit communication is generated by the planner, for one specific target position. We can see that explicit communication is only used if the robot starts within the banded region in red. When the robot starts too close to the intersection (upper-right grey region), the planner predicts that the user will let the robot pass first as it is closer to the intersection. When the robot starts too far (bottom-left grey region), the planner predicts that the user will pass first. Explicit communication is used only when it becomes ambiguous who should pass first.

We also studied the effect of the coefficients and in the reward function in Eq. (19). Setting resulted in a submissive robot that would yield to the user when there was a potential collision. Conversely, setting resulted in aggressive robot behaviors. When , the robot tends to be submissive. In our implementation, the robot is usually slower than the human, so it is often more efficient to let the user pass first. Based on the simulation results, we decided to experimentally test two sets of parameters: as robot-prioritized scenario, and as human-prioritized scenario.

Vii Experimental Results

Vii-a Experimental Setup

We again used a TurtleBot2 as the mobile robot platform. The robot was equipped with a Hokuyo URG-04LX-UG01 laser range finder, and an ASUS Zenbook UX303UB as the onboard computer. The planning algorithm is implemented on a desktop computer with Intel Core i7 processor and 16GB RAM. The two computers communicate with each other through a wireless network.

The onboard computer processes the laser range finder readings to localize the robot and estimate the position and velocity of nearby pedestrians. We localize the robot in a static map using laser-based Monte Carlo Localization 

[48]. The robot detects and tracks pedestrians using a leg detection algorithm provided by ROS [49]

. The algorithm first attempts to segment leg objects from laser range finder readings with a pre-trained classifier, then pairs nearby legs and associates the detection with a tracker for each person. The localization and tracking results are sent to the desktop computer that runs our planning algorithm. The planning algorithm computes the plan and sends it back to the robot.

The explicit communication is displayed to the user via a wearable haptic interface that consists of a single vibration motor (Haptuator Mark II, shown in Fig. 8 (d)). The interface is capable of rendering distinct signals by modulating the vibration pattern. In this experiment, we display haptic cues with different vibration amplitudes and durations to indicate the robot’s intent. Robot priority is represented by a single long vibration (duration 1.5s, max current 250 mA), and human priority is represented by 3 short pulses (0.2 sec vibration with 0.2 sec pause in between, max current 150 mA).

The experiment field is illustrated in Fig. 8(a), which is a room of size 8 m10 m. Two tables are placed at the two ends of the field as the targets for the user. To distract the user from focusing solely on the robot, we place questionnaires on the tables, and ask the user to remember and answer arithmetic questions. The entire experiment is recorded with an overhead camera (GoPro Hero 4, recording at 60 Hz), and we post-process the video to extract the trajectories of the user and the robot. To facilitate tracking, we ask the user to wear a purple hat, and we attach an ArUco marker [50] on the top of the robot.

Vii-B Experimental Design

Fig. 9: Comparison of different metrics for three experimental conditions. (a) Percentage of time that the user passed in front of the robot. (b) Average path length. A base length (6.2 m) is subtracted from all for visualization purpose. (c) The user’s trust in the robot. Brackets indicate statistical significance (*, **, ***).

[leftmargin=0cm]

Manipulated Factors.

We manipulate three factors: task priority, robot starting position, and communication mode. Overall, the experiment is divided into three sessions based on the communication mode:

  • explicit + implicit: the robot communicates its intent to the user both explicitly (via haptic feedback) and implicitly (by changing speed and direction), using a model to predict the user’s movement.

  • implicit only: the robot does not plan for explicit communication, but still changes speed and direction according to a model that predicts the user’s movement.

  • baseline: the robot simply performs collision avoidance with the user without predicting the user’s movement.

Each session consists of 20 trials (10 for training). In each trial, the human user and the robot both move from a target position to a goal position, as shown in Fig. 8(b). The robot is assigned one of the two task priorities in each trial: robot priority () or human priority (). We also vary the starting position of the robot. The starting positions can be classified as far (4 out of 20 trials), close (4/20) and normal (12/20), based on the distance to the user’s starting position (Fig. 8(b)). There are equal numbers of high-priority trials and low-priority trials for each type of starting position. In addition to the orthogonal encounter scenario, we also tested the planner in other experimental setups. Fig. 8 (c) presents human-robot encounter at different angles ( and ). We found that the planner worked similarly in these scenarios.

Dependent Measures.

We measure the user’s path length for each trial, whether the user passes the robot in the front, and the user’s trust in the robot. Trust is measured using a post-experiment survey where the user rates his/her trust in the robot on a scale from 1 to 7, 1 being the least trust and 7 being the most trust. In the session where explicit communication is allowed, we also record whether explicit communication happens and the time at which it happens.

Hypothesis.

We hypothesize that:

  1. [label=., widest=IV, align=left, leftmargin=*]

  2. Using explicit + implicit communication conveys the robot’s intent better than implicit only and baseline, such that users will elect to pass in front of or behind the robot as appropriate for a given priority.

  3. The user’s average path length is shorter when the robot plans for communication with the human model (explicit + implicit and implicit only modes).

  4. The user is more trustful of the robot when the robot plans for explicit + implicit communication.

Subject Allocation.

A total of 12 people (7 males and 5 females) participated in the experiment after giving informed consent, under a protocol that was approved by the Stanford University Institutional Review Board. We used a within-subjects design and counterbalanced the order of the three sessions.

Vii-C Analysis and Results

Fig. 9 summarizes major results. We describe the analysis and results in detail in the following paragraphs.

Understanding Robot Intent. To characterize how often the user correctly understood the robot’s intent, we compute the percentage of trials that the user passed in front of the robot for each communication mode and task priority. A one-way repeated measures ANOVA revealed a significant effect on percentage passing in the front for the communication mode ( for human priority trials, for robot priority trials). We performed the statistical test separately for robot and human priority trials. We performed a post-hoc analysis with Tukey HSD to determine pairwise differences. For human priority trials, results show that the baseline mode is significantly different from the implicit only mode () and the implicit + explicit mode (). For robot priority trials, the implicit + explicit mode is significantly different from the implicit only mode () and the baseline mode (). This supports Hypothesis I that the user can better understand the robot’s intent with explicit communication.

Path Length. We computed average path length of each user. Similar to the metric of passing in the front, we performed a one-way repeated measure ANOVA analysis, and results show a significant effect ( for human priority trials, for robot priority trials). Post-hoc comparisons reveal that, for human priority trials, the average path length of the implicit + explicit mode is significantly shorter than the baseline mode (). For robot priority trials, the average path length of the baseline mode is significantly longer than the implicit only mode () and implicit + explicit mode (). The result suggests that the user can navigate more efficiently when the robot plans for communication (Hypothesis II).

Trust. Users’ trust in the robot was measured with a post-experiment survey, as explained in the previous subsection. A one-way repeated measure ANOVA analysis revealed a significant effect for communication mode (). Post-hoc Tukey HSD analysis showed that all three pairs are significantly different from each other ( for all), which supports Hypothesis III.

Fig. 10: Velocity profiles from sample trials. Top: human priority trials. Bottom: robot priority trials.

Vii-D Effect of Communication

Fig. 9(a) shows that when there was both implicit and explicit communication, users were able to understand the robot’s intent most of the time and acted accordingly – users passed in front of the robot 92% of the time in human priority trials, and only 9% of the time in robot priority trials. When there was only implicit communication, users were more confused, especially in robot priority trials (about 50% of the time users passed in front of the robot despite that the robot’s intent was not to yield). When the robot performed collision avoidance naively (baseline condition), users were the most confused, and acted more conservatively.

Fig. 10 depicts the robot’s velocity over time from sample trials of different communication modes and task priorities. In human priority trials (top row), when the robot planned for communication (implicit only and implicit + explicit modes), it started to slow down relatively early, compared to the baseline mode. By doing so, the robot implicitly expressed its intent. Indeed, significantly more users chose to pass in front of the robot with these communication modes (Fig. 9(a)). The reason that the planner generated this behavior is that, with the human model, the planner was able to predict the effect of robot motions on the user’s navigation behavior. On the other hand, the baseline planner did not perform any prediction and simply solved for a collision-free trajectory. As a result, the robot did not slow down until it got very close to the user, which caused the user to believe that it did not want to yield. The result suggests that the planned motion is more legible when human’s reactions are taken into consideration during planning.

Our results also suggest that users can navigate more efficiently when the robot planned for communication. Fig. 9

(b) shows that for human priority trials, users’ path length is significantly shorter in the implicit + explicit mode than the baseline mode. For robot priority trials, both the implicit + explicit mode and the implicit only mode result in significantly shorter path lengths than the baseline mode. Knowing the robot’s intent, users can better coordinate their movements with the robot to navigate more efficiently. In the implicit + explicit mode, users in general got feedback early in the trial. In the implicit only mode, users had to infer the robot’s intent, and tended to be less efficient. This result does not show statistical significance; users’ behaviors have high variance in the implicit only mode. Some users were able to interpret the robot’s intent via its motion, while others could not. In Fig. 

9(a), we observe the same high variance in the implicit only mode.

Finally, we show that users gained more trust in the robot with the proposed planner in Fig. 9 (c). We found statistical significance between each pair of communication modes. This result suggests that transparency is very important for social navigation. The easier users can understand the robot’s intent, the more they trust the robot. When users do not trust the robot, they tend to act conservatively and be less efficient (fewer trials where the user passed in front of the robot, and longer path lengths in the baseline mode).

Viii Conclusion and Future Work

In this paper, we presented a planning framework that leverages implicit and explicit communication to generate efficient and transparent social navigation behaviors. This approach for mobile robot proactive communication is inspired by humans’ ability to use both explicit and implicit communication to avoid collisions during navigation. The planner relies on a new probabilistic model that predicts human movements. We evaluated the planner both in simulation and in a user study with a physical mobile robot. Results showed that the proposed planning algorithm can generate proactive communicative actions, which better expressed the robot’s intent, reduced users’ effort, and increased users’ trust of the robot compared to collision avoidance with and without a model that predicts users’ movements.

There are numerous ways to expand on this work. First, the model of human navigation behavior makes certain assumptions and approximations as described in Section IV. As a result, the planner can not deal with certain scenarios (e.g., if the human suddenly stops and remains stationary). Expanding the data collection for IRL, including additional behaviors, and relaxing assumptions, would improve the model and thus the planner. Second, our planner is computationally expensive, and can only plan actions that are locally optimal. Although we have not observed any unusual behaviors generated by the planner, there is no theoretical guarantee of optimality or safety. One approach to reducing the computational complexity of planning is to use sampling-based methods, instead of performing optimization. In addition to speeding up the computation, another advantage of sampling-based methods is that our probabilistic model can potentially be fully utilized. (Our model captures the whole distribution, while we are only computing the most probable motions of each mode.) Third, we can generalize our approach to consider richer interaction scenarios and other communication modalities. Currently our algorithm only works for interaction between a single human and a single robot. It is possible to extend the framework to consider multiple humans interacting with the robot at the same time. However, extension to densely populated scenarios is not feasible with the current modeling approach. The advantages and disadvantages of various forms of explicit communication (haptic, verbal, and visual) can also be identified in the context of specific application scenarios.

References

  • [1] E. A. Sisbot, L. F. Marin-Urias, R. Alami, and T. Simeon, “A human aware mobile robot motion planner,” IEEE Transactions on Robotics, vol. 23, no. 5, pp. 874–883, 2007.
  • [2] P. Trautman, J. Ma, R. M. Murray, and A. Krause, “Robot navigation in dense human crowds: the case for cooperation,” in IEEE International Conference on Robotics and Automation, 2013, pp. 2153–2160.
  • [3] G. Ferrer and A. Sanfeliu, “Proactive kinodynamic planning using the extended social force model and human motion prediction in urban environments,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2014, pp. 1730–1735.
  • [4] M. Kollmitz, K. Hsiao, J. Gaa, and W. Burgard, “Time dependent planning on a layered social cost map for human-aware robot navigation,” in IEEE European Conference on Mobile Robots, 2015, pp. 1–6.
  • [5] H. Kretzschmar, M. Spies, C. Sprunk, and W. Burgard, “Socially compliant mobile robot navigation via inverse reinforcement learning,” The International Journal of Robotics Research, vol. 35, no. 11, pp. 1289–1307, 2016.
  • [6] B. Kim and J. Pineau, “Socially adaptive path planning in human environments using inverse reinforcement learning,” International Journal of Social Robotics, vol. 8, no. 1, pp. 51–66, 2016.
  • [7] E. Hall, The Hidden Dimension: Man’s Use of Space in Public and Private.   Bodley Head, 1969.
  • [8] Y. Nakauchi and R. Simmons, “A social robot that stands in line,” Autonomous Robots, vol. 12, no. 3, pp. 313–324, 2002.
  • [9] E. Pacchierotti, H. I. Christensen, and P. Jensfelt, “Embodied social interaction for service robots in hallway environments,” in Field and Service Robotics.   Springer, 2006, pp. 293–304.
  • [10] R. Kirby, R. Simmons, and J. Forlizzi, “Companion: A constraint-optimizing method for person-acceptable navigation,” in IEEE International Symposium on Robot and Human Interactive Communication, 2009, pp. 607–612.
  • [11] M. Svenstrup, T. Bak, and H. J. Andersen, “Trajectory planning for robots in dynamic human environments,” in IEEE/RSJ international conference on Intelligent Robots and Systems, 2010, pp. 4293–4298.
  • [12] T. Kruse, A. Kirsch, H. Khambhaita, and R. Alami, “Evaluating directional cost models in navigation,” in ACM/IEEE International Conference on Human-Robot Interaction, 2014, pp. 350–357.
  • [13] D. Helbing and P. Molnar, “Social force model for pedestrian dynamics,” Physical Review E, vol. 51, no. 5, p. 4282, 1995.
  • [14] F. Zanlungo, T. Ikeda, and T. Kanda, “Social force model with explicit collision prediction,” Europhysics Letters, vol. 93, no. 6, p. 68005, 2011.
  • [15] F. Farina, D. Fontanelli, A. Garulli, A. Giannitrapani, and D. Prattichizzo, “Walking ahead: The headed social force model,” PloS ONE, vol. 12, no. 1, p. e0169734, 2017.
  • [16] P. Ratsamee, Y. Mae, K. Ohara, M. Kojima, and T. Arai, “Social navigation model based on human intention analysis using face orientation,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2013, pp. 1682–1687.
  • [17] D. Mehta, G. Ferrer, and E. Olson, “Autonomous navigation in dynamic social environments using multi-policy decision making,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2016, pp. 1190–1197.
  • [18] Y. Che, C. T. Sun, and A. M. Okamura, “Avoiding human-robot collisions using haptic communication,” in IEEE International Conference on Robotics and Automation, 2018, pp. 5828–5834.
  • [19] P. Trautman and A. Krause, “Unfreezing the robot: Navigation in dense, interacting crowds,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010, pp. 797–803.
  • [20] A. Y. Ng and S. Russell, “Algorithms for inverse reinforcement learning,” in

    International Conference on Machine Learning

    , 2000, pp. 663–670.
  • [21] S. Levine and V. Koltun, “Continuous inverse optimal control with locally optimal examples,” in International Conference on Machine Learning, 2012, pp. 475–482.
  • [22] P. Abbeel and A. Y. Ng, “Exploration and apprenticeship learning in reinforcement learning,” in International Conference on Machine Learning.   ACM, 2005, pp. 1–8.
  • [23] B. D. Ziebart, A. L. Maas, J. A. Bagnell, and A. K. Dey, “Maximum entropy inverse reinforcement learning,” in

    AAAI Conference on Artificial Intelligence

    , 2008, pp. 1433–1438.
  • [24] H. Kretzschmar, M. Kuderer, and W. Burgard, “Learning to predict trajectories of cooperatively navigating agents,” in IEEE International Conference on Robotics and Automation, 2014, pp. 4015–4020.
  • [25] A. Thomaz, G. Hoffman, M. Cakmak et al., “Computational human-robot interaction,” Foundations and Trends in Robotics, vol. 4, no. 2-3, pp. 105–223, 2016.
  • [26] A. D. Dragan, K. C. Lee, and S. S. Srinivasa, “Legibility and predictability of robot motion,” in ACM/IEEE International Conference on Human-Robot Interaction, 2013, pp. 301–308.
  • [27] A. D. Dragan and S. S. Srinivasa, “Generating Legible Motion,” in Robotics: Science and Systems, 2013.
  • [28] D. Sadigh, S. Sastry, S. A. Seshia, and A. D. Dragan, “Planning for autonomous cars that leverage effects on human actions,” in Robotics: Science and Systems, 2016.
  • [29] D. Sadigh, S. S. Sastry, S. A. Seshia, and A. Dragan, “Information gathering actions over human internal state,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2016, pp. 66–73.
  • [30] D. Sadigh, N. Landolfi, S. S. Sastry, S. A. Seshia, and A. D. Dragan, “Planning for cars that coordinate with people: Leveraging effects on human actions for planning and active information gathering over human internal state,” Autonomous Robots (AURO), vol. 42, no. 7, pp. 1405–1426, 2018.
  • [31] A. Bestick, R. Bajcsy, and A. D. Dragan, “Implicitly assisting humans to choose good grasps in robot to human handovers,” in International Symposium on Experimental Robotics.   Springer, 2016, pp. 341–354.
  • [32] S. Nikolaidis, D. Hsu, and S. Srinivasa, “Human-robot mutual adaptation in collaborative tasks: Models and experiments,” The International Journal of Robotics Research, vol. 36, no. 5-7, pp. 618–634, 2017.
  • [33] C.-M. Huang and B. Mutlu, “Modeling and evaluating narrative gestures for humanlike robots,” in Robotics: Science and Systems, 2013, pp. 57–64.
  • [34] M. Lohse, R. Rothuis, J. Gallego-Pérez, D. E. Karreman, and V. Evers, “Robot gestures make difficult tasks easier: the impact of gestures on perceived workload and task performance,” in SIGCHI Conference on Human Factors in Computing Systems.   ACM, 2014, pp. 1459–1466.
  • [35] T. Hashimoto, S. Hiramatsu, T. Tsuji, and H. Kobayashi, “Realization and evaluation of realistic nod with receptionist robot saya,” in IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2007, pp. 326–331.
  • [36] K. Baraka, S. Rosenthal, and M. Veloso, “Enhancing human understanding of a mobile robot’s state and actions using expressive lights,” in IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2016, pp. 652–657.
  • [37] S. Scheggi, F. Chinello, and D. Prattichizzo, “Vibrotactile haptic feedback for human-robot interaction in leader-follower tasks,” in International Conference on Pervasive Technologies Related to Assistive Environments, 2012, pp. 51:1–51:4.
  • [38] S. Scheggi, F. Morbidi, and D. Prattichizzo, “Human-robot formation control via visual and vibrotactile haptic feedback,” IEEE Transactions on Haptics, vol. 7, no. 4, pp. 499–511, 2014.
  • [39] D. Sieber, S. Musić, and S. Hirche, “Multi-robot manipulation controlled by a human with haptic feedback,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2015, pp. 2440–2446.
  • [40] B. Mutlu, F. Yamaoka, T. Kanda, H. Ishiguro, and N. Hagita, “Nonverbal leakage in robots: communication of intentions through seemingly unintentional behavior,” in Proceedings of the 4th ACM/IEEE International Conference on Human Robot Interaction, 2009, pp. 69–76.
  • [41] H. Admoni, A. Dragan, S. S. Srinivasa, and B. Scassellati, “Deliberate delays during robot-to-human handovers improve compliance with gaze communication,” in Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, 2014, pp. 49–56.
  • [42] A. Moon, D. M. Troniak, B. Gleeson, M. K. Pan, M. Zheng, B. A. Blumer, K. MacLean, and E. A. Croft, “Meet me where i’m gazing: how shared attention gaze affects human-robot handover timing,” in ACM/IEEE International Conference on Human-Robot Interaction, 2014, pp. 334–341.
  • [43] M. Morari, C. Garcia, J. Lee, and D. Prett, Model predictive control.   Prentice Hall Englewood Cliffs, NJ, 1993.
  • [44] L. Peterson and M. J. Peterson, “Short-term retention of individual verbal items,” Journal of Experimental Psychology, vol. 58, no. 3, p. 193, 1959.
  • [45] A. Astolfi, “Exponential stabilization of a wheeled mobile robot via discontinuous control,” ASME Journal of Dynamic Systems, Measurement, and Control, vol. 121, no. 1, pp. 121–126, 1999.
  • [46] S. Johnson, “The nlopt nonlinear-optimization package [software],” 2014.
  • [47] S. Hoogendoorn and P. H. L. Bovy, “Simulation of pedestrian flows by optimal control and differential games,” Optimal Control Applications and Methods, vol. 24, no. 3, pp. 153–172, 2003.
  • [48] S. Thrun, D. Fox, W. Burgard, and F. Dellaert, “Robust monte carlo localization for mobile robots,” Artificial Intelligence, vol. 128, no. 1-2, pp. 99–141, 2001.
  • [49] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, and A. Y. Ng, “Ros: an open-source robot operating system,” in ICRA workshop on open source software, vol. 3, no. 3.2, 2009, p. 5.
  • [50] S. Garrido-Jurado, R. Muñoz-Salinas, F. Madrid-Cuevas, and M. Marín-Jiménez, “Automatic generation and detection of highly reliable fiducial markers under occlusion,” Pattern Recognition, vol. 47, no. 6, pp. 2280–2292, 2014.