When robots and humans must navigate together in a shared space, conflicts may arise when they choose conflicting trajectories. Humans are able to disambiguate such conflicts between one another by communicating, often passively, through non-verbal communicative cues. The process by which people share spaces such as hallways involves cues such as body posture, motion trajectories, and gaze. In cases where this communication breaks down, the parties involved may do a “Hallway Dance,”111https://www.urbandictionary.com/define.php?term=Hallway20dance wherein they navigate into the same space - even doing so several times while trying to deconflict from each other’s paths - rather than gracefully passing each other. This occurrence, however, is rare and socially-awkward for the participants.
Robots generate trajectories which can often be difficult for people to interpret, and generally communicate very little about their internal states passively . This behavior can lead to situations similar to the hallway dance, wherein they clog the traffic arteries in confined spaces such as hallways or even in crowded, but open spaces such as atria. The Building-Wide Intelligence project  at UT Austin intends to create an ever-present fleet of general-purpose mobile service robots. With multiple robots continually navigating our Computer Science Department, we have had many opportunities to witness these robots come into conflict with people when passing them in shared spaces. The most common type of conflict occurs when a human and a robot should simply pass each other in a hallway, but instead stop in front of each other; thus inconveniencing the human and possibly causing the robot to choose a different path.
In previous work , our group sought to overcome this difficulty by incorporating LED turn signals onto the robot. It was found that the turn signals are not easily interpreted by participants, but that the introduction of a “passive demonstration” showing their use allows the signals to be understood. A passive demonstration is a training episode wherein the robot demonstrates the use of the turn signal in front of the user without explicitly telling the user that they are being instructed. In the case of our previous study, the robot simply makes a turn, using the turn signal, within the field of view of the user. Thus, the user has the opportunity to witness the signal before it is important for interaction with the robot, but is not explicitly told its purpose. However, limitations of this technique include that it demands that the robot recognize when it is first interacting with a new user, allowing it to perform the demonstration, and that an opportunity arises to perform such a demonstration before the signal must be used in practice.
This work designs and tests a more naturalistic signaling mechanism, hypothesizing that naturalistic signals will not require such a training period. Signaling mechanisms such as gaze or body language, mimicking human non-verbal communicative cues, may be far more easily understood by untrained users. Gaze is an important cue used to disambiguate human navigational intentions. A person will look in the direction that they intend to walk simply to assure that the path is safe and free of obstacles, but doing so enables others to observe their gaze. Observers can interpret the trajectory that the person performing the gaze is likely to follow, and coordinate their behavior. From this observation follows the design of a series of two studies. The first study is a human field experiment exploring the importance of gaze in the navigation of a shared space. The second study is a a human-robot study contrasting a robot using an LED turn signal with a gaze cue rendered on a virtual agent head. These studies support the hypotheses that gaze is an important social cue used when navigating shared spaces and that the interpretation of gaze as a naturalistic communicative cue is more clear to human observers than the artificial cue of a LED signal when used in this context. A video demonstrating different test conditions and responses can be found online. 222https://youtu.be/mOwZo6uRREc
Of recent interest to the robotics research community is the study of humans and robots navigating in a shared space [3, 8, 14, 17, 21, 23, 24, 26]. Our prior study introduced the concept of a “passive demonstration,” in order to disambiguate the intention of a robot’s LED turn signal . Baraka and Veloso (Baraka2018) used an LED configuration on their CoBot to indicate a number of robot states - including turning - focusing on the design of LED animations to address legibility. They performed a study showing that the use of these signals increases participants’ willingness to aid the robot. Szafir, Mutlu, and Fong (Szafir:2015:CDF:2696454.2696475) equipped quad-rotor drones with LEDs mounted in a ring at the base, providing four different signal designs along this strip. They found that their LEDs improve participants’ ability to quickly infer the intended motion of the drone. Shrestha, Onishi, Kobayashi, and Kamezaki (8525528) performed a study similar to ours, in which a robot crosses a human’s path in several different ways, indicating its motion intention with an arrow projected onto the floor using a color video projector. They found that in the scenario of a person and the robot passing each other in a hallway, similar to that presented in this paper, their method is effective in expressing the robot’s intended motion trajectory. They intend to explore the use of shoulder-height turn signals in future work.
Gaze has been studied heavily in HRI . It is a common hypothesis that gaze following is “hard-wired” in the brain . Generating gaze on the behalf of the robot is a naturalistic signal, emulating human behavior. A significant portion of human communication takes place implicitly, based on a combination of the context in which communication takes place and factors such as body language . A robot walking down a hallway, crossing a person’s path and getting out of the way leveraging the motion predictor from  is leveraging implicit social cues to coordinate its behavior. In this work, the robot generates a gaze signal to convey its intention to a person, despite the fact that doing so has no impact on its actual vision. This gaze signal can be contrasted with communicative signals designed specifically for non-humanoid robots and other devices , such as LED turn signals.
People rely heavily on non-verbal communication in social interactions [2, 4]. The present work builds on the demonstrated concept that humans infer other people’s movement trajectories from their gaze direction , and on the relationship between head pose and gaze direction . Norman (norman2009design) speculated that bicycle riders know how to avoid collision with pedestrians since the latter group’s members are consistent with their gaze.
Other works have dealt with visible change of posture when a pedestrian is about to change course, such as weight shifting, foot location and leg and pelvic rotations [16, 22]. Patla, Adkin, and Ballard’s (Patla1999) detailed description of whole-body kinematics during walking motion, was leveraged by Unhlelkar, Perez-D’Arino, Stirling, and Shah (unhelkar2015human) to create a predictor for discretized human motion trajectories. Unhlelkar et al (unhelkar2015human) found that head pose is a significant predictor of the direction that a person intends to walk. In their study, they discretized trajectories in terms of a decision problem of which target a person would walk towards.
Following a similar line of thought, Khambhaita, Rios-Martinez, and Alami (khambhaita2016head) propose a motion planner which coordinates head motion to the path a robot will take seconds in the future. In a video survey in which their robot approaches a T-intersection in a hallway, they found that study participants were significantly more able to determine the intended path of the robot in terms of the left or right branch of the intersection when the robot used the gaze cue as opposed to when it did not. Using a different gaze cue, Lynch, Pettré, Bruneau, Julien, Kulpa, Crétual, and Olivier (lynch2018effect) performed a study in a virtual environment in which virtual agents established mutual gaze with participants during path-crossing events in a virtual hallway, finding no significant effect in helping participants to disambiguate their paths from those of the virtual agents.
Of course, this gaze behavior extends beyond walking and bicycling. Recent work in our laboratory has studied the use of gaze as a cue for interacting with copilot systems in cars [9, 10], also with the aim of inferring the driver’s intended trajectory. Gaze is also often fixated on objects being manipulated, which can be leveraged to improve algorithms which learn from human demonstrations . Though the use of instrumentation such as head-mounted gaze trackers or static gaze tracking cameras is limiting for mobile robots, recent work in the development of gaze trackers which work without such equipment  may soon allow us to repeat the robot experiments presented in this paper with the robot reacting to human gaze, rather than only generating a gaze cue.
Human Field Study
This human ecological field study observes the effect of violating expected human gaze patterns while navigating a shared space. In this study, research confederates sometimes look opposite to the direction in which they intend to walk, violating the expectations observed by the work of Patla et al. (Patla1999) and Unhlelkar et al. (unhelkar2015human) by which head pose is predictive of trajectory. We hypothesize that causes problems in interpreting the navigational intent of the confederate, and can lead to confusion or near-collisions.
The Student Activity Center at UT Austin is a busy, public building hosting meeting rooms for student activities and a variety of restaurants in its food court. It has predictable busy times, centering around the schedule of class changes at the university. This study was performed in a busy hallway, Figure 1, which becomes crowded during class changes.
Prior to this study, two of the authors of this paper trained each other to proficiently look counter to the direction in which they walk, and acted as confederates who interacted with study participants. Both of the confederates who participated in this study are female. A third author acted as a passive observer to interactions between these confederates and other pedestrians walking through the hallway.
This experiment is organized as a study, controlling whether the interaction occurs in a “crowded” or “uncrowded” hallway, and whether the confederate looks in the direction in which they intend to go (their gaze is “congruent”) or opposite to this direction (their gaze is “incongruent”). Here, “crowded” is defined as a state in which it is difficult for two people to pass each other in the hallway without coming within m of each other. It can be observed that the busiest walkways at times form “lanes” in which pedestrians walk directly in lines when traversing these spaces. This study was not performed under these conditions, as walking directly toward another pedestrian would require additionally breaking these lanes, introducing another variable into the study.
The passive observer annotated all interactions in which the confederate and a pedestrian walked directly toward each other. If the confederate and the pedestrian collided with each other or nearly collided with each other, the interaction was annotated as a “conflict.” Conflicts are further divided into “full” collisions, in which the two parties bumped into each other; “partial,” in which the confederate and pedestrian brushed against each other; or “shift,” in which the two parties shifted to the left or right to deconflict each other’s paths after coming into conflict.
A total of interactions were observed ( female / male), with in crowded conditions and in uncrowded conditions. The confederate looked in the incongruent direction in of the of the uncrowded interactions and of the crowded interactions. A one-way ANOVA found no significant main effect between the crowded and uncrowded conditions (), whether the confederate went to the pedestrian’s right or left during the interaction (), or based on gaze direction (). Whether the gaze direction was congruent with the walking direction, however, was significant (). A breakdown of conflicts based on the congruent condition versus the incongruent gaze condition can be found in Table 1. These results support the hypothesis that humans use gaze to deconflict their navigational trajectories when crossing each other’s paths in an ecologically-valid setting.
|Partial||8 (18%)||9 (12%)|
|Quick Shift||5 (12%)||24 (33%)|
|Full||0 (0%)||3 (4%)|
|Any Conflict||13 (30%)||36 (49%)|
|No Conflict||30 (70%)||37 (51%)|
|Total||43 (100%)||73 (100%)|
Gaze as a Navigational Cue for HRI
Motivated by the use of gaze as a naturalistic cue for implicitly communicating navigational intent, we engineered a system in which the BWIBot uses a virtual agent head to communicate the direction it intends to steer toward in order to deconflict its trajectory from that of a human navigating a shared space by looking in the direction that it intends to navigate toward. In these experiments, a human starts at one end of a hallway, and the robot starts at the other end. The human is instructed to traverse the hallway to the other end, and the robot also autonomously traverses it. As a proxy for measuring understanding of the cue, the number of times that the human and robot come into conflict with each other is measured for two conditions: one in which the robot uses a turn signal to indicate the side of the hallway that it intends to pass the person on,333The robot’s motion is intended to be similar to a car changing lanes. and one in which it uses a gaze cue to make this indication. This experiment is motivated by prior work  which introduced the concept of a “passive demonstration” in order to overcome the challenge of people not understanding the intention of the turn signal. The question asked here is, will the gaze cue, based on natural behavior and implicit communication, outperform the artificial signal of the LED turn signal.
This study is set in a test hallway built from cubicle furniture, shown in Figure 2. The hallway is m long by m wide. The system models the problem of traversing the hallway as one in which the hallway is divided into three lanes, similar to traffic lanes on a roadway, which is illustrated in Figure 3. If the human and the robot are within m of each other when they cross each other’s path, they are considered to be in conflict with each other. This distance is based on the m safety buffer engineered into the robot’s navigational software, which also causes the robot to stop. If the human and the robot both start in the middle lane, at opposing ends of the hallway, with the navigational goal of traversing to the opposing ends of the hallway, the signal is intended to indicate an intended lane change on behalf of the robot, so that the human may interpret that signal, and shift into the opposing lane if necessary.
To display gaze cues on the BWIBot we developed a 3D-rendered version of the Maki 3D-printable robot head.444https://www.hello-robo.com/maki The decision to use this head is motivated by the ability to both render it as a virtual agent and, in future work, to 3D print and assemble a head. The virtual version of the head was developed by converting the 3D-printable STL files into Wavefront .obj files and importing them into the Unity game engine.555https://unity.com/ To control the head and its gestures, custom software was developed using ROSBridgeLib.666https://github.com/MathiasCiarlo/ROSBridgeLib The head is displayed on a inch monitor mounted to the front of the robot. When signaling, the robot turns its head and remains in this pose. The eyes are not animated to move independently of the head. The head turn takes seconds. These timings and angles were hand-tuned and pilot tested on members of the laboratory. The gaze signal can be seen in Figure 4.
The LED cue is a re-implementation of the LED turn signals from . Strips of LEDs m long with LEDs line the 8020 extrusion on the chassis of the front of the BWIBot. They are controlled using an Arduino Uno microcontroller, and blink twice per second with seconds on and seconds off each time they blink. In the condition that the LEDs are used, the monitor is removed from the robot. The LED signals can be seen in Figure 5.
To test the effectiveness of gaze in coordinating navigation through a shared space, we conducted a human-robot interaction study in the hallway test environment. After obtaining informed consent and, optionally, media release, participants are guided to one end of the hallway, where the robot is already set up at the opposite end. The participant is instructed to navigate to the opposing end of the hallway, and both the participant and the robot start in the “middle lane,” as per the model in Section Gaze as a Navigational Cue for HRI. When the participant starts walking down the hallway, the robot is also started. The study presented here uses an inter-participant design, in which each participant sees exactly one of the two cues – gaze or LED. Each participant traverses the hallway with the robot exactly once in order to avoid training effects. After completing the task of walking down the hallway, each participant responds to a brief post-interaction survey.
To assure that the study results are reflective of the robot’s motion signaling behavior, rather than the participants’ motion out of the robot’s path, the study is tuned to give the participant enough time to get out of the robot’s way by reacting to its gaze or LED cue. The problem is modeled using three distances: , and . The distance (m) is the distance at which the robot signals its intention to change lanes, which is based on the distance at which the robot can accurately detect a person in the hallway using a leg detector  and its on-board LiDAR sensor. The distance at which the robot will execute its turn, (m), was hand-tuned to be at a range at which it is unlikely that the participant will have time to react to the robot’s gaze or LED cue. This design is so that if the participant has not already started changing lanes by the time the robot begins its turn, it is highly likely that the person and robot will experience a conflict. Thus, this study tests interpretation of the queue, not reaction to the turn. The distance at which the robot determines that its motion is in conflict with that of the study participant is , which is set to m. This design is based on the safety buffer used when the robot is autonomously operating in our building.
In addition to using the three ranges to control the robot’s behavior, the robot also always moves itself into the “left” lane. This choice was made because, in North America, pedestrians usually walk to the right of each other in order to deconflict each other’s paths. Preliminary testing of the combination of these two behaviors showed that the pedestrians and the robot came into conflict of the time. As such, when participants move out of the robot’s path, it can be attributed mainly to the robot’s signaling.
The post-interaction survey comprises questions, consisting of -point Likert and cognitive-differences scales, and one free-response question. Five demographic questions on the survey ask whether people in the country where the participant grew up drive or walk on the left or right-hand side of the road and about their familiarity with robots. Ten questions concern factors such as the clarity of the signaling method used by the robot. Finally, twenty nine questions ask about perceptions of the robot in terms of personality factors such as selfishness or whether the robot is perceived as threatening; factors of safety and usefulness; and other factors such as appropriate environments for the robot, such as the workplace or home. The free response prompts the following to participants in the LED condition, “There would be a better position for the signals, and it is:” The full survey as given to the participants is available online 777https://docs.google.com/forms/d/1aTVx˙cdhLMZPosKktS5FxqJZbanXiXJs6Ey8XQh7tQg/prefill.
We recruited participants ( male / female), ranging in age from to years. The data from participants is excluded from our analysis. Last-minute software changes before the first day of testing led to a software failure that was only detected after participants had completed the experiment (after a participant asked, “What face?” we investigated). Two of the participants, early on, were robotics students who had read the previous paper  and were familiar with the hypotheses of the study. After reviewing participant recruitment and carefully repairing the software and re-piloting the study, testing resumed. The final excluded data comes from a participant who failed to participate in the experimental protocol.
The remaining pool of participants includes participants in the LED condition and in the gaze condition. Table 2
shows the results from the robot signaling experiment in these two conditions. A pre-test for homogeneity of variances confirms the validity of a one-way ANOVA for analysis of the collected data. A one-way ANOVA shows a significant main effect (, ). Post-hoc tests of between-groups differences using the Bonferroni criteria show significant mean differences between the gaze group and the LED group (gaze versus LED: ), but no significant mean difference between the gaze and LED conditions (). None of the post-interaction survey responses revealed significant results. A video of two example interactions can be found at
https://youtu.be/MHT3NU3NueM. These results support the hypothesis that the robot’s gaze can be more readily interpreted in order to deconflict its trajectory from that of a person navigating in a shared space.
Discussion & Future Work
The goal of these studies is to evaluate whether a naturalistic, implicit communicative gaze cue outperforms a more synthetic LED turn signal in coordinating the behavior of people and robots when navigating a shared space. This work follows previous work, finding that LED turn signals are not readily interpreted by people when interacting with the BWIBot, but that a brief, passive demonstration of the signal is sufficient to disambiguate its meaning . This study asks whether gaze can be used without such a demonstration.
The human ecological field study presented here validates the use of gaze as an implicit communicative cue for coordinating trajectories. Gaze may even be a more salient cue than a person’s actual trajectory in this interaction.
In the human-robot study that follows, we compared the performance of an LED turn signal against a gaze cue presented on a custom virtual agent head. In this condition, the robot turns its head and “looks” in the direction of the lane that it intends to take when passing the study participant. Our results demonstrate that the gaze cue significantly outperforms the LED signal in preventing the human and robot from choosing conflicting trajectories. We interpret this result to mean that people naturally understand this cue when the robot makes it, transferring their knowledge of interactions with other people onto the template of their interaction with the robot.
The gaze cue does not perform perfectly in the context of this study. There are several potential contributing factors. The first is that, while the entire head rotates, the eyes do not move to focus on any point in front of the robot. There are also minor errors in the construction of the 3D model that make it look slightly unnatural at times.888These have been addressed and will be rectified in a future study. The distance window between signaling and performing the lane-change is also briefer in this study than in the previous study . This change in timing is because we had to change robots999The robot is a similar, custom, BWIBot, with a slightly different base and sensor suite. due to electrical problems. The leg detector  does not work as reliably on the updated platform. In the previous study, participants are signaled at m, as opposed to m in this study, as a result of issues with leg detection. Finally, interpreting gaze direction on a virtual agent head may be difficult due to the so-called “Mona Lisa Effect” . In follow-up studies, we intend to both tune the behavior of the head, and contrast its performance against a 3D printed version of the same head. The decision to use a virtual agent version of the Maki head is driven by our ability to contrast results in future experiments (upon construction of the hardware) between virtual agents and robotic heads.
Subtle differences between the robot’s behavior and human behavior may make the signal’s intention ambiguous, or it may simply be that robot gaze is interpreted differently from human gaze. Additionally, the lack of physical embodiment of the head possibly plays heavily into the performance of displayed cues. A detailed survey of the gaze literature discussing these factors is beyond the scope of this paper, but many are addressed by Admoni and Scassellati (admoni2017social). Significant future work to analyze these factors is in the planning phases.
The overall results of this study are highly encouraging. Many current-generation service robots avoid what may be perceived by their designers as overly-humanoid, perhaps unnecessary facial features and expressions. However, the findings in this work indicate that such features may be more readily interpreted by people interacting with these devices, and thus be highly beneficial.
This work has taken place in the Learning Agents Research Group (LARG) at the Artificial Intelligence Laboratory, The University of Texas at Austin. LARG research is supported in part by grants from the National Science Foundation (IIS-1637736, CPS-1739964, IIS-1724157), the Office of Naval Research (N00014-18-2243), Future of Life Institute (RFP2-000), Army Research Laboratory, DARPA, and Lockheed Martin. Peter Stone serves on the Board of Directors of Cogitai, Inc. The terms of this arrangement have been reviewed and approved by the University of Texas at Austin in accordance with its policy on objectivity in research.
-  (2017) Social eye gaze in human-robot interaction: a review. Journal of Human-Robot Interaction 6 (1), pp. 25–63. Cited by: Related Work.
-  (2013) Bodily communication. Routledge. Cited by: Related Work.
-  (2018-01-01) Mobile service robot state revealing through expressive lights: formalism, design, and evaluation. International Journal of Social Robotics 10 (1), pp. 65–92. External Links: Cited by: Related Work.
-  (2005) Effects of nonverbal communication on efficiency and robustness in human-robot teamwork. In 2005 IEEE/RSJ international conference on intelligent robots and systems, pp. 708–713. Cited by: Related Work.
-  (2018) A survey of nonverbal signaling methods for non-humanoid robots. Foundations and Trends® in Robotics 6 (4), pp. 211–323. Cited by: Related Work.
-  (2013-03) Legibility and predictability of robot motion. In Human-Robot Interaction, Cited by: Introduction.
-  (2000) The eyes have it: the neuroethology, function and evolution of social gaze. Neuroscience & Biobehavioral Reviews 24 (6), pp. 581–604. Cited by: Related Work.
-  (2018) Passive demonstrations of light-based robot signals for improved human interpretability. In 2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 234–239. Cited by: Introduction, Related Work, LED Signal, Results, Gaze as a Navigational Cue for HRI, Discussion & Future Work, Discussion & Future Work.
-  (2018-08) A study of human-robot copilot systems for en-route destination changing. In Proceedings of the 27th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN2018), Nanjing, China. External Links: Cited by: Related Work.
-  (2018-10) Inferring user intention using gaze in vehicles. In The 20th ACM International Conference on Multimodal Interaction (ICMI), Boulder, Colorado. External Links: Cited by: Related Work.
A review and analysis of eye-gaze estimation systems, algorithms and performance evaluation methods in consumer platforms. IEEE Access 5, pp. 16495–16519. Cited by: Related Work.
-  (2017) Bwibots: a platform for bridging the gap between ai and human–robot interaction research. The International Journal of Robotics Research 36 (5-7), pp. 635–659. Cited by: Introduction.
-  (2015) Person tracking and following with 2d laser scanners. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 726–733. Cited by: Experimental Setup, Discussion & Future Work.
-  (2002) Collision avoidance by observing pedestrians’ faces for intelligent wheelchairs. Journal of the Robotics Society of Japan 20 (2), pp. 206–213. Cited by: Related Work.
-  (2009) I’ll walk this way: eyes reveal the direction of locomotion and make passersby look and go the other way. Psychological Science 20 (12), pp. 1454–1458. Cited by: Related Work.
-  (1999) Online steering: coordination and control of body center of mass, head and body reorientation. Experimental brain research 129 (4), pp. 629–634. Cited by: Related Work.
-  (2013) Human–robot collision avoidance using a modified social force model with body pose and face orientation. International Journal of Humanoid Robotics 10 (01), pp. 1350008. Cited by: Related Work.
-  (2015) A review of eye gaze in virtual agents, social robotics and HCI: behaviour generation, user interaction and perception. Computer graphics forum 34 (6), pp. 299–326. Cited by: Discussion & Future Work.
-  (2018) Human gaze following for human-robot interaction. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 8615–8621. Cited by: Related Work.
-  (2019) Understanding teacher gaze patterns for robot learning. arXiv preprint arXiv:1907.07202. Cited by: Related Work.
-  (2018-08) Communicating directional intent in robot navigation using projection indicators. In 2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), Vol. , pp. 746–751. External Links: Cited by: Related Work.
-  (2013) Examining anticipatory turn signaling in typically developing 4-and 5-year-old children for applications in active orthotic devices. Gait & posture 37 (3), pp. 349–353. Cited by: Related Work.
-  (2015) Communicating directionality in flying robots. In Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI ’15, New York, NY, USA, pp. 19–26. External Links: Cited by: Related Work.
-  (2010) Smooth collision avoidance in human-robot coexisting environment. In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3887–3892. Cited by: Related Work.
-  (2015) Human-robot co-navigation using anticipatory indicators of human walking motion. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 6183–6190. Cited by: Related Work.
-  (2015) Communicating robotic navigational intentions. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5763–5769. Cited by: Related Work.