TOKCS: Tool for Organizing Key Characteristics of VAM-HRI Systems

08/07/2021 ∙ by Thomas R. Groechel, et al. ∙ Brown University University of Colorado Boulder University of Southern California 0

Frameworks have begun to emerge to categorize Virtual, Augmented, and Mixed Reality (VAM) technologies that provide immersive, intuitive interfaces to facilitate Human-Robot Interaction. These frameworks, however, fail to capture key characteristics of the growing subfield of VAM-HRI and can be difficult to consistently apply. This work builds upon these prior frameworks through the creation of a Tool for Organizing Key Characteristics of VAM-HRI Systems (TOKCS). TOKCS discretizes the continuous scales used within prior works for more consistent classification and adds additional characteristics related to a robot's internal model, anchor locations, manipulability, and the system's software and hardware. To showcase the tool's capability, TOKCS is applied to find trends and takeaways from the fourth VAM-HRI workshop. These trends highlight the expressive capability of TOKCS while also helping frame newer trends and future work recommendations for VAM-HRI research.



There are no comments yet.


page 5

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The need to help identify growing trends within Virtual, Augmented, and Mixed Reality for Human Robot Interaction (VAM-HRI) is evidenced by four consecutive years of a VAM-HRI workshop consistently spanning 60-100+ attendees. This nascent sub-field of HRI addresses challenges in mixed reality interactions between humans and robots, involving applications such as remote teleoperation, mental model alignment for effective partnering, facilitating robot learning, and comparing the capabilities and perceptions of robots and virtual agents. VAM-HRI research is becoming even more accessible to the robotics community due in part to the wide-spread availability of commercial virtual reality (VR), augmented reality (AR), and mixed reality (MR) platforms and the rise of readily-accessible 3D game engines for supporting virtual environment interactions.

To understand what challenges and solutions have been focused on by this new community, Williams et al. [23] proposed the Reality-Virtuality Interaction cube as a tool for clustering VAM-HRI research. The Interaction Cube is a three-dimensional conceptual framework that captures characteristics about the design elements involved (expressivity of the view and flexibility of control) as well as the virtuality they implement (from real to fully virtual). While the Interaction Cube provides a useful lens for roughly characterizing research involving interactive technologies within VAM-HRI, the continuous nature of the cube makes it challenging to exactly position where design elements and environments are within the cube. Furthermore, the Interaction cube does not address other characteristics of VAM-HRI research that have recently gained attention, such as robot internal models, software, hardware, and experimental evaluation methods.

To help advance characterizing VAM-HRI systems, we introduce a Tool for Organizing Key Characteristics of VAM-HRI Systems (TOKCS). TOKCS builds off work from the Interaction Cube, discretizing its continuos scales and adding new key characteristics for classification. The tool is applied to the 10 workshop papers from the International Workshop on VAM-HRI to validate its usefulness within the growing subfield. These classifications help inform current and future trends found within the workshop.

2 Interaction Cube Framework

The Interaction Cube uses three dimensions to characterize VAM-HRI work: the 2D Plane of Interaction to represent interactive design elements and the 1D Reality-Virtuality Continuum from Milgram to characterize the environment.

Figure 1: The Reality-Virtuality Interaction Cube used to visually categorize MRIDEs according to their Flexibility of Control (FC), Expressivity of View (EV), and where they lie upon the Reality-Virtuality Continuum (RV).

2.1 MRIDEs: View & Control Enhancing Interface Elements

The first two dimensions of the Interaction Cube are defined by the Plane of Interaction, which captures both (1) the opportunities to view into the robot’s internal model, and (2) the degree of control the human has over the internal model. These two levels of interactivity (termed the expressivity of view (EV) and flexibility of controller (FC) respectively) are the conceptual pillars for characterizing interactivity within the Interaction Cube, and any components that contribute or impact either EV or FC are called interaction design elements

. This is similar to the Model-View-Controller design pattern. However, in this case the 2D placement on the Interaction Plane depends on a vector whose direction results from the impact a design element has on EV and the impact a design element has on FC. The magnitude of the vector is scaled by the complexity of the robot’s internal model. According to

Williams et al. [23], “while it is likely infeasible to explicitly determine the position of a technology on this plane, it is nevertheless instructive to consider the formal relationship between interaction design elements and the position of a technology on this plane.”

The Interaction Cube categorizes the study of VAM virtual objects as MRIDEs (Mixed-Reality Interaction Design Elements), which can fall into one of three categories:

  • User-Anchored Interface Elements: Objects attached to user view. This is similar to traditional GUI elements that are anchored to the user’s cameria coordinate frame and do not change along with the user’s field of view.

  • Environment-Anchored Interface Elements

    : Objects anchored to the environment or robot. For example, while the robot performs 3D SLAM, it visualizes the estimated map to a user wearing a MR headset (like in Figure


  • Virtual Artifacts: Objects that can be manipulated by humans or robots or may move “under their own ostensible volition”. For example, if a robot plans to move in a direction and visualizes an arrow to indicate direction to a user wearing a MR headset (like in Figure 2(B)).

2.2 The Reality-Virtuality Continuum & VAM-HRI

The third axis of the Reality-Virtuality Interaction Cube illustrates where an MRIDE falls on the Reality-Virtuality Continuum [13]

. This continuum classifies environments and interfaces with respect to how much virtual and/or real content they contain. On one end of the spectrum lies reality, which is any interface that does not use any virtual content and makes use of only real objects and imagery. The opposite end of the spectrum is virtual reality, which would be an interface that consists of pure virtual content without any integration of the real world (for example, a simulated world presented in VR). Between these two extremes is mixed reality, which captures all interfaces that incorporate a portion of both reality and virtuality in their design. There are two sub-sections of mixed reality: (1) augmented reality where virtual objects are integrated into the real world; and (2) augmented virutality where real objects are inserted within virtual environments.

Augmented reality interfaces in VAM-HRI often communicate the state and/or intentions of a real robot. For example, the battery levels of a robot can be displayed with a virtual object that hovers over a real robot, or a robot’s planned trajectory can be drawn on the floor with a virtual line to indicate the robot’s future movement intentions.

Virtual reality interfaces are often used to provide simulated environments where human users can interact with virtual robots. In these virtual settings user interactions with robots can be monitored and evaluated without risk of physical harm for either robot or human. Additionally, the virtual robot models can be easily and quickly altered to allow for rapid prototyping of both robot and interface design. Without the need for physical hardware, robots can be added to any virtual scene without the typical costs associated with real robots.

Virtual environments can also be used to teleoperate and/or supervise real robots in the physical world. In cases like these, 3D data collected by the real robot about its surrounding environment is integrated within virtual settings to create augmented virtuality interfaces. Cyber-physical interfaces and virtual control rooms are two common VAM-HRI augmented virtuality methods of enhancing remote robot operators ability by increasing situational awareness of their robot’s state and location while mitigating the limitations of virtual interfaces such as cyber sickness [11].

3 Additional Key Components of VAM-HRI Systems

The key insight of this work is the addition of key characteristics of VAM-HRI not covered by the Interaction Cube to create TOKCS. These include VAM-HRI system hardware, research that seeks to increase the robot’s model of the world around it, and additional granularity to mixed-reality interaction design elements (MRIDEs). The characteristics are part of TOKCS which is then applied to the VAM-HRI workshop’s papers in Sec. 4. The application informs the insights and future work recommendations outlined in Sec. 5.

3.1 Hardware

While hardware used for virtual, augmented, and mixed reality can vary widely, there are certain types of hardware that are commonly used in VAM-HRI. Here we outline the most common, which enable experiences along the Reality-Virtuality Continuum: head-mounted displays (HMDs), projectors, displays, and peripherals. Because hardware technology is making significant advances every year, labeling the specific technology (e.g., HoloLens 2) is important when classifying hardware within TOKCS. These hardware technologies then fall under these categories.

HMDs. Virtual, mixed, and augmented reality all commonly use head-mounted displays. The Oculus Quest and HTC Vive both allow for a full virtual reality experience, visually immersing the user in a completely virtual environment. The HTC Vive also allows for augmented virtuality, such as in Wadgaonkar et al. [20], where the user is in a virtual setting but the virtual robot being manipulated is also moving in the real world. The Microsoft HoloLens and the Magic Leap are strictly augmented reality headsets, where virtual images are rendered on top of the real world view of the user.

Projectors. Onboard projectors can provide a way for the robot itself to display virtual objects or information. Alternately, static projectors allow an area to contain augmented reality elements. Images might be projected onto an object, on the floor, or onto a robot.

Displays. This category of hardware ranges from handheld smartphones or tablets to room-size displays. Two-dimensional and three-dimensional monitors fall somewhere in between this range. Some of these exist in a single location, while mobile displays can be carried by a person or moved by a robot. A cave automated virtual environment (or CAVE) immerses the user in virtual reality using 3 to 6 walls to partially or fully enclose the space. An augmented reality display might include a realtime camera with overlaid virtual graphics, while a virtual reality display contains completely virtual graphics. Displays can be an especially effective way to conduct user studies without investing in expensive hardware, for example by showing recorded videos to participants on Amazon Mechanical Turk [16].

Peripherals. Peripheral devices allow for a richer interaction within virtual, augmented, or mixed reality. Leap Motion hand tracking can be combined with a headset such as the HTC Vive (as in [12]) to provide recording and playback of motions and commands. Oculus Quest controllers are handheld and can be used individually or in tandem, giving the user a modality for both gesturing and selecting with the use of buttons on the device. Peripherals might frequently be used to enhance the Flexibility of Control (FC) of a MRIDE.

3.2 Robot Internal Complexity of Model

The Interaction Cube emphasizes the increased expressivitiy of view and flexibility of controller aspects of projected visual objects having on the robot’s underlying model. This fails to explore, however, the sensing capabilities and data afforded by VAM technologies (e.g., ARHMD). The framework can be expanded by including the technologies’ ability to aid the robot’s internal model of the world - namely increasing the robot’s internal complexity of model (CM). The robot’s internal CM benefits from data typically difficult to gather (e.g., eye-gaze) as well as the technology affording data assumptions (e.g., a headset with various sensors being anchored to the user’s head). These data manifest in aiding a robot’s model of the environment and/or model of the user.

Environment - Data from the VAM technology further increases the robot’s understanding of an environment. An example is provided in Fig. 2. Given a mobile robot with 2D SLAM, a 3D map from an ARHMD’s SLAM can be transformed into the robot’s coordinate frame. The map can then be used for more accurate navigation. In another situation, a mobile phone camera can help with object recognition both in front and behind the robot.

Figure 2: Demonstrates a navigation situation where the robot 2D SLAM map (B) benefits from the 3D SLAM map from the ARHMD (A). The robot only maps the two front table legs (bottom left) as it is only equipped with a 2D lidar. The robot, however, is too tall to move past the table so it will collide if it does not use the 3D map from the ARHMD. A combined SLAM map would be created from feature matching such as the table legs (circles).

User - Data from VAM technology further increases the robot’s understanding of the user. For example, a robot can better infer a user’s intent to choose an object by using ARHMD eye-gaze [18]. Data gathered from motion sensors can also be used to infer affective state such as student curiosity [5].

3.3 User Perceived Anchor Locations and Manipulability

These categories, however, fail to be mutually exclusive and lack needed granularity. For example, a virtual artifact can be user-anchored such as a movable user-anchored element or an environment anchored object that moves on its own. Granularity can also be added to benefit MRIDE classifications such as distinguishing between robot and environment anchored objects.

To this end, two important distinctions can be added to expand the current framework. First, we apply two characteristics: Anchor Location {User, Robot, Environment} and Perceived Manipulability {User, Robot, None}. Second, we distinguish MRIDEs based on the intended user perception of the virtual object (i.e., where does the user perceive the anchor to be and who can/does move a virtual object).

The latter distinction is important as any object can be translated into the environment’s coordinate frame. The granularity of Anchor Location combined with intended user perception allows for labeling virtual objects intended to be perceived as part of the robot such as adding virtual robot appendages [19, 6]. These virtual objects are specifically designed to be perceived as part of the robot; Even the arms of a robot (or any virtual object) can be translated to environment coordinates.

This distinction also allows for multiple labels within each characteristic, such as objects that are manipulable by both the robot and the user. Visuals for path planning (e.g., [9]) further highlight the benefits of these granular distinctions. A planned robot pose visualized within the environment could be argued as both robot- and environment-anchored since the same trajectory can be defined within the robot’s local frame of reference or within a global frame of reference.

A key property of virtual object manipulation is the user’s action attribution of the manipulation (i.e., does the user perceive that they moved the object, the robot moved the object, or the object moved on its own). Perceived Manipulability is this action attribution, the perception the user has of the manipulation. For an object that the user manipulates (e.g., grabs), the Perceived Manipulability is the user. Virtual objects “manipulated” by the robotic system, however, are not necessarily directly manipulated by the robot nor perceived as so. In such a case, the virtual object may be scripted to move on its own to give the illusion of robot manipulation yet may fail in its illusion. When researching social robotics, this may have significant consequences on a user’s perception of the robot (e.g., the robot’s social presence). Therefore, to alleviate this complication and as stated above, TOKCS is applied from the intended user perception of the designed system (i.e., if the system attempts an illusion of robot manipulation of a virtual object, it is classified under Perceived Manipulability: Robot).

Lastly, these labels exist only for virtual objects and are not tied to classifying VAM-HRI research under model, view, and control described in Sec. 3.2. VAM-HRI studies a variety of modalities provided by VAM tecnologies. HMD data used for improving a robot’s SLAM, for example, still firmly sits under increasing the robot’s internal complexity of model but is not applicable under Anchor Location nor Perceived Manipulability.

4 Paper Classifications of the VAM-HRI Workshop

TOKCS consists of characterizing VAM-HRI systems with: Anchor Location {User, Env, Robot}, Perceived Manipulability {User, Robot, None}, Increases Expressivity of View (EV) {0,1}, Increases Flexibility of Controller (FC) {0,1}, Increases Complexity of Model (CM) {0,1}, Milgram Continuum {AR, AV, VR}, Software Description, and Hardware Description.

We apply TOKCS to papers from the International Workshop on VAM-HRI to understand the ways in which researchers have been developing new techologies that leverage virtual, augmented, and mixed reality. The ten papers and their categorization within the TOKCS are summarized in Table 1.

Within these ten papers a variety of contributions were observed. In most cases, a given system focused its improvements on a specific dimension of the TOKCS; five of the ten papers developed improvements within a single dimension. The two that contributed expansions along all three axes leveraged AR/VR in a domain that had previously not utilized AR/VR. Higgins et al. [7]

developed a method for training grounded-language models in VR, instead of with real world robots.

Ikeda and Szafir [8] leverages AR-headsets for robotic debugging, where previous methods had used 2D screens. Four papers of the ten increased expressivity of view (EV), four increased the flexibility of the controller (FC), and three improved upon the robot internal complexity of model (CM). Of these papers, half can be described as virtual reality, two are augmented virtuality, two are augmented reality, and one is mixed reality. The majority of methods are anchored at the environment level. Two methods’ anchor is located at the robot and two are located at the user. If a perceived manipulable is available, it is typically available at the user-level.

We also observe a broad range of utilized hardware and software. Unity was overwhelmingly popular among papers as the 3D game engine of choice; nine of the ten papers explicitly mention Unity3D. The most popular HMD mentioned was the Hololens, which was used in three of the papers. Oculus Quest, HTC Vive, and MTurk are each used in two of the ten papers.

Paper Anchor Location Perceived Manipulability Expressivity of View Flexibility of Controller Complexity of Model Milgram Continuum [13] Software Hardware
Boateng and Zhang [2] Robot, Env AR Unity Hololens video recordings via MTurk
Ikeda and Szafir [8] Env User AR Unity Hololens
LeMasurier et al. [9] Env, Robot User AV Unity, ROSNET, ROS HTC Vive
Puljiz et al. [17] AV Unity Hololens
Wadgao- nkar et. al [20] Env, Robot AV Unity HTC VIVE
Barentine et al. [1] Env VR Unity, TagUp Oculus Quest VR headset & controllers
Higgins et al. [7] User User VR Unity, ROS, ROS, Gazebo StreamVR headset
Mara et al. [12] Env Robot, User VR Unity HTC VIVE Pro Eye & Leap Motion
Mimnaugh et al. [15] VR Unity Oculus Rift S
Mott et al. [16] Env, User VR Unity MTurk Web Video of VR
Table 1: Summary of TOKCS. Up arrow symbols () indicate that the work increases the functionality within this aspect of TOKCS. Blank entries indicate that the contributions of the paper for this aspect are on par with prior work.

4.1 Evaluations: Subjective and Objective Metrics

In addtion to TOKCS, we further evaluated measures and metrics applied to VAM-HRI research. An important component of VAM-HRI research programs is to evaluate and benchmark new approaches by using both objective and subjective metrics. Objective metrics are any metric that can be directly determined through sensors or measurements and do not involve a human’s subjective experience. Examples of objective metrics include task completion time, the number of successful and failed trials, and accuracy and precision of visualization alignment.Subjective metrics are any metric that depends on the perceived experience of the users involved. Examples of subjective metrics include mental workload, levels of immersiveness, and perceived system usability. Both subjective and objective metrics are important and complementary benchmarks for determining how effective new VAM-HRI contributions are compared to existing approaches. There are a wide variety of metrics that can be used for both subjective and objective measurements, and understanding which metrics VAM-HRI researchers are using helps highlight what aspects of interaction these technologies are improving on.

The most popular method of evaluating effectiveness of a given design was conducting surveys of study participants. Additional evaluation metrics focused on quantitative performance metrics on an evaluation task and subjective experience (see Table


Paper Objective Metrics Subjective Metrics
Boateng and Zhang [2] NASA TLX, identification of robot position, orientation, and movement
Ikeda and Szafir [8] System Usability Scale, Think out loud process
Wadgaonkar et al. [20] Post-Experiment Interviews, Custom Survey Questions
Higgins et al. [7] Task Accuracy, Amount of training data User ratings of robot, Custom Survey Questions
Mara et al. [12] Task Completion Time, Task Completion Rate Custom Survey Questions
Mimnaugh et al. [15] Simulator Sickness Questionnaire, Custom Survey Questions
Mott et al. [16] Post-Assessment of Situation Awareness, Analog Situation Awareness Global Assessment Technique Questions
Table 2: Description of objective and subjective metrics in 4th VAM-HRI Workshop papers. Blank spaces indicate a lack of metric of that type for that paper. Papers omitted from the table did not report metrics.

5 Current Trends & the Future of VAM-HRI

In this paper, the VAM-HRI Workshop is used as a case study for MRIDE classification and categorization within the Reality Virtuality Interaction Cube; however, the papers submitted to this workshop can also be used to exemplify and project current and future trends in the field of VAM-HRI. This growing sub-field of HRI is showing promise in enhancing all areas of HRI from robot control (e.g., teleoperation and supervision interfaces) to collaborative robotics and improving teamwork with autonomous systems. The following will cover some of the key insights gathered from this year’s workshop that show how VAM-HRI is evolving and improving the field of HRI as whole.

Figure 3: VAM-HRI user studies allow for the precise recording and playback of user interactions with experimental stimuli (e.g., human postures and posture shifts, task-related human movements, gestures, and gaze behaviors, etc.) as demonstrated in the above images from Mara et. al’s CoBot Studio [12].

5.1 Experimental Evaluation of VAM-HRI Systems

Research in HRI heavily features user studies in the evaluation of robotic systems and their interfaces. It has been an ongoing challenge to adequately record and playback human interactions with robot, to answer questions such as: ‘Where was the user looking at X time?,’ ‘How close was the human positioned relative to the robot at Y moment?,’ ‘What were the user’s joint values when using a new interface and how are the physical ergonomics evaluated?’ As a possible solution to many of these challenges, VAM-HRI allows for unprecedented recording, playback, and analysis of user interactions with virtual or real robots and objects in an experimental setting due to the inherent ability of HMDs (and other devices like a Leap Motion) to record body/hand/head position/orientation and gaze direction from a seemingly limitless number of virtual cameras recording from different angles

[22]. This is exemplified at a highly polished level in CoBot Studio [12] (see Figure 3).

However, it is interesting to note that although precise objective measures can be relatively easily gathered from VAM-HRI experiments only 2 of the 10 submissions to the VAM-HRI Workshop gathered any objective data (see Table 2). The lack of objective measures may be due to a handful of factors, such as the work being in a preliminary stage best suited for a workshop or the research questions being more focused on social responses and subjective opinions from users. Regardless of reason, we encourage authors of future VAM-HRI submissions to any venue to take full advantage of the objective measurements that VAM-HRI systems inherently provide, as objective observations are still useful for evaluating a multitude of social interactions (e.g., user pose for evaluating body language, user-robot proxemics, user gaze).

Although virtual reality interfaces have the aforementioned strengths for enhancing experimental evaluation, they have their own set of unique evaluation challenges as well, one of which being use of online studies with crowdworkers (e.g., on Amazon Mechanical Turk). HRI in general has made prolific use of online user studies (especially during the COVID-19 pandemic) that take advantage of cheap and readily available participants. However, VAM-HRI heavily draws upon 3D visualizations (as often seen in with HMD-based interfaces), which cannot be properly displayed to crowdworkers who lack HMDs and/or 3D monitors. Additionally, a strength of AR interfaces is that 3D data and visualizations can be rendered contextually in user’s environments and are able to be observed from any angle desired by the user. However, VAM-HRI studies feature AR interfaces, such as those performed by Mott et. al [16], that involve crowdworkers are restricted to images and videos that restrict the user’s viewpoint and do not allow for a traditional and full AR experience. It remains an open question if results from crowdsourced VAM-HRI studies provide comparable results to VAM-HRI studies run in person. The initial evaluation of object and interaction design, however, can be a vital step towards creating the full, in-person interaction.

5.2 VAM-HRI as an Interdisciplinary Study

HRI is well known to be an interdisciplinary field and VAM-HRI is showing to be no exception. The CoBot Studio project brings together roboticists, psychologists, AI experts, multi-modal communication researchers, VR developers, and professionals in interaction design and game design [12]. As the VAM-HRI field grows, it will likely become increasingly common (and needed) to see teams with varied experiences and skill sets contributing to collaborative research.

A field poised to inform VAM-HRI is that of multi-robot systems. Research in this area is an under-explored inspiration for VAM-HRI in regard to enhancing the complexity of model (CM) for robots utilizing VAM systems. The frameworks and techniques of the adjacent field may be able to be modified or even directly applied when treating the human user as an autonomous mobile sensor platform, akin to the human being treated as though they are another robot in the system. For example, spatial and semantic scene understanding are important perceptual capabilities for active robots (to navigate their environment) and passive VAM technologies (to localize the user’s field of view)

Additionally, experimentation techniques seen in the field of general Virtual Reality may aid in the administering of questionnaires and gathering participant feedback. Typical questionnaires administered by VAM-HRI researchers can be quite jarring for participants who experience extreme context shifts between virtual worlds (where the study took place) and the real world (where the feedback is gathered). This poses as a potential confounding factor for participants who no longer visually reference what they are evaluating and may romanticize or incorrectly remember experimental stimuli they can no longer see. The field of Virtual Reality has similar challenges and some studies have started to provide in situ evaluations where questionnaires are posed to users within the virtual environments [10]. We are beginning to see this trend of in situ surveys in VAM-HRI as well. In the CoBot studio project, surveys are administered within the experiment’s virtual setting, removing the confounding factors of: (1) reality-virtuality context shifts (having to leave the immersive virtual environment by taking off an HMD to take a mid-task survey); and (2) retrospective surveys provided well after exposure to experimental stimulus [12].

These cross-disciplinary trends are not uni-directional however, as VAM-HRI is currently posed to inform and improve the field of VR in return. Enhancing immersion has always been a primary goal of the field of VR since its inception many decades ago. With the rise of mass-produced consumer grade HMDs, visual immersion has reached new heights for users around the world. However, the challenge of providing physical immersion through the use of haptics has largely remained an open question: how can a user reach out and touch a dynamic character in a virtual world? Research in VAM-HRI has proposed a potential solution for dynamic haptics, where robots mimic the pose and movements of virtual dynamic objects. Work by Wadgaonkar et al. [20] exemplifies the notion of VAM-HRI supporting the field of VR with robots acting as dynamic haptic devices and allowing users to touch characters in virtual worlds and further enhance immersion in VR settings.

5.3 Advancements in VAM-HRI

A strength of VAM-HRI is the ability to alter a robot’s morphology with virtual imagery. This technique can take the form of body extensions where virtual appendages are added to a real robot, such as limbs [6], or form transformations where the robot’s entire morphology is altered, such as transforming a drone into a floating eye [21]. Recent VAM-HRI developments have further expanded upon this idea of changing a real robot’s appearance through the aforementioned morphological alterations to include superficial alterations as well, where virtual imagery can be used to change a robot’s cosmetic traits. Prior work has demonstrated that robot cosmetic alterations can communicate robot internal states (e.g., robotic system faults) [4]; however, to our knowledge, this is the first time such superficial alterations have been used to manipulate social interactions between human and robot [20].

Although the interactions studied in HRI are typically focused on that of the end-user, a lesser studied category of interaction exists, which is that between robots and their developers and designers. Debugging robots often proves to be a challenging and tedious task with robot faults and unexpected behavior being hard to understand or explain without parsing through command lines and error logs. To address this issue, prior work in VAM-HRI has used AR interfaces to enhance debugging capabilities [3, 14]. Work by Ikeda and Szafir [8] in VAM-HRI ’21 has built upon these concepts by providing in situ AR visualizations of robot state and intentions, allowing users to better compare robots’ plans with their actions when debugging autonomous robots. As AR hardware becomes increasingly intertwined with robotic systems, debugging tools such as these will likely become more commonplace to increase the efficiency and enjoyment of robot design.

Finally, VAM-HRI interfaces have been a popular topic of study within HRI for many years now, and many standard methods of interacting with robots through MR or VR have emerged (e.g., AR waypoints for navigation or AR lines for displaying robot trajectory [21]). However, novel methods of interacting with robots are still being designed today, an example of which being persistent virtual shadows, aimed at tackling the issue of knowing a robot’s location when out of the user’s line-of-sight. Whereas prior solutions have tried using 2D top-down radars for showing robot locations [21], issues remain as interfaces such as these require repeated context shifts be performed by the user to look at the physical surroundings and then to the radar. Solutions such as persistent virtual shadows circumvent this limitation by embedding robot location data into the user’s environment, providing a natural method of displaying a robot’s location. This is a location cue that humans have learned to interpret almost subconsciously throughout the course of their lives. Creative advances such as these will continue to emerge in this relatively nascent sub-field of HRI, presenting an exciting new future for both VAM-HRI and the field of HRI as a whole.


This work was supported by the National Science Foundation (NSF) under award IIS-1764092 and IIS-1925083.


  • [1] C. M. Barentine, A. McNay, R. Pfaffenbichler, A. Smith, E. Rosen, and E. Phillips (2021) Manipulation assist for teleoperation in VR. In Proc. of the 3rd International Workshop on Virtual, Augmented, and Mixed-Reality for Human-Robot Interactions (VAM-HRI), Cited by: Table 1.
  • [2] A. Boateng and Y. Zhang (2021-03) Virtual shadow rendering for maintaining situation awareness in proximal Human-Robot teaming. In Companion of the 2021 ACM/IEEE International Conference on Human-Robot Interaction, HRI ’21 Companion, New York, NY, USA, pp. 494–498. Cited by: Table 1, Table 2.
  • [3] T. H. J. Collett and B. A. Macdonald (2010) An augmented reality debugging system for mobile robot software engineers. Cited by: §5.3.
  • [4] F. De Pace, F. Manuri, A. Sanna, and D. Zappia (2018) An augmented interface to display industrial robot faults. In International Conference on Augmented Reality, Virtual Reality and Computer Graphics, pp. 403–421. Cited by: §5.3.
  • [5] T. Groechel, R. Pakkar, R. Dasgupta, C. Kuo, H. Lee, J. Cordero, K. Mahajan, and M. J. Matarić (2021) Kinesthetic curiosity: towards personalized embodied learning with a robot tutor teaching programming in mixed reality. In Experimental Robotics: The 17th International Symposium, Vol. 19, pp. 245. Cited by: §3.2.
  • [6] T. Groechel, Z. Shi, R. Pakkar, and M. J. Matarić (2019) Using socially expressive mixed reality arms for enhancing low-expressivity robots. In 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), pp. 1–8. Cited by: §3.3, §5.3.
  • [7] P. Higgins, G. Y. Kebe, A. Berlier, K. Darvish, D. Engel, F. Ferraro, and C. Matuszek (2021) Towards making virtual Human-Robot interaction a reality. In Proc. of the 3rd International Workshop on Virtual, Augmented, and Mixed-Reality for Human-Robot Interactions (VAM-HRI), Cited by: Table 1, Table 2, §4.
  • [8] B. Ikeda and D. Szafir (2021) An ar debugging tool for robotics programmers. Cited by: Table 1, Table 2, §4, §5.3.
  • [9] G. LeMasurier, J. Allspaw, and H. A. Yanco (2021) Semi-autonomous planning and visualization in virtual reality. Cited by: §3.3, Table 1.
  • [10] L. Lin, A. Normoyle, A. Adkins, Y. Sun, A. Robb, Y. Ye, M. Di Luca, and S. Jörg (2019) The effect of hand size and interaction modality on the virtual hand illusion. In 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pp. 510–518. Cited by: §5.2.
  • [11] J. I. Lipton, A. J. Fay, and D. Rus (2017) Baxter’s homunculus: virtual reality spaces for teleoperation in manufacturing. IEEE Robotics and Automation Letters 3 (1), pp. 179–186. Cited by: §2.2.
  • [12] M. Mara, K. Meyer, M. Heiml, H. Pichler, R. Haring, B. Krenn, S. Gross, B. Reiterer, and T. Layer-Wagner (2021) CoBot studio vr: a virtual reality game environment for transdisciplinary research on interpretability and trust in human-robot collaboration. Cited by: §3.1, Table 1, Table 2, Figure 3, §5.1, §5.2, §5.2.
  • [13] P. Milgram, H. Takemura, A. Utsumi, and F. Kishino (1995) Augmented reality: a class of displays on the reality-virtuality continuum. In Telemanipulator and telepresence technologies, Vol. 2351, pp. 282–292. Cited by: §2.2, Table 1.
  • [14] A. G. Millard, R. Redpath, A. M. Jewers, C. Arndt, R. Joyce, J. A. Hilder, L. J. McDaid, and D. M. Halliday (2018) ARDebug: an augmented reality tool for analysing and debugging swarm robotic systems. Frontiers in Robotics and AI 5, pp. 87. Cited by: §5.3.
  • [15] K. J. Mimnaugh, M. Suomalainen, I. Becerra, E. Lozano, R. Murrieta, and S. LaValle (2021) Defining preferred and natural robot motions in immersive telepresence from a First-Person perspective. In Proc. of the 3rd International Workshop on Virtual, Augmented, and Mixed-Reality for Human-Robot Interactions (VAM-HRI), Cited by: Table 1, Table 2.
  • [16] T. Mott, T. Williams, H. Zhang, and C. Reardon (2021) You have time to explore over here!: augmented reality for enhanced situation awareness in human-robot collaborative exploration. Cited by: §3.1, Table 1, Table 2, §5.1.
  • [17] D. Puljiz, B. Zhou, K. Ma, and B. Hein (2021) HAIR: head-mounted AR intention recognition. In Proc. of the 3rd International Workshop on Virtual, Augmented, and Mixed-Reality for Human-Robot Interactions (VAM-HRI), Cited by: Table 1.
  • [18] E. Rosen, D. Whitney, M. Fishman, D. Ullman, and S. Tellex (2020) Mixed reality as a bidirectional communication interface for human-robot interaction. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 11431–11438. Cited by: §3.2.
  • [19] N. Tran, K. Mizuno, T. Grant, T. Phung, L. Hirshfield, and T. Williams (2020) Exploring mixed reality robot communication under different types of mental workload. In International Workshop on Virtual, Augmented, and Mixed Reality for Human-Robot Interaction, Vol. 3. Cited by: §3.3.
  • [20] C. P. Wadgaonkar, J. Freischuetz, A. Agrawal, and H. Knight (2021) Exploring behavioral anthropomorphism with robots in virtual reality. Cited by: §3.1, Table 2, §5.2, §5.3.
  • [21] M. Walker, H. Hedayati, J. Lee, and D. Szafir (2018) Communicating robot motion intent with augmented reality. In Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, pp. 316–324. Cited by: §5.3, §5.3.
  • [22] T. Williams, L. Hirshfield, N. Tran, T. Grant, and N. Woodward (2020) Using augmented reality to better study human-robot interaction. In International Conference on Human-Computer Interaction, pp. 643–654. Cited by: §5.1.
  • [23] T. Williams, D. Szafir, and T. Chakraborti (2019) The reality-virtuality interaction cube. In Proceedings of the 2nd International Workshop on Virtual, Augmented, and Mixed Reality for HRI, Cited by: §1, §2.1.