Virtual, Augmented, and Mixed Reality for Human-Robot Interaction: A Survey and Virtual Design Element Taxonomy

Virtual, Augmented, and Mixed Reality for Human-Robot Interaction (VAM-HRI) has been gaining considerable attention in research in recent years. However, the HRI community lacks a set of shared terminology and framework for characterizing aspects of mixed reality interfaces, presenting serious problems for future research. Therefore, it is important to have a common set of terms and concepts that can be used to precisely describe and organize the diverse array of work being done within the field. In this paper, we present a novel taxonomic framework for different types of VAM-HRI interfaces, composed of four main categories of virtual design elements (VDEs). We present and justify our taxonomy and explain how its elements have been developed over the last 30 years as well as the current directions VAM-HRI is headed in the coming decade.


page 5

page 10

page 15

page 18

page 22


Augmented Reality and Robotics: A Survey and Taxonomy for AR-enhanced Human-Robot Interaction and Robotic Interfaces

This paper contributes to a taxonomy of augmented reality and robotics b...

TOKCS: Tool for Organizing Key Characteristics of VAM-HRI Systems

Frameworks have begun to emerge to categorize Virtual, Augmented, and Mi...

Didn't see that coming: a survey on non-verbal social human behavior forecasting

Non-verbal social human behavior forecasting has increasingly attracted ...

Augmented Reality Appendages for Robots: Design Considerations and Recommendations for Maximizing Social and Functional Perception

In order to address the limitations of gestural capabilities in physical...

MVC-3D: Adaptive Design Pattern for Virtual and Augmented Reality Systems

In this paper, we present MVC-3D design pattern to develop virtual and a...

Spatial Data Science: Closing the human-spatial computing-environment loop

Over the last decade, the term spatial computing has grown to have two d...

1. Virtual, Augmented, and Mixed Reality for Human-Robot Interaction

Although robots are poised to increasingly support human society across a multitude of critical industries (e.g., healthcare, manufacturing, space exploration, agricultural) robot usage has remained limited due to the difficulties of robot control, supervision, and collaboration. A major source of this limitation arises from the bi-directional challenge of human-robot communication. Robots are often found to be incomprehensible, and humans struggle to predict robot capabilities or intentions. Simultaneously, robots lack the ability to reason about complex human behaviors: a skill inherently required for effective collaboration with humans.

At the heart of this problem lies the issue of poor information exchange between humans and robots, where neither can understand what the other is explicitly or implicitly conveying. This is analogous to the Gulf of Execution and Gulf of Evaluation concepts within the Human Action Cycle, a proposed model describing human interactions with complex systems from the cognitive engineering and human-computer interaction communities (Norman and Draper, 1986). Humans regularly have issues conveying their high-level goals as inputs a robot can understand (e.g., gulf of execution), while robots often provide ineffective or no feedback to allow humans to assess the robotic system’s state (e.g., gulf of evaluation).

An example gulf of evaluation found in human-robot interactions is the motion inference problem, where robot users find the task of predicting when, where, and how a robot teammate will move to be difficult due to lack of information communicated from the robot. Information regarding a robot’s planned movement is often invisible to human users, and even in circumstances where a robot is made to communicate its movement intent, the robot may lack the ability to share its motion plans as a human teammate would. A large amount of work in human-robot interaction (HRI) has looked at addressing this gulf of evaluation, such as by having robots use human-inspired social cues (e.g., gaze, gestures, etc.) to communicate their intentions (Sanghvi et al., 2011), altering robot trajectories to be more legible or expressive (Dragan et al., 2013; Szafir et al., 2014), or using various other means such as auditory indicators (Tellex et al., 2014). Although such techniques have shown effectiveness in reducing this gulf of evaluation by making robot motion more predictable, common constraints (e.g., computational, platform, environmental) may limit their feasibility when deployed in the real world. For example, an aerial robot’s morphology would prevent it from performing hand gestures or gaze, a dynamic or cluttered environment may restrict robots from altering from optimal trajectories to be more legible, and robot auditory indicators may be rendered ineffective if deployed by a noisy robot (e.g., aerial robots) or in a noisy environment.

To mitigate these issues, new methods of human-robot communication are being explored by HRI researchers that leverage more than the verbal or non-verbal cues seen in traditional human communication. New forms of visual communication have shown great promise in enhancing human-robot interaction, ranging from enhanced graphical displays to improve robot control that reduce gulfs of execution to LED lights that communicate various robot signals (Szafir et al., 2015) that reduce gulfs of evaluation. Recently, the rise of consumer-grade, standardized virtual, augmented, and mixed reality (VAM) technologies (including the iPad, Microsoft HoloLens, Meta 2, Magic Leap, Oculus Rift/Quest, HTC Vive, etc.) has created a promising new medium for information exchange between users and robots and is well suited to enhance human-robot interactions in a variety of ways. VAM interfaces allow users to see 3D virtual imagery in a virtual space or contextually embedded within their environment. Up to this point, robot users have been forced to use traditional 2D screens to analyze the rich 3D data a robot often collects about its environment. VAM technology can also be used hands-free when in the form of a head-mounted display (HMD) that allows for more fluid and natural interactions with robots in a shared physical environment. Users can also be immersed in purely virtual worlds and interact with virtual robots which allows HRI researchers to evaluate interactions that would otherwise be impossible to observe either due to safety concerns or lack of access to an expensive physical robot(s). Finally, VAM interfaces allow HRI researchers to record and analyze human-robot interactions unlike ever before by leveraging the body, head, and gaze tracking inherent with VAM HMDs.

This paper traces the development of early work merging HRI and VAM technology (which, while promising, was often hampered by limitations in underlying VAM technologies) and highlights more recent work that leverages modern systems. While this interdisciplinary surge of research is exciting and valuable, in research fields can raise disciplinary challenges. For example, many researchers are likely (unbeknownst to each other) to be simultaneously working on similar problems or using similar techniques, while potentially using significantly different terminology and conceptual frameworks to ground and disseminate their work. Such a lack of shared awareness and shared terminology may introduce problems for future research. Even the most basic terms used within this new wave of VAM-HRI work, that might naturally be assumed to have obvious, commonly agreed upon meanings — words like “virtual reality,” “augmented reality,” “mixed reality,” “user interface,” and “visualization” — are rendered imprecise by the multifarious uses of mixed reality visualizations and of virtual and mixed reality systems. It is thus critical for the research community to have a common set of terms and concepts that can be used to accurately and precisely describe and organize the wide array of work being done within this field.

In this paper, we present a taxonomy for VAM-HRI 3D command sequencing paradigms and Virtual Design Elements (VDEs) to provide researchers with a shared, descriptive basis for characterizing the types of systems HRI researchers are developing and deploying in both mixed and virtual reality systems. This taxonomy is the result of surveying 175 papers published in 99 conferences and journals over a time span of 30 years and is informed by recent (but nascent) attempts (Williams et al., 2019b) to grapple with the breadth and complexity of this field through the series of Virtual, Augmented, and Mixed-Reality for Human Robot-Interaction (VAM-HRI) workshops held in conjunction with the ACM/IEEE International Conference on Human-Robot Interaction (Williams et al., 2018). Each category, class, and VDE within our proposed taxonomy aims to provide HRI researchers (both those working within VAM-HRI and otherwise) with the shared language necessary to advance this subfield along a productive and coherent path.

Our goal in creating this taxonomy is not only to provide a shared language for researchers to use to describe and disseminate their work, but also to aid researchers in connecting with the host of complementary work being performed in parallel to their own. This taxonomy may enable researchers to better understand the benefits of different types of virtual imagery, more quickly identify promising graphical representations across different domains and contexts of use, and build knowledge regarding what types of representations may currently be under- (or over-) explored.

As a final contribution of this work, we produce an online version of this taxonomy ( that visualizes the categorization of all 175 papers. Notably, this online platform enables any VAM-HRI researcher to issue a pull request to add their research to this categorization scheme. It is our hope that this platform serves as a living resource that the VAM-HRI community may use to track the continued progress and growth of our nascent field

2. VAM-HRI Advancement Over Time

To begin to understand how the field of VAM-HRI reached its current state, one can trace the development of VAM technologies back to Sutherland’s vision of “The Ultimate Display” (itself influenced by Vannevar Bush’s conception of the Memex) (Sutherland, 1965) and later developments with the Sword of Damocles system (Sutherland, 1968). The earliest major work leveraging VAM for HRI appears to date back to a push in the late 1980’s and early 1990’s with various work exploring robot teleoperation systems (Bejczy et al., 1990; Kim et al., 1987). Perhaps the most fully-developed instance of these early systems was the ARGOS interface for augmented reality robot teleoperation (Milgram et al., 1993). While the ARGOS interface used a stereo monitor, rather than the head-mounted displays in vogue today, the system introduced several design elements for displaying graphical information to improve human-robot communication and introduced concepts such as virtual pointers, tape measures, tethers, landmarks, and object overlays that would influence many subsequent designs. Later developments throughout the 1990’s introduced several other important concepts, such as the use of virtual reality for both actual robot control and teleoperator training (Hine et al., 1995), the integration of HMDs (including the first use of an HMD to control an aerial robot) (de Vries and Padmos, 1997), projective virtual reality where user “reaches through” a VR system to control a robot that manipulates objects in the real world (Freund and Rossmann, 1999), the rise of VAM applications for robotics in medicine and surgery (Burdea, 1996), and continued work on ARGOS and ARGOS-like systems (Milgram et al., 1995). At a high level, major themes appear that focus on using VR for simulation or training purposes, VR and/or AR as new forms of information displays (e.g., for data from robot sensors), and VAM-based robotic control interfaces. While many of these developments appear initially promising, it is interesting to note that following an initial period of intense early research on HRI and VAM, later growth throughout the 1990’s appears to have happened at a relatively stable rate, rather than rapidly expanding. In addition, efforts to take research developments beyond laboratory environments into commercial/industrial systems appear to have been largely unsuccessful (indeed, even today robot teleoperation interfaces are still typically based on standard 2D displays rather than leveraging VAM).

In recent years, research in the field of VAM-HRI has seen explosive growth. This recent explosion is due in part to the emergence of commercial head-mounted displays (HoloLens, Vive, Oculus, etc.) as well as enhanced computer performance, which together have shifted the field from one requiring specialized hardware developed in research labs to one in which significant advances can be made immediately using standardized, inexpensive hardware systems that provide expansive software development libraries.

The rise of new VAM-HRI research has demonstrated the potential for virtual reality (VR), augmented reality (AR), and mixed reality (MR) across a number of application domains that are of significant historical interest to the field of HRI, including education (García et al., 2011; Chang et al., 2010), social support (Quintero et al., 2015; Hedayati et al., 2018), and task-based collaboration (Chandan et al., 2019; Frank et al., 2016), while illustrating how MR and VR can be used as new tools in a HRI researcher’s toolkit for robot design (Cao et al., 2018), human-subject experimentation (Wijnen et al., 2020; Williams et al., 2020), and robot programming and debugging (Hoppenstedt et al., 2019).

AR visualization techniques offer significant promise in HRI due to their capability to effortlessly communicate robot intent and behavior. For example, researchers have shown how AR can be used to visualize various aspects of robot navigational state, including heading, destination, and intended trajectory (Walker et al., 2018). Similarly, others have shown how AR can be used to visualize a robot’s beliefs and perceptions by displaying both exteroceptive (e.g., laser range finder data) and proprioceptive (e.g., battery status) sensor data (Avalle et al., 2019; Cao et al., 2018). Such virtual imagery can improve the situational awareness of human teammates during human-robot collaboration and teleoperation, making interaction more fluid, intuitive, safe, and enjoyable.

VR techniques, on the other hand, provide unique opportunities for safe, flexible, and novel environments in which to explore HRI. For instance, researchers have explored how virtual environments can help humans learn how to work with new robots (Pérez et al., 2019) and help robots learn new skills (Dyrstad et al., 2018; Iuzzolino et al., 2018). Using VR interfaces for these purposes reduces risks, addresses spatial and monetary limitations, and promotes new opportunities to experiment with new (or not yet physically feasible) robots and environments. Moreover, VR interfaces can be used to visualize the real world in new ways, allowing teleoperators and supervisors to view robots’ real environments through video streams or point cloud sensor displays in a more immersive and intuitive manner than traditional 2D displays (Bosch et al., 2016; Sun et al., 2020). In addition, VR systems may enable users to exert more fine-grained control over teleoperated robots, including unmanned aerial and nautical vehicles, by leveraging related technologies such as head tracking, haptic controllers, and tactile gloves (Ibrahimov et al., 2019).

3. VAM Interfaces for Robotics

The advancements in hardware accessibility have created a host of new opportunities for exploring VAM technology as an interaction medium for enhancing various aspects of HRI. Various VAM displays, degrees of reality, and coordinate system calibration techniques, and VAM-HRI interface paradigms have all been used in different combinations to successfully apply VAM interfaces to robotics.

3.1. The Reality-Virtuality Continuum

VAM displays are computer displays that either immerse users in an entirely synthetic world or merge both the real world and a synthetic world. Note, that although VAM refers to more than just visual senses (i.e., haptic, auditory, etc.), for the duration of this paper VAM will refer only to synthetic imagery.

VAM technology is capable of mixing varying degrees of reality and virtuality. As observed by Milgram et al. in 1994, all VAM displays fall upon a “Reality-Virtuality Continuum” (Milgram and Kishino, 1994)

. This taxonomy has served as a useful tool for classifying VAM interfaces as well as coined the term “Mixed Reality.” Per the Reality-Virtuality Continuum, interfaces that place users in environments consisting of only synthetic imagery are considered “Virtual Reality” (VR), accordingly interfaces that only consist of real imagery are considered based in reality. This leaves a middle ground, where synthetic and real imagery are combined to form the space of “Mixed Reality” (MR). Within the design space of MR lie two sub-categories of “Augmented Reality” (AR), where synthetic imagery is added to a real environment, and “Augmented Virtuality” (AV), where real imagery is added to a synthetic environment. Together with VR, MR, consisting of AR and AV, one can categorize VAM interfaces and provide more specific design guidelines for HCI interfaces.

Figure 1. Milgram’s Reality-Virtuality Continuum (Milgram and Kishino, 1994).

3.2. Display Hardware

VAM interfaces can be implemented by various display hardware. The following is a list of common VAM display types.

2D Monitor Video Displays: 2D monitor video displays provide a means of overlaying AR synthetic imagery on real images or video feeds, as “window-on-the-world” displays. Due to the inherent 2D nature of traditional monitors, users of these displays are not able to experience depth when viewing interfaces on these displays. Additionally, these displays tether users to computer terminals, and do not allow for a hands-free, free-roam of environments in which AR imagery are being added.

3D Monitor Video Displays: 3D monitor video displays are similar to the above AR 2D monitor video displays, with the difference being that users can see interfaces with depth. Users can see with depth with either glasses-free 3D monitors or with 3D glasses (i.e., anaglyph, polorized, or active shutter).

Tablet Displays: Tablet displays are also similar to AR 2D monitor video displays, although a major difference being users can freely explore their environments due to the portable nature of tablets. Additionally, tablets provide users with touch-based interactions, which is uncommon in the monitor-based displays.

Projectors: Projector-based displays provide AR synthetic imagery to a user’s environment, that users can see without the need of special equipment, such as glasses or head-mounted displays (HMDs), and also allow users to freely roam their environment. Drawbacks to this display type include: the projected imagery is inherently 2D; occlusions (from the environment, users, or robots) disrupt and/or block the AR imagery; and the projected imagery can get washed out in environments that are too bright.

CAVEs: CAVEs, or Cave Automatic Virtual Environments, are VR displays where users are placed in a three to six walled environment. The walls of the environment display 3D imagery when paired with 3D glasses allowing users to be immersed in AV or VR environments. CAVE’s allow for multiple users to experience the VR or AV environments, but unfortunately are expensive, confining, and immovable.

VR HMDs: VR HMDs are 3D stereoscopic displays worn on users’ heads to fully immerse users in VR or AV environments. Recent advances in VR HMD technology have allowed users to not only control the display with head motion, but with hand and body motion as well, permitting users to be hands-free while able to free to walk around synthetic environments naturally with their own body. A downside to these displays is that they enclose users off from the real world without an option of seeing reality from the display while worn.

Optical See-Through HMDs: Optical See-Through HMDs present AR through transparent lenses, allowing users to see virtual imagery overlaid on the real world. Similarly, to VR HMD’s, these displays provide users with hands-free, environmental free-roam experience. However, current technology is restricted to narrow field-of-views to see the virtual imagery. Additionally, the imagery provided by these HMDs are easily washed out in bright lights.

Video Pass-Through HMDs: Video pass-through HMDs are unique in that they can provide AR, AV, and VR experiences to users. By mounting a stereo cameras to the front of a VR HMD, video pass-through HMD’s can pipe video imagery from the real outside world to the dual lens within the HMD. The video feed can be intercepted between camera and lens, allowing for virtual imagery being added to the captured image frames. This process of displaying AR imagery allows these HMD displays to show AR imagery in environments with bright light, such as the outdoors.

3.3. VAM Interface Coordinate Frame Rectification

Finally, to successfully merge both reality and virtual reality in a single interface and to have VAM imagery appear in its appropriate position and orientation within the world, a singular coordinate frame must be maintained to manage the positions of the various virtual imagery. However, VAM interfaces for HRI present the unique challenge of multiple physical agents (users and robots) each having their own unique perspectives and coordinate frames. These agents’ frames are also often moving and changing at any given times, requiring real-time tracking of users, robots, and objects within an environment. Research has explored the following various methods for extracting and unifying coordinate frames:

Fiducial Markers: Fiducial markers are images that have their optical properties known by a VAM interface beforehand and act as visual reference points for the systems. When a VAM interface’s camera detects a marker, it can determine the marker’s relative pose to that of the camera. These markers are often placed in a robot’s environment, on robots, or on objects within a robots environment. This method of frame rectification is popular due to its high portability and low-cost (Hashimoto et al., 2011; Borrero and Márquez, 2012; Frank et al., 2017; Kobayashi et al., 2007).

Motion Capture Cameras: Motion capture cameras can obtain poses of robots, users, and objects in an environment with high precision by tracking patterns of infrared reflecting markers. Unfortunately, these systems are expensive and immovable, often making interfaces that use this method of frame rectification constrained to laboratory environments (Walker et al., 2018; Hedayati et al., 2018; Walker et al., 2019).


: Another method of unifying coordinate frames between user and robot is by using odometry (including visual odometry with computer vision algorithms such as SLAM

(Mur-Artal et al., 2015)). If agent coordinate frames are initially synced at the initialization of a VAM-HRI interface, odometry can track the relative pose changes agents have undergone over time, which can be used to maintain a rectified coordinate system (Reardon et al., 2018; Gregory et al., 2019).

ICP Alignment: Coordinate frame rectification is also made possible by using point cloud alignment algorithms, such as Iterative Closest Point (ICP) (Segal et al., 2009), which can take in separately collected point clouds and output the transformation between the two perspectives. This method requires all agents to have some overlap between their collected point clouds which makes this method most suitable for initial frame synchronization between multiple agents that is followed by odometry-based frame rectification methods above (Gregory et al., 2019; Reardon et al., 2018).

Machine Learning Image-Based Pose Estimation

: Relative poses between agents can also be found through machine learning algorithms, trained to estimate an object’s pose through image frames. Current technology limits this method to known, simple objects; however, as machine learning algorithms strengthen

(Xiang et al., 2017) it is feasible to imagine this method becoming more popular for agent frame rectification, especially if it means fiducial markers or motion capture cameras can be eliminated (Bolano et al., 2019).

3.4. VAM-HRI Interaction Design Paradigms

VAM-HRI interfaces often end up following higher-level implementation paradigms for various types of interactions. This can be seen especially in the cases of HMD teleoperation interfaces for remote robots and robot command sequencing.

HMD-Mediated Remote Robot Teleoperation Interfaces

The recent advent of mass-produced HMD technology has seen a rise in HMD interfaces that mediate remote robot teleoperation, for either navigational tasks or manipulation tasks. As noted by Lipton et al. (Lipton et al., 2017), these interfaces can fall into three classes: Direct Interfaces, Virtual Control Room Interfaces, and Cyber-Physical Interfaces.

Direct HMD Teleoperation Interfaces: Direct HMD teleoperation interfaces live-stream stereo video feeds from remote robots to the HMD’s lenses. This process allows users to see from the robot’s ‘eyes’ with immersive 3D stereoscopic vision as if they were embodying the remote robot, especially if user head motions control robot head motions with a one-to-one mapping. These HMD-based interfaces have shown to significantly improve robot teleoperation tasks. Additionally, if virtual imagery is overlaid on the video stream, the interfaces become a AR interface (Higuchi and Rekimoto, 2013). However, there is a significant drawback associated with Direct HMD Interfaces that stems from both miscues from the user’s proprioceptive system and the communication delays inherently found with current network technology. When a remote robot moves and the local user does not move, the user’s proprioceptive system receives conflicting cues (visual cues of movement vs. no body motion detected) causing nausea. The same happens in reverse when a local user turns their head and robot’s head does not immediately turn to match the movement (due to mechanical limitations or communication delays), which creates conflicting proprioceptive cues (no visual cues of movement vs. body motion detected) that also cause nausea.

AV HMD Teleoperation Interfaces: To mitigate the nauseating effects of the Direct HMD Interfaces, two AV HMD Teleoperation Interface paradigms have arisen from research in the VAM-HRI field: Virtual Control Room and Cyber-Physical Interfaces (Lipton et al., 2017). In both interface styles, the state of the user’s eyes are decoupled from the robotic systems state to remove the conflicting proprioceptive system cues. By placing the user in a AV environment, the users’ eyes are represented by virtual cameras in the virtual space that move freely with the users head and body movements. This decoupling method helps mitigate nausea caused by communications/hardware delays and/or imperfect mappings between user head motion and robot head motion.


Virtual Control Room Model: In the Virtual Control Room Model, the user is placed in a virtual room that serves as a supervisory command and control center of a remote robot. Within the control room, the user is able to interacts with displays and objects in the virtual space itself, and can view 3D stereo video streams projected on the rooms walls thus still allowing the user to still see from the robot’s perspective with depth (Lipton et al., 2017; Kot and Novák, 2014) (see Figures 3-A and 4-C).


Cyber-Physical Model: In the Cyber-Physical Model a shared AV virtual space is created (typically with a one-to-one mapping) between: (1) a remote robot and a virtual environment; and (2) a human operator(s) and a virtual environment. Additionally, a 3D reconstruction of the robot’s remote environment is rendered (typically with dense RGB point clouds) within the virtual environment to provide situational context and awareness to the human operator. A virtual robot replica of the remote physical robot is also added to the virtual environment in the same relative location within the virtual environment as in the real remote environment. This virtual robot mimics the remote real robot’s state and actions. The user can also use the virtual robot to send commands to the remote real robot or visualize the current state or actions being undertaken by the physical robot. A benefit of this interface paradigm over that of the Virtual Control Room, is that the user can freely change their viewpoint within the remote environment by walking around the AV environment 3D reconstruction, as they are not only provided the view from the robot’s camera(s). However, the sense of immersion from virtually embodying the remote robot is lost (Rosen et al., 2020; Sun et al., 2020; Allspaw et al., 2018) (see Figure 3-B).

VAM-HRI 3D Command Sequencing

Our VAM-HRI literature review revealed recurring high-level themes for robot 3D command sequencing. We propose the following three paradigms that capture these methods of controlling robots with VAM interfaces: Direct Manipulation, Environment Markup, and Digital Twins.

Direct Manipulation: The paradigm of Direct Manipulation leverages more traditional methods to directly control robots and utilizes 3D translation and/or rotation input from either physical or virtual source, commonly to send teleoperation commands to end effectors or navigational systems. Direct manipulation from physical inputs include body/head tracking and VAM-based 3D controllers (Whitney et al., 2018). Virtual controllers can act as metaphors for existing physical control input devices (i.e., levers, handles, joysticks, etc.) (Hashimoto et al., 2011) or utilize novel designs unconstrained by physics such as floating control spheres (Krupke et al., 2016).

Environment Markup: Under the Environment Markup paradigm, users send commands to robots by adding virtual annotations to a robot’s environment. Examples of such environmentally-anchored annotations include waypoints, trajectories, and planned future poses of manipulable objects. These annotations can take the form of a simple single command or can be chained/combined together to form a series of commands or a singular complex command (Chan et al., 2018; Ishii et al., 2009).

Digital Twins: In contrast to commands sent under the Environment Markup paradigm, Digital Twin command sequencing does not add virtual annotations to a robot’s environment. Instead, command sequences are generated by the manipulation of a digital twin which is a virtual replica (or representation) of a real object or robot. For example, in the case of a robot with a digital twin, a user could press a virtual button on the robot’s digital twin, at which point, the real robot would respond as if its own physical button was pressed. Additionally, robot teleoperation can be achieved by directly manipulating the robot digital twin (arm, body, etc.), after which, the real robot imitates the action taken by the digital twin (e.g., the robot moves itself in the exact trajectory taken by the digital twin or moves its end effector to the final position taken by the digital twin). Real object manipulation is performed in a similar manner, but in this case, a robot mimics a user’s object digital twin manipulations on the associated real object (Sun et al., 2020; Frank et al., 2017; Hashimoto et al., 2011; Krupke et al., 2018).

Figure 2. Virtual Design Element Taxonomy Table

4. VAM-HRI Virtual Design Element Taxonomy

In this paper we propose a novel taxonomy for identifying and categorizing VAM-HRI Virtual Design Elements (VDEs) (Williams et al., 2019b). VAM-HRI VDEs are VAM-based visualizations that impact robot interactivity by providing new or alternate means of interacting with robots. VDEs can appear in two ways: (1) user-anchored — attached to points in the user’s camera’s coordinate system, unchanging as the user changes their field of view (see Figure 4-D); or (2) environment-anchored — attached to points in the coordinate system of a robot or some other element of the environment, rather than the interface itself (Williams et al., 2019b) (see Figure 3-C).

Over the course of the following sections, we will detail each category and class of VDE within the taxonomy, which we organize into the following categories: Virtual Entities, Virtual Alterations, Robot Status Visualizations, and Robot Comprehension Visualizations (see Figure 2). Each VDE category was identified after surveying the aforementioned 175 VAM-HRI research papers, and acts as a high-level categorical grouping of VDEs that share common purposes for enhancing and/or manipulating human-robot interactions with virtual or mixed reality. Note that VDEs are instantiations of the taxonomic sub-classes and can be used both in isolation and in synergistic conjunction by combining two or more VDEs (e.g., using a Cosmetic Alteration to communicate the Internal Readiness of a robot joint (Avalle et al., 2019)).

4.1. Virtual Entities

The first category of VDEs we will examine are Virtual Entities: visualizations in which virtual entities, such as robots or objects, are added to real or virtual user environments. These VDEs can be a standalone VAM-based graphical elements that act as a visualization aid, input device, or simulation of an entity found in reality; however, they can also take the form of digital twins that are directly associated with a physical entity in the user and/or robot environment. We detail each of the three types of Virtual Entity classes below.

Virtual Entities – Robots

The Robots class of the Virtual Entity VDE category encompasses visualizations of robots that can be provided to users as either a visual tool for inspection or have full kinematic models and allow for complex interactions with users. These VDEs provide a level of immersion (a requirement for HRI simulations) unparalleled by simulations viewed on traditional 2D displays (such as Gazebo (Meyer et al., 2012)) since the Functional Virtual Robots is visually integrated within the user’s environment. We identify three sub-classes of such virtual robot entities:

Visualization Robots improve user understandings of a robot’s current state or future actions but do not afford any two-way interactions with users (i.e., users cannot direct input to the graphical representation). These VDEs typically model a robot’s 3D morphology, in whole or in part. Uses include providing users with a means of understanding a robot’s current state in which case a virtual 3D model of a real robot mimics a physical robot’s joint configurations in real time. Visualization Robots are particularly useful in situations of limited situational awareness when the user cannot directly see the physical robot (or portions of the robot), such as in remote teleoperation tasks (Kot and Novák, 2014) (see Figure 3-A). Visualization Robot VDEs can also provide users with a preview of proposed or planned robot motion by overlaying a 3D robot model onto the environment to allow for better means to assess how a robot will navigate through an environment and whether it will successfully travel to a desired location without collisions. Finally, Visualization Robots can also show the locations of robots that are partially or fully occluded by the environment such as behind a wall or door (Rosen et al., 2020) (see Figures 3-B and 5-D).

Simulated Robots

are not linked to a specific instance of a physical robot and are instead independent robot simulations used to evaluate robots in fully virtual settings when using a real robot is not ideal, such as when robot hardware is unavailable, unsafe, has limited battery life, and/or faces physical depreciation when operated repeatedly (as required for sample-inefficient learning algorithms and/or interaction studies). These VDEs are also used for evaluating simulated interactions with autonomous or manually teleoperated robots, e.g., in situations where collocated interaction is infeasible or hazardous for humans (i.e., working near large industrial robots, testing space exploration robots in zero gravity environments, etc.). Simulated Robots are also useful for user training without putting robot hardware at risk of being damaged by inexperienced users and for training real robots through virtual-to-real-world transfer learning techniques

(Iuzzolino et al., 2018) that require direct user interaction, such as learning from demonstration (Argall et al., 2009; De Pace et al., 2018; Stilman et al., 2005; Meyer zu Borgsen et al., 2018) (see Figure 3-D).

Robot Digital Twins operate in tandem with real robots and provide users with an immersive virtual robot that can be interacted with in lieu of a real, physical robot. By interacting first with a Robot Digital Twin, users can better predict how their actions will affect the system and offer foresight into the eventual pose and position of the physical robot as it mimics the actions taken by the virtual robot. For instance, mappings between the Robot Digital Twin and real robot include instantaneous duplication (the physical robot moves to match the Robot Digital Twin’s position/attitude immediately), delayed duplication (the physical robot moves to match the Robot Digital Twin’s position/attitude after a set period of time), confirmed duplication (the physical robot matches the Robot Digital Twin when triggered by the user), and more planning-oriented systems, such as using the Robot Digital Twin to denote waypoints or actions for future execution by the physical robot (Krupke et al., 2018; Sun et al., 2020; Walker et al., 2019; Hashimoto et al., 2011) (see Figure 3-C).

Virtual Entities – Control Objects

These VDEs represent virtual objects that users can interact with to send direct commands that control robotic systems. Control Objects can be 2D or 3D and can be user-anchored (such as 2D buttons on an AR tablet that remain in static positions regardless of where the display is pointed — see Figure 4-D) or environment-anchored (such as a virtual 3D handles that remain fixed to a robot chassis — see Figure 5-F). We identify two sub-classes of Control Objects that allow for developers to rapidly prototype and evaluate various interfaces and designs for robot input without procuring or engineering actual hardware:

Panels & Buttons are virtual control objects that look and act like panels and buttons found in real life and 2D GUIs (e.g., buttons, sliders, switches, etc.) (Li et al., 2019) (see Figure 4-D).

Controllers emulate physical 3D input devices that leverages the 3D capabilities of VAM displays. Controllers often act as metaphors for existing physical control input devices (i.e., levers, handles, joysticks, etc.). However, Controller VDEs have also opened a nascent design space that allows robot designers to create interface input devices that are unconstrained by physics in the form of objects unable to be implemented or on earth or in reality, such as manipulable control toruses (Hashimoto et al., 2011) or floating control spheres (Krupke et al., 2016), allowing for robot interactions that would otherwise be impossible to implement and/or evaluate with traditional non-VAM interfaces (see Figure 5-F).

Virtual Entities – Environmental

The class of Environmental Virtual Entities encapsulate VAM-based visualizations of entities found in a user or robot’s environment. For instance, virtual representations may be used to simulate agents, objects, and entire environments that do not physically exist in a robot’s current development environment but will physically exist when a robot is deployed (e.g., to enable testing in a laboratory environment that mimics conditions a robot would encounter in the field as robots are unable to distinguish between the simulated virtual objects and real objects). In addition, these VDEs can be used for simulating how a robot might interact with such objects. These simulated entities often enhance interactions between robot and developers when debugging/assessing robotic systems by freeing developers from real agents/terrain to test a system. Alternatively, Environmental Virtual Entities can be associated with physical agents, terrain, and objects already present in a robot’s environment in the form of digital twins. We detail five VDEs in the Environmental sub-class of Virtual Entities:

Simulated Agents simulate, with virtual imagery, physical entities that normally have independent agency (e.g., autonomous robots, humans, animals, etc.). In the case of these simulations, the robot is not able to differentiate between real agents and Simulated Agents. The primary use of these VDEs is to enable robot testing and debugging without requiring the presence of key agents with whom robots would need to interact. For example, a large industrial robot might practice object handovers with a simulated human, without putting any human lives in harm’s way, while the robot’s developers observe and evaluate the mock interactions. Alternatively, a Simulated Agent VDE may be used in tandem with an autonomous Simulated Robot VDE for real humans to interact with, allowing for the testing of autonomous interactions with intelligent robots in situations where a physical robot is unavailable or currently infeasible (Meyer zu Borgsen et al., 2018) (see Figure 3-D).

Simulated Objects use virtual imagery to simulate the presence of physical objects in a robot’s environment. It is important to note that these nonexistent virtual objects hold no association with any real objects in a robot’s setting. These nonexistent objects can be used to simulate obstacles in debugging sessions with real robots (e.g., a virtual wall, table, chair, etc), without robot developers needing to procure physical objects, enabling rapid modification, deletion, and duplication of objects (Borrero and Márquez, 2012) (see Figure 5-G).

Simulated Environments use virtual imagery to synthetically create the presence of environment areas or terrains. These VDEs can be used to evaluate autonomous robot responses to hazardous terrain (e.g., loose gravel, sand, water features, etc.) without endangering robots. Simulated Environments can also be used to evaluate robot interactions in environments that are difficult to find or recreate on Earth such as a lunar Moon with decreased gravity. Finally, Simulated Environments can provide a realistic setting to evaluate interactions (e.g., object hand-offs between user and robot) between Simulated Robots and users (Meyer zu Borgsen et al., 2018) (see Figure 3-D).

Object Digital Twins act as virtual replicas of associated real objects to sequence actions to be taken on their real-world equivalents. These VDEs can allow users to preview actions to be taken on the real object prior to robot execution. For example, a real cup to be moved by a real robot might have a virtual cup overlaid on its current position. A user could then interact with the virtual cup and move it to a new location, fine tune its final placement, and then command a robot to move the real cup to the position of the virtual cup. This interaction pattern can also enable robot action previewing similar to (and potentially in conjunction with) Robot Digital Twins (Krupke et al., 2018; Frank et al., 2017) (see Figure 5-H).

Environment Digital Twins are virtual replicas of real environments rendered as a VAM-based visualization. These environments can be man-made structures/areas or outdoor terrain that are made to be exact replicas of their associated real world environment. As in Object Digital Twins, users could alter the state of the Environment Digital Twin, to have a robot take action on the real environment so its state matches its digital twin. Examples of such systems include interfaces that visualize real satellite terrain data as an Environment Digital Twin VDE to test and/or supervise aerial robot systems scouting a wildland forest fires across the associated real expanse of wilderness (Omidshafiei et al., 2015) (see Figure 4-B).

4.2. Virtual Alterations

The second category of VDEs we will examine are Robot Virtual Alterations: graphical elements that allow a robot’s appearance to become a design variable that is fast, easy, and cheap to prototype and manipulate. This category of VDE enables exciting new opportunities for HRI researchers and designers, especially since modifications to robot morphology are traditionally prohibitive due to cost, time, and/or constraints stemming from task or environment. We divide this category into classes involving (1) superficial alterations to robot appearance and (2) morphological alterations that substantially adjust robot form and/or perceived capabilities.

Virtual Alterations – Superficial

Superficial Virtual Alterations use virtual imagery to change the appearance of physical parts of the robot. This change in appearance does not occur by altering the robot’s form or morphology (i.e., adding a virtual arm, making the head invisible, etc.) but instead by changing the appearance of robots’ physical surfaces or the space adjacent to those surfaces. We identify two sub-classes of such elements:

Cosmetic Alterations alter the color, pattern, or texture of the robot’s physical surfaces. The manipulation of robot surfaces enable new interaction patterns. These VDEs are considered cosmetic with respect to the robot’s morphology and can be combined with additional VDEs to provide a function. For instance, changing the color of a robot arm in a manufacturing context might call attention to a malfunctioning actuator, indicate a hot surface temperature, or discourage touching. In an educational setting, superficial alternations might change the texture of a robot arm to look soft or furry to encourage interaction with children. Additionally, as robots increasingly expand to new consumer domains in the near future, designers could use Cosmetic Alterations to make robots more eye catching in public spaces or enable end-user customization of personal robots in private living spaces to match home decor or personal taste (Avalle et al., 2019) (see Figure 3-E).

Special Effect Alterations add virtual imagery around robots’ physical surfaces to change their appearance indirectly. These effects can take various forms such as a glow effect added around a robot’s body, virtual streamers that render behind a robot’s arm as it moves, virtual flames that spray out of a robot’s end effectors, or virtual light sources that indirectly alter the reflective appearance of a robot’s physical surface. Although we did not come across this VDE during our literature survey, VAM-HRI is still a growing field and we envision this VDE holding value for manipulating human-robot interactions (i.e., adding virtual sparkles to a robot to make it more engaging to children in educational settings).

Virtual Alterations – Morphological

Morphological Virtual Alterations connect or overlay virtual imagery on a robot platform to fundamentally alter a robot’s perceived form and/or function by creating new “virtually/physically embodied” cues, where cues that are traditionally generated using physical aspects of the robot are instead generated using indistinguishable virtual imagery. For example, rather than directly modifying a robot platform to include signaling lights as in (Szafir et al., 2015), an AR interface might overlay virtual signaling lights on the robot in an identical manner. Alternatively, virtual imagery might be used to give anthropomorphic or zoomorphic features to robots that don’t have this physical capacity (e.g., adding a virtual body to a single manipulator or a virtual head to an aerial robot). Virtual imagery might also be used to obscure or make more salient various aspects of robot morphology based on user role (e.g., an override switch might be hidden for normal users but visible for a technician). These alterations may also enable new forms of interaction not previously possible for a given morphology, such as enabling functional robots to provide gestural cues (Szafir et al., 2015). We identify three sub-classes of such morphological Virtual Alterations:

Body Extensions add virtual parts to a robot without changing its underlying form such as an aerial robot is still recognizable as a UAV even if a virtual arm is added to its chassis, which would not be the case if virtual imagery were overlaid on the aerial robot to make it look like a floating robotic eye (Walker et al., 2018) (see Figure 3-H). Extensions do not necessarily need to be traditional robot parts and might instead appear as human heads, animal limbs, or even imagined parts like magical wings (Cao et al., 2018; Groechel et al., 2019) (see Figure 3-F).

Body Diminishments visually remove, rather than add, portions of a robot (e.g., grippers, heads, arms, wheels) through diminished reality (DR) techniques (Mori et al., 2017). An important use of this VDE is to resolve teleoperation occlusions that occur when a robot arm blocks the line-of-sight between its camera and the object being manipulated (Taylor et al., ) (see Figure 3-G).

Form Transformations overlay virtual imagery onto real robots to change the robot’s underlying form and/or make it appear as something other than a robot entirely. Similar to Robot Diminishments, this VDE can change the form of the robot to make it more or less appealing to a targeted user-groups or utilize new communication methods, depending on the designer’s intentions. These form alterations need not be limited to that of new mechanical forms, but can include any form such as that of a human, animal, or fictional character, all of varying degrees of realism (Walker et al., 2018; Zhang et al., 2019) (see Figures 3-H).

Figure 3. A: Virtual Control Room, Visualization Robot, and Internal Readings (Kot and Novák, 2014), B: Cyber-Physical Interface, Visualization Robot, Spatial Previews, External Sensor 3D Data, and Trajectories (Rosen et al., 2020), C: Robot Digital Twin, Headings, Waypoints, and Trajectories (Walker et al., 2019), D: Simulated Robots, Simulated Agents, and Simulated Environment (Meyer zu Borgsen et al., 2018), E: Cosmetic Alterations and Internal Readiness (Avalle et al., 2019), F: Body Extensions and Callouts (Groechel et al., 2019), G: Body Diminishment (Taylor et al., ), H: Form Transformation and Heading (Walker et al., 2018).

4.3. Robot Status Visualizations

The third category of VDEs we present are Robot Status Visualizations: a set of elements focused on enabling designers to rapidly and easily assess the current state of a robot. We divide this category into classes focused on (1) internal and (2) external robot status.

Robot Status Visualizations – Internal

Internal Robot Status Visualization VDEs convey internal sensor readings and/or the operational status of robot sensors and actuators. We identify two sub-classes of such elements:

Internal Reading VDEs display data returned from internal sensors (e.g., battery levels, robot temperatures, wheel speeds). Making this information more readily available to users may enhance situational awareness and prevent mishaps such as robots running out of battery in the middle of mission-critical tasks or traveling too fast through environmental hazards (Kot and Novák, 2014) (see Figure 3-A).

Internal Readiness VDEs display data about sensors and actuators, such as whether a sensor is ready to collect data or whether an actuator is ready to function, and if not, why (e.g., whether a sensor is disconnected, an actuator is experiencing a fault, etc.). Such information may improve debugging and help prevent robots from operating as black-boxes, with users left wondering why a robot is not functioning as expected (Avalle et al., 2019) (see Figure 3-E).

Robot Status Visualizations – External

External Robot Status Visualization VDEs communicate the robot’s external state (as the robot perceives it) by providing information regarding its current pose and location. We identify two sub-classes of such elements:

Robot Pose VDEs convey a robot’s knowledge of its own pose (i.e., configuration and orientation). These VDEs can take different forms such as a textual display of numerical joint angles, a 3D model of a State Visualization Virtual Robot, or a rendering of a virtual axis anchored to a robot’s joints (Kot and Novák, 2014; Nawab et al., 2007) (see Figure 4-A).

Robot Location VDEs convey where a robot is in the environment, e.g., as an occluded robot’s outline with a Spatial Visualization Virtual Robot or a virtual indicator on the other side of a wall or door, as a top-down radar-like display showing user and robot locations, or as an off-screen indicator to direct a user’s attention to a robot outside the current field-of-view (Chandan et al., 2019; Walker et al., 2018) (see Figure 5-D).

4.4. Robot Comprehension Visualizations

The fourth and final category of VDEs we examine are Robot Comprehension Visualizations: visualizations that convey what a robot believes about its environment, and its current or planned task. VAM-HRI VDEs present a powerful medium for conveying this information, as VDEs can be directly overlaid on a robot’s environment. For example, a visual trajectory spline rendered on a floor can wrap around a wall, indicating not only that the robot sees the wall, but also that the robot’s planned actions will avoid the wall.

Robot Comprehension Visualizations – Environment

Environment-based Robot Comprehension Visualization VDEs communicate to the user what the robot believes about its environment, including where environment information is being collected from, what environment information has been collected, and/or what a robot has inferred from such information. We identify seven sub-classes of such elements:

External Sensor Purviews are environment-anchored visualizations that show users where a robot’s external sensors (LiDAR, cameras, etc.) are collecting data and/or how and where those sensors are positioned (Hedayati et al., 2018; Kobayashi et al., 2007) (see Figure 4-B).

External Sensor Numerical Readings convey numerical data returned from a robot’s external sensors. These readings can be user– or environment–anchored and can be shown explicitly with digits or as more abstract visualizations such as progress bars, virtual thermometers, or virtual weight scales (Lipton et al., 2017; Cao et al., 2018) (see Figure 4-C). Note that the external sensors need not physically attached to a robot, such as data from motion capture cameras regarding the distance between a robot and a nearby object.

External Sensor Images & Videos allow the user to see remote environments or to see from a robot’s perspective. Images and videos can be presented as either user- or environment-anchored visualizations from cameras on the robot or in the robot’s environment. When stereo cameras are paired with a stereo interface (e.g., an HMD, 3D monitor, or CAVE), users can see the images and videos with depth, granting enhanced immersion and teleoperation capability (Lipton et al., 2017; Kot and Novák, 2014; Li et al., 2019) (see Figure 4-D).

External Sensor 3D Data convey depth information to recreate remote robots’ environments. These reconstructions can take various forms (e.g., point clouds, voxel maps, 3D meshes, etc.) and aim to present sensed depth data in a manner that allows users to perceive remote environments as if they were there in-person. Unlike Images and Video VDEs, the 3D reconstructions utilized by External Sensor 3D Data VDEs enable immersive and free exploration of a remote robot’s environment, without being restricted to the robot’s perspective (Sun et al., 2020; Rosen et al., 2020) (see Figure 3-B). As in the case of the previous two VDEs, the external sensors do not need to be physically attached to the robot and this VDE includes 3D data collected from user-worn HMDs, such as the spatial map generated from a HoloLens.

Sensed Spatial Regions, the first of three region-based VDEs, visualize regions that a robot has identified within its its environment. These visualized regions are produced by a robot through the analysis of sensed environment data, such as exploration frontiers or traversable vs. non-traversable areas depicted through an occupancy grid (unlike 3D reconstructions that do not perform any logical analysis and categorization of environmental regions). Regions can be annotated with information (i.e., estimated information gain if the area were to be explored) (Reardon et al., 2019) (see Figure 4-E).

Robot Inherent Spatial Regions are not informed by data sensed from the environment, but are instead inherent to the robot based on its form or operational mode. For example, these may depict the regions a robot can physically reach or show areas a robot may operate best in, such as the optimal area in which to perform an object handover (Frank et al., 2017) (see Figure 4-F).

User-Defined Spatial Regions are regions defined by user input, rather than by the robot’s sensors or physical constraints (e.g., a user drawing a bounding box on the floor of a robot’s environment with a tablet display). These regions have many potential uses, but are most commonly used to define areas a robot should not enter, in the form of virtual boundaries (Sprute et al., 2019a) (see Figure 4-G).

Robot Comprehension Visualizations – Entity

Entity-based Robot Comprehension Visualization VDEs convey what a robot knows or believes about an entity (i.e., an object, human, another robot, etc.), such as where an entity is, what an entity is, and attributes an entity holds. We identify four sub-classes of such elements:

Entity Labels act as identifiers for entities known by a robotic system. These visualizations enable users to easily reference entities in spoken commands without the need for referring expressions such as enabling users to instruct robots through commands such as “pick up cube B” (Bolano et al., 2019; Sibirtseva et al., 2018) (see Figure 4-H). Additionally, these VDEs allow robots to label points of importance in the environment such as a room’s primary access point (Chandan et al., 2019) (see Figure 5-D)

Figure 4. A: Robot Poses (Nawab et al., 2007), B: External Sensor Purviews and Environmental Digital Twin (Omidshafiei et al., 2015), C: Virtual Control Room and External Sensor Numerical Readings (Lipton et al., 2017), D: External Sensor Images & Videos and Virtual Panels & Buttons (Li et al., 2019), E: Sensed Spatial Regions (Reardon et al., 2019), F: Robot Inherent Spatial Regions (Frank et al., 2017), G: User-Defined Spatial Regions (Sprute et al., 2019a), H: Entity Labels, Entity Locations, and Task Status (Bolano et al., 2019).

Entity Attributes convey information a robot knows about an entity’s characteristics, such as whether an entity is heavy, delicate, or dangerous; information known about the entity’s affordances; the current state of an entity (e.g., an entity that’s too hot, in a dangerous location, still drying, sleeping, charging its battery, etc.); or an entity’s geometry and shape (e.g., optimal grasp points or surface normals) (Chan et al., 2018) (see Figure 5-A).

Entity Locations highlight the locations of entities within the robot’s environment through rings, arrows, bounding boxes, etc. This VDE is especially useful when the location of an entity is occluded by walls or containers or outside of the user’s field-of-view (Dima et al., 2020). These can also be used to highlight task- and dialogue-relevant entities, either by allowing robots to passively highlight entities that are of interest to the current task or the subject of the robot’s current attention, or to actively call interlocutors’ attention to entities in the same way that humans typically would through deictic gaze and deictic gesture (Williams et al., 2019a; Sibirtseva et al., 2018; Quintero et al., 2015; Bolano et al., 2019) (see Figures 4-H).

Entity Appearances show what an entity looks like, unlike Entity Location VDEs that communicate the location of an entity. These VDEs are primarily used when an entity is occluded from view and draws analogies to “X-ray vision” by showing users the appearance of real-life entities that are partially or fully visually hidden within the users’ environment (e.g., inside a box, behind a robot arm, or on the other side of a wall or door). This interface feature may be particularly useful in environments, such as warehouses, where objects are stored in sealed containers with contents only knowable through data representations that are exclusively computer-readable (e.g., barcodes) (Ganesan et al., 2018) (see Figure 5-B).

Robot Comprehension Visualizations – Task

Task-based Robot Comprehension Visualization VDEs display what a robot understands about its current or planned task, including where to move, how to move, what objects to act upon, and how to act on those objects. These VDEs can also convey information regrading general task understanding, such as task status and outcomes. We identify nine sub-classes of such elements:

Headings, the first type of VDE in this class, do not show the actual path the robot or its manipulators will take, but simply the direction they are currently traveling in or will be traveling next. These visualizations commonly take the form of arrows pointing in the direction of planned movement; however, they can also take more unique forms (i.e., the utilization of a Form Transformation VDE to provide an eyeless robot with virtual eyes that designers can use to provide gaze cues that communicate future movement intentions (Walker et al., 2018) (see Figure 3-H). Headings may be useful for autonomous robots in crowded, shared spaces with human pedestrians or dynamic obstacles, in which navigational plans need to be recalculated by the robot quickly and frequently. Researchers have also shown how headings can be particularly effective when displayed using projectors, enabling all bystanders to see the intended movements of robots without observers needing to each wear or use specialized hardware (Shrestha et al., 2018) (see Figure 3-C).

Waypoints are environmentally-anchored visualizations of intermediate navigation points. These are typically used to visualize robot intentions but can also be used by robots to suggest spatial destinations indicating where users should move. Waypoints provide another method of previewing robot motion and can either be automatically placed in an environment by a robot trajectory planner or manually placed in the environment by a user. Waypoints are also often combined with other VDEs to show additional information known about each waypoint or what will be performed at each waypoint (Chan et al., 2018; Walker et al., 2018, 2019) (see Figure 3-C).

Callouts are visualizations that communicate where a user should focus their attention. These VDEs use visualizations to attract attention to an object or location, such using a virtual arrow to show where a robot heard a sound or pointing at an object a user should look at (Groechel et al., 2019) (see Figure 3-F).

Spatial Previews use environment-anchored visualizations to show future poses of robots, objects, and other environmental entities. These VDEs can explicitly communicate the expected future position and/or orientation of an entity during a task. A common use for these previews is to depict where a robot will move or where a robot will move an object during a manipulation task. However, robots can also use these VDEs to make requests to users by indicating where a user should place an object. These VDEs can be depicted in various ways, including 2D circles on the ground, complex 3D wireframe or shaded models, combining waypoints with flags indicating orientation at those waypoints, or with one or more Spatial Visualization Virtual Robots (Rosen et al., 2020; Frank et al., 2017) (see Figures 3-B & 5-H).

Trajectories display spatial paths that a robot intends to follow or that it believes an object or agent will follow. These environment-anchored visualizations can show both robot navigation paths and manipulator paths. Trajectories can be visualized in various ways such as lines, splines, or dense stroboscopic Future Robot Pose VDEs in the form of State Visualization Virtual Robots. For example, one common implementation of Trajectory VDEs consists of rendering a trajectory for each wheel on a ground robot, which helps users to anticipate whether or not a wheel will collide with or fall into a terrain hazard. Trajectories can be used to not only indicate the path a robot will take in the future, but also the path a robot (or human) has taken in the past. Trajectory VDEs are typically only used to enhance a user’s view into a robot’s internal model but can also be used to provide new opportunities for control over the robot (e.g., a user directly manipulating the trajectory visualization by grabbing the trajectory line or spline and moving it with their hands). Finally, trajectories can encode data along their paths, such as robot velocities (Walker et al., 2018; Rosen et al., 2020; Leutert et al., 2013; Walker et al., 2019) (see Figure 3-C).

Alteration Previews show the intended permanent modifications a robot plans to make on an object. These rendered previews give the user a chance to verify if the robot’s plan matches that of the user’s task goal(s) prior to execution and cancel or modify them if needed. These modifications typically show how an object will appear during or after the modifications take place (e.g., displaying the proposed path of an etching tool on an object’s surface, where holes will be drilled, how a wall will look after being painted, how a steel bar will look after being bent, etc.). Additionally, these visualizations may communicate what actions will be applied to an object, such as varying pressures applied along the surface on object. This type of VDE is especially useful in circumstances where robot errors arising from command misinterpretation mean the object being acted upon will no longer be usable, potentially wasting hours or days of time and resources to replace the incorrectly modified object (Chan et al., 2018; Leutert et al., 2013) (see Figure 5-A).

Command Options present to users what actions a robot can (or cannot) take in a given state. This may take the form of a robot displaying virtual imagery that indicates what object(s) it can currently pick and/or potential grasp points the robot can utilize (Quintero et al., 2015). In addition to showing user what actions a robot can take, these VDEs also allow a robot to inform users that it is incapable of performing an action it believes the user wishes or might wish it to perform, sometimes with an explanation as to why they are not possible. These VDEs may reduce user frustrations with robotic systems by avoiding situations where a robot silently fails to execute commands and improve user efficiency by aiding users in preemptively realizing that a task will not be performed correctly (or at all) (Arévalo Arboleda et al., 2020) (see Figure 5-C).

Task Status convey beliefs regarding the status of a task currently or previously executed. These VDEs may be represented as traditional textual or numerical visualizations, or as more abstract visual representations, such as progress bars. These visualizations can facilitate human-robot collaborative task planning by helping users quickly and easily understand the current task state a robot is executing, improve debugging by enabling users to compare the state a robot thinks it is in with its actual state, or how much longer a task will take to complete (Bolano et al., 2019; Walker et al., 2018; Ganesan et al., 2018) (see Figure 4-H). Additionally, these VDEs can also convey the status of a concluded task, whether it has resulted in success, failure, or error. These visualizations help users understand what a robot believes the outcome of a task to be (even if incorrect, which may aid in robot debugging). These visualizations can inform a user as to why a task resulted in failure or error, which can often be a mystery to users who would otherwise need to consult complex error logs (Ganesan et al., 2018; De Pace et al., 2018).

Task Instructions enable humans and robots to effectively instruct and guide each other by communicating next steps in collaborative tasks. These instructions can take various forms such as explicit instructions written in text or more abstract instructions that inform a user what to do to accomplish a task such as using an Robot Extension Morphological Alteration VDE to add virtual arms to a robot that point at an object for a user to interact with (Groechel et al., 2019; Ganesan et al., 2018) (see Figures 5-E).

Figure 5. A: Entity Attributes and Alteration Previews (Chan et al., 2018), B: Entity Appearances (Ganesan et al., 2018), C: Command Options (Quintero et al., 2015), D: Visualization Robots, Robot Locations, and Entity Labels (Chandan et al., 2019), E: Task Instructions (Ganesan et al., 2018), F: Virtual Controllers (Krupke et al., 2016), G: Simulated Objects (Borrero and Márquez, 2012), H: Spatial Previews and Object Digital Twins (Frank et al., 2017).

Robot Functionalities and Domains Supported by VAM-HRI

Now that we have described the space of Virtual Design Elements used in VAM-HRI applications, we are ready to examine the ways in which these different VDEs have been used to enhance certain robot functionalities as well as the application domains in which those VAM-enhanced robotic functionalities have proven most useful.

VAM-VDEs have been leveraged to enhance a number of fundamental robotic needs. We divide these robotic functionalities into the following categories: navigation, object manipulation, prototyping, human training, robot training, debugging, swarm supervision, and social interaction. In this section we will describe the way in which VAM VDEs have been used for each of these purposes.

4.4.1. Robot Navigation

For robotic applications that do not involve stationary robots, indoor and outdoor environments of varying scale must be safely and efficiently navigated. Our survey of the literature identified 51 papers in which AR and VR were used for this purpose. Kästner et al. (Kästner and Lambrecht, 2019), for example, presents an approach in which trajectories are used to visualize a robot’s intended navigation path within human teammates’ HMDs. Moreover, Kästner et al. allow users to re-specify the robot’s destination through manipulation of a virtual arrow used to denote its destination. Similarly, Stotko et al. (Stotko et al., 2019) present a VR-based system for remote robotic teleoperation, in which robotic sensor data is visualized directly within the user’s HMD to allow them to explore the robot’s environment through that robot’s perspective with high levels of immersion.

4.4.2. Robot Manipulation

While not necessary in some social domains, most task-oriented robotics applications require robots to physically manipulate objects in their environment, e.g. in assembly or sorting tasks. Our survey of the literature identified 105 papers in which AR or VR were used for this purpose. Many of these papers used AR in order to help human teleoperate robotic manipulation more accurately and safely. Krupke et al. (Krupke et al., 2018), for example, present a system that superimposes a virtual Robot Digital Twin over a real robot in augmented reality. Users of this system can then control the real robot to perform a pick and place task through manipulation of the Robot Digital Twin and Object Digital Twins shown in their HMD. After each user command, the virtual robot simulates performance of the commanded task. If the user is satisfied, they can then trigger the real robot to perform that command. While in this use case, the robot operator is present alongside the robot, VAM techniques can also be used to help users to remotely control robot manipulation. Naceri et al. (Naceri et al., 2019), for example, present a VR interface for real-time robot teleoperation. In this interface, the remote environment is visualized using streaming External Sensor Images and Videos and External Sensor 3D Data, to make robot teleoperation more effective.

4.4.3. Robot Training

In many circumstances, robot end-users are tasked with teaching and/or programming a more general-purpose robot to perform a specific job, such as how to sort dishes, open and pour a bottle, or build furniture. More traditional robot training methods, such as learning from demonstration (Argall et al., 2009), can be especially challenging tasks when minimal feedback from the robot is provided to the user regarding how well the robot is learning, why the robot is not learning, or how to provide teaching inputs to the robot. VR and AR can be used in these contexts to visualize task-relevant objects and obstacles as virtual objects. Our survey of the literature identified 31 papers that used VR or AR for such purposes. Sprute et al. (Sprute et al., 2019b), for example, present an AR system for teaching robots the extent of its operating environment though User-Defined Spatial Regions that serve as virtual walls. The define virtual borders are defined with an AR tablet interface, which the robot uses to subsequently generate its navigation plans.

4.4.4. Human Training

Alternatively, in some domains, humans need to be trained to operate robots within a safe and constrained environment. Additionally, robots may need to share task-relevant information with their human teammates, such as instructions on how to complete a human-robot shared task.Our survey of the literature identified 35 papers in which AR and VR were used for this purpose. In these cases, VR and AR are often used to develop personalized, low-cost training environments. Many of these use cases are in the context of training users to use vehicles (autonomous or otherwise) (Ropelato et al., 2018; Arppe et al., 2019) or in the context of training humans to train robots (Gadre et al., 2019). Driving simulations provide learners a safer place to improve their skill without worrying about causing disturbances to others.

4.4.5. Robot Debugging

Similarly, even outside of training procedures, robot programmers as well as end users often need to determine on the fly why a robot is acting in a particular way, especially when unexpected behavior is displayed. Current robots typically require these users to parse detailed error logs to answer fairly simple questions, such as why a robot’s end effector stopped moving. This sort of question can have multiple answers, ranging from motor faults to payloads that exceed maximum weights. Our survey of the literature identified five papers in which AR and VR were used to enhance robot debugging. VAM techniques can be added to robotic systems to aid in quickly answering common debugging questions. For example, robot debugging can be enhanced by rendering virtual imagery to localize and efficiently explain robot faults. VAM-HRI systems have been designed that visualize robotic faults within user HMDs through the use of Virtual Cosmetic Alterations on the robot that highlight the robot parts that are currently experiencing faults with paired visualizations that provide at-a-glance information about fault type (e.g., sensor fault, servo fault, end effector overloading, etc.) (Avalle et al., 2019; De Pace et al., 2018).

It is important to note that HRI is not restricted to interactions between robots and end-users, but between robots and developers as well. When going through the process of creating a new robot or robot algorithm, robot designers go through extensive cycles of debugging. To iterate and improve a robotic system these users must understand why a robot/algorithm is not performing as expected. During this testing phase VAM-HRI techniques can allow robot designers to more easily see what a robot is thinking through virtual imagery, such as if a robot detects an obstacle while testing an autonomous navigation algorithms (Kobayashi et al., 2007).

4.4.6. Robot Prototyping

Similarly, while many of the previous use cases have focused on on-line robot tasking, VAM can also be used in the initial design of robots before they are ready for deployment. When VAM is used for robot prototyping, virtual imagery is used to preview robot designs and/or functionality. This virtual imagery can be used either to represent a completely virtual robot or add virtual parts to a robot - all without using physical robotic hardware. VAM robot prototyping saves on both the monetary costs of robotic hardware as well as hours of labor that would otherwise be needed to install or program robot parts during the design process. Our survey of the literature identified 10 papers in which AR or VR were used for this purpose. Cao et al. (Cao et al., 2018), for example, introduce a mixed reality robot prototyping system for people building DIY robots that allows users to virtually assemble and construct robots in AR with Visualization Robots and Body Extensions. Using this system, hobbyists can test their designs in mixed reality before executing those designs in real life with Simulated Robots and Simulated Objects.

4.4.7. Social Interactions

In the domain of social robotics, robot developers use a variety of design strategies to manipulate users’ perceptions of robots as being more trustworthy, engaging, and/or approachable. Our survey of the literature identified 17 papers in which AR or VR were used for this purpose. Many of these approaches have operated by altering robot appearance, or by enhancing robots’ communicative capabilities to allow communication that would otherwise have been impossible given their inherent morphologies. For example, Zhang et al. (Zhang et al., 2019) present a system to enhance human perceptions of interaction proxemics with a mixed reality robotic avatar. In this case, the physical robot is non-humanoid, but a Form Transformation VDE is utilized by overlaying a 3D AR avatar of a human above the real robot that mimics a human’s gaze and body motions while moving. Through arm swinging frequency, this visualization allows the robot to effectively communicate its moving speed to nearby humans and improve subjective perceptions about the robot.

4.4.8. Swarm Supervision

Finally, while many of the approaches above have focused on single robots, management of multi-robot systems is also a major challenge for robot designers and users. As more robots are added to a robotic system, the system becomes increasingly difficult to supervise and/or control. The number of robots that can be operated simultaneously is called the fan-out of a human-robot team, with robots that have high neglect tolerance and lower interaction time achieving higher fan-out (Olsen Jr and Wood, 2004). Our survey of the literature identified three papers in which AR or VR were used for this purpose. In these papers, VAM-HRI researchers have investigated how VAM technologies can increase the fan-out of robotic systems and decrease the mental load of robot operators, for example by rendering virtual imagery to display the location and status of many robots. For example, Ghiringhelli et al. (Ghiringhelli et al., 2014) present an AR interface for swarm supervision, in which the supervisor is presented with Robot Status Visualizations VDEs (Robot Location, Robot Pose, Waypoints, Headings, and Trajectories) overlaid over each robot.

Figure 6. Application Areas

4.5. Robot Domains

While in the previous section we explored the different core robotic functionalities that VAM-HRI technologies are being used to augment, in this section we turn our attention to the high-level application domains in which such solutions are being designed and deployed. Our coverage of these domains is guided and organized according to the set of application domains delineated by Bartneck et al. in their recent textbook (Bartneck et al., 2020): customer service robots, robots for learning, robots for entertainment, robots for healthcare and therapy, service robot, collaborative robots, self-driving cars, and remotely operated robots.

4.5.1. Collaborative Robots

Collaborative robots, unlike traditional industrial robots, have safety features and human-friendly designs which allow human to work closely with them in manufacturing, shop floor, or maintenance contexts. Mixed and virtual reality can provide otherwise unobservable information about the robots working in these domains, such as visualization of the areas in which such robots work, the areas reachable by such robots, and the regions such robots are likely to move in the future. Making all of this information available to human teammates can make collaborative robots safer and more effective for those teammates to work with. Our survey identified 34 papers in which VR and AR are being used in these domains.

Matsas et al. (Matsas et al., 2018), for example, prototype techniques in virtual reality to improve safety in human-robot collaborative manufacturing. Specifically, in this system, a red-bordered circle is overlaid on human teammates’ view of their environment through a virtual reality interface, to show the limits of the robot’s workspace, and a yellow wedge is drawn to show the movement range of the robot’s arms. If humans enter these spaces, this wedge turns red, and a warning is displayed.

4.5.2. Customer Service Robots

Customer service robots have been used as tour guides (e.g. providing information to visitors about points of interest, and taking visitors to requested location), receptionists (e.g. providing check-in processing for hotel guests), and sales promoters (e.g. providing store promotion information to customers). VAM technologies have not yet been applied to this domain widely: our survey identified a single paper within this domain. Specifically, Pereira et al. (Pereira et al., 2017) demonstrated the use of an AR dialog interface that provides a compelling visual interface for customers to interact with robots in restaurant contexts.

4.5.3. Robots for Learning

Robots for learning are those operating in classroom environments, as teachers, tutors, peers, or teaching assistants – or as the objects of study themselves – to help make lessons more effective and engaging. With augmented reality, new knowledge can be overlaid on physical robots so students can see this information in a spatially situated manner. Alternatively, learning content can be projected onto a surface on which students and peer robots can engage with that content together.

Our survey of prior literature identified 12 papers in this area. Johal et al. (Johal et al., 2019), for example, present an educational robot that is used to teach optical concepts regarding the visible and infrared (IR) light spectrum to K-12 students during Physics classes. Johal et al. use AR in this context to visualize information about the robot’s IR sensors, such as the direction of IR emitters, cone range, and intensity of the IR signal, all of which are visualized to students through Android tablets.

4.5.4. Robots for Entertainment

Robots have also been used for entertainment, as pets, toys, exhibitions, or in the performing arts. VAM hasn’t been applied widely in entertainment robotics: we identified only one paper in this domain. The robot in this paper (Urbani et al., 2018) is not a traditional entertainment robot of the sort mentioned above but is instead a multipurpose wearable (wrist-worn) robot that has many different functionalities invoked by AR system. It consists of six interlinked servomotors fastened together using plastic brackets. This robot follows users’ commands to change speeds, turn to specific angles, or set torque limits, and can wrap around or stand straight up from a user’s wrist. Through an AR headset, a window appears on top of the robot, from which the users can see the robot status display, shape-changing menus, a media player on which they can watch videos, and a robot pose controller.

4.5.5. Robots for Healthcare and Therapy

Robots are widely applied to healthcare and therapy. They offer support for senior citizens, by helping to detect adverse medical events or by providing enhanced mobility. They are also used in the context of therapy, e.g. for people with autism spectrum disorder, people undergoing rehabilitation, and people undergoing surgery (e.g. by providing assistance in laparoscopic surgery). To support robotic surgery, AR helps to highlight anatomical structures, overlay surgery plan, robot and instrument status on the main visual source (e.g. patient body, monitor). AR can also be used to visualize the planned trajectory of the surgical robot’s needle so the surgeon can check if it’s valid.

Our survey identified 71 papers in which AR and VR are used with robots in such domains. AR has been used in this domain primarily to support robotic-assisted surgery (RAS). Qian et al. (Qian et al., 2019), for example, present an AR system called ARssist to aid the first assistant in robotic-assisted laparoscopic surgeries. This system visualizes the robotic instruments and endoscope inside the patient body using the assistant’s HMD. This approach has the potential to improve efficiency, navigation consistency and safety for instrument insertion. VAM environments have also been used in the context of therapeutic and assistive robots such as robotic wheelchairs. Zolotas et al. (Zolotas and Demiris, 2019), for example, present an augmented reality system that can help users control their wheelchairs safely and independently. Users of this system observe a mini-map utility that shows the wheelchair’s future trajectory, potential obstacles, and potentials for collision.

4.5.6. Service Robots

Service robots (as opposed to the previously discussed customer service robots) perform simple and repetitive tasks in service of humans, such as house-cleaning, delivery, or security operations, as well as other dull or dangerous tasks such as space exploration and emergency response. Many service robots work remotely by themselves. In these cases, the remote environment is displayed to the operator using 3D rendering from point cloud data or video streams. If humans and service robots are co-located, such as in certain types of search and rescue tasks, robot state information such as location, battery, condition, trajectory, and so forth, can be shown within their teammates’ HMDs. Our survey of the literature found thirteen papers in this domain.

Our survey identified 15 papers in which AR and VR are used with robots in such domains. Martín et al., for example, (San Martín and Kildal, 2019) present a multimodal AR system that allows robots to warn human teammates more effectively about hazards during navigation through unfamiliar spaces, by way of hazard area visualizations in teammates’ HoloLens interfaces. Service robots can also be helpful in facilitating repetitive tasks in field domains such as agriculture. Huuskonen et al.(Huuskonen and Oksanen, 2019), for example, provide an AR interface that allows farmers to simultaneously monitor multiple autonomous tractors.

4.5.7. Self-Driving Cars

Self-driving cars (and other autonomous vehicles) are robots that can automatically navigate between locations in large-scale human environments, especially those that can operate on semi-structured human transportation infrastructure such as roads and highways. One of the main proposed benefits of self-driving cars is to allow autonomous driving when humans are too fatigued to safely operate their vehicles. However, current self-driving cars have a number of limitations that require humans to be able to quickly take over and intervene. Since on-road autonomous car training is dangerous and expensive, VAM technology has been applied to design autonomous car training program. During VR training, the simulators are shown in HMDs to show users information about the virtual car such as speed limit, distance traveled, and current speed. In AR training programs, the user is trained in a real car on a designated road while wearing a see-through AR HMD that can display introductory videos, car’s state information, and instruction to the user. Our survey revealed two papers in this domain. For example, Sportillo et al. (Sportillo et al., 2018) present a virtual reality training program for autonomous vehicle operators, that can help train such operators to improve their ability to quickly retake vehicle control when necessary.

4.5.8. Remotely Operated Robots

Remotely operated robots are robots that are controlled by humans from different places (a use case that overlaps with some of the other application domains described above). In this scenario, human operators usually receive the visualization of the remote environment in point cloud data rendering or video stream. Via VAM interface, operators can see robot status, future trajectory, heading, grasping point which are overlaid on the environment visualization.

Our survey identified 39 papers in this domain. For example, Zollmann et al. (Zollmann et al., 2014) present an AR interface for piloting unmanned aerial vehicles (UAV). In this paper, information about the UAV is displayed in the user’s AR HMD: a virtual sphere acts as waypoint for the UAV, a virtual shadow is displayed on the ground, and a line is drawn between the waypoint and the shadow to show how high the UAV is from the ground. Many virtual spheres are connected to be UAV’s trajectory, which help the user to supervise the UAV’s path and intervene if they detect any potential collision. On the other hand, Kent et al. (Kent et al., 2017) design an AR interface to help users teleoperate robots in an object manipulation task. In the AR interface, when the remote operator clicks on an object, a semi-transparent sphere appears overlaying the selected object with several blue grasping points. After the operator chooses one of the grasp points, a 3D model of robot’s hand/end-effector appears for user choosing the grasping angle. After confirming the grasping pose, the real robot arm executes the grasp.

Finally, Gharaybeh et al. (Chizeck and others, 2019) present a MR system for teleoperating robot arms to defuse hazardous undetonated underwater munitions; an otherwise very dangerous task for human divers. After submerging a robot arm, its teleoperator can use visualizations of the arm’s LiDAR point cloud sensor data to see the ocean floor, the undetonated munition, and other helpful visual aids, including a 3D model of the robot arm. This 3D information brings depth perception to the operators, enhancing control of the remote robot arm to defuse munitions more safely and easily.

Taxonomy Takeaways and the Future of VAM-HRI

As seen in Figure 6, the majority of VAM-HRI papers surveyed fell into non-specified application areas, where the VAM-HRI research aimed to improve robot interactions in a more general sense by enhancing common robot functionalities (i.e., navigation, manipulation, etc.) that are required across a wide range of domains.

However, one of the most common application areas in recent VAM-HRI research is that of collaborative industrial robots. This trend is not unexpected to see as traditional industrial robots are being replaced with this new generation of robots that are more easily deployable and are more easily able to be programmed and set up in-house. In the past traditional robots were housed behind fences due to their large size and ability to easily harm nearby humans, but collaborative robots are breaking down this physical barrier and working alongside their human counterparts. With humans now directly working with robots in industrial settings, more effort has been made to improve interactions with such robots, including the use of VAM-HRI interfaces to assist in communicating between human and robot during collaborative tasks.

As a second step in this analysis, we consider VDE use trends. While task comprehension (including sub-classes such as waypoints, trajectories, and task instructions) have historically been the most well-explored variety of VAM-HRI VDEs, other classes, such as environment comprehension VDEs, are now attracting increased attention. In contrast, object comprehension and virtual robot alteration VDEs are the newest variants to be explored, first introduced in 2004 and 2006 respectively. These VDEs are still relatively unexplored and may be fruitful avenues for further research.

Overall, in this paper we have explored a wide range of research that demonstrates that VAM-HRI is an active and rapidly growing research area. We believe, however, that this field is currently hindered by a lack of precise terminology and theoretical models that explain how work in the field may connect and build off each other. The taxonomy we have presented for identifying, grouping, and classifying key design elements across VAM-HRI systems helps to address this issue and highlights potential design elements that have yet to appear in the research literature, which may serve as fertile ground for future research. For example, in the creation of this taxonomy it was realized that Special Effect Alteration VDEs had yet to be explored in scientific literature; however, when viewing the VAM-HRI work done to date from the high-level view of the VDE Table, this hole in the research landscape is made clear. It is our hope that other researchers are able to use the table in a similar fashion to guide their own efforts, and that our work will help VAM-HRI grow into a mainstream field by providing researchers in the community with the necessary lexicon for easily understanding, describing, and referencing the designs in their own systems with other relevant work being performed, while also helping researchers reason about what areas require further exploration or represent entirely novel areas of inquiry.

Additionally, the VAM VDE Taxonomy can be used as a catalogue and/or cookbook for robot designers interested in enhancing their robots and their interactions through VAM technology. We envision developers being able see all the VDEs available, including their hierarchical categories and classes, when deploying VAM-HRI systems and being able to pick and choose the VDE(s) that best addresses their VAM-HRI design’s challenges and purpose.

Finally, it is important to note that as research and technology in this emerging field becomes more mature, the taxonomy presented in this article is destined to change and grow as time goes on. This taxonomy thus serves as a jumping off point for organizing and inspiring the future work performed within this field.


  • J. Allspaw, J. Roche, N. Lemiesz, M. Yannuzzi, and H. A. Yanco (2018) Remotely teleoperating a humanoid robot to perform fine motor tasks with virtual reality–. In Proceedings of the 1st International Workshop on Virtual, Augmented, and Mixed Reality for HRI (VAM-HRI), Cited by: item.
  • S. Arévalo Arboleda, T. Dierks, F. Rücker, and J. Gerken (2020) There’s more than meets the eye: enhancing robot control through augmented visual cues. In Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction, pp. 104–106. Cited by: §4.
  • B. D. Argall, S. Chernova, M. Veloso, and B. Browning (2009) A survey of robot learning from demonstration. Robotics and autonomous systems 57 (5), pp. 469–483. Cited by: §4, §4.4.3.
  • D. F. Arppe, L. Zaman, R. W. Pazzi, and K. El-Khatib (2019) UniNet: a mixed reality driving simulator. Cited by: §4.4.4.
  • G. Avalle, F. De Pace, C. Fornaro, F. Manuri, and A. Sanna (2019) An augmented reality system to support fault visualization in industrial robotic tasks. IEEE Access 7, pp. 132343–132359. Cited by: §2, Figure 3, §4, §4, §4, §4.4.5.
  • C. Bartneck, T. Belpaeme, F. Eyssel, T. Kanda, M. Keijsers, and S. Šabanović (2020) Human-robot interaction: an introduction. Cambridge University Press. Cited by: §4.5.
  • A. K. Bejczy, W. S. Kim, and S. C. Venema (1990) The phantom robot: predictive displays for teleoperation with time delay. In Proceedings., IEEE International Conference on Robotics and Automation, pp. 546–551. Cited by: §2.
  • G. Bolano, C. Juelg, A. Roennau, and R. Dillmann (2019) Transparent robot behavior using augmented reality in close human-robot interaction. In 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), pp. 1–7. Cited by: §3.3, Figure 4, §4, §4, §4.
  • A. M. Borrero and J. A. Márquez (2012) A pilot study of the effectiveness of augmented reality to enhance the use of remote labs in electrical engineering education. Journal of science education and technology 21 (5), pp. 540–557. Cited by: §3.3, Figure 5, §4.
  • J. Bosch, P. Ridao, R. Garcia, and N. Gracias (2016) Towards omnidirectional immersion for rov teleoperation. Proceedings of Jornadas de Automática, Madrid, Spain. Cited by: §2.
  • G. C. Burdea (1996) Virtual reality and robotics in medicine. In Proceedings 5th IEEE International Workshop on Robot and Human Communication. RO-MAN’96 TSUKUBA, pp. 16–25. Cited by: §2.
  • Y. Cao, Z. Xu, T. Glenn, K. Huo, and K. Ramani (2018) Ani-bot: a modular robotics system supporting creation, tweaking, and usage with mixed-reality interactions. In Proceedings of the Twelfth International Conference on Tangible, Embedded, and Embodied Interaction, pp. 419–428. Cited by: §2, §2, §4, §4, §4.4.6.
  • W. P. Chan, C. P. Quintero, M. K. Pan, M. Sakr, H. M. Van der Loos, and E. Croft (2018) A multimodal system using augmented reality, gestures, and tactile feedback for robot trajectory programming and execution. In Proceedings of the ICRA Workshop on Robotics in Virtual Reality, Brisbane, Australia, pp. 21–25. Cited by: §3, Figure 5, §4, §4, §4.
  • K. Chandan, V. Kudalkar, X. Li, and S. Zhang (2019) Negotiation-based human-robot collaboration via augmented reality. arXiv preprint arXiv:1909.11227. Cited by: §2, Figure 5, §4, §4.
  • C. Chang, J. Lee, C. Wang, and G. Chen (2010) Improving the authentic learning experience by integrating robots into the mixed-reality environment. Computers & Education 55 (4), pp. 1572–1578. Cited by: §2.
  • H. Chizeck et al. (2019) Telerobotic control in virtual reality. Ph.D. Thesis. Cited by: §4.5.8.
  • F. De Pace, F. Manuri, A. Sanna, and D. Zappia (2018) An augmented interface to display industrial robot faults. In International Conference on Augmented Reality, Virtual Reality and Computer Graphics, pp. 403–421. Cited by: §4, §4, §4.4.5.
  • S. C. de Vries and P. Padmos (1997) Steering a simulated unmanned aerial vehicle using a head-slaved camera and hmd. In Head-Mounted Displays II, Vol. 3058, pp. 24–33. Cited by: §2.
  • E. Dima, K. Brunnström, M. Sjöström, M. Andersson, J. Edlund, M. Johanson, and T. Qureshi (2020) Joint effects of depth-aiding augmentations and viewing positions on the quality of experience in augmented telepresence. Quality and User Experience 5 (1), pp. 2. Cited by: §4.
  • A. D. Dragan, K. C. Lee, and S. S. Srinivasa (2013) Legibility and predictability of robot motion. In 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 301–308. Cited by: §1.
  • J. S. Dyrstad, E. R. Øye, A. Stahl, and J. R. Mathiassen (2018)

    Teaching a robot to grasp real fish by imitation learning from a human supervisor in virtual reality

    In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 7185–7192. Cited by: §2.
  • J. A. Frank, M. Moorhead, and V. Kapila (2016) Realizing mixed-reality environments with tablets for intuitive human-robot collaboration for object manipulation tasks. In 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 302–307. Cited by: §2.
  • J. A. Frank, M. Moorhead, and V. Kapila (2017) Mobile mixed-reality interfaces that enhance human–robot interaction in shared spaces. Frontiers in Robotics and AI 4, pp. 20. Cited by: §3.3, §3, Figure 4, Figure 5, §4, §4, §4.
  • E. Freund and J. Rossmann (1999) Projective virtual reality: bridging the gap between virtual reality and robotics. IEEE transactions on robotics and automation 15 (3), pp. 411–422. Cited by: §2.
  • S. Y. Gadre, E. Rosen, G. Chien, E. Phillips, S. Tellex, and G. Konidaris (2019) End-user robot programming using mixed reality. In 2019 International Conference on Robotics and Automation (ICRA), pp. 2707–2713. Cited by: §4.4.4.
  • R. K. Ganesan, Y. K. Rathore, H. M. Ross, and H. B. Amor (2018) Better teaming through visual cues: how projecting imagery in a workspace can improve human-robot collaboration. IEEE Robotics & Automation Magazine 25 (2), pp. 59–71. Cited by: Figure 5, §4, §4, §4.
  • A. P. García, G. V. Fernández, B. M. P. Torres, and F. López-Peña (2011) Mixed reality educational environment for robotics. In 2011 IEEE International Conference on Virtual Environments, Human-Computer Interfaces and Measurement Systems Proceedings, pp. 1–6. Cited by: §2.
  • F. Ghiringhelli, J. Guzzi, G. A. Di Caro, V. Caglioti, L. M. Gambardella, and A. Giusti (2014) Interactive augmented reality for understanding and analyzing multi-robot systems. In 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1195–1201. Cited by: §4.4.8.
  • J. M. Gregory, C. Reardon, K. Lee, G. White, K. Ng, and C. Sims (2019) Enabling intuitive human-robot teaming using augmented reality and gesture control. arXiv preprint arXiv:1909.06415. Cited by: §3.3, §3.3.
  • T. Groechel, Z. Shi, R. Pakkar, and M. J. Matarić (2019) Using socially expressive mixed reality arms for enhancing low-expressivity robots. In 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), pp. 1–8. Cited by: Figure 3, §4, §4, §4.
  • S. Hashimoto, A. Ishida, M. Inami, and T. Igarashi (2011) Touchme: an augmented reality based remote robot manipulation. In The 21st International Conference on Artificial Reality and Telexistence, Proceedings of ICAT2011, Vol. 2. Cited by: §3.3, §3, §3, §4, §4.
  • H. Hedayati, M. Walker, and D. Szafir (2018) Improving collocated robot teleoperation with augmented reality. In Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, pp. 78–86. Cited by: §2, §3.3, §4.
  • K. Higuchi and J. Rekimoto (2013) Flying head: a head motion synchronization mechanism for unmanned aerial vehicle control. In CHI’13 Extended Abstracts on Human Factors in Computing Systems, pp. 2029–2038. Cited by: §3.
  • B. Hine, P. Hontalas, T. Fong, L. Piguet, E. Nygren, and A. Kline (1995) VEVI: a virtual environment teleoperations interface for planetary exploration. SAE transactions, pp. 615–628. Cited by: §2.
  • B. Hoppenstedt, T. Witte, J. Ruof, K. Kammerer, M. Tichy, M. Reichert, and R. Pryss (2019) Debugging quadrocopter trajectories in mixed reality. In International Conference on Augmented Reality, Virtual Reality and Computer Graphics, pp. 43–50. Cited by: §2.
  • J. Huuskonen and T. Oksanen (2019) Augmented reality for supervising multirobot system in agricultural field operation. IFAC-PapersOnLine 52 (30), pp. 367–372. Cited by: §4.5.6.
  • R. Ibrahimov, E. Tsykunov, V. Shirokun, A. Somov, and D. Tsetserukou (2019) Dronepick: object picking and delivery teleoperation with the drone controlled by a wearable tactile display. In 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), pp. 1–6. Cited by: §2.
  • K. Ishii, S. Zhao, M. Inami, T. Igarashi, and M. Imai (2009) Designing laser gesture interface for robot control. In IFIP Conference on Human-Computer Interaction, pp. 479–492. Cited by: §3.
  • M. L. Iuzzolino, M. E. Walker, and D. Szafir (2018) Virtual-to-real-world transfer learning for robots on wilderness trails. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 576–582. Cited by: §2, §4.
  • W. Johal, O. Robu, A. Dame, S. Magnenat, and F. Mondada (2019) Augmented robotics for learners: a case study on optics. In 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), pp. 1–6. Cited by: §4.5.3.
  • L. Kästner and J. Lambrecht (2019) Augmented-reality-based visualization of navigation data of mobile robots on the microsoft hololens-possibilities and limitations. In 2019 IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE Conference on Robotics, Automation and Mechatronics (RAM), pp. 344–349. Cited by: §4.4.1.
  • D. Kent, C. Saldanha, and S. Chernova (2017) A comparison of remote robot teleoperation interfaces for general object manipulation. In Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction, pp. 371–379. Cited by: §4.5.8.
  • W. Kim, F. Tendick, and L. Stark (1987) Visual enhancements in pick-and-place tasks: human operators controlling a simulated cylindrical manipulator. IEEE Journal on Robotics and Automation 3 (5), pp. 418–425. Cited by: §2.
  • K. Kobayashi, K. Nishiwaki, S. Uchiyama, H. Yamamoto, S. Kagami, and T. Kanade (2007) Overlay what humanoid robot perceives and thinks to the real-world by mixed reality system. In 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, pp. 275–276. Cited by: §3.3, §4, §4.4.5.
  • T. Kot and P. Novák (2014) Utilization of the oculus rift hmd in mobile robot teleoperation. In Applied Mechanics and Materials, Vol. 555, pp. 199–208. Cited by: item, Figure 3, §4, §4, §4, §4.
  • D. Krupke, L. Einig, E. Langbehn, J. Zhang, and F. Steinicke (2016) Immersive remote grasping: realtime gripper control by a heterogenous robot control system. In Proceedings of the 22nd ACM Conference on Virtual Reality Software and Technology, pp. 337–338. Cited by: §3, Figure 5, §4.
  • D. Krupke, F. Steinicke, P. Lubos, Y. Jonetzko, M. Görner, and J. Zhang (2018) Comparison of multimodal heading and pointing gestures for co-located mixed reality human-robot interaction. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1–9. Cited by: §3, §4, §4, §4.4.2.
  • F. Leutert, C. Herrmann, and K. Schilling (2013) A spatial augmented reality system for intuitive display of robotic data. In 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 179–180. Cited by: §4, §4.
  • J. Li, R. Balakrishnan, and T. Grossman (2019) StarHopper: a touch interface for remote object-centric drone navigation. Cited by: Figure 4, §4, §4.
  • J. I. Lipton, A. J. Fay, and D. Rus (2017) Baxter’s homunculus: virtual reality spaces for teleoperation in manufacturing. IEEE Robotics and Automation Letters 3 (1), pp. 179–186. Cited by: item, §3, §3, Figure 4, §4, §4.
  • E. Matsas, G. Vosniakos, and D. Batras (2018) Prototyping proactive and adaptive techniques for human-robot collaboration in manufacturing using virtual reality. Robotics and Computer-Integrated Manufacturing 50, pp. 168–180. Cited by: §4.5.1.
  • J. Meyer, A. Sendobry, S. Kohlbrecher, U. Klingauf, and O. Von Stryk (2012) Comprehensive simulation of quadrotor uavs using ros and gazebo. In International conference on simulation, modeling, and programming for autonomous robots, pp. 400–411. Cited by: §4.
  • S. Meyer zu Borgsen, P. Renner, F. Lier, T. Pfeiffer, and S. Wachsmuth (2018) Improving human-robot handover research by mixed reality techniques. In VAM-HRI 2018. The Inaugural International Workshop on Virtual, Augmented and Mixed Reality for Human-Robot Interaction. Proceedings, Cited by: Figure 3, §4, §4, §4.
  • P. Milgram and F. Kishino (1994) A taxonomy of mixed reality visual displays. IEICE TRANSACTIONS on Information and Systems 77 (12), pp. 1321–1329. Cited by: Figure 1, §3.1.
  • P. Milgram, A. Rastogi, and J. J. Grodski (1995) Telerobotic control using augmented reality. In Proceedings 4th IEEE International Workshop on Robot and Human Communication, pp. 21–29. Cited by: §2.
  • P. Milgram, S. Zhai, D. Drascic, and J. Grodski (1993) Applications of augmented reality for human-robot communication. In Proceedings of 1993 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’93), Vol. 3, pp. 1467–1472. Cited by: §2.
  • S. Mori, S. Ikeda, and H. Saito (2017) A survey of diminished reality: techniques for visually concealing, eliminating, and seeing through real objects. IPSJ Transactions on Computer Vision and Applications 9 (1), pp. 1–14. Cited by: §4.
  • R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos (2015) ORB-slam: a versatile and accurate monocular slam system. IEEE transactions on robotics 31 (5), pp. 1147–1163. Cited by: §3.3.
  • A. Naceri, D. Mazzanti, J. Bimbo, D. Prattichizzo, D. G. Caldwell, L. S. Mattos, and N. Deshpande (2019) Towards a virtual reality interface for remote robotic teleoperation. In 2019 19th International Conference on Advanced Robotics (ICAR), pp. 284–289. Cited by: §4.4.2.
  • A. Nawab, K. Chintamani, D. Ellis, G. Auner, and A. Pandya (2007) Joystick mapped augmented reality cues for end-effector controlled tele-operated robots. In 2007 IEEE Virtual Reality Conference, pp. 263–266. Cited by: Figure 4, §4.
  • D. A. Norman and S. W. Draper (1986) User centered system design; new perspectives on human-computer interaction. L. Erlbaum Associates Inc.. Cited by: §1.
  • D. R. Olsen Jr and S. B. Wood (2004) Fan-out: measuring human control of multiple robots. In Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 231–238. Cited by: §4.4.8.
  • S. Omidshafiei, A. Agha-Mohammadi, Y. F. Chen, N. K. Üre, J. P. How, J. L. Vian, and R. Surati (2015) Mar-cps: measurable augmented reality for prototyping cyber-physical systems. In AIAA Infotech@ Aerospace, pp. 0643. Cited by: Figure 4, §4.
  • A. Pereira, E. J. Carter, I. Leite, J. Mars, and J. F. Lehman (2017) Augmented reality dialog interface for multimodal teleoperation. In 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 764–771. Cited by: §4.5.2.
  • L. Pérez, E. Diez, R. Usamentiaga, and D. F. García (2019) Industrial robot control and operator training using virtual reality interfaces. Computers in Industry 109, pp. 114–120. Cited by: §2.
  • L. Qian, A. Deguet, Z. Wang, Y. Liu, and P. Kazanzides (2019) Augmented reality assisted instrument insertion and tool manipulation for the first assistant in robotic surgery. In 2019 International Conference on Robotics and Automation (ICRA), pp. 5173–5179. Cited by: §4.5.5.
  • C. P. Quintero, O. Ramirez, and M. Jägersand (2015) Vibi: assistive vision-based interface for robot manipulation. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 4458–4463. Cited by: §2, Figure 5, §4, §4.
  • C. Reardon, K. Lee, and J. Fink (2018) Come see this! augmented reality to enable human-robot cooperative search. In 2018 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), pp. 1–7. Cited by: §3.3, §3.3.
  • C. Reardon, K. Lee, J. G. Rogers, and J. Fink (2019) Augmented reality for human-robot teaming in field environments. In International Conference on Human-Computer Interaction, pp. 79–92. Cited by: Figure 4, §4.
  • S. Ropelato, F. Zünd, S. Magnenat, M. Menozzi, and R. Sumner (2018) Adaptive tutoring on a virtual reality driving simulator. International SERIES on Information Systems and Management in Creative EMedia (CreMedia) 2017 (2), pp. 12–17. Cited by: §4.4.4.
  • E. Rosen, D. Whitney, E. Phillips, G. Chien, J. Tompkin, G. Konidaris, and S. Tellex (2020) Communicating robot arm motion intent through mixed reality head-mounted displays. In Robotics Research, pp. 301–316. Cited by: item, Figure 3, §4, §4, §4, §4.
  • A. San Martín and J. Kildal (2019) Audio-visual ar to improve awareness of hazard zones around robots. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, pp. 1–6. Cited by: §4.5.6.
  • J. Sanghvi, G. Castellano, I. Leite, A. Pereira, P. W. McOwan, and A. Paiva (2011) Automatic analysis of affective postures and body motion to detect engagement with a game companion. In Proceedings of the 6th international conference on Human-robot interaction, pp. 305–312. Cited by: §1.
  • A. Segal, D. Haehnel, and S. Thrun (2009) Generalized-icp.. In Robotics: science and systems, Vol. 2, pp. 435. Cited by: §3.3.
  • M. C. Shrestha, T. Onishi, A. Kobayashi, M. Kamezaki, and S. Sugano (2018) Communicating directional intent in robot navigation using projection indicators. In 2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 746–751. Cited by: §4.
  • E. Sibirtseva, D. Kontogiorgos, O. Nykvist, H. Karaoguz, I. Leite, J. Gustafson, and D. Kragic (2018) A comparison of visualisation methods for disambiguating verbal requests in human-robot interaction. In 2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 43–50. Cited by: §4, §4.
  • D. Sportillo, A. Paljic, and L. Ojeda (2018) Get ready for automated driving using virtual reality. Accident Analysis & Prevention 118, pp. 102–113. Cited by: §4.5.7.
  • D. Sprute, K. Tönnies, and M. König (2019a) A study on different user interfaces for teaching virtual borders to mobile robots. International Journal of Social Robotics 11 (3), pp. 373–388. Cited by: Figure 4, §4.
  • D. Sprute, P. Viertel, K. Tönnies, and M. König (2019b)

    Learning virtual borders through semantic scene understanding and augmented reality

    In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4607–4614. Cited by: §4.4.3.
  • M. Stilman, P. Michel, J. Chestnutt, K. Nishiwaki, S. Kagami, and J. Kuffner (2005) Augmented reality for robot development and experimentation. Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, Tech. Rep. CMU-RI-TR-05-55 2 (3). Cited by: §4.
  • P. Stotko, S. Krumpen, M. Schwarz, C. Lenz, S. Behnke, R. Klein, and M. Weinmann (2019) A vr system for immersive teleoperation and live exploration with a mobile robot. arXiv preprint arXiv:1908.02949. Cited by: §4.4.1.
  • D. Sun, A. Kiselev, Q. Liao, T. Stoyanov, and A. Loutfi (2020) A new mixed-reality-based teleoperation system for telepresence and maneuverability enhancement. IEEE Transactions on Human-Machine Systems 50 (1), pp. 55–67. Cited by: §2, item, §3, §4, §4.
  • I. E. Sutherland (1965) The ultimate display. Multimedia: From Wagner to virtual reality 1. Cited by: §2.
  • I. E. Sutherland (1968) A head-mounted three dimensional display. In Proceedings of the December 9-11, 1968, fall joint computer conference, part I, pp. 757–764. Cited by: §2.
  • D. Szafir, B. Mutlu, and T. Fong (2014) Communication of intent in assistive free flyers. In Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction, pp. 358–365. Cited by: §1.
  • D. Szafir, B. Mutlu, and T. Fong (2015) Communicating directionality in flying robots. In 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 19–26. Cited by: §1, §4.
  • [87] A. V. Taylor, A. Matsumoto, E. J. Carter, A. Plopski, and H. Admoni Diminished reality for close quarters robotic telemanipulation. Cited by: Figure 3, §4.
  • S. Tellex, R. Knepper, A. Li, D. Rus, and N. Roy (2014) Asking for help using inverse semantics. Robotics: Science and Systems Foundation. Cited by: §1.
  • J. Urbani, M. Al-Sada, T. Nakajima, and T. Höglund (2018) Exploring augmented reality interaction for everyday multipurpose wearable robots. In 2018 IEEE 24th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), pp. 209–216. Cited by: §4.5.4.
  • M. E. Walker, H. Hedayati, and D. Szafir (2019) Robot teleoperation with augmented reality virtual surrogates. In 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 202–210. Cited by: §3.3, Figure 3, §4, §4, §4.
  • M. Walker, H. Hedayati, J. Lee, and D. Szafir (2018) Communicating robot motion intent with augmented reality. In Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, pp. 316–324. Cited by: §2, §3.3, Figure 3, §4, §4, §4, §4, §4, §4, §4.
  • D. Whitney, E. Rosen, D. Ullman, E. Phillips, and S. Tellex (2018) Ros reality: a virtual reality framework using consumer-grade hardware for ros-enabled robots. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1–9. Cited by: §3.
  • L. Wijnen, S. Lemaignan, and P. Bremner (2020) Towards using virtual reality for replicating hri studies. In Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction, pp. 514–516. Cited by: §2.
  • T. Williams, M. Bussing, S. Cabrol, E. Boyle, and N. Tran (2019a) Mixed reality deictic gesture for multi-modal robot communication. In 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 191–201. Cited by: §4.
  • T. Williams, L. Hirshfield, N. Tran, T. Grant, and N. Woodward (2020) Using augmented reality to better study human-robot interaction. In International Conference on Human-Computer Interaction, pp. 643–654. Cited by: §2.
  • T. Williams, D. Szafir, T. Chakraborti, and H. Ben Amor (2018) Virtual, augmented, and mixed reality for human-robot interaction. In Companion of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, pp. 403–404. Cited by: §1.
  • T. Williams, D. Szafir, and T. Chakraborti (2019b) The reality-virtuality interaction cube. In Proceedings of the 2nd International Workshop on Virtual, Augmented, and Mixed Reality for HRI, Cited by: §1, §4.
  • Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox (2017)

    Posecnn: a convolutional neural network for 6d object pose estimation in cluttered scenes

    arXiv preprint arXiv:1711.00199. Cited by: §3.3.
  • J. Zhang, O. Janeh, N. Katzakis, D. Krupke, and F. Steinicke (2019) Evaluation of proxemics in dynamic interaction with a mixed reality avatar robot.. In ICAT-EGVE, pp. 37–44. Cited by: §4, §4.4.7.
  • S. Zollmann, C. Hoppe, T. Langlotz, and G. Reitmayr (2014) FlyAR: augmented reality supported micro aerial vehicle navigation. IEEE transactions on visualization and computer graphics 20 (4), pp. 560–568. Cited by: §4.5.8.
  • M. Zolotas and Y. Demiris (2019) Towards explainable shared control using augmented reality. Cited by: §4.5.5.