Recent advance in Virtual Reality (VR) and Augmented Reality (AR) has blurred the boundaries between the virtual and the physical world, introducing a new dimension for Human-Robot Interaction (HRI). With new dedicated hardware [21, 26, 28], VR affords easy modifications of the environment and its physical laws for HRI; it has already facilitated various applications that are otherwise difficult to conduct in the physical world, such as psychology studies [37, 51, 46] and AI agent training [23, 38, 50, 49].
In comparison, AR is not designed to alter the physical laws. By overlaying symbolic/semantic information and visual aids as holograms, its existing applications primarily focus on assistance in HRI, e.g., interfacing [48, 56, 55]4, 9, 45], robot control [19, 60], and programming [27, 35]. Such a confined range of applications hinders its functions in broader fields.
We argue such a deficiency is due to the current setting adopted in prior AR work; we call it a active human, passive robot paradigm, as illustrated by the red arrows in Fig. 1. In such a paradigm, the virtual holograms displayed in AR introduce asymmetric perceptions to humans and robots; from two different views, the robot and human agents may possess a different amount of information. This form of information asymmetry prevents the robot from properly assisting humans during collaborations. This paradigm also heavily relies on a one-way communication channel, which intrinsically comes with a significant limit: only human agents can initiate the communication channel, whereas a robot can only passively execute the commands sent by humans, incapable of proactively manipulating and interacting with the augmented and physical environment.
To overcome these issues, we introduce a new active human, active robot paradigm and propose a shared AR workspace, which affords shared perception and manipulation for both human agents and robots; see Fig. 1:
Shared perception among human agents and robots. In contrast to existing work in AR that only enhances human agents’ understanding of robotic systems, the shared AR workspace dispatches perceptual information of the augmented environment to both human agents and robots equivalently. By sharing the same augmented knowledge, a robot can properly assist its human partner during HRI; the robot can accomplish a Level 1 Visual Perspective Taking (VPT1
) by inferring if a human agent perceives certain holograms and estimating associated costs.
Shared manipulation on AR holograms. In addition to manipulating physical objects, shared AR workspace endows a robot with the capability to manipulate holograms proactively, in the same way as a human agent does, which would instantly trigger the update of shared perception. As a result, HRI in the shared AR workspace permits a more seamless and harmonious collaboration.
We develop a prototype system using a Microsoft Hololens and TurtleBot2, and demonstrates the efficacy of the shared AR workspace in a case study of a resource collection game.
The remainder of the paper is organized as follows. Section II introduces the system setup and details some critical system components. The two essential functions, shared perception and shared manipulation of the proposed shared AR workspace, are described in Section III. Section IV demonstrates the efficacy of the proposed system by a case study, and Section V concludes the paper with discussions on some related fields the system could potentially promote.
Ii System Setup
In this section, we describe the prototype system that demonstrates the concept of the shared AR workspace; Fig. 2 depicts the system architecture. Our prototype system assumes (i) a human agent wearing an AR device and (ii) a robot with perception sensors; however, the system should be able to scale up to multi-human, multi-robot settings.
We choose TurtleBot2 mobile robot with a ROS compatible laptop as the robot platform; see Fig. 2a. The robot’s perception module includes a Kinect 2 RGB-D sensor and a Hokuyo Lidar, which constructs the environment’s 3D structure using RTAB-Map . Once the map is built, the robot only needs to localize itself within the map by fusing visual and wheel odometry.
Human agents in the present study wear a Microsoft HoloLens as the AR device; see Fig. 2b. HoloLens headset integrates a 32-bit Intel Atom processor and runs Windows 10 operating system onboard. Using Microsoft’s Holographic Processing Unit, the users can realistically view the augmented contents as holograms. The AR environment is created using the Unity3D game engine.
Real-time interactions in the shared AR workspace demands timely communication between HoloLens (human agents) and robots, established using ROS# . Between the two parties, HoloLens serves as the client, who publishes the poses of holograms, whereas the robot serves as the server, which receives these messages and integrates them into ROS. In addition to the perceptual information obtained by its sensors, the robot also has access to the 3D models of holograms so that they can be rendered appropriately and augmented to the shared perception.
The shared perception in the shared AR workspace allows a robot to perceive virtual holograms in three different levels with increasing depth: (i) know the existence of holograms in the environment, (ii) see the holograms from the robot’s current coordinate obtained by localizing itself using physical sensors, and (iii) infer human agent’s utility/cost of seeing holograms. Take an example shown in Fig. 2e: human agents can directly see objects in the yellow region as it is within their Field-of-View (FoV), but they need to change the views to perceive the objects marked in light blue; objects in dark blue are fully occluded. Only having with such a multi-resolution inference could the robot properly initiate interactions or collaboration with the human, forming a bi-directional communication. For instance, in Fig. 2f, the robot estimates a hologram is occluded from the human agent’s current view and plans and carries this occluded hologram to assist a human agent to accomplish a task. Since the robot proactively helps the human agent form collaborations, such a new AR paradigm contrasts the prior one-directional communication.
Iii Shared AR Workspace
Below we describe the shared perception and shared manipulation implemented in the shared AR workspace.
Iii-a Detection and Transformation
A key feature in the shared AR workspace is the ability to know where the holograms are at all time, which requires to localize human agents, robots, and holograms, and construct transformations among them. Using an AR headset, the human agent’s location is directly obtained. Given the corresponding transformations between a human agent and a hologram , , the AR headset with the human agent’s egocentric view can render the holograms.
By estimating the human pose from a single RGB-D image , the robot establishes a transformation to the human agent ; Fig. 2c shows one example. Specifically, the frame of a human agent is attached to the head, whose axis is aligned with the human face’s orientation estimated by three key points—two eyes and the neck. When the human agent is partially or completely outside of the robot’s FoV, the frame of the human agent is directly estimated by leveraging the visual odometry provided by the Hololens.
By combining the above two transformations, the transformations from the robot to a hologram can be computed by . The transformations and the coordination of human agents, robots, and virtual holograms are represented in the same coordinate for easy retrieval by the robot.
Iii-B Augmenting Holograms
Only knowing the existence of holograms is insufficient; the robot ought to “see” the holograms in a way that can be naturally processed for its internal modules (e.g., planning, reconstruction). We design a rendering schema to “augment” holograms to the robot and incorporate them into the robot’s ROS data messages, such as 3D point clouds and 2D images.
3D Point Clouds
The holograms rendered for human agents are stored in a mesh format. To render them in 3D for robots, we use a sampling-based method  to convert holograms to point clouds. With the established transformations, these holograms are augmented to the robot’s point clouds input with both position and color information; see Fig. 4(a) for examples of rendered holograms for the robot.
2D Image Projection
We render the holograms by projecting them onto the robot’s received images. Following a general rendering pipeline , we retrieve the hologram’s coordinate with respect to camera frame by the established transformation and calculate the 2D pixel position , given the camera’s intrinsic matrix .
Iii-C Visual Perspective Taking
Simply “knowing” and “seeing” holograms would not be sufficient for a robot to help the human agent in the shared AR workspace proactively. Instead, to collaborate, plan, and execute properly, the robot would need to possess the ability to infer whether others can see an object. Such an ability to attribute others’ perspective is known as Level 1 Visual Perspective Taking (VPT1) [13, 22]. Specifically, we hope to endow the robot in the shared AR workspace with capabilities of inferring (i) whether the human agent can see certain objects, and (ii) how difficult it is.
VPT1 of a robot is devised and implemented at both the object level and scene level. At the object level, we define the human agent’s cost to see an object as a function proportional to the angle between the human agent’s current facing direction and looking direction of the object; see an illustration in Fig. 3
. The facing direction is jointly determined by the pose detection from the robot’s view and the IMU embedded in HoloLens. The system also accounts for the visibility of objects as they may be occluded by other real/virtual objects in the environment. To identify an occluded object, multiple virtual rays are emitted fromAR headset’s FoV to the points in a standard plane whose pose would be updated along with the human agent’s pose. The object would be identified as occluded if any of those rays intersect with (i) other holograms whose poses are known in the system, or (ii) real objects or structures whose surfaces are detected by HoloLens’s spatial mapping.
At the scene level, we categorize the augmented environment into three regions: (i) Focusing region, highlighted in yellow in Fig. 2e, is considered within the human’s FoV excluding occluded regions, determined by the FoV of HoloLens—a by area centered at the human’s eye. (ii) Transition region, highlighted in light blue, does not directly appear in the human’s FoV, but it can be perceived with minimal efforts (e.g., by turning head). (iii) Blocked region, highlighted in the dark blue, is occluded and cannot be seen by merely rotating view angles; the human agent has to traverse the space with large body movements, e.g., spaces under tables are typical Blocked regions.
Iii-D Interacting with Holograms
By “seeing,” “knowing,” and even “inferring” human agents about holograms in the shared AR workspace, the robot could subsequently plan and manipulate these holograms as an active user in the very same way as a human agent does. However, the holograms are not yet tangible for the robot to “interact.” In our prototype system, we devise a simple rule-based algorithm to determine the conditions to be triggered for a robot to interact with holograms.
Fig. 4 illustrates the core idea. After obtaining a hologram’s 3D mesh, the algorithm fits a circumscribed sphere to the mesh and to itself with enlargement. Once the robot’s sphere is sufficiently close to the hologram’s (i.e., there is an intersection between two spheres), it triggers a manipulation mode, and the hologram is attached to the robot and move together. The movements are also synced in the shared perception to the human agent in real-time; see Fig. 2f. Since the present study adopts a ground mobile robot, we project the spheres to circles on the floor plane to simplify the intersection check. More sophisticated interactions, such as a mobile manipulator grasping a hologram in 3D space, is achievable using standard collision checking methods.
The last component of the system is the planner. In fact, the shared AR workspace poses no constraints on task and motion planning algorithms; the decision should be made mainly based on robot platforms (e.g., ground mobile robot, mobile manipulator, humanoid) and executed tasks (e.g., HRI, navigation, prediction) during the interactions; see the next section for the planning schema adopted in this paper.
Iv-a Experimental Setup
We design a resource collection game in the shared AR workspace to demonstrate the efficacy of the system. Fig. 4(a) depicts the environment. Six holograms, rendered as point clouds and highlighted in circles with zoomed-in views, are placed around the human agent (marked by a red skeleton at the center of the room), whose facing direction is indicated in yellow. Some holograms can be easily seen, whereas others are harder due to their tricky locations in 3D or occlusion (e.g., object 6). A human agent’s task is to collect all holograms and move them to the table as fast as possible. The robot stationed in the green dot would help the human in collecting the resources.
As described in Section III-C, the robot first estimates the cost for a human agent to see the holograms and whether they are occluded; the result is shown in Fig. 4(b). In our prototype system, the robot prioritizes to help the occluded holograms and then switch to the one with the highest cost. In future, it is possible to integrate prediction models (e.g., [14, 12, 34]) that anticipate human behaviors.
Iv-B Qualitative Results
Intuitively, we should see a better overall performance during HRI via shared AR workspace due to its shared perception and manipulation that enables a robot to help the human agent for task completion collaboratively proactively.
Fig. 6 gives an example of a complete process, demonstrating a natural interaction between the human agent and the robot to accomplish a given task collaboratively. The top row shows the human agent’s egocentric views through the Hololens that overlays the holograms to the image captured by its PV camera. The middle row is a sequence of the interactions between the robot and holograms from a third-person view. The bottom row reveals the robot’s knowledge of the workspace and its plans. In this particular trial, the human agent first collected the roman tomato and the bottle as they appear to have a lower cost. In parallel, the robot collaboratively carries holograms—the occluded cabbage and the tomato with the highest cost—to the human agent.
Iv-C Quantitative Results
We conduct a pilot study to evaluate shared AR workspace quantitatively. Twenty participants were recruited to assess the robot performance in a between-subject setting ( for each group). The participants in the Human group are asked to find and collect all six holograms by themselves. The participants in the Human+Robot group use the shared AR workspace system, where the robot proactively helps the participants to accomplish the task. Each subject has no familiarization with the physical environments, but they received simple training about how to use the AR device right before the experiments started.
Fig. 7 compares the results between the two groups. The difference of the completion time is statistically significant; , . Participants with robot’s help take significantly less time (mean: seconds, median:
seconds) to complete the given task. In contrast, the baseline group requires much more time with a larger variance (mean:seconds, median: seconds). This finding indicates a new role that a robot can play in the shared AR workspace by assisting human agents to accomplish a task collaboratively.
V Related Work and Discussion
We design, implement, and demonstrate how the shared perception and manipulation provided by the shared AR workspace improve HRI with a proof-of-concept system using a resource collection game. In future, more complex and diverse HRI studies are needed to further examine and benifits and limits of the shared AR workspace by (i) varying the degree of human agent’s and/or robot’s perception and manipulation capability; e.g., only the robot can see and act on holograms while the human agent cannot, as an opposite to current AR setup, and (ii) introducing virtual components to avoid certain costly and dangerous setups in the physical world. Below, we briefly review related work and scenarios that shared AR workspace could potentially facilitate.
The idea of creating a shared workspace for human agents and robots has been implemented in VR, where they can re-target views to each other to interact with virtual objects . Prior studies have demonstrated advantages in teleoperation  and robot policy learning . More recently, a system [11, 54] that allows multiple users to interact with the same AR elements is devised. In comparison, the shared AR workspace deals with the perceptual noise in the physical world and promotes robots to become active users in AR to work on tasks with humans collaboratively.
In recent years, Human-Robot Interaction and Collaboration have been developing with increasing breadth and depth. One core challenge of the field is to seek how the robot or the human should act to promote understanding and trust, usually in terms of predictability, with the other. From a robot’s angle, it models humans by inferring goals [25, 32], tracking mental states [6, 52], predicting actions , and recognizing intention and attention [31, 16]. From a human agent’s perspective, the robot needs to be more expressed , to promote human trust , to assist properly [18, 30], and to generate proper explanations of its behavior . We believe the proposed shared AR workspace is an ideal platform for evaluating and benchmarking existing and new algorithms and models.
Human-robot teaming [10, 41] poses new challenges to computational models aiming to endow robots with the Theory of Mind abilities, which are usually in a dyadic scenario . With the adaptability to multi-party settings and the fine-grained controllability of users’ situational awareness, the proposed shared AR workspace offers a unique solution to test the robot’s ability to maintain belief, intention, and desires [15, 47, 52] of other agents. Crucially, the robot would play the role of a collaborator to help and as a moderator  to accommodate each agent. The ultimate goal is to forge a shared agency [42, 40] between robots and human agents for seamless collaboration.
How human’s cognition emerges and develops is a fundamental question. Researchers have looked into the behaviors of primates’ collaboration and communication , imitation , and crows’ high-level reasoning , planning and tool making  for deeper insights. Cognitive robots are still in their infancy in developing such advanced cognitive capabilities, despite various research efforts [5, 58]. These experimental settings can be relatively easier to replicate in the shared AR workspace, which would open up new avenues to study how a robot would emerge similar behaviors.
-  (2001) Interactive computer graphics: a top-down approach with opengl primer package. Prentice-Hall, Inc.. Cited by: §III-B.
-  ROS sharp. Note: https://github.com/siemens/rossharp, Accessed: 2020-01-15 Cited by: §II.
-  (1998) Collaborative virtual environments: an introductory review of issues and systems. Virtual Reality 3 (1), pp. 3–15. Cited by: §V.
-  (2006) Augmented reality visualisation for player. In International Conference on Robotics and Automation (ICRA), Cited by: §I.
-  (2019) Embodiment in socially interactive robots. Foundations and Trends® in Robotics 7 (4), pp. 251–356. Cited by: §V.
-  (2016) An implemented theory of mind to improve human-robot shared plans execution. In ACM/IEEE International Conference on Human-Robot Interaction (HRI), Cited by: §V.
-  (2019) A tale of two explanations: enhancing human trust by explaining robot behavior. Science Robotics 4 (37). Cited by: §V.
-  (1996) Action recognition in the premotor cortex. Brain 119 (2), pp. 593–609. Cited by: §V.
-  (2014) Interactive augmented reality for understanding and analyzing multi-robot systems. In International Conference on Intelligent Robots and Systems (IROS), Cited by: §I.
-  (2017) Computational design of mixed-initiative human–robot teaming that considers human factors: situational awareness, workload, and workflow preferences. International Journal of Robotics Research (IJRR) 36 (5-7), pp. 597–617. Cited by: §V.
-  (2017) Design and evaluation of a handheld-based 3d user interface for collaborative object manipulation. In ACM Conference on Human Factors in Computing Systems (CHI), Cited by: §V.
-  (2018) Preference-based assistance prediction for human-robot collaboration tasks. In International Conference on Intelligent Robots and Systems (IROS), Cited by: §IV-A.
-  (2009) Visual perspective taking impairment in children with autistic spectrum disorder. Cognition 113 (1), pp. 37–44. Cited by: §III-C.
-  (2007) Cost-based anticipatory action selection for human–robot fluency. Transactions on Robotics (T-RO) 23 (5), pp. 952–961. Cited by: §IV-A.
-  (2016) Inferring human intent from video by sampling hierarchical plans. in intelligent robots and systems. In International Conference on Intelligent Robots and Systems (IROS), Cited by: §V.
-  (2016) Anticipatory robot control for efficient human-robot collaboration. In ACM/IEEE International Conference on Human-Robot Interaction (HRI), Cited by: §V.
-  (1996) Manufacture and use of hook-tools by new caledonian crows. Nature 379 (6562), pp. 249–251. Cited by: §V.
-  (2015) May i help you?: design of human-like polite approaching behavior. In ACM/IEEE International Conference on Human-Robot Interaction (HRI), Cited by: §V.
-  (2015) Intuitive visual teleoperation for ugvs using free-look augmented reality displays. In International Conference on Robotics and Automation (ICRA), Cited by: §I.
-  (2014) Online global loop closure detection for large-scale multi-session graph-based slam. In International Conference on Intelligent Robots and Systems (IROS), Cited by: §II.
-  (2014) Head tracking for the oculus rift. In International Conference on Robotics and Automation (ICRA), Cited by: §I.
-  (1977) The development in very young children of tacit knowledge concerning visual perception.. Genetic Psychology Monographs. Cited by: §III-C.
-  (2016) A virtual reality platform for dynamic human-scene interaction. In SIGGRAPH ASIA 2016 Virtual Reality meets Physical Reality: Modelling and Simulating Virtual Humans and Environments, Cited by: §I.
-  (2017) Baxter’s homunculus: virtual reality spaces for teleoperation in manufacturing. Robotics and Automation Letters (RA-L) 3 (1), pp. 179–186. Cited by: §V.
-  (2016) Goal inference improves objective and perceived performance in human-robot collaboration. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Cited by: §V.
-  (2017) A glove-based system for studying hand-object manipulation via joint pose and force sensing. In International Conference on Intelligent Robots and Systems (IROS), Cited by: §I.
-  (2018) Interactive robot knowledge patching using augmented reality. In International Conference on Robotics and Automation (ICRA), Cited by: §I.
-  (2019) High-fidelity grasping in virtual reality using a glove-based system. In International Conference on Robotics and Automation (ICRA), Cited by: §I.
-  (2019) Chimpanzees (pan troglodytes) coordinate by communicating in a collaborative problem-solving task. the Royal Society B 286 (1901), pp. 20190408. Cited by: §V.
-  (2016) A multi-modal perception based architecture for a non-intrusive domestic assistant robot. In ACM/IEEE International Conference on Human-Robot Interaction (HRI), Cited by: §V.
-  (2010) Mightability maps: a perceptual level decisional framework for co-operative and competitive human-robot interaction. In International Conference on Intelligent Robots and Systems (IROS), Cited by: §V.
-  (2016) Human-robot shared workspace collaboration via hindsight optimization. In International Conference on Intelligent Robots and Systems (IROS), Cited by: §V.
-  (1978) Does the chimpanzee have a theory of mind?. Behavioral and brain sciences 1 (4), pp. 515–526. Cited by: §V.
-  (2020) A generalized earley parser for human activity parsing and prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). Cited by: §IV-A.
-  (2018) Robot programming through augmented trajectories in augmented reality. In International Conference on Intelligent Robots and Systems (IROS), Cited by: §I.
-  (2011) 3d is here: point cloud library (pcl). In International Conference on Robotics and Automation (ICRA), Cited by: §III-B.
-  (2016) Who turned the clock? effects of manipulated zeitgebers, cognitive load and immersion on time estimation. IEEE Transactions on Visualization & Computer Graph (TVCG) 22 (4), pp. 1387–1395. Cited by: §I.
-  (2018) Airsim: high-fidelity visual and physical simulation for autonomous vehicles. In Field and service robotics, Cited by: §I.
-  (2017) Robot moderation of a collaborative game: towards socially assistive robotics in group interactions. In International Symposium on Robot and Human Interactive Communication (RO-MAN), Cited by: §V.
-  (2020) Intuitive signaling through an “imagined w”. In the Annual Meeting of the Cognitive Science Society (CogSci), Cited by: §V.
-  (2010) Planning for human-robot teaming in open worlds. Transactions on Intelligent Systems and Technology (TIST) 1 (2), pp. 1–24. Cited by: §V.
-  (2020) Bootstrapping an imagined we for cooperation. In the Annual Meeting of the Cognitive Science Society (CogSci), Cited by: §V.
-  (2009) Do new caledonian crows solve physical problems through causal reasoning?. the Royal Society B 276 (1655), pp. 247–254. Cited by: §V.
-  (2018) Human-aware robotic assistant for collaborative assembly: integrating human motion prediction with planning in time. Robotics and Automation Letters (RA-L) 3 (3), pp. 2394–2401. Cited by: §V.
-  (2018) Communicating robot motion intent with augmented reality. In ACM/IEEE International Conference on Human-Robot Interaction (HRI), Cited by: §I.
-  (2018) Spatially perturbed collision sounds attenuate perceived causality in 3d launching events. In Conference on Virtual Reality and 3D User Interfaces (VR), Cited by: §I.
-  (2018) Where and why are they looking? jointly inferring human attention and intentions in complex tasks. In , Cited by: §V.
-  (2017) Assistive grasping with an augmented reality user interface. International Journal of Robotics Research (IJRR) 36 (5-7), pp. 543–562. Cited by: §I.
Learning virtual grasp with failed demonstrations via bayesian inverse reinforcement learning. In International Conference on Intelligent Robots and Systems (IROS), Cited by: §I.
-  (2019) Vrgym: a virtual testbed for physical and interactive ai. In ACM Turing Celebration Conference-China, Cited by: §I.
-  (2017) The martian: examining human physical judgments across virtual gravity fields. IEEE Transactions on Visualization & Computer Graph (TVCG) 23 (4), pp. 1399–1408. Cited by: §I.
-  (2020) Joint inference of states, robot knowledge, and human (false-)beliefs. In International Conference on Robotics and Automation (ICRA), Cited by: §V, §V.
Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. In International Conference on Robotics and Automation (ICRA), Cited by: §V.
-  (2018) CARS: collaborative augmented reality for socialization. In International Workshop on Mobile Computing Systems & Applications, Cited by: §V.
-  (2020) Congestion-aware evacuation routing using augmented reality devices. In International Conference on Robotics and Automation (ICRA), Cited by: §I.
-  (2019) Vision-tangible interactive display method for mixed and virtual reality: toward the human-centered editable reality. Journal of the Society for Information Display. Cited by: §I.
-  (2018) Cost functions for robot motion style. In International Conference on Intelligent Robots and Systems (IROS), Cited by: §V.
-  (2020) Dark, beyond deep: a paradigm shift to cognitive ai with humanlike common sense. Engineering 6 (3), pp. 310–345. Cited by: §V.
-  (2018) 3d human pose estimation in rgbd images for robotic task learning. In International Conference on Robotics and Automation (ICRA), Cited by: §III-A.
-  (2018) Head-mounted augmented reality for explainable robotic wheelchair assistance. In International Conference on Intelligent Robots and Systems (IROS), Cited by: §I.