Effect of Visual Cues on Pointing Tasks in Co-located Augmented Reality Collaboration

by   Lei Chen, et al.
Xi'an Jiaotong-Liverpool University

Visual cues are essential in computer-mediated communication. It is especially important when communication happens in a collaboration scenario that requires focusing several users' attention on aspecific object among other similar ones. This paper explores the effect of visual cues on pointing tasks in co-located Augmented Reality (AR) collaboration. A user study (N = 32, 16 pairs) was conducted to compare two types of visual cues: Pointing Line (PL)and Moving Track (MT). Both are head-based visual techniques.Through a series of collaborative pointing tasks on objects with different states (static and dynamic) and density levels (low, mediumand high), the results showed that PL was better on task performance and usability, but MT was rated higher on social presenceand user preference. Based on our results, some design implicationsare provided for pointing tasks in co-located AR collaboration.



There are no comments yet.


page 1

page 4


Exploring the Effect of Visual Cues on Eye Gaze During AR-Guided Picking and Assembly Tasks

In this paper, we present an analysis of eye gaze patterns pertaining to...

Frontal Screens on Head-Mounted Displays to Increase Awareness of the HMD Users' State in Mixed Presence Collaboration

In the everyday context, e.g., a household, HMD users remain a part of t...

Effects of Head-locked Augmented Reality on User's performance and perceived workload

An augmented reality (AR) environment includes a set of digital elements...

Labeling Out-of-View Objects in Immersive Analytics to Support Situated Visual Searching

Augmented Reality (AR) embeds digital information into objects of the ph...

Project IRL: Playful Co-Located Interactions with Mobile Augmented Reality

We present Project IRL (In Real Life), a suite of five mobile apps we cr...

The Effect of Focal Distance, Age, and Brightness on Near-Field Augmented Reality Depth Matching

Many augmented reality (AR) applications operate within near-field reach...

Towards Cross-Surface Immersion Using Low Cost Multi-Sensory Output Cues to Support Proxemics and Kinesics Across Heterogeneous Systems

Collaboration in immersive systems can be achieved by using an immersive...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Augmented Reality (AR) has been used for collaborative experiences. It allows users to interact with shared virtual content while having a view of the real world (Lukosch et al., 2015; Billinghurst and Kato, 2002; Yamazoe and Yonezawa, 2014). For these collaborative experiences to be efficient and positive, there needs to be fluid communication between the collaborators. In addition, knowing what the collaborator is doing or which objects are being looked at can improve the sense of awareness and social presence (Kim et al., 2018, 2019; Gerbaud and Arnaldi, 2008; Kraut et al., 1996). One advantage of using AR head-mounted displays (HMDs) is their ability to capture users’ head and gaze movements and use the data to provide visual augmentation cues (Erickson et al., 2020; Huang et al., 2018b; Xu et al., 2019). Such cues can enhance situational awareness and social presence (Bai et al., 2020) and improve user performance and usability in collaborative AR (Piumsomboon et al., 2019).

Previous research on AR primarily focused on remote collaboration (e.g., (Piumsomboon et al., 2017; Higuch et al., 2016; Huang et al., 2018a, b)). It is not clear how visual cues could be helpful in enhancing user collaboration when they are in the same physical environment. For pointing tasks specifically, many applications involve such pointing tasks during co-located collaboration (e.g., games, education, training). It is therefore important to understand the appropriate techniques for such pointing tasks with different object states and density levels. Users could be interacting with objects that are not easily touchable (e.g., moving objects) and clustered in dense regions with other objects with similar properties. Attempting to describe or pinpoint an object of interest using verbal and hand gestures may not be practical and efficient. In this research, we explore the use of visual cues to enhance the identification of target objects in collaborative AR where these objects can be static or dynamic and be in various levels of density.

Pointers (Erickson et al., 2020; Kim et al., 2013b; Yu et al., 2019) and annotations (Huang et al., 2019; LaViola et al., 1998) are two main visual cues that have been explored. The majority of prior research focused on hand-based techniques, e.g., to carry a pointer and to draw an annotation in a virtual environment. However, these hand-based techniques may not be efficient or ideal for AR systems (Lu et al., 2021, 2020). For one, it can pose severe occlusion because the field-of-view of AR HMDs is usually narrow. Also, AR systems, with the exception of the Magic Leap, do not come with a pointing device or controllers.

Therefore, in this research, we explore two hands-free, head-based techniques to simulate visual cues provided by pointers and annotations. Pointing Line (PL) is used to simulate pointers. It indicates a user’s line of sight and focus of attention. Moving Track (MT) simulates the use of annotations. It records a continuous moving track to help identify a target. Based on the two techniques, we investigate the effect of visual cues on task performance, usability, and social presence when interacting with virtual objects in a co-located AR. To do this, we ran a user study with two visual cues (PL and MT) to allow paired users to share information about the object of interest and identify it. The experiment involved both static and dynamic objects in three levels of density (low, medium, and high). Figure 1 (above) shows an overview of the experiment setup.

In short, the paper makes the following two contributions:

  • We report the results of a user study comparing the two types of visual cues based on hands-free, head-based techniques in co-located AR scenarios for pointing and selection tasks;

  • Based on the results, we propose implications on the design and use of these visual cues for co-located AR collaboration.

2. Related Work

2.1. Collaborative AR Systems and Social Presence

Collaboration is the “mutual engagement of participants in a coordinated effort to solve a problem together” (Roschelle and Teasley, 1995). Collaborative AR systems allow users to interact with shared AR content as naturally as with physical objects, and to complete a task or achieve a common goal (Billinghurst and Kato, 1999; Li et al., 2017; Chen et al., 2020). Researchers have shown that AR systems can effectively support a group of users to perform collaborative activities (Tait and Billinghurst, 2015; Kaufmann, 2003; Szalavári et al., 1998; Billinghurst et al., 2002b).

During collaboration, it is key to be aware of where and which object(s) the collaborator is interested in or interacting with (Lacoche et al., 2017; Antunes et al., 2001; Dourish and Bellotti, 1992). Providing visual, nonverbal cues is beneficial for improving the awareness of users’ actions and the sense of being together, namely social presence (Buxton, 2009; Gergle et al., 2013). In the early days, telepointers and cursors (Greenberg et al., 1996) were explored to support user awareness of others’ actions on a shared workspace in traditional platforms (e.g., desktop displays). Recently, multimodal cues are used in collaborative environments, typically using auditory and visual elements. For example, virtual avatars have been explored to represent each collaborator and to provide an increased awareness of others in the shared environment (Gutwin et al., 1996; Piumsomboon et al., 2018; Jo et al., 2016). In AR environments, researchers explored the use of virtual arrows to represent collaborators’ head directions (Chastine et al., 2007) and miniature virtual avatars to show collaborators’ gaze directions and body gestures (Piumsomboon et al., 2018). Although using avatars can contribute to users’ perceived social presence, it adds extra visual elements to the limited display and field-of-view of current AR HMDs. Thus, this solution may not be ideal when there are multiple objects in the environment. Using pointer and annotation cues is an uncluttered alternative to support social presence and collaboration in AR. Specifically, sharing gaze pointing cues has been explored in remote collaboration with AR devices (Ishii et al., 1993; Lee et al., 2017; Higuch et al., 2016). For example, Ishii et al. (Ishii et al., 1993) and Higuchi et al. (Higuch et al., 2016) reported that when users shared their workspace, gaze pointing cues provided better understanding where their partner was looking at. Lee et al. (Lee et al., 2017) and Higuchi et al. (Higuch et al., 2016) found that sharing gaze pointing cues significantly improved users’ awareness of each other’s focus and joint attention.

In addition to gaze pointing cues, augmenting hand pointing cues via gestures was shown to be able to facilitate users’ perception of others’ actions. For example, Piumsomboon et al. (Piumsomboon et al., 2018) reported that redirected gaze and gestures in their Mini-Me system improved users’ awareness of the partner in a collaborative AR interface. Yang et al. (Yang et al., 2020) stated that visual head frustum and hand gestures intuitively demonstrated the remote user’s movements and target positions. Kim et al. (Kim et al., 2019) added sketch cues in addition to gestures and demonstrated an improved task efficiency with the enhanced visual cues.

Although previous research has proven the positive effects of visual cues on supporting social presence in collaborative activities, most of these studies focused on remote AR systems. The effect of visual cues in co-located AR collaboration is still largely underexplored. It is not clear how different visual cues may enable and enhance users’ perceived social presence when they are co-located in the same physical space. Therefore, we put our focus on the study of visual cues in co-located AR environments.

2.2. Visual Cues in Collaboration

Two noticeable visual cues were presented in previous work on collaboration: pointer and annotation (Kim et al., 2013b; Huang et al., 2019; Teo et al., 2018; Erickson et al., 2020). Pointer cues provide a pointing line, indicating a user’s line of sight and focus of attention. Annotation cues record a moving track to identify a target, such as a track of hand or head positions. Here, we discuss these two types of visual cues in detail.

Pointing Line Cues

Previous work showed that pointing line cues can effectively support communication (Greenberg et al., 1996; Duval et al., 2014; Oda et al., 2015; Sousa et al., 2019; Sakata et al., 2003; Gupta et al., 2016). For example, Gupta (Gupta et al., 2016) reported that presenting users’ gaze directions using a pointer significantly improved the sense of co-presence between users in remote collaboration. Piumsomboon et al. (Piumsomboon et al., 2019) visualized three types of pointing line cues: the field-of-view frustum, eye-gaze ray, and head-gaze ray. They reported that these cues significantly improved user performance, usability, and subjective preferences. They also found that head-gaze ray was significantly less confusing to use than field-of-view.

Most of the previous research explored the effect of pointing line cues with hand gestures or eye gaze (Bai et al., 2020; Sousa et al., 2019). For hand gestures, if users’ hands are occupied, it requires users to deliberately release the object in their hand before pointing to a target. Some research used eye gaze to provide pointing line cues (Špakov et al., 2019; Blattgerste et al., 2018). They stated that eye gaze can lead to better performance and better teamwork than head gaze, i.e., the direction that a user is facing towards but not necessarily looking at. However, people’s eye gaze has a lot of micro movements. Having to keep their eyes stable when pointing at an object may lead to fatigue. Head-based techniques, on the other hand, could mitigate this issue. By providing a gaze ray from a user’s head to a target object, this can support observers’ awareness of their collaborator’s attention and allow them to view the same object (Anthes and Volkert, 2005). However, studies on head-based pointing line cues are very limited, especially in co-located AR collaboration.

Moving Track Cues

Moving track is one of the mostly studied visual communication cues (Tang and Minneman, 1991; Rekimoto and Nagao, 1995). It has been found to be more effective than pointing line cues for communicating spatial information (Kim et al., 2013b; Fussell et al., 2004). While the pointing line cues present a point around the target object, moving track cues provide users with a track, namely a series of points leading to the target, and can help guide users’ focus to key locations (Lu et al., 2021). Early research has studied the use of moving track cues on a shared video view (Tang and Minneman, 1991; Ishii et al., 1994). Researchers have explored extensively the use of moving track cues in remote collaborative physical tasks. For example, Billinghurst et al. (Billinghurst et al., 2002a) presented a method to record stabilized moving tracks in an annotation-based AR system. Teo et al. (Teo et al., 2018) reported that local users in remote collaboration felt the task was easier and more enjoyable with moving track cues, and the cues helped them understand the attention and focus of the remote user. A recent study (Kim et al., 2019) compared four combinations of visual cues provided by hand gesture, pointer and sketch. They reported that participants completed tasks faster and felt higher level of usability if sketch (moving track) cues were provided.

Although there have been studies on moving track cues, such as annotations and drawings, this research mainly used hand-based techniques and focused on remote collaboration (Huang et al., 2018b; Teo et al., 2018). Besides, previous research on moving track cues also demonstrated some limitations. Much of this research explored these cues as an annotation tool. In this case, the track traces needed to be erased after completing each step of a task (Kim et al., 2013b). Huang et al. (Huang et al., 2018b) also mentioned that moving track cues that blocked users’ view could have a negative impact on user experience and task performance. Based on these findings, we have made the moving track cues to gradually fade away and disappear in our study. This solution neither requires users to actively erase any virtual element, nor prevents users from viewing the workspace with additional visual elements.

In summary, our research is framed on previous work and compares two head-based visual techniques, pointing line cues and moving track cues, to explore their effect on pointing tasks in co-located AR collaboration.

3. System Setup

To conduct this research, we created a multi-user AR collaborative system (see Figure 1a). The system was developed with Unity (version 2019.2.0f1) and was run on the Windows platform. Each of the two users used a Meta2 connected to a laptop and with each other via a private network and Photon Unity Networking. Either laptop could be the host server. After they have been connected, one would assume the host role and the two laptops’ timers were synchronized to begin the experiment.

The equipment set up in this study consisted of (1) two Meta2 HMDs, an AR device with two optical see-through displays with a 2550 x 1440 pixel resolution, 90° field of view, and frame rate of 60 Hz. (see Figure 1a), and (2) two Windows 10 laptops each with an Intel Core i9-8950HK at 2.9 GHz, 32 GB RAM, and NVIDIA GeForce GTX 1080. This set up allows a pair of users to work in the same virtual environment using the two Meta2 HMDs and jointly do any tasks together. The system is able to collect data (e.g., completion time and accuracy).

3.1. Visual Techniques

To determine the techniques selected for this study, we first conducted a pilot study and explored both head-based and hand-based approaches, each with and without a visual cue (i.e., baseline). The results showed that head-based tended to be better in performance and user preference for pointing tasks in co-located AR (which was in line with other studies (e.g., (Hansen et al., 2004; Yang et al., 2020)). Given our pilot study results and those from the literature, we did not include a baseline technique to keep the study focused and within a reasonable time. Therefore, our study is based on two head-based visual techniques (i.e., Pointing Line and Moving Track) to compare their performance and usability under different scenarios.

The two head-gaze techniques receive input from the user’s head movement and is captured by the tracking function in the Meta2. The cursor endpoint is based on a ray cast from the head’s center toward the AR interface.

Figure 2. The two visual techniques (a) Pointing Line (PL) and (b) Moving Track (MT) with an example each.
Pointing Line (PL)

PL is shown as a red ray representing the user’s viewing direction. It is emitted from the center of the user’s head toward the user interface in AR environment. When it intersects with an object, this object becomes highlighted and can be selected. As shown in Figure 2a (upper row), when a user moves his/her head, the ray also moves in synchronicity following the head’s directional movements.

Moving Track (MT)

MT uses a different approach to show directional movement. Instead of showing the ray, it shows a trail that follows the cursor. To reduce excessive visual clutter, the technique limits the length of the trail to 10 pixels. That is, the excess part of the tail fades out and gradually disappears. As shown in Figure 2b (lower row), as the user moves their head, the trail is displayed, with end part becoming lighter and fading out gradually.

4. User Study

4.1. Participants

16 pairs of participants (17 males) between the ages of 18 to 23 (M = 20.78) were recruited from a local university for this experiment. Only 3 pairs did not know each other before participating in the experiment. All participants had normal or corrected-to-normal vision and had no issues distinguishing the colors we used in the AR device. They reported an average of 3.69 for cooperative ability on a scale from 1 (‘so bad’) to 5 (‘good’). 13 of them (41%) had limited prior experience with AR HMDs before.

4.2. Experiment Design

The experiment followed a 2 × 2 × 3 (2 Technique × 2 Object State × 3 Density) within-subjects design to study the effects of visual cues on pointing tasks in co-located AR collaboration. In this experiment, we have the following three independent variables:

  1. Technique. We tested two visual techniques: Pointing Line (PL) and Moving Track (MT). As mentioned above, PL is based on the ray-casting approach to show a line to the object on the direction of the AR HMD (see Figure 3a). MT provides a continuous trace that follows the pointer (see Figure 3b).

  2. Object State. We explored two states: Static and Dynamic. Static objects were immobile while the dynamic ones would move around. As the arrows depicted in Figure 3c indicate, dynamic objects would be randomly assigned a path without any particular pattern. In dynamic scenarios, the objects performed uniform rectilinear motion (i.e., constant velocity) with speeds between 50-120 pixels/second and random directions. The placement and movement of the objects were predefined to be same for each pair. All objects (target and non-targets) would move and were programmed to avoid each other. A virtual boundary of 120 degrees was set for the objects. As this is wider than the 90-degree FOV of the Meta2, users would need to turn their heads during the tasks.

  3. Density. The size of objects used in our study was 70 pixels in diameter. There were three levels of object density: Low (6 objects with no occlusion, Figure 3c), Medium (12 objects with slight occlusion, Figure 3d) and High (18 objects with severe occlusion, Figure 3e).

Figure 3. Two users completing the task with (a) Pointing Line (PL) and (b) Moving Track (MT) technique. There were (c) Low, (d) Medium and (e) High density for Static and Dynamic objects. Red indicates the target object; green indicates the object selected.

As a within-subjects experiment, each participant experienced all 12 conditions. Technique was counterbalanced according to a balanced Latin square design to minimize learning effects. Object State and Density were randomly and equally distributed. There were 6 trails for each condition and 72 trials in total for each pair of participants.

This experiment was classified as low risk research and was approved by the University Ethics Committee at Xi’an Jiaotong-Liverpool University (#21-01-09).

4.3. Hypotheses

Based on our literature review and design of our experiment, we postulated the following four hypotheses:

H1: PL and MT would both perform better with static targets than with dynamic objects.

H2: PL and MT would have better performance in lower density scenarios. High density situations could be crowded and have object occlusion, making it more difficult to identity and select a target.

H3: PL’s more explicit cue showing viewing direction, it would lead to improved task efficiency and accuracy than MT.

H4: In terms of subjective measures, in general, MT would be more preferred by users than PL, as MT would make it easier to track cursors movement and the user’s intention (in our case User A). As such, it would increase collaborators’ awareness and co-presence.

Figure 4. An example of all the steps involved in a typical trail.
During the selection of User A During the selection of User B
User A’s visual cue User B’s visual cue Target to be Selected User A’s visual cue User B’s visual cue Target to be Selected
User A Yes No Highlighted by System No Yes Visible (Already Selected by this User)
User B Yes No Visible but not highlighted No Yes Visible and Highlighted (Selected by User A)
Table 1. Visibility states of the visual cues of User A and User B and the target to be selected according to selection stage

4.4. Task and Procedure


For each trial, User A was required to choose the target object (see Step (1) in Figure 4, highlighted in red) by moving the cursor to the object using head motions and confirming it via a mouse click. The selected object would turn to green, which can be seen by User B. Then User B would need to locate the same object by moving the cursor and doing the confirmation. As such, the task consisted of the following steps (see Figure 4

): (1) system randomly highlighted in red one object in the scene. Only User A could see the highlighted target. At this moment, the objects in User B’s view are all grey. (2) User A would choose the object using the pre-defined technique. Once confirmed, the selected object would turn into green and both users could see this object. (3) User B was then required to locate the green object and confirm it. (4) One trial was finished and the next would begin when both users were ready to proceed.

In our experiment, the two users in each pair would do the selection in turns. We designed the experiment so that they could not see both the visual cues of their partner and their own at the same time. During selection, each user could only see one visual cue (the cue of the user needing to do the selection). In other words, during the selection of User A, only the cue of User A would be visible; while during the selection of User B, only the cue of User B would be shown. Besides, at the beginning, only User A could see the target highlighted by system. Only after User A completed the selection would User B be able to see the selected target, now selected by User A (see the Table 1 for a summary of this process). This approach was chose to minimize visual clutter and to make it more aligned with how collaboration takes in places in more realistic scenarios.


The experiment procedure was divided into 4 phases: (P1) Informing participants of the experiment goal and the ethics regulations governing it, and then completing the consent form plus a short questionnaire to collect anonymized demographic data (5 minutes); (P2) Providing participants several practice trials to let them become familiar with the AR device and the task (5 minutes); (P3) Participants completing the trials and in between conditions each participant filling in the Social Presence, and Usability questionnaires (25 minutes); and (P4) Interviewing participants to collect further feedback and comments (5 minutes). The whole experiment took about 40 minutes to complete for each pair.

Figure 5. Mean completion time (Left) and accuracy rate (Right) results in all conditions: 2 Technique (PL and MT) × 2 Object State (Static and Dynamic) × 3 Density (L: Low, M: Medium and H: High).
Figure 6. Completion Time (a) and Accuracy Rate (b) according to Technique, Object State, and Density.

4.5. Results

For the objective data analysis (completion time and accuracy rate), we employed three-way repeated ANOVAs with an alpha value of 0.05 to determine any differences across conditions and followed by pair-wise comparisons with Bonferroni correction for the data with a significant difference. We used a Greenhouse-Geisser adjustment to correct for violations of the sphericity assumption. We reported effect sizes (

). For data from the subjective questionnaires that were ordinal, non-parametric data (e.g., subjective ratings or rankings), we applied Wilcoxon signed-rank test to look for differences. For simplicity, M and SD are used to denote mean and standard deviation values.

4.5.1. Task Performance

Participants’ task completion time and accuracy rate were collected to assess their performance. We recorded the time taken by paired participants to complete each trial. Accuracy rate was measured by the number of correct trials among the total number of trials. Figure 5 shows mean time and accuracy rate for each condition (2 Technique × 2 Object State × 3 Density).

Completion Time

We found an interaction effect between Technique × Object State (F = 3.993, p = .049, = .040), Technique × Density (F = 7.461, p = .001, = .073), Technique × Object State × Density (F = 4.065, p = .019, = .041) on completion time, but no significance was found between Object State × Density (p = .758). For Technique × Object State, pairwise comparisons revealed that PL_Dynamic took significantly less time than MT_Dynamic (p = .001). PL_Static also took less time than MT_Static, but no significant difference was found (p = .149). In addition, both PL and MT techniques took significantly less time in static condition than that in dynamic one (both p ¡ .001).

For Technique × Density, PL_Medium was significantly faster than MT_Medium (p ¡ .001). Although we noticed that PL_High and PL_Low were respectively faster than MT_High and MT_Low, no significant difference was found (both p ¿ .05). Besides, PL_High was significantly slower than PL_Medium (p = .027) and PL_Low (p ¡ .001). MT_Low was significantly faster than MT_High (p = .001) and MT_Medium (p ¡ .001). For Technique × Object State × Density, PL_Dynamic_Medium took less time than MT_Dynamic_Medium (p ¡ .001). For other conditions, although using PL was faster than MT, there was not any significant difference (p ¿ .05).

Static Dynamic
Low Medium High Overall Mean Low Medium High Overall Mean
PL 97.9% (14.4%) 93.8% (24.3%) 92.7% (26.1%) 94.8% (22.3%) 95.8% (20.1%) 95.8% (20.1%) 94.8% (22.3%) 95.5% (20.8%)
MT 93.8% (24.3%) 87.5% (33.2%) 91.7% (27.8%) 91.0% (28.7%) 90.6% (29.3%) 92.7% (26.1%) 87.5% (33.2%) 90.3% (29.7%)
Table 2. Means (Standard Deviations) of accuracy rate on different conditions.
Figure 7. Users’ ratings on Social presence in all conditions (2 Techniques × 2 Object States); Significance results are highlighted in Bold; PL: Pointing Line, MT: Moving Track; S: Static, D: Dynamic; Subscales, CP: Co-presence, AA: Attention Allocation, and PMU: Perceived Message Understanding.

In addition, there was a significant effect of Technique (F = 14.559, p ¡ .001, = .133), Object State (F = 33.360, p ¡ .001, = .260), and Density (F = 15.303, p ¡ .001, = .139) on completion time. As shown in Figure 6a, post-hoc analyses indicated that using PL was significantly faster than using MT. Besides, time spent on trials with Static objects was significantly lower than with Dynamic ones. Post-hoc analyses showed that completing High and Medium density trials took significantly longer time than Low density trials (both p ¡ .001). We also found an increasing trend of the mean time spent on Low, Medium and High density tasks. More details of the main results on completion time are provided in the table in Appendix A.

Accuracy Rate

Overall, we found a high average accuracy rate for all conditions (M = 92.8%, SD = .257) in all trials (see Figure 5 (Right)). A further analysis showed that there was no significant interaction effect of Technique × Object State (p = .688), Technique × Density (p = .988), and Technique × Object State × Density (p = .410) on accuracy rate. When looking at the descriptive data, we found PL_Dynamic and PL_Static got higher rates than MT_Dynamic and MT_Static, respectively. Besides, for mean results of accuracy rate, PL performed better than MT in static trials, while MT performed better in dynamic trials. For Density, the accuracy rate of two techniques gradually decreased with the increase in density. In addition, from Figure 5 (Right), we can see that PL led to a higher rate than MT in general in any density condition. More details of all conditions on accuracy rate are shown in Table 2.

Besides, we found that there was a significant effect of Technique on accuracy rate (F = 7.355, p = .008, = .072). As shown in Figure 6b, Post-hoc analyses showed that participants with PL got significantly higher accuracy rate than ones with MT. In addition, we noticed that the rate decreased gradually with the increase in density of objects. The Static and Dynamic trials led to the same rate. However, we did not find any significant difference of Object State and Density (all p ¿ .05).

4.5.2. User Experience

Participants’ collaboration and user experience were quantified using the data from post-experiment questionnaires that contained Likert-scale questions. We collected three sets of subjective data: Social Presence, System Usability Scale, and User Preference.

Social Presence

We adapted the Social Presence Questionnaire (Harms and Biocca, 2004) according to the nature of our trials. 3 sub-scales, Co-presence (CP), Attention Allocation (AA) and Perceived Message Understanding (PMU), were used and consisted of 9 rating items on a 7-point Likert scale (1: Strongly Disagree 7: Strongly Agree). Overall, participants had a high feeling of social presence when interacting in AR (M = 5.507, SD = .895). Besides, we found that MT was rated higher than PL in all conditions for median scores (see Figure 7). A Wilcoxon signed-rank test revealed that there was a significant interaction effect of social presence between PL and MT in dynamic trials (Z = -1.993, p = .046) but no interaction significance was found in static trials (p = .180). Besides, there was a significant effect of Technique on social presence (Z = -2.618, p = .009). Pairwise comparisons revealed that overall MT got a significantly higher score than PL.

For the subscales, the Wilcoxon signed-rank test revealed a significant difference of CP on PL and MT (Z = -2.142, p = .032). Post-hoc tests showed that MT got a significantly higher score than PL (Z = -1.961, p = .050) in Dynamic trials but there was no significance in Static trials (p = .310). For AA, we found a significant difference between PL and MT (Z = -2.128, p = .033). MT got a significantly higher score than MT in Dynamic trials (Z = -1.976, p = .048), but there was no significance in Static trials (p = .163). We did not find any significance on the PMU subcale (all p ¿ .05).

Figure 8. Users’ ratings of SUS on the technique’s usability with significance (*: statistically significant); SUS score [0, 100], the higher, the better.
Figure 9. User preference of the two visual techniques.
System Usability Scale (SUS)

SUS (Brooke and others, 1996) was selected to measure and evaluate the participants’ responses towards the usability of the two visual techniques. It consisted of 10 items using 5-point Likert scales (1: Strongly Disagree 5: Strongly Agree). Participants gave an average of 78.162 (SD = 12.958) and 73.235 (SD = 14.414) for PL and MT, respectively. Figure 9 shows the users’ rating with median and significance results for all conditions. A Wilcoxon signed-rank test showed that there was a significant difference between PL and MT (Z = -2.611, p = .009). Pairwise comparisons showed a significant effect between PL and MT in Dynamic trials (Z = -2.042, p = .041). No other significance was found (all p ¿ .05).

User Preference

At the end of the experiment, we asked participants to choose their preferred technique for different types of trials (see Figure 9). Overall, MT (54.69%) was more favored by participants than PL (45.31%). For the Static_Low and Dynamic_Low trials, the proportion of MT was not higher than PL. Except for these 2 types of trials, MT got always a higher number of votes. One interesting finding was that both in Static and Dynamic trials, the higher the density of objects, more participants would choose MT.

5. Discussion

In this paper, we explored the use of two visual techniques (Pointing Line (PL) and Moving Track (MT)) for providing awareness cues during collaborative exploration in AR systems for static and dynamic objects and with different levels of object density (Low, Medium and High). Overall, results from our user study indicated that these visual cues provided benefits. We next discuss the findings in more detail.

5.1. User Performance and Usability: PL ¿ MT

When completing the trials, the density and state of objects affected the participants’ performance. This was expected and our results verified that the time needed for selecting dynamic objects was significantly longer while the accuracy was lower, which supports H1. The accuracy rate decreased with the increase of object density, which also led to increase the time spent on the tasks. In particular, the time spent on lower density trials was significantly less than on higher density trials, which is in line with our expectations and also aligns with H2.

Our results show that PL led to significantly higher performance than MT, which aligns with H3. This result aligns with the findings from Kim et al. (Kim et al., 2019) who found that a pointer could be a main visual communication cue with fast completion time. On the other hand, other studies (Fussell et al., 2004; Kim et al., 2013a) found that users completed assembly tasks faster with drawing sketch (moving track) cues than with pointing cues, which differs from our results. However, this difference can be explained with reference to the nature of the tasks in the 2 experiments. For annotations, once drawn, the moving track cues remain in the shared task space so the information is available until it is erased. This can be beneficial for assembling tasks as in the cited experiment. In our scenario, instead, keeping the moving track cues would add further visual details to an already busy AR environment and, as such, may not be so convenient (e.g., in high density cases). For completion time, specifically, PL contributed to a significant higher efficiency than MT in dynamic trials with medium density. In other conditions, PL still performed better than MT in general. In other words, despite the possibility of PL cues occluding more users’ view when objects were motion, this did not lower their performance. One participant said that ”PL was easy to control even if the objects were moving”, which could have provided some degree of counterbalance. As such, this result shows that regardless of whether objects were moving or not and the density level, PL cues seem more beneficial on task performance.

While the accuracy rate was high for both visual techniques, PL was significantly better than MT, which supports H3. For mean values, PL led to higher accuracy than MT in all conditions. It seems that PL was more intuitive when showing the position of the target that one user was watching. Some participants (N = 3) mentioned that since a pointing ray was emitted from the user’s head, they would have control over the start point and orientation of the ray, much like a physical laser pointer. Participants found MT slightly more difficult to use, which resulted in taking significantly longer time to complete the trials. In general, our results show that PL was significantly better than MT on task performance, especially for improving task efficiency in dynamic trials with medium density level of objects.

Results from SUS show that participants rated PL significantly higher than MT. It was preferred significantly more for dynamic trials. In general, participants found that showing a visual line from the head to a target allowed them to see the direction of pointing and this extra implicit information led to improved performance and higher accuracy. Participants used the words ’intuitive’ and ’natural’ to describe PL but not for MT.

5.2. Social Presence and User Preference: PL ¡ MT

We found that MT yielded significantly higher ratings on social presence than PL overall and, specifically, in dynamic trials, which supports H4. In particular, MT was rated significantly higher than PL on Co-presence and Attention Allocation for both overall and in dynamic trials. The visual tail of MT moving behind the cursor seems to have allowed users to focus and encouraged them to predict the end target. By focusing on the path, participants said that they ‘felt more connected’ with the thinking of the other user and could sense their presence better. MT’s emphasis on the trajectory path in real-time seems to have enhanced their feeling of social presence with the other user. One user mentioned that ”it is very interesting to follow the movement path”, and another one said that ”I can not only know what the target location is, but also where the focus view starts. This way, I can strongly feel my partner being with me together.

Interestingly, when asked which technique they prefer to use, MT was rated higher than PL in most conditions, except for low density. This will support our H4. Participants in general said that they prefer MT over PL but there is a general agreement that PL has a better usability, is more intuitive to use, and can help them perform faster. This also explains the findings from the SUS results. This result shows that when working with another person, the feeling of being together, of being connected (even by using a simple visual cue) with the collaborator is an important factor that affects user experience. Some prior studies seem to support our results. For example, Teo et al. (Teo et al., 2018) reported that all of their participants preferred having visual annotation cues, and also stated that participants felt that the task was easier and more enjoyable with dynamic visual cues. Their findings are aligned with our results. Unlike performance results, these results are more subjective and reflect the emotional side of working with other users, which is an important aspect for collaborative systems.

In short, the results show that PL was better on task performance and usability but MT was rated on social presence and user preference.

5.3. Design Implications of our Findings

Our results and findings point to the following implications for the design and use of visual cues in co-located AR that involves pointing tasks:

  • The objective results showed that users performed better with Pointing Lines (PL). Therefore, if the goal is to maximize task performance, especially considering efficiency and accuracy, a technique like the PL could be helpful.

  • PL is considered easy to control and seems more usable, which can transfer to improved task performance. As such, when ease of control and high usability is important, a PL-based technique could be beneficial, especially when dealing with dynamic objects within a medium level of density (slightly occluded and crowded conditions).

  • Based on users’ preference and feedback, it seems that being able to follow the movement of the partner’s viewing direction provides a higher focus and attention. In this context, they rated Moving Tracking (MT) higher than PL. Therefore, if the goal is to provide higher collaborative experience and social presence, a technique similar to MT is more suitable.

6. Limitations and Future Work

This research has some limitations, which can serve as directions for future work. As stated earlier, given the results of our pilot study and our literature review, we did not include a baseline technique as by including one it would unnecessarily lengthen our experiment unnecessarily but without providing additional insights. Future work could involve more variations and different implementations of the two visual techniques, a baseline technique can be helpful to allow pinpointing the effect of more specific aspects of visual techniques for pointing tasks in collaborative scenarios.

Although the size of our sample population is in line with some publications reporting similar experiments (e.g., (Huang et al., 2019; Piumsomboon et al., 2019)), the sample size was not very large. However, our data still had enough power and was able to show significant differences across the various conditions of the experiment, leading to some interesting findings. In the future, it will be useful to run a study with a larger sample (that has a more diverse group of participants, including pairs who are familiar and unfamiliar with each other) and see if the same results hold or new insights are found. Users in the current study have specific roles, i.e., one initiates the task and the other follows. It would be interesting to study more complex tasks where users’ roles are not clearly specified (and can switch back and forth between roles). Future work may also investigate how visual cues can facilitate communication without a specific task and in more fluid and flexible scenarios).

We did not consider the effect of position arrangement of each pair of users and cases where they are located in separate physical spaces. There may not be a significant effect with MT. However, it is not clear what the effect is with PL because it provides implicit direction information about where the user is looking at. In the future, it will be helpful to explore cases where users are moving and positioned at different locations and orientations and when they are separated physically.

In this experiment, we did not consider other types of non-visual sensorial cues, like audio and haptic feedback. While the implementation of such cues requires careful design, it is interesting and useful because AR systems are multimodal and suitable for combining several modes of sensorial feedback to create a more immersive work environment.

7. Conclusion

In this paper, we evaluated the effect of visual cues on pointing task performance, usability, and social presence in co-located AR collaboration. A user study was conducted by comparing two different visual cues (Pointing Line (PL) and Moving Track (MT)) during collaboration with Static and Dynamic tasks in Low, Medium, and High levels of object density. Based on the results of an experiment following a 2 × 2 × 3 (2 Technique × 2 Object State × 3 Density) within-subjects design with 16 pairs of participants, we found that users with PL cues performed better than with MT cues on task performance. Users were more positive about the usability of PL than MT, especially in dynamic tasks with a medium level of object density. Besides, we found that MT cues were useful for enhancing the sense of social presence and user experience when completing pointing tasks in co-located AR. Overall, our results show that PL was better on task performance and usability, while MT was better on social presence and user preference. With these findings, we discussed the implications for the design and use of visual cues for co-located AR collaboration.

The authors wish to thank the participants for their time and the reviewers for their insightful comments that have helped improve our paper. This research was funded in part by Xi’an Jiaotong-Liverpool University’s Key Special Fund (#KSF-A-03 and #KSF-A-19) and Research Development Fund (#RDF-16-02-43), the Natural Science Foundation of Guangdong Province (#2021A1515012629), and Guangzhou Basic and Applied Basic Foundation (#202102021131).


  • C. Anthes and J. Volkert (2005) A toolbox supporting collaboration in networked virtual environments. In International Conference on Computational Science, pp. 383–390. Cited by: §2.2.
  • M. Antunes, A. R. Silva, and J. Martins (2001) An abstraction for awareness management in collaborative virtual environments. In Proceedings of the ACM symposium on Virtual reality software and technology, pp. 33–39. Cited by: §2.1.
  • H. Bai, P. Sasikumar, J. Yang, and M. Billinghurst (2020) A user study on mixed reality remote collaboration with eye gaze and hand gesture sharing. In Proceedings of the 2020 CHI conference on human factors in computing systems, pp. 1–13. Cited by: §1, §2.2.
  • M. Billinghurst, A. Cheok, S. Prince, and H. Kato (2002a) Real world teleconferencing. IEEE Computer Graphics and Applications 22 (6), pp. 11–13. Cited by: §2.2.
  • M. Billinghurst, H. Kato, K. Kiyokawa, D. Belcher, and I. Poupyrev (2002b) Experiments with face-to-face collaborative ar interfaces. Virtual Reality 6 (3), pp. 107–121. Cited by: §2.1.
  • M. Billinghurst and H. Kato (1999) Collaborative mixed reality. In Proceedings of the First International Symposium on Mixed Reality, pp. 261–284. Cited by: §2.1.
  • M. Billinghurst and H. Kato (2002) Collaborative augmented reality. Communications of the ACM 45 (7), pp. 64–70. Cited by: §1.
  • J. Blattgerste, P. Renner, and T. Pfeiffer (2018) Advantages of eye-gaze over head-gaze-based selection in virtual and augmented reality under varying field of views. In Proceedings of the Workshop on Communication by Gaze Interaction, pp. 1–9. Cited by: §2.2.
  • J. Brooke et al. (1996) SUS-a quick and dirty usability scale. Usability evaluation in industry 189 (194), pp. 4–7. Cited by: §4.5.2.
  • B. Buxton (2009) Mediaspace–meaningspace–meetingspace. In Media space 20+ years of mediated life, pp. 217–231. Cited by: §2.1.
  • J. W. Chastine, K. Nagel, Y. Zhu, and L. Yearsovich (2007) Understanding the design space of referencing in collaborative augmented reality environments. In Proceedings of graphics interface 2007, pp. 207–214. Cited by: §2.1.
  • L. Chen, H. Liang, F. Lu, K. Papangelis, K. L. Man, and Y. Yue (2020) Collaborative behavior, performance and engagement with visual analytics tasks using mobile devices. Human-centric Computing and Information Sciences 10 (1), pp. 1–24. Cited by: §2.1.
  • P. Dourish and V. Bellotti (1992) Awareness and coordination in shared workspaces. In Proceedings of the 1992 ACM conference on Computer-supported cooperative work, pp. 107–114. Cited by: §2.1.
  • T. Duval, T. T. H. Nguyen, C. Fleury, A. Chauffaut, G. Dumont, and V. Gouranton (2014) Improving awareness for 3d virtual collaboration by embedding the features of users’ physical environments and by augmenting interaction tools with cognitive feedback cues. Journal on Multimodal User Interfaces 8 (2), pp. 187–197. Cited by: §2.2.
  • A. Erickson, N. Norouzi, K. Kim, R. Schubert, J. Jules, J. J. LaViola, G. Bruder, and G. F. Welch (2020) Sharing gaze rays for visual target identification tasks in collaborative augmented reality. Journal on Multimodal User Interfaces 14 (4), pp. 353–371. Cited by: §1, §1, §2.2.
  • S. R. Fussell, L. D. Setlock, J. Yang, J. Ou, E. Mauer, and A. D. Kramer (2004) Gestures over video streams to support remote collaboration on physical tasks. Human-Computer Interaction 19 (3), pp. 273–309. Cited by: §2.2, §5.1.
  • S. Gerbaud and B. Arnaldi (2008) Scenario sharing in a collaborative virtual environment for training. In Proceedings of the 2008 ACM symposium on Virtual Reality Software and Technology, pp. 109–112. Cited by: §1.
  • D. Gergle, R. E. Kraut, and S. R. Fussell (2013) Using visual information for grounding and awareness in collaborative tasks. Human–Computer Interaction 28 (1), pp. 1–39. Cited by: §2.1.
  • S. Greenberg, C. Gutwin, and M. Roseman (1996) Semantic telepointers for groupware. In Proceedings Sixth Australian Conference on Computer-Human Interaction, pp. 54–61. Cited by: §2.1, §2.2.
  • K. Gupta, G. A. Lee, and M. Billinghurst (2016) Do you see what i see? the effect of gaze tracking on task space remote collaboration. IEEE transactions on visualization and computer graphics 22 (11), pp. 2413–2422. Cited by: §2.2.
  • C. Gutwin, S. Greenberg, and M. Roseman (1996) Workspace awareness in real-time distributed groupware: framework, widgets, and evaluation. In People and Computers XI, pp. 281–298. Cited by: §2.1.
  • J. P. Hansen, K. Tørning, A. S. Johansen, K. Itoh, and H. Aoki (2004) Gaze typing compared with input by head and hand. In Proceedings of the 2004 symposium on Eye tracking research & applications, pp. 131–138. Cited by: §3.1.
  • C. Harms and F. Biocca (2004) Internal consistency and reliability of the networked minds measure of social presence. Cited by: §4.5.2.
  • K. Higuch, R. Yonetani, and Y. Sato (2016) Can eye help you? effects of visualizing eye fixations on remote collaboration scenarios for physical tasks. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 5180–5190. Cited by: §1, §2.1.
  • W. Huang, L. Alem, F. Tecchia, and H. B. Duh (2018a) Augmented 3d hands: a gesture-based mixed reality system for distributed collaboration. Journal on Multimodal User Interfaces 12 (2), pp. 77–89. Cited by: §1.
  • W. Huang, M. Billinghurst, L. Alem, and S. Kim (2018b) HandsInTouch: sharing gestures in remote collaboration. In Proceedings of the 30th Australian Conference on Computer-Human Interaction, pp. 396–400. Cited by: §1, §1, §2.2.
  • W. Huang, S. Kim, M. Billinghurst, and L. Alem (2019) Sharing hand gesture and sketch cues in remote collaboration. Journal of Visual Communication and Image Representation 58, pp. 428–438. Cited by: §1, §2.2, §6.
  • H. Ishii, M. Kobayashi, and K. Arita (1994) Iterative design of seamless collaboration media. Communications of the ACM 37 (8), pp. 83–97. Cited by: §2.2.
  • H. Ishii, M. Kobayashi, and J. Grudin (1993) Integration of interpersonal space and shared workspace: clearboard design and experiments. ACM Transactions on Information Systems (TOIS) 11 (4), pp. 349–375. Cited by: §2.1.
  • D. Jo, K. Kim, and G. J. Kim (2016) Effects of avatar and background representation forms to co-presence in mixed reality (mr) tele-conference systems. In SIGGRAPH ASIA 2016 virtual reality meets physical reality: modelling and simulating virtual humans and environments, pp. 1–4. Cited by: §2.1.
  • H. Kaufmann (2003) Collaborative augmented reality in education. Institute of Software Technology and Interactive Systems, Vienna University of Technology. Cited by: §2.1.
  • S. Kim, M. Billinghurst, and G. Lee (2018) The effect of collaboration styles and view independence on video-mediated remote collaboration. Computer Supported Cooperative Work (CSCW) 27 (3), pp. 569–607. Cited by: §1.
  • S. Kim, G. A. Lee, N. Sakata, A. Dünser, E. Vartiainen, and M. Billinghurst (2013a) Study of augmented gesture communication cues and view sharing in remote collaboration. In 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 261–262. Cited by: §5.1.
  • S. Kim, G. A. Lee, and N. Sakata (2013b) Comparing pointing and drawing for remote collaboration. In 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 1–6. Cited by: §1, §2.2, §2.2, §2.2.
  • S. Kim, G. Lee, W. Huang, H. Kim, W. Woo, and M. Billinghurst (2019) Evaluating the combination of visual communication cues for hmd-based mixed reality remote collaboration. In Proceedings of the 2019 CHI conference on human factors in computing systems, pp. 1–13. Cited by: §1, §2.1, §2.2, §5.1.
  • R. E. Kraut, M. D. Miller, and J. Siegel (1996) Collaboration in performance of physical tasks: effects on outcomes and communication. In Proceedings of the 1996 ACM conference on Computer supported cooperative work, pp. 57–66. Cited by: §1.
  • J. Lacoche, N. Pallamin, T. Boggini, and J. Royan (2017) Collaborators awareness for user cohabitation in co-located collaborative virtual environments. In Proceedings of the 23rd ACM Symposium on Virtual Reality Software and Technology, pp. 1–9. Cited by: §2.1.
  • J. LaViola, L. S. Holden, A. S. Forsberg, D. S. Bhuphaibool, and R. C. Zeleznik (1998) Collaborative conceptual modeling using the sketch framework. Cited by: §1.
  • G. A. Lee, S. Kim, Y. Lee, A. Dey, T. Piumsomboon, M. Norman, and M. Billinghurst (2017) Improving collaboration in augmented video conference using mutually shared gaze.. In ICAT-EGVE, pp. 197–204. Cited by: §2.1.
  • B. Li, R. Lou, J. Posselt, F. Segonds, F. Merienne, and A. Kemeny (2017) Multi-view vr system for co-located multidisciplinary collaboration and its application in ergonomic design. In Proceedings of the 23rd ACM Symposium on Virtual Reality Software and Technology, pp. 1–2. Cited by: §2.1.
  • X. Lu, D. Yu, H. Liang, and J. Goncalves (2021) IText: hands-free text entry on an imaginary keyboard for augmented reality systems. pp. 1–11. External Links: Document Cited by: §1, §2.2.
  • X. Lu, D. Yu, H. Liang, W. Xu, Y. Chen, X. Li, and K. Hasan (2020) Exploration of hands-free text entry techniques for virtual reality. In 2020 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 344–349. External Links: Document Cited by: §1.
  • S. Lukosch, M. Billinghurst, L. Alem, and K. Kiyokawa (2015) Collaboration in augmented reality. Computer Supported Cooperative Work (CSCW) 24 (6), pp. 515–525. Cited by: §1.
  • O. Oda, C. Elvezio, M. Sukan, S. Feiner, and B. Tversky (2015) Virtual replicas for remote assistance in virtual and augmented reality. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology, pp. 405–415. Cited by: §2.2.
  • T. Piumsomboon, A. Dey, B. Ens, G. Lee, and M. Billinghurst (2019) The effects of sharing awareness cues in collaborative mixed reality. Frontiers in Robotics and AI 6, pp. 5. Cited by: §1, §2.2, §6.
  • T. Piumsomboon, G. A. Lee, J. D. Hart, B. Ens, R. W. Lindeman, B. H. Thomas, and M. Billinghurst (2018) Mini-me: an adaptive avatar for mixed reality remote collaboration. In Proceedings of the 2018 CHI conference on human factors in computing systems, pp. 1–13. Cited by: §2.1, §2.1.
  • T. Piumsomboon, Y. Lee, G. Lee, and M. Billinghurst (2017) CoVAR: a collaborative virtual and augmented reality system for remote collaboration. In SIGGRAPH Asia 2017 Emerging Technologies, pp. 1–2. Cited by: §1.
  • J. Rekimoto and K. Nagao (1995) The world through the computer: computer augmented interaction with real world environments. In Proceedings of the 8th annual ACM symposium on User interface and software technology, pp. 29–36. Cited by: §2.2.
  • J. Roschelle and S. D. Teasley (1995) The construction of shared knowledge in collaborative problem solving. In Computer supported collaborative learning, pp. 69–97. Cited by: §2.1.
  • N. Sakata, T. Kurata, T. Kato, M. Kourogi, and H. Kuzuoka (2003) WACL: supporting telecommunications using wearable active camera with laser pointer.. In ISWC, Vol. 2003, pp. 7th. Cited by: §2.2.
  • M. Sousa, R. K. dos Anjos, D. Mendes, M. Billinghurst, and J. Jorge (2019) Warping deixis: distorting gestures to enhance collaboration. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pp. 1–12. Cited by: §2.2, §2.2.
  • O. Špakov, H. Istance, K. Räihä, T. Viitanen, and H. Siirtola (2019) Eye gaze and head gaze in collaborative games. In Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications, pp. 1–9. Cited by: §2.2.
  • Z. Szalavári, E. Eckstein, and M. Gervautz (1998) Collaborative gaming in augmented reality. In Proceedings of the ACM symposium on Virtual reality software and technology, pp. 195–204. Cited by: §2.1.
  • M. Tait and M. Billinghurst (2015) The effect of view independence in a collaborative ar system. Computer Supported Cooperative Work (CSCW) 24 (6), pp. 563–589. Cited by: §2.1.
  • J. C. Tang and S. L. Minneman (1991) VideoDraw: a video interface for collaborative drawing. ACM Transactions on Information Systems (TOIS) 9 (2), pp. 170–184. Cited by: §2.2.
  • T. Teo, G. A. Lee, M. Billinghurst, and M. Adcock (2018) Hand gestures and visual annotation in live 360 panorama-based mixed reality remote collaboration. In Proceedings of the 30th Australian Conference on Computer-Human Interaction, pp. 406–410. Cited by: §2.2, §2.2, §2.2, §5.2.
  • W. Xu, H. Liang, A. He, and Z. Wang (2019) Pointing and selection methods for text entry in augmented reality head mounted displays. pp. 279–288. External Links: Document Cited by: §1.
  • H. Yamazoe and T. Yonezawa (2014) Synchronized ar environment for multiple users using animation markers. In Proceedings of the 20th ACM Symposium on Virtual Reality Software and Technology, pp. 237–238. Cited by: §1.
  • J. Yang, P. Sasikumar, H. Bai, A. Barde, G. Sörös, and M. Billinghurst (2020) The effects of spatial auditory and visual cues on mixed reality remote collaboration. Journal on Multimodal User Interfaces 14 (4), pp. 337–352. Cited by: §2.1, §3.1.
  • D. Yu, H. Liang, X. Lu, K. Fan, and B. Ens (2019) Modeling endpoint distribution of pointing selection tasks in virtual reality environments. ACM Trans. Graph. 38 (6). External Links: Link, Document Cited by: §1.

Appendix A Appendix A: Summary Table of Main results of completion time showing significant differences between conditions.

Variable Condition Mean (SD) / s ANOVA - P Post-hoc Test Results
Technique PL 3.339 (1.103) p ¡ .001*** PL ¡ MT (p ¡ .001***)
MT 3.556 (1.205)
Object State S 3.245 (0.960) p ¡ .001*** S ¡ D (p ¡ .001***)
D 3.651 (1.299)
Density L 3.202 (1.103) p ¡ .001*** L ¡ H (p ¡ .001***)
M 3.554 (1.181) L ¡ M (p ¡ .001***)
H 3.587 (1.157) M ¡ H (P = .998)
Technique × Object State PL_S 3.195 (0.981) p = .049*
PL_D ¡ MT_D (p = .001**)
PL_S ¡ MT_S (P = .149)
PL_D 3.483 (1.197)
MT_S 3.294 (0.937)
MT_D 3.818 (1.376)
Technique × Density PL_L 3.161 (1.114) p = .001**
PL_M ¡ MT_M (p ¡ .001***)
PL_H ¡ MT_H (P = .381)
PL_L ¡ MT_L (p = .428 )
PL_M 3.392 (0.937)
PL_H 3.551 (1.200)
MT_L 3.243 (1.094)
MT_M 3.717 (1.366)
MT_H 3.622 (1.115)
Technique × Object State × Density PL_S_L 2.951 (0.894) p = .019*
PL_D_M ¡ MT_D_M (p ¡ .001***)
PL_D_H ¡ MT_D_H (p = .494)
PL_D_L ¡ MT_D_L (p = .579)
PL_S_H ¡ MT_S_H (p = .588)
PL_S_M ¡ MT_S_M (p = .103)
PL_S_L ¡ MT_S_L (p = .516)
PL_S_M 3.250 (1.006)
PL_S_H 3.385 (0.998)
PL_D_L 3.372 (1.267)
PL_D_M 3.360 (0.894)
PL_D_H 3.718 (1.357)
MT_S_L 3.024 (0.909)
MT_S_M 3.424 (0.982)
MT_S_H 3.435 (0.866)
MT_D_L 3.461 (1.218)
MT_D_M 4.184 (1.516)
MT_D_H 3.809 (1.295)
Note: Level of significance: *¡0.05, ** ¡0.01, and *** ¡0.001; Significant results are highlighted in Bold; PL: Pointing Line, MT: Moving Track; S: Static, D: Dynamic; L: Low, M: Medium, H: High.