Virtual Border Teaching Using a Network Robot System

Virtual borders are employed to allow users the flexible and interactive definition of their mobile robots' workspaces and to ensure a socially aware navigation in human-centered environments. They have been successfully defined using methods from human-robot interaction where a user directly interacts with the robot. However, since we recently witness an emergence of network robot systems (NRS) enhancing the perceptual and interaction abilities of a robot, we investigate the effect of such a NRS on the teaching of virtual borders and answer the question if an intelligent environment can improve the teaching process of virtual borders. For this purpose, we propose an interaction method based on a NRS and laser pointer as interaction device. This interaction method comprises an architecture that integrates robots into intelligent environments with the purpose of supporting the teaching process in terms of interaction and feedback, the cooperation between stationary and mobile cameras to perceive laser spots and an algorithm allowing the extraction of virtual borders from multiple camera observations. Our experimental results acquired from 15 participants' performances show that our system is equally successful and accurate while featuring a significant lower teaching time and a higher user experience compared to an approach without support of a NRS.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 9

page 12

02/12/2021

Unpacking Human Teachers' Intentions For Natural Interactive Task Learning

Interactive Task Learning (ITL) is an emerging research agenda that stud...
02/02/2019

Enabling Robots to Infer how End-Users Teach and Learn through Human-Robot Interaction

During human-robot interaction (HRI), we want the robot to understand us...
04/02/2019

VRGym: A Virtual Testbed for Physical and Interactive AI

We propose VRGym, a virtual reality testbed for realistic human-robot in...
02/25/2019

A Virtual Teaching Assistant for Personalized Learning

In this extended abstract, we propose an intelligent system that can be ...
07/28/2021

Virtual Landmark-Based Control of Docking Support for Assistive Mobility Devices

This work proposes an autonomous docking control for nonholonomic constr...
05/05/2021

LEADOR: A Method for End-to-End Participatory Design of Autonomous Social Robots

Participatory Design (PD) in Human-Robot Interaction (HRI) typically rem...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Socially aware robot navigation is an essential building block for the pervasive deployment of service robots in human-centered environments, e.g. home environments or offices. In order to allow users the interactive and flexible definition of their mobile robots’ workspace according to their needs and to allow a socially aware navigation, virtual borders turned out to be an effective method [(1)]. These are non-physical borders that are not directly visible to the user but that are respected by mobile robots during navigation tasks. Thus, the definition of virtual borders is especially relevant for users living in a human-robot shared space. Current works focus on the human-robot interaction (HRI) to define virtual borders, but these methods are limited to the robots’ on-board sensors. Hence, there is a limited interaction space for the user.

However, we recently observe a trend towards intelligent environments, e.g. smart homes or offices, and the integration of service robots into such ubiquitous spaces [(2)], [(3)]. This integration aims to overcome the robot’s limitations in perception, such as a limited field of view, by utilizing sensors from the smart environment and to provide additional interaction abilities. Due to this opportunity, we want to investigate the effect of an intelligent environment on the teaching of virtual borders. Hence, the interaction is no more limited to the human and the robot but also encompasses the intelligent environment which we refer to a network robot system (NRS). Our central scientific questions is: Can the integration of service robots into an intelligent environment improve/support the teaching of virtual borders? In this context, we characterize an intelligent environment by three main components: (1) A set of sensors, e.g. cameras or microphones, allows the perception of the environment, (2) a set of actuators, e.g. lights or loudspeakers, allows the interaction of the environment with its residents and (3) connectivity between sensors and actuators [(4)].

In this work, we contribute the following aspects to the state of the art. We investigate the effect of a NRS on the teaching process of virtual borders. Therefore, we propose (1) an interaction method based on a NRS and laser pointer as interaction device to support the process of teaching virtual borders. This interaction method encompasses an architecture for a NRS to support the teaching process, the cooperation of stationary and mobile cameras to perceive laser spots and an algorithm to extract virtual borders from multiple camera observations. We also contribute (2) a comprehensive experimental evaluation assessing the performance of the proposed interaction method compared to an interaction method without support of an intelligent environment. The evaluation comprises criteria, such as success rate, teaching time, accuracy and user experience, and is based on performances of 15 participants. This evaluation and comparison with a baseline method will give (3) an answer to the question if a NRS can improve/support the teaching process of virtual borders.

The remainder of this paper is structured as follows: in the next section, we give an overview of related works concerning virtual border teaching, the integration of robots into intelligent environments, and we point out the opportunity for our interaction method. Afterwards, we formally define a virtual border and describe how it can be integrated into a map of the environment. This is the basis for our novel interaction method that we explain in detail in Sect. 4. This interaction method is then evaluated and compared to a baseline method in the following section to answer the question regarding the usefulness of an intelligent environment in the teaching process. Finally, we conclude our work and point out work for the future based on the results of the evaluation.

2 Related Work

Commercial solutions to restrict the workspace of a mobile robot comprise magnetic stripes placed on the ground [(5)] and virtual wall systems based on beacon devices [(6)]. Despite their effectiveness, they are intrusive, power-consuming or inflexible. Moreover, there are methods employing sensors from an intelligent environment in the teaching process. Liu et al. use several bird’s eye view cameras mounted in the environment to visualize a top view of the environment on a tablet’s screen on which the user can draw arbitrary virtual borders [(7)]. However, this approach relies on a full camera view coverage of the environment and does not deal with partial occlusions, e.g. by furniture. In order to address these shortcomings, Sprute et al. introduced a framework for interactive teaching of virtual borders [(1)]. This framework is based on robot guidance allowing a user to guide a mobile robot using any interaction device, e.g. visual markers. The robot’s trajectory and a seed point are then used to specify a virtual border which is respected by the robot. In another work, the authors extend this framework by considering different types of virtual borders to make the teaching process more flexible [(8)]

. However, both approaches feature a linear teaching time regarding the border length and lack a user-friendly feedback system. Along these indirect teaching methods, i.e. the user guides the robot to define virtual borders in the environment, also direct teaching methods have been explored, e.g. drawing on a map of the environment containing occupancy probabilities 

[(9)]. These maps are created by today’s home robots and are used for navigation purposes. Although this is simple for the user, this interaction method is inaccurate because of a correspondence problem between points on the map and in the environment. A comprehensive comparison between different HRI methods for teaching virtual borders revealed augmented reality as the most powerful interface for the given task [(9)]. However, this approach requires specialized hardware, i.e. a Google Tango-enabled device, which limits the potential number of users. Nonetheless, 36% of the participants of the user study, i.e. the second largest group, also preferred a laser pointer as interaction device due to its simplicity, comfort and intuitiveness during teaching. The advantages of a laser pointer are also revealed in other tasks, such as guiding a mobile robot to a 3D location [(10)], controlling a robot using stroke gestures [(11)] and teaching visual objects  [(12)].

Another related research field, apart from the restriction of a robot’s workspace, is the integration of service robots into intelligent environments which is described by several terms that are related to each other. Kim et al. coined the term ubiquitous robots which refers to the integration of several heterogeneous components into a ubiquitous space [(13)]. For example, Sandygulova et al. developed a portable ubiquitous robotics testbed consisting of a kitchen equipped with a wireless sensor network, a camera and mobile robot [(14)] and Djaid et al. integrated a wheelchair with a manipulatable arm into an intelligent environment [(15)]. A similar concept is a network robot system that is characterized by five elements: physical embodiment, autonomous capabilities, network-based cooperation, environment sensors and actuators and human-robot interaction [(16)]. Such an environment is realized in the physically embedded intelligent systems (PEIS) project [(17)]. Other terms for this cooperation are Kukanchi [(18)] and informationally structured environment [(19)]. All these approaches show that the cooperation between robots and their intelligent environment can improve the quality of robot applications, e.g. through enhanced perception and interaction abilities [(20)]. Other applications comprise the improvement of a robot’s object search [(21)], the development of a human-aware task planner based on observations through the smart environment [(22)] and a gesture-based object localization approach for robot applications in an intelligent environment [(23)].

3 Virtual Borders

A 2D occupancy grid map (OGM) [(24)] models the physical environment, i.e. walls or furniture, in terms of discrete cells containing occupancy probabilities of the corresponding areas. These OGMs contain free, occupied and unknown areas. Based on an OGM of an environment, a navigation algorithm calculates a path from a robot’s start to a goal pose. The areas that can be reached by a mobile robot are known as workspace of a robot. In order to interactively and flexibly restrict this workspace, users can employ virtual borders. These are non-physical borders that are not directly visible to humans but are respected by robots during navigation tasks. Thus, a mobile robot changes its navigational behavior according to the user-defined restrictions. In a previous work [(25)], the authors defined a virtual border as a triple . The three components are described below:

  1. Virtual border points : These are points with organized as a polygonal chain that specify the boundaries of the virtual border on the ground plane. There are two types of virtual borders: (1) a closed polygonal chain (polygon) and (2) a simple polygonal chain (separating curve). The former one divides an environment into an inner and outer area, while the latter one does not directly partition an environment. Thus, its first and last line segments are linearly extended to the physical borders of the environment to allow the separation of the area.

  2. Seed point : This is a point on the ground plane indicating the area to be modified by the user in the teaching process.

  3. Occupancy probability : This component specifies the occupancy probability of the area to be modified as indicated by the seed point .

To integrate a virtual border into a map of the environment, we employ a map integration algorithm [(25)]. The input is a 2D OGM of the physical environment and a user-defined virtual border, and the algorithm outputs a 2D OGM containing physical as well as the user-defined virtual borders. Furthermore, this algorithm can be iteratively applied with different virtual borders , i.e. the output of the -th teaching process becomes the input of the -th teaching process. This map integration algorithm has already been proven to be correct, i.e. it changes the mobile robot’s navigational behavior, and flexible, i.e. a user can define arbitrary virtual borders [(9)]. For this reason, we also employ the algorithm in this work.

4 Virtual Border Teaching with a NRS

Due to the advantages of robots integrated into intelligent environments and the drawbacks of current solutions for the definition of mobile robots’ workspaces, we take up this opportunity of an enhanced perception and feedback ability in this work to improve the teaching of virtual borders. To this end, we compose a NRS consisting of a set of stationary and mobile cameras as depicted in Fig. 1. Stationary cameras are integrated into the environment (yellow and red fields of view) covering certain areas of the environment, while a mobile camera mounted on a robot (blue field of view) can observe areas that are not covered by the stationary cameras due to their installation or occlusions, e.g. under the table. Hence, this combination allows virtual border teaching in a NRS even if the stationary cameras do not cover all areas of the environment. A user defines virtual borders by “drawing” directly in the environment using a common laser pointer, and the laser spot is perceived by the stationary cameras in the environment and/or a camera on a mobile robot. We chose a laser pointer as interaction device because of its intuitive handling, better accuracy compared to human pointing gestures and the reasons mentioned in the related literature in Sect. 2. Additional to the cameras, we install a display and a voice control in the environment. This combination enhances the perceptual as well as the feedback abilities of the system and aims to improve the process of teaching virtual borders.

Figure 1: A person defines a virtual border in the environment using a laser pointer. The spot is observed by stationary cameras in the environment (yellow and red) and a mobile camera on a robot (blue).

4.1 Architecture

Fig. 2 illustrates the overview of our system consisting of stationary cameras integrated in the environment and mobile cameras on mobile robots. Each smart camera independently performs laser point detection in image space resulting in a 2D point for each detected laser spot. Subsequently, each point is projected into 3D world space using either a ground plane model in case of the stationary cameras or an additional depth image in case of the mobile cameras. All 3D points are then incorporated into a single 3D point set describing all laser point positions in space independent of the camera source. Finally, a polygonal chain or a seed point is extracted from the point set as it is required by the algorithm for the integration of a virtual border into a map (see Sect. 3).

Figure 2: Architecture of the network robot system for defining virtual borders based on multiple camera views.

4.2 Laser Point Localization

Since a user employs a (green) laser pointer to define virtual borders in the environment, the detection of such a laser spot is an essential aspect. For this purpose, we apply a laser point detection algorithm that is based on illumination and morphologic properties of a laser spot, i.e. circular shape, specific size and extreme brightness compared to its local environment (Sprute:2017c, )

. The result of the algorithm is the position of a laser spot in 2D image space. In order to localize a laser spot with respect to the world coordinate frame, the spot’s 3D position is determined. In case of a stationary camera, we employ a ground plane model to calculate the intersection between a ray through the laser spot pixel and the plane. In the case of a mobile camera, i.e. a RGB-D camera on a mobile robot, we use the additional depth information and camera calibration parameters to estimate the 3D position. Since these are actually points on the ground plane, they degenerate to 2D positions on the ground plane with respect to the world coordinate frame. All camera transformations with respect to the world coordinate frame are known to ensure transformations from the cameras’ coordinate frames to the world’s coordinate frame. These transformations belong to the Special Euclidean group

and are determined during installation. In case of a mobile robot, this transformation is dynamically adapted according to the localization of the robot in the environment.

4.3 Polygon Extraction

Figure 3: Processing steps extracting a polygonal chain from a point set including noise. Each row corresponds to a user-defined point set, a polygon in the first and a separating curve in the second row. The first column visualizes the input point set containing virtual border points but also noise and other clusters. The points assigned to the virtual border cluster are colored green in the second column. Thinning the virtual border cluster yields the red point set in column three. This is used to generate a polygonal chain as shown in the last column.

The resulting point set from the previous step contains data points of a single user-defined virtual border but also noisy data points and possibly additional clusters. These can occur due to errors in the laser point detection algorithm or other areas in the environment that have the same characteristics as a laser point. Furthermore, the data points of the virtual border can be spatially redundant since the points are obtained from different cameras that may have an overlap of their fields of view. Moreover, calibration inaccuracies of the cameras and localization errors of a mobile robot can lead to small deviations from a desired user-defined point. We address these challenges in multiple processing steps as depicted in Fig. 3 to robustly extract a virtual border from the data points. The figure shows a closed and simple polygonal chain in the first and second row, respectively.

Input: pointsetIn
Output: pointsetOut
Params : eps, minPts, minExp, maxExp, minSize
1 Function clustering(Input, Output, Params)
2       pointsetOut = ;
3       clusters = DBSCAN (pointsetIn, eps, minPts);
4       clusters = orderClustersBySize (clusters, minSize);
5       foreach c in clusters do
6             if minExp < expansion (c) < maxExp then
7                   pointsetOut = c;
8                   break ;
9                  
10            
11      return pointsetOut;
12      
Algorithm 1 Clustering step.

The first step is the clustering step as denoted in Algorithm 1, and it is designed to extract the user-defined virtual border components and to discard noisy data points and irrelevant clusters. The input of this step is a 2D point set as shown in the first column of Fig. 3, and the cluster points of the virtual border are the result. Due to the flexibility of a virtual border, i.e. a user can define arbitrary shapes, we applied the DBSCAN Ester:1996 algorithm for clustering the data points (l. 1). This is a density-based clustering algorithm that can find clusters with different shapes. It is parameterized by to define the distance threshold for a neighboring point and to define a core point. This is a point that has at least points within its distance . The result of the DBSCAN algorithm is a set of where each point of is assigned to a cluster. Noisy data points, that do not belong to a cluster, are discarded. Afterwards, we order the clusters by their sizes (number of points) in descending order (l. 1) to iterate over the clusters beginning with the largest cluster (l. 1ff.). The additional parameter is a lower threshold for the size of a cluster to exclude small clusters due to noise. In each iteration, it is checked if the expansion of the current cluster lies within the expansion thresholds defined by and to ignore irrelevant clusters (l. 1). The first cluster that fulfills this condition is returned as cluster of the virtual border. Thus, the algorithm selects the largest cluster with certain expansion characteristics. The result is visualized in the second column of Fig. 3 as green points. The black points are either noise or irrelevant clusters.

Input: pointsetIn
Output: pointsetOut
Parameters : maxNeighborDist
1 Function thinning(Input, Output, Params)
2       pointsetOut = ;
3       while true do
4             p = getPointWithMostNeighbors (pointsetIn, maxNeighborDist);
5             n = getNeighbors (p, maxNeighborDist);
6             if n  then
7                   mean = getMean (p n);
8                   pointsetIn = pointsetIn {p n};
9                   pointsetOut = pointsetOut mean;
10                  
11            else
12                   break ;
13                  
14            
15      pointsetOut = pointsetOut pointsetIn;
16       return pointsetOut;
17      
Algorithm 2 Thinning step.

The second step, i.e. the thinning step (Algorithm 2), is designed to remove spatially redundant data points and to smooth data points due to localization errors and calibration inaccuracies. Its input is the largest cluster from the previous step , and the result is a thinned cluster . This is an iterative algorithm that looks for spatially nearby data points and replaces them by their mean value. For this purpose, the point with most neighbors within a distance is selected (l. 2) and its neighboring points are determined (l. 2). If there is at least one neighboring point (l. 2), the mean point is calculated for these points  (l. 2). Afterwards, these points are removed from the initial  (l. 2) and the mean point is added to  (l. 2). In case that no data point has at least one neighboring point (l. 2), the iterative procedure terminates. Finally, all remaining points contained in , i.e. points without neighbors, are added to  (l. 2), i.e. the set containing the thinned points. Thus, the thinned cluster includes the initial points, that do not have neighboring points, and mean points representing a subset of the initial points. The result is shown as red points in the third column of Fig. 3. Compared to the second column containing the virtual border cluster as green points, there are fewer points since neighboring points are summarized by their mean.

Input: pointsetIn
Output: polygon
Parameters : maxNeighborDist
1 Function polyGeneration(Input, Output, Params)
2       temp1, temp2 = ;
3       forward = true;
4       p = pointsetIn(0);
5       setMarked (p);
6       temp1 = concat (temp1, p);
7       while not allMarked (pointsetIn) do
8             n = getNearestUnmarkedNeighbor (p, maxNeighborDist);
9             if n  then
10                   if forward then
11                         temp1 = concat (temp1, n);
12                        
13                  else
14                         temp2 = concat (temp2, n);
15                        
16                  setMarked (n);
17                   p = n;
18                  
19            else
20                   if forward then
21                         p = pointsetIn(0);
22                         forward = false;
23                        
24                  else
25                         break;
26                  
27            
28      reverse (temp1);
29       polygon = concat (temp1, temp2);
30       return polygon;
31      
Algorithm 3 Polygon generation step.

Finally, the thinned point set is the input for the last step in which the polygonal chain is generated (Algorithm 3). This algorithm consists of two phases. First, we select an arbitrary point of as starting point and collect neighboring points in one direction. If there is no more neighboring point available, we again select our starting point and collect neighboring points in the other direction. Afterwards, the selected points are concatenated. To this end, we first initialize two empty polygonal chains and  (l. 3), and we set the variable , that indicates the phase of the algorithm, to true (l. 3). We then select an arbitrary point (here at index 0) of , mark it and append it to  (l. 3ff.). Afterwards, the nearest neighboring point within a distance , that it not already marked, is selected (l. 3). If there is a neighboring point available (l. 3), we append to one of the temporary polygonal chains depending on the variable , mark and select as the current point  (l. 3ff.). This procedure is repeated until there is no neighboring point available for the current point  (l. 3), i.e. the neighboring points for the first direction are collected. In this case, we select our initial point again as current point and switch the variable to collect neighboring points along the other direction (l. 3ff.). Subsequently, the same procedure is performed until there is again no neighboring point available (l. 3) or all points of are marked (l. 3). As a last step, the order of the temporary polygonal chain is reversed (l. 3), and is appended resulting in the final polygonal chain (l. 3). The result is visualized in the last column of Fig. 3 where the red data points, i.e. the result from the thinning step, are connected by a polygonal chain.

We use the parameter values shown in Tab. 1 for the algorithms. These were determined experimentally. Since the parameter values also depend on the robot’s state, i.e. which component of a virtual border is defined, there are two columns in the table. The states differ in the clustering step because the expansion of the cluster for a seed point can be smaller than for the virtual border points . Moreover, the thinning and polygon generation steps are not performed if the user specifies a seed point . Instead the mean value of the resulting points from the clustering step is calculated.

Step Parameter Border Seed
Clustering eps 0.5 m 0.5 m
minPts 1 1
minExp 0.3 m 0.05 m
maxExp + m 5.0 m
minSize 10 10
Thinning maxNeighborDist 0.1 m -
Polygon generation maxNeighborDist 0.5 m -
Table 1: Parameter values for the algorithms depending on the robot state.

4.4 Human-Robot-Environment Interaction

A user interacts with our system using multimodal interaction. The primary interaction device is the laser pointer to specify virtual borders. As described in Sect. 3, a virtual border consists of three components. Therefore, our system also reflects this in different internal states that are described as follows:

  • Default: The system is in an inactive state and ignores all user interactions.

  • Border: The system recognizes laser spots that are used for the definition of virtual border points . If the stationary cameras perceive a user’s laser spot, the system automatically sends a mobile robot to this area. Thus, the mobile robot autonomously navigates to this area and can act as mobile camera if the stationary cameras lose track of the laser spot.

  • Seed: The system recognizes laser spots and calculates the seed point . Similar to the Border state, a mobile robot simultaneously moves to the laser spot position if a stationary camera perceives a laser spot. The seed point indicates the keep off area, i.e. .

  • Guide: This state is similar to the Default state, but the user can guide the mobile robot using the laser pointer. The state should be never reached when at least one of the stationary cameras’ fields of view covers a part of the virtual border. However, we incorporate this state in our interaction method to ensure its functionality in case of an absence of stationary cameras to guide the robot to the virtual border. Hence, the system degenerates to a system without support of a NRS.

To interact with the system, a user can employ speech commands, e.g. to switch between the different states. Our supported user events are listed below:

  • Define border: This command is used to start the definition of virtual border points , thus switching to state Border.

  • Define seed: This command is used to start the specification of a seed point , thus switching to state Seed.

  • Guide robot: The system’s internal state switches to Guide so that a user can guide the mobile robot using the laser pointer.

  • Save: This command is employed when a user wants to integrate and save his or her user-defined virtual border into the map of the environment.

  • Cancel: If a user does not want to save his or her user-defined virtual border, he or she can cancel the teaching process. Hence, the internal state changes to Default.

Furthermore, we provide multimodal feedback to increase the user experience of the system. For example, the change of an internal state of the system, e.g. a switch from the Default to Border state, is signalized with sound feedback (beep tone) and a color change of the mobile robot’s on-board LED. Besides, if the user is in state Border or Seed, the system also provides a sound feedback (beep tone) whenever a laser spot is detected. Hence, a user can easily notice if a laser spot is detected and used for virtual border definition. Although a laser spot provides inherent visual feedback, we additionally integrate a display into the environment showing the 2D OGM of the environment. The map also visualizes the virtual borders allowing a direct feedback of the teaching result to the user.

5 Evaluation

We evaluated our method with respect to the success rate, teaching time, accuracy and user experience. To this end, we conducted an experiment with 15 participants (11 male, 4 female) with a mean age of

years and standard deviation of

years. Participants rated their experience with robots on a 5-point Likert scale as and .

5.1 Setup

The experiment was performed in an 8 m 5 m lab environment comprising free space, walls and furniture, such as tables and chairs. Additionally, we integrated adjustable walls with a height of 0.5 m to model two different rooms in the environment. Besides, we placed a carpet on the ground and some dirt in one area of the environment as basis for the evaluation scenarios. An image of the environment is shown in Fig. 3(a). As a mobile robotic platform, we employed a TurtleBot v2 equipped with a laser scanner for localization and a front-mounted RGB-D camera. The color images were captured with a resolution of 640  480 pixels, and the depth sensor had a resolution of 160 120 pixels. Additionally, the robot had a colored on-board LED, three push buttons and a buzzer. In order to take advantage of an intelligent environment, we mounted three RGB cameras with an image resolution of pixels on the ceiling (2.95 m height, pitch angle of 90). Thus, they provided top views of the environment. Their fields of view are shown in Fig. 3(b). All RGB cameras were calibrated, i.e. their intrinsic camera parameters are known, and their relative transformations with respect to the world coordinate frame were determined. We refer to theses stationary cameras as red, green and blue cameras. In order to allow a participant to use speech commands, we relied on a Wizard of Oz method in which a human operator reacted on the speech commands, i.e. switching between system’s states per remote control. We did not use a voice-controlled intelligent personal assistant, such as Amazon’s Alexa or Google Assistant, due to network restrictions in the university’s network. However, this method did not change the way in which a participant interacted with the system. Moreover, a 22 inch display served as additional feedback device. The display showed the current OGM and virtual borders if defined by the user. A prior OGM of the environment was created beforehand using a Simultaneous Localization and Mapping (SLAM) algorithm running on the mobile robot Grisetti:2007 . Finally, the mobile robot was localized inside the environment using the adaptive Monte Carlo localization approach Fox:2003 . This allows the robot to determine its pose with respect to the world coordinate frame.

(a)
(b)
Figure 4: (a) shows an image of the lab environment consisting of two areas separated by an adjustable wall with a door. (b) depicts a 3D sketch of the environment visualizing the three cameras of the intelligent environment and their fields of view (red, green and blue).

5.2 Tools

We implemented all components of the system as ROS nodes Quigley:2009 . ROS is a modular middleware architecture that allows communication between several components of a system, that are called nodes. ROS is the de facto standard for robot applications and provides a large set of tools to accelerate prototyping and error diagnosis. We organized all our nodes in packages to allow the easy distribution of our implementation. In order to save resulting maps of a teaching process for evaluation purposes, we implemented a node that directly stores the map on the hard disk whenever a new map is defined by a user. Furthermore, we used integrated time functionality of ROS to perform time measurements for the evaluation.

5.3 Procedure

The participants were asked to define virtual borders in three different scenarios that are typical use cases of our interaction method. They cover all supported border types. These scenarios were introduced to each participant and are described as follows:

  1. Room exclusion: The user wants the mobile robot to not enter a certain room due to privacy concerns. For this purpose, he or she draws a line separating the room from the rest of the environment. This area is in the fields of view of the red and green camera. The length of the polygonal chain is 0.70 m long, and the area has a size of approximately 8.00 m.

  2. Carpet exclusion: The user wants the mobile robot to circumvent a carpet area (2.00 m 1.25 m) while working. To this end, he or she draws a polygon around the carpet and specifies the inner area as keep off area. This area is in the fields of view of the green and blue camera.

  3. Spot cleaning: The user wants his or her vacuum cleaning robot to perform a spot cleaning in a corner of a room. Hence, he or she draws a separating curve around the area and specifies the rest of the room as keep off area. This dirty area is indicated by paper snippets on the ground and is covered by the blue camera. The polygonal chain has a length of 3.60 m and encompasses an area of 3.20 m.

For each scenario, a ground truth map was manually created beforehand as shown in the first row of Fig. 7 to evaluate the participants’ performances concerning their accuracy. The ground truth maps contain the physical environment as well as the virtual borders.

At the beginning of each run of the experiment, an experimenter showed a participant how to employ the interaction method, i.e. a short introduction into the different states of the system, how to switch between states and how to use the laser pointer. Afterwards, a participant had some time to get familiar with the interaction device before the first run started, i.e. the participant could use the laser pointer and guide the robot. The robot’s pose at the beginning of each scenario was set to a predefined pose to allow the comparison between the results. This robot’s pose is shown in the bottom right of Fig.3(b). The shortest paths to the virtual borders in the three scenarios were between 2.50 m and 5.40 m (scenario 1: 5.40 m, scenario 2: 2.50 m and scenario 3: 3.00 m). Every participant defined a virtual border with two different interaction methods:

  1. Robot only: This is the baseline approach similar to Sprute et al. Sprute:2017c , and it only employs a single TurtleBot v2 as described in Subsect. 5.1. A user directly interacts with the robot’s on-board camera by showing the laser spot in the cameras’ fields of view. The robot follows this laser spot using visual servoing technique so that a user can guide the robot with the laser pointer. In order to switch between the different states of the system, the user can push buttons on the backside of the robot. Feedback about the system’s current state is provided through on-board colored LEDs and sound when switching between states. Additionally, beep tones are uttered when the system acquires virtual border points  or a seed point .

  2. Network robot system: This is our proposed approach based on a NRS described in Sect. 4. Additional to the mobile robot as in the baseline interaction method, the NRS features stationary cameras in the environment as additional sensors to perceive the laser points. A voice control also allows switching between system’s states using voice commands. As additional feedback device, apart from colored LEDs and sound, acts a display integrated into the environment that shows a 2D OGM of the environment and the user-defined virtual borders.

This procedure resulted in six runs per participant (three scenarios and two interaction methods) and 90 runs in total. After performing the practical part of the experiment, each participant was asked to answer a post-study questionnaire concerning his or her user experience with the interaction methods. Since we used a within-subjects design, a participant could compare both interaction methods. The questionnaire comprised general questions about age, gender and experience with robots and the following statements about the experiment (translated from German). These could be rated on a 5-point Likert scale:

  1. I had problems to define the virtual borders
    (1 = big problems, 5 = no problems)

  2. It was intuitive to define the virtual borders
    (1 = not intuitive, 5 = intuitive)

  3. It was physically or mentally demanding to define the virtual borders (1 = hard, 5 = easy)

  4. It was easy to learn the handling of the interaction method (1 = hard, 5 = easy)

  5. I liked the feedback of the system
    (1 = bad/no feedback, 5 = good feedback)

Furthermore, the participants were asked if the intelligent environment supported the teaching process. They could answer the question with yes or no.

During the experiment, an experimenter measured the time needed for teaching the virtual borders. The time started with the change from the Default state to one of the active states and ended with the integration of the virtual border into the prior map. The resulting posterior map was saved after each run and associated with its ground truth map for evaluation of the accuracy (see Sect. 5.6). Additionally, the experimenter documented the success rate that indicates if the participant successfully defined the virtual border. An experiment with a single participant took approximately 20 minutes in total.

5.4 Success Rate

The first criterion for the evaluation is the success that indicates if a participant could correctly define a desired virtual border using one of the interaction methods. Thus, a high success rate shows that participants could correctly employ the interaction method.

5.4.1 Method & Metric

We register a successful run if a participant could correctly define the virtual border points (independent of their accuracy) and could correctly specify a seed point . Otherwise, we assessed a run as unsuccessful. In order to determine the success rate, we related the number of successful runs per scenario and interaction method to the total number of runs for this setting, i.e. 15 participants’ runs.

5.4.2 Results

Figure 5: Success rate for both interaction methods depending on the scenarios.

The success rates are summarized in Fig. 5 as bars. Both interaction methods feature the same high success rate, i.e. 91.1% on average. Furthermore, the success rates for the three scenarios are the same (93.3% for scenarios 1 and 2; 86.7% for scenario 3). There were nine participants who performed all their runs successfully, four participants who failed for one of their six runs, and two participants incorrectly defined a virtual border in two of their six runs. The reason for these incorrect runs was the definition of the seed point . While some participants were confused where to specify the seed point , especially in scenario 3, other participants were unconcentrated and noticed their mistake on their own after performing the experiments. There were no problems with the definition of the virtual border points .

5.5 Teaching Time

Another criterion is the teaching time which is an important indicator for the usability of the interaction method. A small teaching time is desired to reduce the effort that is necessary to define a mobile robot’s workspace. A high teaching time increases a user’s effort and simultaneously decreases his or her acceptance of the system.

5.5.1 Method & Metric

The time was measured during the experiment for each run. Thus, we obtained six measurements for each participant representing the teaching times for each scenario and interaction method. To identify reasons for potential time differences between the interaction methods, we decomposed a time measurement according to the three active states of the system, i.e. guiding the mobile robot (), defining virtual border points () and specifying the seed point ().

5.5.2 Results

Figure 6: Teaching time for both interaction methods (blue and red contours around bars) based on the scenarios. Each bar is subdivided into the three active states of the system.

The teaching times of the runs are summarized in Fig. 6 depending on the scenario. Each bar comprises the measurements of all 15 participants for an interaction method and scenario. The teaching time for the NRS is significantly lower in all scenarios compared to the approach without support of an intelligent environment. This results in speedups of 2.8, 2.2 and 2.2 for scenarios 1, 2 and 3, respectively. The reason for this significant difference is revealed by the decomposition of the time measurements. While the time for the Robot only approach is composed of all active states of the system, our NRS approach does not include the Guide state. This is because the user directly interacts with the environment, and the laser spot is detected by the stationary cameras mounted in the environment. Simultaneously, the mobile robot autonomously navigates to the drawing area of the user to act as additional mobile camera. This is especially important when the stationary cameras cannot detect a laser spot due to their installations or occlusions, e.g. by furniture or the user himself or herself. Therefore, the user does not explicitly interact with and guide the robot as with the Robot only approach. Thus, the NRS approach can avoid the time in the Guide state. This time primarily depends on the distance between the robot’s start pose and the virtual borders. As described in Subsect. 5.3, the distances in our scenarios ranged from 2.5 m and 5.4 m. If the distances would be smaller, the time in the Guide state would also decrease. However, we think that we chose the distances quite liberally since much larger distances would be even realistic, e.g. in large households with a charging station for the robot. Another reason for the time difference between the interaction methods is the time in the state Border which is linear with respect to the border length Sprute:2017c . Thus, if the user-defined virtual border is short, e.g. 0.70 m for scenario 1, our NRS approach is only slightly faster in this state, i.e. 4 seconds difference. But if we consider a larger virtual border, e.g. the 6.50 m long border around the carpet (scenario 2), our approach is even 26 seconds faster on average. The reason for this is the mobile robot’s velocity limitation (0.2 ) to allow a safe and smooth guiding of the robot. With our NRS approach, the user is also limited, e.g. by the frame rate of the cameras for laser point detection, but not that strong as with the Robot only approach. Hence, our interaction method also features a linear teaching, but with a smaller gradient compared to the baseline method. The time differences in the Seed state only differ minimally because they are independent of the border length or robot characteristics. Thus, this state’s influence on the overall time measurement is small.

5.6 Accuracy

Figure 7: Visualization of representative accuracy results. The first row shows ground truth maps for all three scenarios containing occupied (black), free (white) and unknown (gray) areas. The yellow cells indicate the area to be defined by the user during the teaching process. The robot’s start pose is visualized as a red arrow. The following rows visualize the qualitative accuracy results for both interaction methods. Green pixels indicate the overlap of ground truth and user-defined areas (GT UD), while red pixels show areas defined by the user but not contained in the ground truth map (UD GT). A blue contour surrounds the union set of both areas (GT UD). Colors are only used for visualization.

This criterion describes how accurate a user defines a virtual border. It is especially important for precise tasks, such as spot cleaning or cleaning around a forbidden area. This accuracy depends on the users’ performances and on the proposed algorithm that combines the detected laser spots from different camera views and extracts a polygonal chain.

5.6.1 Method & Metric

We assessed the accuracy of a participant’s performance by calculating the Jaccard similarity index (JSI) between a user-defined and its associated ground truth virtual border:

(1)

A ground truth virtual border contains all cells in the OGM that belong to a manually created virtual border for each of the three scenarios. These are the yellow cells in the first row of Fig. 7. In contrast to this, user-defined virtual borders are the virtual borders that have been defined in the teaching processes by the participants. They are visualized as green and red cells in the following rows. Green cells indicate the correctly () and red pixels the incorrectly () user-defined areas. The blue contour comprises the union set of both virtual borders . Hence, the JSI can be interpreted visually as the fill degree of the green cells with respect to the cells encompassed by the blue contour.

5.6.2 Results

Figure 8: Quantitative accuracy results for both interaction methods depending on the scenarios.

Six qualitative accuracy results are depicted in Fig. 7. These are representative examples of different users whose accuracy values coincide with the overall mean per scenario and interaction method. The accuracy for scenario 1 and 3 is similar, but there is a difference for scenario 2. Our interaction method features a higher fill degree of the green area with respect to the area surrounded by the blue contour compared to the Robot only approach. These observations are confirmed by the JSI values presented in Fig. 8. There are no significant differences for scenario 1 and 3, but our approach is more accurate compared to the baseline approach for scenario 2. A reason could be that it is harder to specify the carpet’s corners when guiding the robot compared to our NRS approach. The highest accuracy value is achieved in scenario 1 because the user excludes a relatively large area compared to the length of the virtual border. Thus, there is only minimal room for errors. The overall mean accuracies for the NRS and Robot only approach are 90.0% and 87.7%, respectively, and they do not differ significantly.

5.7 User Experience

Finally, we evaluated the user experience to reflect the subjective opinion of the participants concerning our proposed interaction method. Therefore, it is an important point for the acceptance of the system. This criterion comprises several aspects related to the user experience, such as problems, intuitiveness, demand, learnability and feedback.

5.7.1 Method & Metric

In order to assess these aspects, we considered the participants’ answers of the questionnaire as introduced in Subsect. 5.3

. We ran paired t-tests on the 15 answers per statement to determine whether there were statistically significant differences between the mean values of both interaction methods. The test checks the null hypothesis that both means are the same. Therefore, we report the values of the corresponding

-distribution and the resulting -value. The -distribution depends on the sample size , and the -value gives the probability of observing the test results under the null hypothesis. We chose a significance level of . Thus, if , we reject the null hypothesis and accept the alternative hypothesis, i.e. both mean values are not the same. We also calculated Cohen’s as indicator of the effect size:

(2)

5.7.2 Results

Figure 9: User experience for both interaction methods showing mean and standard deviation depending on the statements.

The participants’ answers to the questionnaire are visualized in Fig. 9 with their mean () and standard deviation (

) per statement and interaction method. Since there were no outliers in the data, running paired t-tests was valid. We found statistically significant differences for all statements in the questionnaire:

  • S1:

  • S2:

  • S3:

  • S4:

  • S5:

Our proposed interaction method based on a NRS is better rated by the participants for all statements compared to the baseline approach. Participants had significantly less problems defining the virtual borders with a NRS () than without support of an intelligent environment (). A reason could be that some participants had problems to rotate the mobile robot around its vertical axis, e.g. for the corners of the carpet (scenario 2). In this case, they moved the laser spot too fast so that the robot’s on-board camera could not follow the spot on the ground. In contrast to this, our approach avoids this problem by the additional cameras in the environment. Participants also found our approach more intuitive ( and ) because they could more directly interact with the environment, i.e. they did not need to guide the robot explicitly. Furthermore, the speech commands provided a more intuitive interaction medium to change the system’s internal state than pushing on the robot’s buttons. The strongest effect was measured for statement 3 that shows that our approach is less physically or mentally demanding () compared to the baseline approach (). This coincides with the results of the teaching time as reported in Subsect. 5.5. Only a medium effect was observed for the learnability of the interaction methods but with a higher rating for the NRS approach ( and ). Since both interaction methods are based on a laser pointer as interaction device, the handling of a laser pointer does not influence the learnability. However, it could be easier to learn the speech commands than the assignments of the buttons to change between states of the system. Furthermore, the guiding of the mobile robot could affect the rating. Finally, there is a significant difference for the feedback of the interaction method ( and ). Since the feedback system of the NRS is a superset of the baseline’s feedback system, a major reason for this difference is the additional display that shows the OGM with the user-defined virtual borders. This is missing for the Robot only approach that only features sound and colored LED feedback. The answers to the question if the intelligent environment supported the teaching process are summarized in Fig. 10. Most of the participants (14 out of 15) felt that the NRS supported the teaching process.

Figure 10: Answers of the participants to the question if the intelligent environment supported the teaching process.

5.8 Discussion

We designed our experiment according to the idea that most of the environment is covered by stationary cameras and that areas, that are not covered by the stationary cameras, are observed by a camera on a mobile robot. In this case, our results show that a NRS cannot significantly improve the success rate and accuracy compared to an approach without support of an intelligent environment. Nonetheless, the overall success rate of 91.1% and mean accuracy of 90.0% are high. Apart from these criteria, the results suggest a significant reduction of the teaching time and an increase of the user experience. In case of the user experience, participants appreciated aspects, such as a reduced mental or physical demand during teaching and an improved feedback of the system. In case of the teaching time, we reach speedups of a factor of more than two, but this is strongly influenced by the distance between the robot’s start pose and the virtual border. Therefore, it is not possible to report a specific speedup value. If we would decrease the number of cameras in the environment, thus only a minority of the environment would be covered by static cameras, this would result in only a minor reduction of the teaching time. Finally, this would degenerate into the Robot only approach.

6 Conclusion & Future Work

In this work, we answered the question whether an intelligent (indoor) environment can support the teaching process of virtual borders. To this end, we developed an interaction method based on a NRS and laser pointer as interaction device. This NRS was designed to support the teaching process in terms of perception and feedback abilities. Especially, the cooperation of multiple stationary cameras integrated into the environment and a mobile camera on a robot neutralized the mobile robot’s perceptual limitations during the user interaction. Furthermore, we proposed an algorithm to extract user-defined virtual borders from these multiple camera observations. During an experimental evaluation of the interaction method and a comparison with a baseline interaction method, that does not build on the support of an intelligent environment, we found out that a NRS can significantly reduce the teaching time and increase the user experience. Simultaneously, the proposed interaction method is equally successful and accurate compared to the baseline. Hence, we conclude that an intelligent environment can improve the teaching of virtual borders in terms of a reduced teaching time and an increased usability.

As a result of the user study, future work should focus on a more sophisticated feedback of the NRS. Although our interaction method already achieved good ratings regarding this aspect, participants wished an even stronger feedback, e.g. projectors in the environment to directly visualize the user-defined virtual border instead of a map shown on a display. Moreover, we currently evaluated the NRS with a single mobile robot which is valid for most households. However, it would be interesting to incorporate multiple robots to increase the number of mobile cameras. Our proposed architecture already considers this aspect (Fig. 2), but more work on the cooperation between multiple mobile robots in such a scenario needs to be done.

References

  • (1) D. Sprute, R. Rasch, K. Tönnies and M. König, A Framework for Interactive Teaching of Virtual Borders to Mobile Robots, in: 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2017, pp. 1175–1181.
  • (2) K. Qian, X. Ma, X. Dai and F. Fang, Knowledge-enabled decision making for robotic system utilizing ambient service components, Journal of Ambient Intelligence and Smart Environments 6(1) (2014), 5–19.
  • (3) T. Zhang, B. Yuan, T. Meng, Y. Ren, H. Liu and X. Wang, Ubiquitous Robot: A New Paradigm for Intelligence, in: Intelligent Data Engineering and Automated Learning – IDEAL 2016, 2016, pp. 202–211.
  • (4) J.C. Augusto, V. Callaghan, D. Cook, A. Kameas and I. Satoh, Intelligent Environments: a manifesto, Human-centric Computing and Information Sciences 3(12) (2013).
  • (5) Neato, Boundary Markers for Neato Botvac Series, 2018, [Access: 07/18].
  • (6) T.-Y. Chiu, Virtual wall system for a mobile robotic device, 2011, European Patent Application EP000002388673A1.
  • (7) K. Liu, D. Sakamoto, M. Inami and T. Igarashi, Roboshop: Multi-layered Sketching Interface for Robot Housework Assignment and Management, in: SIGCHI Conference on Human Factors in Computing Systems, 2011, pp. 647–656.
  • (8) D. Sprute, K. Tönnies and M. König, This Far, No Further: Introducing Virtual Borders to Mobile Robots Using a Laser Pointer, 2017, arXiv:1708.06274, [Currently under review].
  • (9) D. Sprute, K. Tönnies and M. König, A Study on Different User Interfaces for Teaching Virtual Borders to Mobile Robots, International Journal of Social Robotics (2018).
  • (10) C.C. Kemp, C.D. Anderson, H. Nguyen, A.J. Trevor and Z. Xu, A Point-and-click Interface for the Real World: Laser Designation of Objects for Mobile Manipulation, in: Proceedings of the 3rd ACM/IEEE International Conference on Human Robot Interaction, 2008, pp. 241–248.
  • (11) K. Ishii, S. Zhao, M. Inami, T. Igarashi and M. Imai, Designing Laser Gesture Interface for Robot Control, in: Human-Computer Interaction – INTERACT 2009, 2009, pp. 479–492.
  • (12) P. Rouanet, P.Y. Oudeyer, F. Danieau and D. Filliat, The Impact of Human-Robot Interfaces on the Learning of Visual Objects, IEEE Transactions on Robotics 29(2) (2013), 525–541.
  • (13) J.H. Kim, K.H. Lee, Y.D. Kim, N.S. Kuppuswamy and J. Jo, Ubiquitous Robot: A New Paradigm for Integrated Services, in: Proceedings 2007 IEEE International Conference on Robotics and Automation, 2007, pp. 2853–2858.
  • (14) A. Sandygulova, M. Dragone and G.M. O’Hare, Privet–a portable ubiquitous robotics testbed for adaptive human-robot interaction, Journal of Ambient Intelligence and Smart Environments 8(1) (2016), 5–19.
  • (15) N.T. Djaid, S. Dourlens, N. Saadia and A. Ramdane-Cherif, Fusion and fission engine for an assistant robot using an ontology knowledge base, Journal of Ambient Intelligence and Smart Environments 9(6) (2017), 757–781.
  • (16) A. Sanfeliu, N. Hagita and A. Saffiotti, Network Robot Systems, Robotics and Autonomous Systems 56(10) (2008), 793–797.
  • (17) A. Saffiotti, M. Broxvall, M. Gritti, K. LeBlanc, R. Lundh, J. Rashid, B.S. Seo and Y.J. Cho, The PEIS-Ecology project: Vision and results, in: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2008, pp. 2329–2335.
  • (18) N.S.M. Nor and M. Mizukawa, Robotic Services at Home: An Initialization System Based on Robots’ Information and User Preferences in Unknown Environments, International Journal of Advanced Robotic Systems 11(7) (2014), 112.
  • (19) Y. Pyo, K. Nakashima, S. Kuwahata, R. Kurazume, T. Tsuji, K. Morooka and T. Hasegawa, Service robot system with an informationally structured environment, Robotics and Autonomous Systems 74(Part A) (2015), 148–165.
  • (20) P. Simoens, M. Dragone and A. Saffiotti, The Internet of Robotic Things: A review of concept, added value and applications, International Journal of Advanced Robotic Systems 15(1) (2018).
  • (21) D. Sprute, A. Pörtner, R. Rasch, S. Battermann and M. König, Ambient Assisted Robot Object Search, in: 15th International Conference on Smart Homes and Health Telematics (ICOST), 2017, pp. 112–123.
  • (22) M. Cirillo, L. Karlsson and A. Saffiotti, Human-aware Task Planning: An Application to Mobile Robots, ACM Trans. Intell. Syst. Technol. 1(2) (2010), 15–11526.
  • (23) D. Sprute, R. Rasch, A. Pörtner, S. Battermann and M. König, Gesture-Based Object Localization for Robot Applications in Intelligent Environments, in: 2018 International Conference on Intelligent Environments (IE), 2018, pp. 48–55.
  • (24) H. Moravec and A. Elfes, High resolution maps from wide angle sonar, in: Robotics and Automation. Proceedings. 1985 IEEE International Conference on, Vol. 2, 1985, pp. 116–121.
  • (25) D. Sprute, K. Tönnies and M. König, Virtual Borders: Accurate Definition of a Mobile Robot’s Workspace Using Augmented Reality, in: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)., 2018, pp. 8574–8581.
  • (26) M. Ester, H.-P. Kriegel, J. Sander and X. Xu, A Density-based Algorithm for Discovering Clusters a Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, in: Second International Conference on Knowledge Discovery and Data Mining, 1996, pp. 226–231.
  • (27) G. Grisetti, C. Stachniss and W. Burgard, Improved Techniques for Grid Mapping With Rao-Blackwellized Particle Filters, IEEE Transactions on Robotics 23(1) (2007), 34–46.
  • (28) D. Fox, Adapting the Sample Size in Particle Filters Through KLD-Sampling, The International Journal of Robotics Research 22(12) (2003), 985–1003.
  • (29) M. Quigley, K. Conley, B. Gerkey, J. Faust, T.B. Foote, J. Leibs, R. Wheeler and A.Y. Ng, ROS: an open-source Robot Operating System, in: ICRA Workshop on Open Source Software, 2009.