Adapting to social conventions is an unavoidable requirement for the acceptance of assistive and social robots. While the scientific community broadly accepts that assistive robots and social robot companions are unlikely to have widespread use in the near future, their presence in health-care and other medium-sized institutions is becoming a reality. These robots will have a beneficial impact in industry (see [15, 14]) and other fields such as health care (see [6, 7]). The growing number of research contributions to social navigation is also indicative of the importance of the topic. To foster the future prevalence of these robots, they must be useful, but also socially accepted. As first proposed by  and later by  and , robots should navigate politely, actively asking for permission or collaboration when necessary. The first step to be able to actively ask for collaboration or permission is to estimate whether the robot would make people feel uncomfortable otherwise, and that is precisely the goal of algorithms evaluating social navigation compliance. Some approaches provide analytic models, whereas others use machine learning techniques such as neural networks (see ). Regardless of the approach followed, modelling social conventions is very challenging. Firstly, because the problem itself is subjective. Secondly, because of the variables involved, whose number and weight is undetermined and changing. This data report presents and describes SocNav1, a dataset for social navigation conventions. The aims of SocNav1 are two-fold: a) enabling comparison of the algorithms that robots use to assess the convenience of their presence in a particular position when navigating; b) providing a sufficient amount of data so that modern machine learning algorithms such as deep neural networks can be used. Because of the structured nature of the data, SocNav1 is particularly well-suited to be used to benchmark non-Euclidean machine learning algorithms such as Graph Neural Networks (see ). The dataset has been made available in a public repository444https://github.com/ljmanso/SocNav1.
There are many different factors that influence robot social acceptance (), including visual appearance, interaction skills and an appropriate management of the interaction spaces. The study of how humans manage their interaction distances with other people is called proxemics (). Multiple social navigation approaches build on the idea of proxemics to improve robots’ social acceptability in navigation (e.g., [9, 16]). However, as pointed out by , there are other factors that should be taken into account to avoid disturbing humans, such as human interaction groups, Information Process Spaces or Affordance and Activity Spaces. Some of these concepts have been incorporated in studies where an analytic solution is provided (e.g., [2, 19]), whereas others follow a machine learning approach (e.g., [9, 16]). Independently of the nature of its implementation, the importance of social navigation makes key having appropriate datasets, not only for benchmarking, but also for learning purposes.
Several public datasets have been used in social navigation. In , authors use the Edinburgh Informatics Forum Pedestrian Database (EIPD) to make a robot learn the behaviour of pedestrians. Another interesting dataset is the one used in , which contains recorded action sequences that correspond to social interactions. The authors use it in a social mapping approach. In , a dataset for public space surveillance task was also made public. It consists of 28 video sequences of 6 different scenarios. Two data sets are also described in  for tracking multiple people tracking. The dataset was acquired from birds-eye and manually annotated.
To the best of our knowledge, the social navigation datasets available in the literature provide data to benchmark and/or learn route estimators based on the behaviour of humans. The first motivation to generate a new dataset is that, especially while the technology readiness level is not high enough, the behaviour that humans expect from robots might be different to the one expected from fellow humans. Generally, humans would expect robots to keep a safer distance in comparison to other humans. Among the possible causes of this phenomena we can highlight the noise made when robots move, and the apparent unpredictability of their behaviour in comparison to that of humans. The second motivation of the dataset is that SocNav1 aims at evaluating the robots’ ability to assess the level of discomfort that their presence might generate among humans. This ability would be used by robot navigation systems to estimate path costs, but SocNav1 does not directly deal with path costs.
2 Data collection methods
In order to acquire data at a feasible cost and gather robot-specific information (i.e., not imitating the behaviour of humans), it was decided to develop an ad hoc application depicting the scenarios that humans had to manually assess (see Figure 1).
The interface of the tool has two main areas. The canvas on the left-hand side is used to depict the scenarios where subjects were asked to assess the robot’s behaviour in terms of the disturbance caused to humans. On the right hand side, users have a slider which value goes from 0-unacceptable to 100-perfect (the intermediate labels are undesirable, acceptable, good and very good). The interface smoothly transitions from one label to another using font transparency to make easier selecting intermediate values. Also on the right-hand side, users can make use of two buttons, one to assess the current scene and generate a new one (button on top) and another one to avoid labelling the current scene in case they are unsure of how to label a particular scene (button on the bottom).
The scenarios, rooms randomly generated under some restrictions to make them feasible, depict metres square areas where different elements can be found: the robot, walls, humans, objects and interaction indicators. The representation is robot-centric, so the robot -in red- is always in the centre of the canvas and aligned with the axes. There is always a room composed of, at least, four walls represented by black lines. Humans -in blue- and objects -in green- can be anywhere in the room. They are only generated within the canvas, enforcing that even if the room is bigger than the canvas users will not miss any element. Interactions are represented by parallel lines. These might exist between humans -for human-to-human interactions- or between a human and an object -to represent any kind of interaction with objects. Given this information, the subjects were asked to estimate to what extent the robot interrupts humans.
Despite some guidelines were provided, subjects were asked to feel free to express how they thought they would feel in the scenarios. The guidelines were the following:
The closer the robot is to humans from their perspective, the more it disturbs.
A collision with a human should have a 0 score (unacceptable).
We want to consider, not only the personal spaces, but also the spaces that humans need to interact with other humans or objects. The closer the robot gets to the interaction space (human to human, or human to object) the lower the score -up to a non-critical limit.
A collision with an interaction area should have a maximum score of 20 (undesirable).
The score should decrease as the number of people it is interrupting increases.
In small rooms with a high number of people, closer distances are acceptable in comparison to big rooms with less people. It is somewhat acceptable to get closer to people in crowded environments. Therefore, in general terms, the higher the density, the higher the score.
You should consider only social aspects, not robot’s intelligence. Even if the robot seems to be having a close look at one of the walls, it should have a decent score as long as it is not disturbing anyone. The variable to assess is not related to the robot’s performance or whether or not the robot collides with walls and objects. We are only asking about social aspects.
The dataset was generated using 2 sets of possible scenarios. Using the first subset, composed of 2500 scenarios, 3 subjects generated a total of 5522 labels for the scenarios. These scenarios were classified multiple times with some level of disagreement between humans, as the nature of the problem is subjective. Using the second subset, composed of 10000 scenarios, 9 subjects generated a total of 3758 labelled scenarios with a low number of duplicates. As a result, 12 subjects generated 9280 labels for the scenarios provided. Three of the subjects were researchers involved in the project, the rest were computer science students with no domain knowledge beyond the instructions they were given. A total of 5735 different scenarios were used, 2761 were labelled once, 2406 were labelled twice and 568 were labelled three or more times. When the dataset was designed, labelling scenarios multiple times was considered beneficial to evaluate to what extent humans agree on the labelling (see Section3). The whole data collection process took place between April 13th and April 27th, 2019.
The dataset is composed of four JSON files: three files for training, development and testing, and a fourth file for training with data augmentation. The percentage of samples for the training, development and testing datasets were 94%, 6% and 6% respectively. Augmentation was carried out by mirroring the scene over the frontal axis, assuming that mirrored scenarios should have the same labels. The samples were shuffled before splitting the dataset into train/dev/set. The augmented dataset was also shuffled after the augmentation process. The main files in the dataset are located in the data subdirectory:
socnav_training.json: Training dataset. No data augmentation. 8168 labels/scenarios.
socnav_training_dup.json: Training dataset with data augmentation. 16336 labels/scenarios.
socnav_dev.json: Development dataset. 556 labels/scenarios.
socnav_test.json: Testing dataset. 556 labels/scenarios.
Each line in these files describes a labelled scenario using a map that contains the following elements.
identifier: A string that identifies the scenario. Several instances of the same labelled scenario might exist.
robot: It is a dictionary containing the identifier of the robot in the scenario.
humans: A list of humans. Each human is implemented as a dictionary with the following keys: id (identifying the human in the scenario), xPos, yPos (the location of the human, expressed in centimetres), orientation (expressed in degrees). Humans are assumed to be 40cm wide, 20cm deep.
objects: A list of objects. Each object is implemented as a dictionary with the following keys: id (identifying the object in the scenario), xPos, yPos (the location of the object, expressed in centimetres), orientation (expressed in degrees). Objects are assumed to be 40x40 cm.
links: A list of interaction tuples, where the first element of the tuple is a human who is interacting with the second element in the tuple, which can be an object or another human.
score: The score assigned to the robot in the scenario. From 0 to 100.
Besides the data sub-directory, the repository has two other sub-directories: raw_data, which contains the data collected by each of the 19 subjects, and unlabelled, where the two subsets of scenarios used can be found (following the same file format and a score of 0 for all the scenarios). All angles are expressed in degrees, distances are expressed in centimetres.
3 Basic analysis
This section provides a brief analysis of the data to facilitate understanding its relatively subjective nature and how the labels are distributed. To this end, a subset of the scenarios which were labelled by three subjects is used (see Figure 2).
Figure (2) depicts a histogram of the labels provided by three different subjects for 500 common scenarios. Each label represents a score range: for perfect, for very good, for good, for acceptable, for undesirable, and for unacceptable. From this figure, certain variability on the opinion of the 3 subjects can be observed. Thus, subjects 1 and 3 tend to give a more extreme score to the different scenarios than subject 2, who scores a higher number of situations with intermediate labels. Despite the observed variations, the three subjects assign a score greater than 50, the limit between good and acceptable, to a similar number of scenarios (around 320). This indicates that no relevant divergences are found among the different opinions of a common scenario.
Figure (2) provides additional data that reinforce the above observation. This figure represents the difference between the score of a subject and the mean score of the three subjects for 150 common scenarios. The four scenarios of Figure 1
have been marked in the chart with vertical dotted lines. The standard deviation for each of the three subjects considering the 150 scenarios is around 10 points. Given that this value is lower than the width of the label ranges, the variation of the score provided by the three subjects can be considered moderately low. For a considerable number of scenarios, such as the scenarios(A) and (D) in Figure 1, the three subjects assign similar scores. Nevertheless, other scenarios produce more variability. This is the case of the scenarios (B) and (C) of Figure 1, which are more susceptible to generate different feelings than (A) and (D).
Considering the whole set of 5735 different scenarios labelled by the three subjects, similar results are observed. The variability in these scenarios has been individually measured for different subsets, grouping the scenarios according to the number of times they have been labelled. For each subset, the pooled standard deviation () () has been computed as a measure of dispersion. The resulting values show that the dispersion remains below points in all the subsets. Moreover, combining the dispersion of all the subsets, a global of is obtained, which is in line with the results of Figure 2.
Nowadays, datasets are extremely important in many scientific disciplines. They are essential for benchmarking and algorithm comparison, but with the emergence of deep learning, datasets have became the basic support over which theis sustained. For the problem at hand, the variable size of the data and its structured nature is one of the challenges from a learning point of view. The 9280 samples generated in SocNav1 seem to be enough for machine learning purposes given the size of the data structures describing the scenarios. Initial results using the dataset in Deep Neural Networks support this idea.
Regarding the design of the experiments, it is worth noting that the labels describe how humans think they would feel in the situation, not how they would feel if they were actually there. Generating a dataset providing direct measurements would be extremely challenging from a technological point of view (how are the measurements taken) as well as from a managerial perspective (time and resources needed). Even though the data for each scenario might not seem very complex, as mentioned in the introduction in more detail, the datasets currently available do not consider interactions between people or objects. We are however open to extending the dataset with new features in the future if it is found useful.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
All authors contributed to the design of the dataset and the manuscript. All authors read and approved the final version of the manuscript. The software to collect and process the data was developed by Luis J. Manso and Pilar Bachiller.
No ethical review process was conducted for this study because no personal data was recorded, only assessments of the scenarios shown. Additionally, the identity of the subjects was anonymised. Before any data was recorded, the subjects were given a brief document explaining what kind of data was going to be recorded along the instructions to use the program.
This work has partly been supported by grants 0043_EUROAGE_4_E, from the European Union - Interreg project-, and by grants GR18133 and IB18056, from the Government of Extremadura.
Data Availability Statement
The dataset generated can be found in the GitHub repository ljmanso/SocNav1: https://github.com/ljmanso/SocNav1.
-  (2018) Relational inductive biases, deep learning, and graph networks. arXiv. External Links: Cited by: §1.
-  (2010) Towards semantic navigation in mobile robotics. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 5765 LNCS, pp. 719–748. External Links: Cited by: §1.
-  (1988-01) Statistical power analysis for the behavioral sciences. SERBIULA (sistema Librum 2.0) 2nd, pp. . External Links: Cited by: §3.
-  (2019) Enabling Socially Competent navigation through incorporating HRI. arXiv, pp. 9–12. External Links: Cited by: §1.
-  (2004-01) The PETS04 surveillance ground-truth data sets. Technical report School of Informatics, University of Edinburgh. Cited by: §1.
-  (2018) Did I Tell You My New Therapist is a Robot? Ethical, Legal, and Societal Issues of Healthcare and Therapeutic Robots. Ethical, Legal, and Societal Issues of Healthcare and Therapeutic Robots (October 17, 2018). Cited by: §1.
-  (2018) Challenges on the Application of Automated Planning for Comprehensive Geriatric Assessment Using an Autonomous Social Robot. In Workshop of Physical Agents, pp. 179–194. Cited by: §1.
-  (1969) The hidden dimension: man’s use of space in public and private the bodley head. London, Sydney, Toronto 121. Cited by: §1.
-  (2009) Adaptive human aware navigation based on motion pattern analysis. In RO-MAN 2009-The 18th IEEE International Symposium on Robot and Human Interactive Communication, pp. 927–932. Cited by: §1.
-  (2012) Socially-Aware Robot Navigation: A Learning Approach. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 902–907. External Links: Cited by: §1.
-  (2013) Social Mapping of Human-Populated Environments by Implicit Function Learning. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1701–1707. External Links: Cited by: §1.
You’ll never walk alone: Modeling social behavior for multi-target tracking.
Proceedings of the IEEE International Conference on Computer Vision, pp. 261–268. External Links: Cited by: §1.
-  (2018) Learning human-aware path planning with fully convolutional networks. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1–5. Cited by: §1.
-  (2018) The Impact of Robotics and Automation on Working Conditions and Employment: Ethical, Legal, and Societal Issues. IEEE Robotics & Automation Magazine 25 (2), pp. 126–128. Cited by: §1.
-  (2017) Labor market risks of industry 4.0, digitization, robots and AI. In 2017 IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY), pp. 000343–000346. Cited by: §1.
-  (2014) Transferring human navigation behaviors into a robot local planner. In The 23rd IEEE International Symposium on Robot and Human Interactive Communication, pp. 774–779. External Links: Cited by: §1.
-  (2015) From Proxemics Theory to Socially-Aware Navigation: A Survey. International Journal of Social Robotics 7 (2), pp. 137–153. External Links: Cited by: §1.
-  (2015) “Go Ahead, Please”: Recognition and Resolution of Conflict Situations in Narrow Passages for Polite Mobile Robot Navigation. In International Conference on Social Robotics, pp. 643–653. External Links: Cited by: §1.
-  (2019) Socially aware robot navigation system in human-populated and interactive environments based on an adaptive spatial density function and space affordances. Pattern Recognition Letters 118, pp. 72–84. Cited by: §1, §1.