I Introduction
The ability to safely navigate in populated scenes, e.g. airports, shopping malls, and social events, is essential for autonomous robots. The difficulty comes from the fact that people walk closely to the robot cutting ways in front of the robot or between the robot and the goal point. The safety margin for the robot to drive in crowded scenes is pushed to the minimum. In such a case, the navigation system has to trade-off between driving safely close to people and reaching the goal quickly. Furthermore, a previous study of socially compliant navigation [9] states three aspects in terms of the robot behaviors – comfort as the absence of annoyance and stress for humans in interaction with robots, naturalness as the similarity between the robot and human behaviors, and sociability as to abide by general cultural conventions. Among these three aspects, the first aspect essentially reflects safety of the navigation.
Previous studies on socially compliant navigation attempt to solve the problem with various methods, including data-driven approaches for human trajectory prediction [2, 1], potential field-based [7] and social force model-based [6]
approaches. In particular, reinforcement learning-based methods use reward functions to penalizes improper robot behaviors eliminating the cause of discomfort
[4, 3]. Inverse reinforcement learning based-methods learn from expert demonstrations [8]. These methods are hard to generalize due to that a large set of comprehensive expert demonstrations are hard to acquire.The study of this paper is based on our previous work which uses deep learning in solving the socially compliant navigation problem
[14]. This paper extends the work in two ways. First, we consider the findings from a previous study [12] that 70% of people walk in social groups. Crowd behavior can be summarized as flows of social groups, and humans tend to move along the flow. It is our understanding that the behavior of joining the flow that shares similar heading direction is more socially compliant, causing fewer collisions and disturbances to surrounding pedestrians. Our method recognizes social groups and selects the flow to follow. Second, we ensure safety with a multi-layer navigation system. In this system, a deep learning-based global planning layer makes high-level socially compliant behavioral decisions while a geometry-based local planning layer handles collision avoidance at a low-level.The paper is further related to previous work on modeling aggregate interactions among social groups [12] and leveraging learned social relations in tracking group formations [11]. Our main contributions are a deep learning-based method for socially compliant navigation with an emphasis on tracking and joining the crowd flow and an overall system integrated with the deep leaning method capable of safe autonomous navigation in dense crowds.
Ii Method
Ii-a System Overview

Fig. 1 gives an overview of the autonomous navigation system which consists of three subsystems as follows.
-
State Estimation Subsystem
involves a multi-layer data processing pipeline which leverages lidar, vision, and inertial sensing [16]. The subsystem computes the 6-DOF pose of the vehicle as well as registers laser scan data with the computed pose. -
Local Planning Subsystem is a low-level planning subsystem in charge of obstacle avoidance in the vicinity of the vehicle. The planning algorithm involves a trajectory library and computes collision-free paths for the vehicle to navigate [15].
-
Social Navigation Planning Subsystem takes in observations only consisting of pedestrians by subtracting the prior map. The subsystem tracks pedestrians in the surroundings of the vehicle, and then extracts the grouping information from the pedestrian walking patterns, with which, the subsystem generates way-points (as input of the Local Planning Subsystem), leveraging Group-Navi GAN, a generative planning algorithm in an adversarial training framework based on a deep neural network, Navi-GAN [14].
Ii-B Group-Navi GAN
Following the extended social force model [12], we propose Group-Navi GAN, a framework to jointly address the safety and naturalness aspects at a group’s level. Group-Navi GAN is inspired by our previous work Navi-GAN [14] which models social forces at an individual’s level. An intention-force generator in the Group-Navi GAN deep network models the driving force as for target agent to move toward the goal. A group-force generator models the repulsive force from other pedestrians j as and the interaction force from other group members as . The joint output of the intention-force generator and group-force generator defines the path for the robot to navigate.
In the group-force generator, a group pooling module first associates the target agent to a group based on the motion information (see Fig. 2
). Then, the group pooling module computes path adjustments which essentially guide the robot to follow the group. We apply a support vector machine classifier
[5] trained by [11] to determine if two agents belong to the same group. This uses the local spatio-temporal relation to cluster the agents with similar motions based on the coherent motion indicators, i.e. the differences in walking speed, spatial locations, and headings.We use the following equation to aggregate the hidden state from to ,
(1) |
where indicates if two agents are in the same group,
(2) |
and are the agent headings. The resulting embedding of hidden state is computed as a row vector which consists of the maximum elements from all other agents. The embedding is further concatenated for decoding,
(3) |
where is random noise drawn from .

Metric | Dataset | Group Percentage | Linear | SGAN [1] | Navi-GAN[14] | Group-Navi GAN |
---|---|---|---|---|---|---|
ADE | ETH[13] | 18% | 0.84 | 0.60 | 0.95 | 1.33 |
HOTEL[13] | 19% | 0.35 | 0.48 | 0.43 | 0.39 | |
UNIV[10] | 73% | 0.56 | 0.36 | 0.85 | 0.29 | |
ZARA1[10] | 70% | 0.41 | 0.21 | 0.40 | 0.21 | |
ZARA2[10] | 69% | 0.53 | 0.27 | 0.47 | 0.30 | |
AVG | 50% | 0.54 | 0.39 | 0.62 | 0.50 | |
FDE | ETH[13] | 18% | 1.60 | 1.22 | 1.64 | 1.98 |
HOTEL[13] | 19% | 0.60 | 0.95 | 0.74 | 0.93 | |
UNIV[10] | 73% | 1.01 | 0.75 | 1.36 | 0.68 | |
ZARA1[10] | 70% | 0.74 | 0.42 | 0.66 | 0.40 | |
ZARA2[10] | 69% | 0.95 | 0.54 | 0.72 | 0.85 | |
AVG | 50% | 0.98 | 0.78 | 1.02 | 0.96 |
Without Social Model
With Social Model
Trajectories
(1) (2) (3)
area. The tests involve 18 people walking in 6 groups. Each group moves in a different direction. The three columns present three representative cases. The first and second rows show screenshots of the simulation environment. The coordinate frame indicates the robot. The goal point is marked as the magenta dot. The red dots are the tracked pedestrians using laser scan data. The third row displays the trajectories of the pedestrians (gray and green) and the robot (yellow and red). The dots are the start points and the star is the goal point of the robot. When using Navigation without Social Model, the robot produces the yellow path. When using Navigation with Social Model, the robot follows the group in green color and produces the red path. A blue square is labeled on each robot path where the corresponding screenshot is captured on the first and second rows. Specifically, on the first row, the screenshots show the moments when the robot drives overly close to people due to not using the social model. On the second row, the screenshots are taken while the robot follows a group during the navigation.
Iii Experiments
Iii-a Social Compliance Evaluation
We evaluate our method on two publicly available datasets: ETH [13] and UCY [10]. These datasets include rich social interactions in real-world scenarios. We follow the same evaluation methodology as the leave-one-out approach and the error metrics used in the prior work [1]:
-
Average Displacement Error: The average L2 distance between predicted way-points and ground-truth trajectories over the predicted time steps.
-
Final Displacement Error: The L2 distance between the predicted way-point and true final position at the last predicted time step.
We compare against a linear regressor that only predicts straight paths, Social-GAN(SGAN) [1], and Navi-GAN [14]. We use the past eight time steps to predict the future eight time steps. As shown in TABLE I, our method yields considerable accuracy improvements for some of the datasets where rich group interactions are prevalent. In particular, UNIV and ZARA1 have more than 70% of the pedestrians moving in social groups, and thus our model performs better. Our model performs slightly worse than the state-of-the-art approaches with the ETH and HOTEL datasets due to the lack of social group interactions. Further, our method assumes the existence of a goal point for each person in the dataset. Lacking precise goal point information results in a relative low accuracy. In the next experiments, we will show results with author-collected data where the strength of our method is more obvious.
Iii-B Group Following Evaluation
We further evaluate the method with a robot vehicle as shown in Fig. 4. The robot is equipped with a Velodyne Puck laser scanner for collision avoidance and pedestrian tracking. Our method is evaluated in two configurations – Navigation with Social Model refers to the full navigation system as shown in Fig. 1, and Navigation without Social Model has the Social Navigation Planning Subsystem removed. The State Estimation Subsystem and the Local Planning Subsystem are directly coupled. The robot navigates directly toward the goal and uses the Local Planning Subsystem to avoid collisions locally.

![]() |
![]() |
![]() |
![]() |
We show results in both simulation and real-work experiments with pedestrian data collected by the robot. In simulation, we show scenarios with 18 people walking around the robot in 6 groups. In real-work experiments, we have 6 people walking in 2 groups. One group moves along the robot navigation direction and the other group moves in the opposite direction. The results are shown in Fig. 3 and Fig. 5. In each scenario, the robot selects a group to follow with the full navigation system (Navigation with Social Model). If using Navigation without Social Model, the robot drives directly toward the goal and results in interactions with groups moving in other directions.
Finally, we conduct an Amazon Mechanical Turk (AMT) study to further understand the safety and naturalness of the robot navigation. A total of 466 participants evaluate the simulation and real-world results. As shown in Table II, 90% of the participants consider Navigation without Social Model to be unsafe (with collisions) while the ratio reduces to 40% using Navigation with Social Model. With the real-world results, 95% of the participants report that the robot forces other pedestrians to change their paths if using Navigation without Social Model. When using Navigation with Social Model, the ratio reduces to 4%. The survey result validates that our method helps reduce disturbances to other pedestrians as well as improves safety of the navigation. A video of these results can be seen at www.youtube.com/watch?v=I_SkA9rmxYE.
Metric | Scene | Without Social Model | With Social Model |
Collision (Safety) | (1) | 97% | 42% |
(2) | 92% | 6% | |
(3) | 92% | 36% | |
AVG | 93% | 28% | |
Path Change (Naturalness) | Real world | 95% | 4% |
Iv Conclusion
The paper proposes an autonomous navigation system capable of operating in dense crowds. In this system, a Social Navigation Planning Subsystem incorporating a deep neural network generates socially compliant behaviors. This involves a group pooling mechanism by inferring social relationships to encourage the autonomous navigation to join the flow of a social group sharing the same moving direction. We show the effectiveness of our method through quantitative and empirical studies in both simulations and real-world experiments. The result is that by joining the crowd flow, the robot has fewer collisions with people crossing sideways or walking toward the robot. Joining the flow also creates fewer disturbances to the pedestrians. As a result, the robot navigates in a safe and natural manner. Since this paper focuses on human-robot interactions at a group’s level, extension of the work in the future can model interactions between groups and scattered individuals.
Acknowledgment
Special thanks are given to C.-E. Tsai, Y. Song, D. Zhao for facilitating experiments.
References
- [1] (2018) Social gan: socially acceptable trajectories with generative adversarial networks. arXiv preprint arXiv:1803.10892. Cited by: §I, TABLE I, §III-A, §III-A.
-
[2]
(2016)
Social lstm: human trajectory prediction in crowded spaces.
In
Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
, Cited by: §I. - [3] (2019) Crowd-robot interaction: crowd-aware robot navigation with attention-based deep reinforcement learning. arXiv preprint arXiv:1809.08835. Cited by: §I.
- [4] (2018) Socially aware motion planning with deep reinforcement learning. arXiv preprint arXiv:1703.08862. Cited by: §I.
- [5] (1995) Support-vector networks. Machine Learning (20). Cited by: §II-B.
- [6] (1998) Social force model for pedestrian dynamics. Physical Review E 51. Cited by: §I.
- [7] (2007) Accompanying persons with a mobile robot using motion prediction and probabilistic roadmaps. In IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), Cited by: §I.
- [8] (2016) Socially compliant mobile robot naviga- tion via inverse reinforcement learning. The International Journal of Robotics Research 35 (11). Cited by: §I.
- [9] (2013-12) Human-aware robot navigation: a survey. Robotics and Autonomous Systems 61 (12), pp. 1726–1743. Cited by: §I.
- [10] (2014) Improving data association by joint modeling of pedestrian trajectories and groupings. In CVPR, Cited by: TABLE I, §III-A.
- [11] (2014) Multi-model hypothesis tracking of groups of people in rgb-d data. In Proc.of the 17th Int’l Conf. on Information Fusion, Cited by: §I, §II-B.
- [12] (2010) The walking behaviour of pedestrian social groups and its impact on crowd dynamics. arXiv preprint arXiv:1003.3894. Cited by: §I, §I, §II-B.
- [13] (2010) Improving data association by joint modeling of pedestrian trajectories and groupings. In Computer Vision–ECCV, Cited by: TABLE I, §III-A.
- [14] (2019-06) A generative approach for socially compliant navigation. Master’s Thesis, , Pittsburgh, PA. Cited by: §I, 3rd item, §II-B, TABLE I, §III-A.
- [15] (2019) Maximum likelihood path planning for fast aerial maneuvers and collision avoidance. In IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), Cited by: 2nd item.
- [16] (2018) Random field topic model for semantic region analysis in crowded scenes from tracklets. In Journal of Field Robotics, Vol. 35. Cited by: 1st item.
Comments
There are no comments yet.