Following Social Groups: Socially Compliant Autonomous Navigation in Dense Crowds

In densely populated environments, socially compliant navigation is critical for autonomous robots as driving close to people is unavoidable. This manner of social navigation is challenging given the constraints of human comfort and social rules. Traditional methods based on hand-craft cost functions to achieve this task have difficulties to operate in the complex real world. Other learning-based approaches fail to address the naturalness aspect from the perspective of collective formation behaviors. We present an autonomous navigation system capable of operating in dense crowds and utilizing information of social groups. The underlying system incorporates a deep neural network to track social groups and join the flow of a social group in facilitating the navigation. A collision avoidance layer in the system further ensures navigation safety. In experiments, our method generates socially compliant behaviors as state-of-the-art methods. More importantly, the system is capable of navigating safely in a densely populated area (10+ people in a 10m x 20m area) following crowd flows to reach the goal.


page 3

page 4


SEAN: Social Environment for Autonomous Navigation

Social navigation research is performed on a variety of robotic platform...

Socially Compliant Navigation Dataset (SCAND): A Large-Scale Dataset of Demonstrations for Social Navigation

Social navigation is the capability of an autonomous agent, such as a ro...

Autonomous Social Distancing in Urban Environments using a Quadruped Robot

COVID-19 pandemic has become a global challenge faced by people all over...

SOCIALGYM: A Framework for Benchmarking Social Robot Navigation

Robots moving safely and in a socially compliant manner in dynamic human...

On the Verification and Validation of AI Navigation Algorithms

This paper explores the state of the art on to methods to verify and val...

Learning Resilient Behaviors for Navigation Under Uncertainty Environments

Deep reinforcement learning has great potential to acquire complex, adap...

Iterative Program Synthesis for Adaptable Social Navigation

Robot social navigation is influenced by human preferences and environme...

I Introduction

The ability to safely navigate in populated scenes, e.g. airports, shopping malls, and social events, is essential for autonomous robots. The difficulty comes from the fact that people walk closely to the robot cutting ways in front of the robot or between the robot and the goal point. The safety margin for the robot to drive in crowded scenes is pushed to the minimum. In such a case, the navigation system has to trade-off between driving safely close to people and reaching the goal quickly. Furthermore, a previous study of socially compliant navigation [9] states three aspects in terms of the robot behaviors – comfort as the absence of annoyance and stress for humans in interaction with robots, naturalness as the similarity between the robot and human behaviors, and sociability as to abide by general cultural conventions. Among these three aspects, the first aspect essentially reflects safety of the navigation.

Previous studies on socially compliant navigation attempt to solve the problem with various methods, including data-driven approaches for human trajectory prediction [2, 1], potential field-based [7] and social force model-based [6]

approaches. In particular, reinforcement learning-based methods use reward functions to penalizes improper robot behaviors eliminating the cause of discomfort

[4, 3]. Inverse reinforcement learning based-methods learn from expert demonstrations [8]. These methods are hard to generalize due to that a large set of comprehensive expert demonstrations are hard to acquire.

The study of this paper is based on our previous work which uses deep learning in solving the socially compliant navigation problem

[14]. This paper extends the work in two ways. First, we consider the findings from a previous study [12] that 70% of people walk in social groups. Crowd behavior can be summarized as flows of social groups, and humans tend to move along the flow. It is our understanding that the behavior of joining the flow that shares similar heading direction is more socially compliant, causing fewer collisions and disturbances to surrounding pedestrians. Our method recognizes social groups and selects the flow to follow. Second, we ensure safety with a multi-layer navigation system. In this system, a deep learning-based global planning layer makes high-level socially compliant behavioral decisions while a geometry-based local planning layer handles collision avoidance at a low-level.

The paper is further related to previous work on modeling aggregate interactions among social groups [12] and leveraging learned social relations in tracking group formations [11]. Our main contributions are a deep learning-based method for socially compliant navigation with an emphasis on tracking and joining the crowd flow and an overall system integrated with the deep leaning method capable of safe autonomous navigation in dense crowds.

Ii Method

Ii-a System Overview

Fig. 1: Navigation software system diagram.

Fig. 1 gives an overview of the autonomous navigation system which consists of three subsystems as follows.

  • State Estimation Subsystem

    involves a multi-layer data processing pipeline which leverages lidar, vision, and inertial sensing [16]. The subsystem computes the 6-DOF pose of the vehicle as well as registers laser scan data with the computed pose.

  • Local Planning Subsystem is a low-level planning subsystem in charge of obstacle avoidance in the vicinity of the vehicle. The planning algorithm involves a trajectory library and computes collision-free paths for the vehicle to navigate [15].

  • Social Navigation Planning Subsystem takes in observations only consisting of pedestrians by subtracting the prior map. The subsystem tracks pedestrians in the surroundings of the vehicle, and then extracts the grouping information from the pedestrian walking patterns, with which, the subsystem generates way-points (as input of the Local Planning Subsystem), leveraging Group-Navi GAN, a generative planning algorithm in an adversarial training framework based on a deep neural network, Navi-GAN [14].

Ii-B Group-Navi GAN

Following the extended social force model [12], we propose Group-Navi GAN, a framework to jointly address the safety and naturalness aspects at a group’s level. Group-Navi GAN is inspired by our previous work Navi-GAN [14] which models social forces at an individual’s level. An intention-force generator in the Group-Navi GAN deep network models the driving force as for target agent to move toward the goal. A group-force generator models the repulsive force from other pedestrians j as and the interaction force from other group members as . The joint output of the intention-force generator and group-force generator defines the path for the robot to navigate.

In the group-force generator, a group pooling module first associates the target agent to a group based on the motion information (see Fig. 2

). Then, the group pooling module computes path adjustments which essentially guide the robot to follow the group. We apply a support vector machine classifier

[5] trained by [11] to determine if two agents belong to the same group. This uses the local spatio-temporal relation to cluster the agents with similar motions based on the coherent motion indicators, i.e. the differences in walking speed, spatial locations, and headings.

We use the following equation to aggregate the hidden state from to ,


where indicates if two agents are in the same group,


and are the agent headings. The resulting embedding of hidden state is computed as a row vector which consists of the maximum elements from all other agents. The embedding is further concatenated for decoding,


where is random noise drawn from .

Fig. 2: Group pooling module in the Group-Navi GAN deep network. The input of the module is the relative displacements of the surrounding pedestrians w.r.t. the target agent. The module associates the target agent to a group based on the motion information and outputs path adjustments for the robot to follow the group.
Metric Dataset Group Percentage Linear SGAN [1] Navi-GAN[14] Group-Navi GAN
ADE ETH[13] 18% 0.84 0.60 0.95 1.33
HOTEL[13] 19% 0.35 0.48 0.43 0.39
UNIV[10] 73% 0.56 0.36 0.85 0.29
ZARA1[10] 70% 0.41 0.21 0.40 0.21
ZARA2[10] 69% 0.53 0.27 0.47 0.30
AVG 50% 0.54 0.39 0.62 0.50
FDE ETH[13] 18% 1.60 1.22 1.64 1.98
HOTEL[13] 19% 0.60 0.95 0.74 0.93
UNIV[10] 73% 1.01 0.75 1.36 0.68
ZARA1[10] 70% 0.74 0.42 0.66 0.40
ZARA2[10] 69% 0.95 0.54 0.72 0.85
AVG 50% 0.98 0.78 1.02 0.96
TABLE I: Social compliance evaluation of Group-Navi GAN and other baseline approaches. Two error metrics, Average Displacement Error and Final Displacement Error are reported (in meters) for and . We manually count the number of pedestrians moving in social groups. Our method outperforms the prior work with the UNIV and ZARA1 datasets where social groups are richly available.

Without Social Model

With Social Model
  (1)                             (2)                             (3)

Fig. 3: Simulation results in a

area. The tests involve 18 people walking in 6 groups. Each group moves in a different direction. The three columns present three representative cases. The first and second rows show screenshots of the simulation environment. The coordinate frame indicates the robot. The goal point is marked as the magenta dot. The red dots are the tracked pedestrians using laser scan data. The third row displays the trajectories of the pedestrians (gray and green) and the robot (yellow and red). The dots are the start points and the star is the goal point of the robot. When using Navigation without Social Model, the robot produces the yellow path. When using Navigation with Social Model, the robot follows the group in green color and produces the red path. A blue square is labeled on each robot path where the corresponding screenshot is captured on the first and second rows. Specifically, on the first row, the screenshots show the moments when the robot drives overly close to people due to not using the social model. On the second row, the screenshots are taken while the robot follows a group during the navigation.

Iii Experiments

Iii-a Social Compliance Evaluation

We evaluate our method on two publicly available datasets: ETH [13] and UCY [10]. These datasets include rich social interactions in real-world scenarios. We follow the same evaluation methodology as the leave-one-out approach and the error metrics used in the prior work [1]:

  1. Average Displacement Error: The average L2 distance between predicted way-points and ground-truth trajectories over the predicted time steps.

  2. Final Displacement Error: The L2 distance between the predicted way-point and true final position at the last predicted time step.

We compare against a linear regressor that only predicts straight paths, Social-GAN(SGAN) [1], and Navi-GAN [14]. We use the past eight time steps to predict the future eight time steps. As shown in TABLE I, our method yields considerable accuracy improvements for some of the datasets where rich group interactions are prevalent. In particular, UNIV and ZARA1 have more than 70% of the pedestrians moving in social groups, and thus our model performs better. Our model performs slightly worse than the state-of-the-art approaches with the ETH and HOTEL datasets due to the lack of social group interactions. Further, our method assumes the existence of a goal point for each person in the dataset. Lacking precise goal point information results in a relative low accuracy. In the next experiments, we will show results with author-collected data where the strength of our method is more obvious.

Iii-B Group Following Evaluation

We further evaluate the method with a robot vehicle as shown in Fig. 4. The robot is equipped with a Velodyne Puck laser scanner for collision avoidance and pedestrian tracking. Our method is evaluated in two configurations – Navigation with Social Model refers to the full navigation system as shown in Fig. 1, and Navigation without Social Model has the Social Navigation Planning Subsystem removed. The State Estimation Subsystem and the Local Planning Subsystem are directly coupled. The robot navigates directly toward the goal and uses the Local Planning Subsystem to avoid collisions locally.

Fig. 4: Experiment platform. A wheelchair-based robot carries a sensor pack on the top. The sensor pack consists of a Velodyne Puck laser scanner, a camera, and a low-grade IMU. The scan data is used for collision avoidance and pedestrian tracking. A laptop computer carries out all onboard processing.
Without Social Model
With Social Model
Fig. 5: Real-world experiments in a area. The first row shows photos of 6 people walking in 2 groups. One group moves along the robot navigation direction and the other group moves in the opposite direction. The second row shows the corresponding trajectories of the people (blue and green) and the robot (orange). Dots indicate the start points and the star indicates the goal point of the robot. In (a), when using Navigation without Social Model, the robot drives directly toward the goal point and results in cutting through the group on the left that moves against the robot. In (b), when using Navigation with Social Model, the robot follows the group on the right and avoids disturbances to the pedestrians.

We show results in both simulation and real-work experiments with pedestrian data collected by the robot. In simulation, we show scenarios with 18 people walking around the robot in 6 groups. In real-work experiments, we have 6 people walking in 2 groups. One group moves along the robot navigation direction and the other group moves in the opposite direction. The results are shown in Fig. 3 and Fig. 5. In each scenario, the robot selects a group to follow with the full navigation system (Navigation with Social Model). If using Navigation without Social Model, the robot drives directly toward the goal and results in interactions with groups moving in other directions.

Finally, we conduct an Amazon Mechanical Turk (AMT) study to further understand the safety and naturalness of the robot navigation. A total of 466 participants evaluate the simulation and real-world results. As shown in Table II, 90% of the participants consider Navigation without Social Model to be unsafe (with collisions) while the ratio reduces to 40% using Navigation with Social Model. With the real-world results, 95% of the participants report that the robot forces other pedestrians to change their paths if using Navigation without Social Model. When using Navigation with Social Model, the ratio reduces to 4%. The survey result validates that our method helps reduce disturbances to other pedestrians as well as improves safety of the navigation. A video of these results can be seen at

Metric Scene Without Social Model With Social Model
Collision (Safety) (1) 97% 42%
(2) 92% 6%
(3) 92% 36%
AVG 93% 28%
Path Change (Naturalness) Real world 95% 4%
TABLE II: Results of survey study. A total of 466 participants evaluate the simulation results in Fig. 3 and the real-world results in Fig. 5. We can see that 90% of the participants consider the Navigation without Social Model to have collisions. For Navigation with Social Model, the ratio reduces to 40%. Further, 95% of the participants report that the robot forces other pedestrians to change their paths if using Navigation without Social Model. When using Navigation with Social Model, the ratio reduces to 4%. The ratios reduce by 3 times in terms of collision and 20 times in terms of path change which validate that our method helps reduce disturbances to other pedestrians as well as improves safety.

Iv Conclusion

The paper proposes an autonomous navigation system capable of operating in dense crowds. In this system, a Social Navigation Planning Subsystem incorporating a deep neural network generates socially compliant behaviors. This involves a group pooling mechanism by inferring social relationships to encourage the autonomous navigation to join the flow of a social group sharing the same moving direction. We show the effectiveness of our method through quantitative and empirical studies in both simulations and real-world experiments. The result is that by joining the crowd flow, the robot has fewer collisions with people crossing sideways or walking toward the robot. Joining the flow also creates fewer disturbances to the pedestrians. As a result, the robot navigates in a safe and natural manner. Since this paper focuses on human-robot interactions at a group’s level, extension of the work in the future can model interactions between groups and scattered individuals.


Special thanks are given to C.-E. Tsai, Y. Song, D. Zhao for facilitating experiments.


  • [1] G. Agrim, J. Justin, F. Li, S. Silvio, and A. Alexandre (2018) Social gan: socially acceptable trajectories with generative adversarial networks. arXiv preprint arXiv:1803.10892. Cited by: §I, TABLE I, §III-A, §III-A.
  • [2] A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, F. Li, and S. Savarese (2016) Social lstm: human trajectory prediction in crowded spaces. In

    Proc. IEEE Conf. Comput. Vis. Pattern Recognit.

    Cited by: §I.
  • [3] C. Chen, Y. Liu, S. Kreiss, and A. Alahi (2019) Crowd-robot interaction: crowd-aware robot navigation with attention-based deep reinforcement learning. arXiv preprint arXiv:1809.08835. Cited by: §I.
  • [4] Y. F. Chen, M. Everett, M. Liu, and J. P. How (2018) Socially aware motion planning with deep reinforcement learning. arXiv preprint arXiv:1703.08862. Cited by: §I.
  • [5] C. Cortes and V. N. Vapnik (1995) Support-vector networks. Machine Learning (20). Cited by: §II-B.
  • [6] D. Helbing and P. Molnar (1998) Social force model for pedestrian dynamics. Physical Review E 51. Cited by: §I.
  • [7] F. Hoeller, D. Schulz, M. Moors, and E. F. Schneider (2007) Accompanying persons with a mobile robot using motion prediction and probabilistic roadmaps. In IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), Cited by: §I.
  • [8] H. Kretzschmar, M. Spies, C. Sprunk, and W. Burgard (2016) Socially compliant mobile robot naviga- tion via inverse reinforcement learning. The International Journal of Robotics Research 35 (11). Cited by: §I.
  • [9] T. Krusea, A. K. Pandeybc, R. Alamibc, and A. Kirschd (2013-12) Human-aware robot navigation: a survey. Robotics and Autonomous Systems 61 (12), pp. 1726–1743. Cited by: §I.
  • [10] L. Leal-Taixe, M. Fenzi, A. Kuznetsova, B. Rosenhahn, and S. Savarese (2014) Improving data association by joint modeling of pedestrian trajectories and groupings. In CVPR, Cited by: TABLE I, §III-A.
  • [11] T. Linder and K. O. Arras (2014) Multi-model hypothesis tracking of groups of people in rgb-d data. In Proc.of the 17th Int’l Conf. on Information Fusion, Cited by: §I, §II-B.
  • [12] M. Moussaïd, N. Perozo, S. Garnier, D. Helbing, and G. Theraulaz (2010) The walking behaviour of pedestrian social groups and its impact on crowd dynamics. arXiv preprint arXiv:1003.3894. Cited by: §I, §I, §II-B.
  • [13] S. Pellegrini, A. Ess, and L. V. Gool (2010) Improving data association by joint modeling of pedestrian trajectories and groupings. In Computer Vision–ECCV, Cited by: TABLE I, §III-A.
  • [14] C. Tsai (2019-06) A generative approach for socially compliant navigation. Master’s Thesis, , Pittsburgh, PA. Cited by: §I, 3rd item, §II-B, TABLE I, §III-A.
  • [15] J. Zhang, C. Hu, R. Gupta Chadha, and S. Singh (2019) Maximum likelihood path planning for fast aerial maneuvers and collision avoidance. In IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), Cited by: 2nd item.
  • [16] J. Zhang and S. Singh (2018) Random field topic model for semantic region analysis in crowded scenes from tracklets. In Journal of Field Robotics, Vol. 35. Cited by: 1st item.