Relational Graph Learning for Crowd Navigation

09/28/2019 ∙ by Changan Chen, et al. ∙ 0

We present a relational graph learning approach for robotic crowd navigation using model-based deep reinforcement learning that plans actions by looking into the future. Our approach reasons about the relations between all agents based on their latent features and uses a Graph Convolutional Network to encode higher-order interactions in each agent's state representation, which is subsequently leveraged for state prediction and value estimation. The ability to predict human motion allows us to perform multi-step lookahead planning, taking into account the temporal evolution of human crowds. We evaluate our approach against a state-of-the-art baseline for crowd navigation and ablations of our model to demonstrate that navigation with our approach is more efficient, results in fewer collisions, and avoids failure cases involving oscillatory and freezing behaviors.



There are no comments yet.


page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Inferring the underlying relations between components of complex dynamic systems can inform decision making for autonomous agents. One natural system with complex dynamics is crowd navigation (i.e., navigation in the presence of multiple humans). The crowd navigation task is challenging as the agent must predict and plan relative to likely human motions so as to avoid collisions and remain at safe and socially appropriate distances from people. Some prior work predicts human trajectories using hand-crafted social interaction models [18] or by modeling the temporal behavior of humans [2]. Although these methods can estimate human trajectories, they do not use the prediction to inform the navigation policy. Other recent works [10, 12, 8]

use deep reinforcement learning (DRL) to learn a socially compliant policy. These policies either do not leverage the human interactions or approximate it with heuristics. They also simplify the human motion prediction problem with unrealistic assumptions such as linear human motion, and typically consider only the current state of humans instead of incorporating predicted human motion to inform the policy.

More broadly, interacting systems have been studied extensively in recent work [4, 19, 22, 20, 11]

and Graph Neural Networks (GNNs) are one of the most powerful tools for modeling objects and their relations (interactions). A variant of GNNs is Graph Convolutional Networks (GCNs)

[23] where relations between nodes are defined as an adjacency matrix. Whereas in GCNs the relations between all nodes are given, Wang et al. [36] and Grover et al. [15] propose to learn the relations between objects and use learned attention to compute new features. Inspired by this work on relational reasoning and GNN models, we propose a relational graph learning model for crowd navigation that reasons about relations between agents (robot and humans) and then use a GCN to compute interactions between the agents. With the predicted interaction features of both the robot and humans, our approach jointly plans efficient robot navigation and predicts the motion of present humans. Figure 1 illustrates how interaction reasoning between all agents and explicit prediction can yield a farsighted navigation policy.

Planning and prediction of multi-agent states is fundamental in many problems, including the general case of decision making in an environment with non-cooperative agents. In this paper, we address this problem with a relational graph learning approach for model-based RL, which predicts future human motions and plans for crowd navigation simultaneously. We show that our model outperforms baselines from prior work and carry out ablation studies to measure the planning budget on navigation performance. We also qualitatively demonstrate that our approach mitigates undesirable robot behaviors in challenging scenarios. The code is available at

Fig. 1: Illustration of our relational graph learning approach (relation graph in blue). We infer interactions among the robot and humans and predict trajectories (line thickness indicates the strength of interaction; large/medium/small figures show current positions, the predicted position in next step, and the predicted position in two steps). By inferring a strong relation between the robot and the red human as well as the green and red human, and planning conditional on predicted human trajectories, we can find a safe path towards the goal.

Ii Related Work

Crowd navigation. Mobile robot navigation in crowds is challenging due to the complex ways in which human intent and social interactions determine human motions. Prior work has used rule-based algorithms to characterize the interactions between agents. Helbing et al. [18] proposed to model the interactions as ”social forces”. RVO [5] and ORCA [32] solve for collision avoidance under reciprocal assumptions. Interacting Gaussian Process (IGP) [29, 30, 31] model the trajectory of each agent as an individual Gaussian Process and propose an interaction potential term to couple the individual GP for interaction. Rule-based methods rely heavily on hand-crafting the model for navigation. Social-LSTM [2] used an LSTM to predict human trajectories directly from large scale datasets. However, for crowd navigation, forecasting models do not directly yield an action policy. Recently, Chen et al. [10, 9] propose to use DRL for crowd navigation by learning a value network encoding the state values.

LM-SARL [8] improved on previous work by learning the robot state representation with attentive pooling over pairwise interaction features. However, this model is limited by partial modeling of crowd interactions, due to significant simplifying assumptions for the underlying state transition without explicitly modeling human motions. Most recently, LeTS-Drive [6]

used online belief-tree search to learn a value and policy function for autonomous driving in a crowded space. Although this approach models intentions and interactions between the vehicle and humans, the interaction is coarse-grained, utilizing Convolutional Neural Networks (CNNs) to process stacked frames of the environment, the history and intended path of the vehicle. In contrast, we include pairwise interactions among all agents, which coupled with our graph learning, explicitly captures relations between agents and models higher-order interactions.

Relational reasoning. Relational reasoning [3, 33] aims to capture relationships between entities such as decision-making agents [40, 1], image pixels [36], words[24], or humans and objects [7, 14]. The relations and entities are typically represented as a connected graph [38, 34], and standard tools for graph-based learning such as Graph Neural Nets(GNNs) [17, 39, 23, 16] are applied. GNNs are a class of neural networks that learn functions on graphs, representing objects as nodes and relations as edges. Knowledge about object dynamics can be encoded in the GNN node update function, and interaction dynamics can be encoded in the edge update function. Most current GNNs operate on interacting systems with known and fixed graph inputs, such as human skeletons or particles connected by springs [4, 23]. However, many interacting systems have unknown relations. For example, humans in sports [20] and crowds [35]. Thus, it is important to infer relations among the entities and learn based on the inferred relations. In crowd navigation, the temporal dynamics of these relations are useful for planning safe and efficient crowd navigation (e.g., understanding the joint motion of a pair of friends as they cross the robot’s path while walking close to each other, in contrast to the motion of a pair of strangers walking in opposite directions and passing each other). Inspired by Kipf et al. [22], who estimate the graph connectivity map from trajectories using an auto-encoder architecture, our model dynamically infers the crowd-robot relation graph at each time step and learns the state representation for each agent based on this graph.

MPC, MCTS and model-based DRL. Model predictive control (MPC) is a family of control algorithms that leverage models of the system to predict state changes and plan control accordingly. Traditional MPC [37, 25] usually assumes access to a known environment model, which is frequently unrealistic. Monte-Carlo Tree Search (MCTS) has been used for decision-time planning by estimating action values based on many simulated trajectories in complex search problems such as the game of Go [27]. More recent model-based RL methods first acquire a predictive model of the world, and then use that model to make decisions. Finn et al. [13] learned a state transition model by predicting future frames to achieve goal-directed robotic manipulation. VPN [26] and Predictron [28] learned to predict future abstract states that are informative for planning in Atari Games. In contrast, our model predictive relational graph takes a set of raw human states (e.g., positions, velocities) as input and predicts multiple interacting human trajectories. To our knowledge, we are the first to integrate relational learning with a model-based DRL algorithm for crowd navigation.

Fig. 2: Given the crowd state as input, our approach uses -step planning to discover the best action sequence for safe and efficient crowd navigation: a) a relational graph learning model reasons about relations between agents and encodes local interactions. b) a value network predicts the value of the robot state representation. c) a motion network predicts future human states. d) with the learned value estimation and motion prediction models, our approach rolls out d steps into the future and searches for the best action sequence.

Iii Approach

We first describe how we formulate the crowd navigation problem with deep RL, then introduce our relational graph learning model for modeling interactions in the crowd. In addition, we show how this model can be integrated into a simplified Monte-Carlo Tree Search (MCTS) approach for d-step planning. Figure 2 shows an overview of our approach.

Iii-a Deep Reinforcement Learning for Crowd Navigation

In this work, we address the crowd navigation task where the robot navigates through a crowd of humans to a goal position as efficiently and safely as possible. This task is formulated as a sequential decision making problem in recent works [10, 12, 8]. Each agent (either human or the robot) observes others’ observable state, including position , velocity and radius (an abstract measure of size). Each agent also maintains an internal state, such as a goal position and preferred speed . We assume actions are instantaneous and the agent always arrives at the target position in the next time step. We use and to denote the robot state and the observed state of human at time , respectively. The robot input state is defined as . The optimal policy, , is to maximize the expected return:

where is the reward received at time , is the discount factor, is the optimal value function,

is the transition probability from time

to time . The preferred velocity is used as a normalization term in the discount factor for numerical reasons [10].

We follow the formulation of the reward function defined in Chen et al. [8], which awards accomplishing the task while penalizing collisions or uncomfortable distances,

where is the minimum separation distance between the robot and the humans during the time period .

We do not make assumptions about the state transition model as in previous work [8]. Instead, we use a model-based approach and learn a state transition model for predicting human motion.

This problem statement can be applied to a set of more general tasks where there are non-cooperative agents and the decision-maker only receives their observable states but does not know about their intents or hidden policies.

Iii-B Model Predictive Relational Graph Learning

The interactions (i.e., spatial relations) between humans are important for robot navigation and for predicting future human motion. Previous work does not learn such interaction features for the robot and humans simultaneously. Here, we model the crowd as a graph, reason about the relations between all agents and use a GCN to compute the robot and human state representations. Using the graph model as a building block, we further construct two other modules: a value estimation module which estimates the value of the current state and a state prediction module which predicts the state at the next time step.

Relational graph learning. The key challenge in crowd navigation is to learn a good representation of the crowd encoding interactions among all agents. Chen et al. [8] show that attentive crowd aggregation improves both interpretation and performance by modeling one-way human-robot interactions. This motivates us to model the crowd and the robot as a directed graph where . The edge indicates how much attention agent pays to agent or the importance of agent to agent . This pairwise relation is not known a priori, so it is inferred with a pairwise similarity function (relation inference). After the relations between all agents are inferred, a graph neural network propagates information from node to node and computes the state representations for all agents (interaction modeling) as shown in Figure 2 a).

LM-SARL [8] can be viewed as an approximation of our Relational Graph Learning (RGL) formulation, in that it learns robot interaction features with respect to all humans and uses attentive pooling to aggregate the interaction features. Our RGL formulation not only learns the attention of the robot to humans but also from humans to humans. Apart from learning the robot state representation, RGL also learns the state representation for other agents simultaneously and propagates these features to the robot node by message passing. In this way, RGL also models higher-order interactions. For example, fully models human-human interaction rather than using a local map (as in LM-SARL) to approximate the local interactions of humans. This approach is also favorable compared to LeTS-Drive [6] in that LeTS-Drive doesn’t reason about pairwise state interactions or model human-human interactions.

Iii-B1 Relation Inference

the initial values of vertices are the state values for the agents:

. Since robot and human states have different dimensions, we use two multilayer perceptrons (MLPs)

and to embed them into a latent space, resulting in a matrix , where the first row is the latent state of the robot and the remaining rows are the latent states of humans. Given the feature matrix , a relation matrix is computed using a pairwise similarity function. Following Wang et al. [36], we use an embedded Gaussian as the similarity function. The pairwise form is given by and the matrix form is given by where , , and . A learned relation is illustrated in Figure 1 where the thickness of the line indicates the strength of pairwise interactions.

Iii-B2 Interaction Modeling

with the feature matrix and relation matrix , we use a GCN to compute the pairwise interaction features. The message passing rule is defined by where is a layer-specific trainable weight matrix, is the node-level feature of layer , and

is an activation function. The feature of node

at level aggregates its neighbor node features at level weighted by the relations stored in matrix .

Let and after layer propagations, we have state representation matrix for , and is the state representation for agent encoding its local interactions.

Value estimation. Our value estimation module consists of two models: a relational graph model to infer the robot state representation and a subsequent value network to predict the value of the state . The value network is an MLP and predicts values over a discretized action space by looking ahead as shown in Figure 2 b).

1:function V-Plan(, d)
2:     if  then
3:          return
4:     end if
5:      top- actions by
6:     for  do
9:          V-Plan
10:     end for
11:     return
12:end function
Algorithm 1 -step planning with and

State prediction. We assume the action of the robot can be perfectly executed and the next robot state can be computed directly from the current state and action . Our state prediction module models interactions between human states to predict future human states, as shown in Figure 2 c). In the action selection phase, previous works [10, 8] rely on either querying the ground truth or off-the-shelf algorithms to approximate the next human states, which does not benefit from end-to-end learning. Our state prediction module consists of two models: first a relational graph model predicts relations between all agents and their layer interaction features, then a successor motion prediction model uses the human state representations to predict their next state , where is the predicted state for human with . The motion prediction network is modelled by an MLP. In addition, to simplify the prediction task, we use a hand-crafted function to estimate the reward based on the prediction of human states, denoted by .

Iii-C -step Planning

The use of DRL for crowd navigation addresses the problem of shortsighted collision avoidance in traditional methods [32] by learning a state value function. However, with imperfect value estimation and unknown human behavior policies, the robot can sometimes be stuck in a local minimum with suboptimal behaviors. By predicting future states and future rewards with our approach, we can simulate -steps into the future to provide a better estimate of the state values. Furthermore, by controlling the depth of this simulation, we can trade off computation for better performance. Our approach is visualized in the tree search diagram shown in Figure 2 d).

Compared to Chen et al. [8], our model does not need to query the environment or use off-the-shelf trajectory predictions to obtain the next human states. We follow the -step planning method proposed in Oh et al. [26], performing rollouts using the crowd state prediction and value estimation up to steps in the future, and select the action with the maximum -step predicted return, defined as follows:

where .

With -step planning, computation increases exponentially with search depth and width. Due to our large action space, we use action space clipping to reduce the computational cost. Intuitively, the value function estimates the quality of entering a state. Using this estimate, we recursively search the top- next states with one-step lookahead. Compared to only considering the top action, -step planning provides a better state value estimate when the agent encounters unseen states. Pseudocode for the -step planning is summarized in Algorithm 1. V-PLAN is a recursive function that takes a raw state as input and simulates future trajectories up to depth with action space clipping width . Figure 2 c) shows one toy example of tree search with depth and clipping width .

Iii-D Joint Value Estimation and State Prediction Learning

Pseudocode for the joint state prediction and value estimation learning scheme is in Algorithm 2

. We first use imitation learning with collected experience from a demonstrator ORCA 

[32] policy to initialize the model and then use RL to refine the policy. During RL training, we use temporal-difference with experience replay and a fixed target network to train the value estimation function

and use supervised learning to train the state prediction function

. Sharing graph models between and makes the RL training unstable and harder to optimize, so we use separate graph models for these two functions. Compared to the training scheme in [8], in addition to the extra steps for training the state prediction model, our training algorithm uses as the target value rather than , which provides more accurate value estimation.

1:Initialize with demonstration
2:Initialize target value network
3:Initialize experience replay memory
4:for episode = 1, M do
5:     Initialize random sequence
6:     repeat
7:                               where
8:          Execute and obtain and
9:          Store tuple ( in
10:          Sample random minibatch tuples from
11:          Set target for value network:
12:          Update by minimizing
13:          Set target for prediction network:
14:          Update by minimizing
15:     until terminal state or
16:     Update target value network
17:end for
Algorithm 2 Learning for and
Method Success Collision Extra Time Avg. Return Max Diff.
LM-SARL-Linear [8]
RGL-Linear (Ours)
MP-RGL-Onestep (Ours)
MP-RGL-Multistep (Ours full)
TABLE I: Quantitative results in circle crossing with five humans. The metrics are defined as follows: “Success”: the rate of robot reaching its goal without a collision; “Collision”: the rate of robot colliding with humans; “Extra Time”: extra navigation time to reach goal in seconds; “Avg. Return”: returns averaged over steps across episodes and all test cases. “Max Diff.”: the difference between actual average returns and the upper bound of average returns.

indicates standard deviation measured using five independently seeded training runs.

Iv Experimental Results

Implementation details. The hidden units of , , , have dimensions (, ), (, ), (, , ), (, ) and the output dimension of is for all . For fair comparison with the baseline method, we use the value network for the baseline as reported in Chen et al. [8]. All the parameters are trained using Adam [21], and the learning rate is . The discount factor is set to be . The exploration rate of the -greedy policy decays from to linearly in the first episodes. We assume holonomic kinematics for the robot, i.e. it can move in any direction. The action space consists of discrete actions: speeds exponentially spaced between (0, ] and headings evenly spaced between [0, 2).

Simulation setup. We use the CrowdNav111 simulation environment. In this simulation, humans are controlled by ORCA [32]

, the parameters of which are sampled from a Gaussian distribution to introduce behavioral diversity. We use circle crossing scenarios for our experiments. Circle crossing scenarios have

humans randomly positioned on a circle of radius with random perturbation added to their , coordinates. The maximum speed for the agent, is 1 , and the goal is fixed so that it can be reached in a minimum time of seconds. To fully evaluate the effectiveness of the proposed model, we look into the simulation setting where the robot is invisible to humans. As a result, the simulated humans react only to humans but not to the robot. This setting is a clean testbed for validating the model’s ability to reason about human-robot and human-human interaction without affecting human behaviors. All models are evaluated with random test cases.

Quantitative Evaluation. The state-of-the-art method LM-SARL [8] assumes that the next ground truth state is given during training. To demonstrate the effectiveness of our relational graph learning model, we use a linear motion model (agents keep velocities as in the last state) in LM-SARL as well as serving as our state prediction model so that the comparison is purely in robot state representation learning. These two models are indicated by LM-SARL-Linear and RGL-Linear, respectively. We refer to our full model as MP-RGL-Multistep and the model with one-step lookahead as MP-RGL-Onestep. For all RGL-based models, we use a two-layer GCN. For MP-RGL-Multistep, we let and . We do not compare with LeTS-Drive [6] as it focuses on components other than interaction modeling and assumes different inputs. Table I reports the rates of success, collision, the average extra navigation time (defined as extra navigation time beyond the minimum possible 8 seconds) as well as average return, which is the cumulative discounted rewards averaged over all steps in the 500 test cases. To provide a clearer interpretation of the average return metric, we add one more metric, that is the difference between the average return and its upper bound. The upper bound only exists in imaginary cases where there are no other humans and the agent can head straight to the goal all the time.

As expected, the performance of LM-SARL-Linear drops compared to the one reported in [8] after replacing the unrealistic assumption of access to the next ground truth state with a linear motion model. The first comparison between LM-SARL-Linear and RGL-Linear reflects the improvement brought by modeling the crowd interaction with graph neural networks. Then the comparison between RGL-Linear and MP-RGL-Onestep shows that using a state prediction model leads to a better estimate of human motions and thus yields a better navigation policy. Finally, the best model in the table is our MP-RGL-Multistep. Comparing it to the one-step version, we see that -step planning improves the success rate, reduces extra time to reach the goal and increases the average return. Even the best-performing model is not collision-free. The robot is invisible to humans and the policy does not have access to the next ground truth state, making some collisions unavoidable (e.g., when the robot is surrounded by humans converging towards it).

Effect of planning budget. In our -step planning, both the tree depth and action space clipping width influence the computation time and planning performance. With larger , our approach is able to look further into the future and plan further ahead. With larger , we consider more actions at each depth and can predict a wider range of possible outcomes, potentially reaching a better path. We study how the planning budget in -step planning influences performance. We tested the MP-RGL-Onestep model in Table I with various test-time planning budgets. By simply setting , the extra time decreases from to and the average return improves from to . With larger , the performance is also further improved. From these two experiments, we conclude that our -step planning leads to better performance both at learning and decision-making.

(a) Scenario 1.
(b) Scenario 2.
(c) Value heatmap for .
(d) Value heatmap for .
(e) Value heatmap for .
(f) Value heatmap for .
Fig. 3: Value estimates for two scenarios using different planning depths . for Scenario 1 and for Scenario 2. The top row shows the top-down view of the crowd, and the two heatmaps below visualize the estimated values over the action space of the robot. The red star in the value map indicates the action with the highest value.

Investigation of running time. The GNN-based state feature extractor is computationally efficient compared to attention-based crowd navigation work [8]. The GNN computation amounts to a matrix multiplication, with negligible change when the number of agents N is relatively small. Search complexity is mostly related to the depth and width . The running time of our full model is 0.095s, 0.101s, 0.114s, 0.115s per each decision making in scenarios of 5, 10, 15, 20 humans respectively on an i7-4790 CPU. This shows that i) our approach runs at  10 FPS which is sufficient for real-time settings when the robot moves at relatively slow speeds, and ii) the impact of scaling the human agent number is small.

Qualitative evaluation. We further investigate how our full model handles two challenging scenarios. The first one is when shortsighted behaviors lead to collisions, and the second one shows the robot freezing when it is close to both the goal and potential danger.

The scenario shown in Figure 2(a) is a typical example of the first scenario. The one-step lookahead policy assigns almost equally high values to actions in the direction of and , with being the highest. This is reasonable for a shortsighted policy since no human occupies either of these two directions. However, taking action around will result in a collision with human 0. By looking two steps ahead, the policy anticipates an unavoidable collision in two steps if it moves in the direction now. The advantage of the multi-step lookahead policy also validates the effectiveness of encoding higher-order human interactions. By observing the strong relation between human 0 and human 3, who are moving towards the same direction and are very close to each other, the robot predicts human 3 will move to a direction of and predicts negative rewards for taking an action in the direction of . Thus, the two-step lookahead policy assigns low values to the direction of and avoids collisions by taking action in the direction of .

The scenario shown in Figure 2(b) is a typical example of the second challenging scenario, where human 3 is standing near the goal position of the robot. The one-step lookahead policy assigns the highest value to the action in the direction of and a lower value to the action in the direction of . This is because taking an action towards will result in a discomfort penalty from stepping into the comfort zone of human 3. The two-step lookahead policy can predict the big positive reward after taking an action in the direction of , and the positive reward two steps later can compensate the discomfort penalty in one step. Thus, the two-step lookahead policy assigns the highest value to the action towards and makes a non-myopic decision.

V Real-world Demonstrations

We deploy our trained policy in real world on a Pioneer robotic platform equipped with an Intel RealSense ZR300 camera (see Figure 4). Test episodes are in the video and the video is available at

Fig. 4: Pioneer robot used for real-world demonstrations.

Vi Conclusion

In this paper, we address crowd navigation with a relational graph learning approach. By formulating human-robot and human-human interactions as a graph and using GCNs to compute interaction features, our model can estimate state values as well as predict human motion. Augmented by -step planning, our model explicitly plans into the future under a specific search budget. We show our model outperforms baseline methods by a large margin and can handle challenging navigation scenarios. The relational graph learning approach we proposed can be extended in several ways. For example, we do not model the temporal dynamics of past human trajectories, which can help infer the intent of individuals and group structure among humans.


  • [1] A. Agarwal, S. Kumar, and K. Sycara (2019-06-04) Learning transferable cooperative behavior in multi-agent teams. External Links: Link, 1906.01202 Cited by: §II.
  • [2] A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese (2016-06) Social LSTM: human trajectory prediction in crowded spaces. In

    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    pp. 961–971. External Links: ISBN 978-1-4673-8851-1, Link, Document Cited by: §I, §II.
  • [3] P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner, C. Gulcehre, F. Song, A. Ballard, J. Gilmer, G. Dahl, A. Vaswani, K. Allen, C. Nash, V. Langston, C. Dyer, N. Heess, D. Wierstra, P. Kohli, M. Botvinick, O. Vinyals, Y. Li, and R. Pascanu (2018-06-04)

    Relational inductive biases, deep learning, and graph networks

    External Links: Link, 1806.01261 Cited by: §II.
  • [4] P. W. Battaglia, R. Pascanu, M. Lai, D. Rezende, and K. Kavukcuoglu (2016-12-01) Interaction networks for learning about objects, relations and physics. External Links: Link, 1612.00222 Cited by: §I, §II.
  • [5] J. v. d. Berg, Ming Lin, and D. Manocha (2008-05) Reciprocal velocity obstacles for real-time multi-agent navigation. In 2008 IEEE International Conference on Robotics and Automation, pp. 1928–1935. External Links: Document Cited by: §II.
  • [6] P. Cai, Y. Luo, A. Saxena, D. Hsu, and W. S. Lee (2019-05-28) LeTS-drive: driving in a crowd by learning from tree search. External Links: Link, 1905.12197 Cited by: §II, §III-B, §IV.
  • [7] Y. Chao, Z. Wang, Y. He, J. Wang, and J. Deng (2015-12) HICO: a benchmark for recognizing human-object interactions in images. In 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1017–1025. External Links: ISBN 978-1-4673-8391-2, Link, Document Cited by: §II.
  • [8] C. Chen, Y. Liu, S. Kreiss, and A. Alahi (2018-09-24) Crowd-robot interaction: crowd-aware robot navigation with attention-based deep reinforcement learning. External Links: Link, 1809.08835 Cited by: §I, §II, §III-A, §III-A, §III-A, §III-B2, §III-B, §III-B, §III-C, §III-D, TABLE I, §IV, §IV, §IV, §IV.
  • [9] Y. F. Chen, M. Everett, M. Liu, and J. P. How (2017-03-26) Socially aware motion planning with deep reinforcement learning. External Links: Link, 1703.08862 Cited by: §II.
  • [10] Y. F. Chen, M. Liu, M. Everett, and J. P. How (2016-09-26) Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning. External Links: Link, 1609.07845 Cited by: §I, §II, §III-A, §III-B2.
  • [11] Z. Deng, A. Vahdat, H. Hu, and G. Mori (2015-11-13)

    Structure inference machines: recurrent neural networks for analyzing relations in group activity recognition

    External Links: Link, 1511.04196 Cited by: §I.
  • [12] M. Everett, Y. F. Chen, and J. P. How (2018-05-04) Motion planning among dynamic, decision-making agents with deep reinforcement learning. External Links: Link, 1805.01956 Cited by: §I, §III-A.
  • [13] C. Finn and S. Levine (2016-10-03) Deep visual foresight for planning robot motion. External Links: Link, 1610.00696 Cited by: §II.
  • [14] G. Gkioxari, R. Girshick, P. Dollár, and K. He (2017-04-24) Detecting and recognizing human-object interactions. External Links: Link, 1704.07333 Cited by: §II.
  • [15] A. Grover, M. Al-Shedivat, J. K. Gupta, Y. Burda, and H. Edwards (2018-06-17) Learning policy representations in multiagent systems. External Links: Link, 1806.06464 Cited by: §I.
  • [16] W. L. Hamilton, R. Ying, and J. Leskovec (2017-06-07) Inductive representation learning on large graphs. External Links: Link, 1706.02216 Cited by: §II.
  • [17] W. L. Hamilton, R. Ying, and J. Leskovec (2017-09-16) Representation learning on graphs: methods and applications. External Links: Link, 1709.05584 Cited by: §II.
  • [18] D. Helbing and P. Molnár (1995-05-01) Social force model for pedestrian dynamics. 51 (5), pp. 4282–4286. External Links: Link, Document Cited by: §I, §II.
  • [19] Y. Hoshen (2017-06-19) VAIN: attentional multi-agent predictive modeling. External Links: Link, 1706.06122 Cited by: §I.
  • [20] M. S. Ibrahim and G. Mori (2018) Hierarchical relational networks for group activity recognition and retrieval. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 721–736. Cited by: §I, §II.
  • [21] D. P. Kingma and J. Ba (2014-12) Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs]. Note: arXiv: 1412.6980Comment: Published as a conference paper at the 3rd International Conference for Learning Representations, San Diego, 2015 External Links: Link Cited by: §IV.
  • [22] T. Kipf, E. Fetaya, K. Wang, M. Welling, and R. Zemel (2018-02-13) Neural relational inference for interacting systems. External Links: Link, 1802.04687 Cited by: §I, §II.
  • [23] T. N. Kipf and M. Welling (2016-09-09) Semi-supervised classification with graph convolutional networks. External Links: Link, 1609.02907 Cited by: §I, §II.
  • [24] Z. Lin, M. Feng, C. N. d. Santos, M. Yu, B. Xiang, B. Zhou, and Y. Bengio (2017-03-08) A structured self-attentive sentence embedding. External Links: Link, 1703.03130 Cited by: §II.
  • [25] D. Q. Mayne, M. M. Seron, and S. V. Raković (2005-02-01) Robust model predictive control of constrained linear systems with bounded disturbances. 41 (2), pp. 219–224. External Links: ISSN 0005-1098, Link, Document Cited by: §II.
  • [26] J. Oh, S. Singh, and H. Lee (2017-07-11) Value prediction network. External Links: Link, 1707.03497 Cited by: §II, §III-C.
  • [27] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis (2016-01) Mastering the game of go with deep neural networks and tree search. 529 (7587), pp. 484–489. External Links: ISSN 0028-0836, 1476-4687, Link, Document Cited by: §II.
  • [28] D. Silver, H. van Hasselt, M. Hessel, T. Schaul, A. Guez, T. Harley, G. Dulac-Arnold, D. Reichert, N. Rabinowitz, A. Barreto, and T. Degris The predictron: end-to-end learning and planning. pp. 9. Cited by: §II.
  • [29] P. Trautman and A. Krause (2010-10) Unfreezing the robot: navigation in dense, interacting crowds. In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 797–803. External Links: Document Cited by: §II.
  • [30] P. Trautman, J. Ma, R. M. Murray, and A. Krause (2013-05) Robot navigation in dense human crowds: the case for cooperation. In 2013 IEEE International Conference on Robotics and Automation, pp. 2153–2160. External Links: Document Cited by: §II.
  • [31] P. Trautman (2017-12) Sparse interacting gaussian processes: efficiency and optimality theorems of autonomous crowd navigation. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 327–334. External Links: Document Cited by: §II.
  • [32] J. van den Berg, S. J. Guy, M. Lin, and D. Manocha (2011) Reciprocal n-body collision avoidance. In Robotics Research, C. Pradalier, R. Siegwart, and G. Hirzinger (Eds.), Springer Tracts in Advanced Robotics, pp. 3–19. External Links: ISBN 978-3-642-19457-3 Cited by: §II, §III-C, §III-D, §IV.
  • [33] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin (2017-06-12) Attention is all you need. External Links: Link, 1706.03762 Cited by: §II.
  • [34] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio (2017-10-30) Graph attention networks. External Links: Link, 1710.10903 Cited by: §II.
  • [35] A. Vemula, K. Muelling, and J. Oh (2017-10-12)

    Social attention: modeling attention in human crowds

    External Links: Link, 1710.04689 Cited by: §II.
  • [36] X. Wang, R. Girshick, A. Gupta, and K. He (2017-11-21) Non-local neural networks. External Links: Link, 1711.07971 Cited by: §I, §II, §III-B1.
  • [37] P. Wieber (2006-12) Trajectory free linear model predictive control for stable walking in the presence of strong perturbations. In 2006 6th IEEE-RAS International Conference on Humanoid Robots, pp. 137–142. External Links: Document Cited by: §II.
  • [38] D. Xu, Y. Zhu, C. B. Choy, and L. Fei-Fei (2017-01-09) Scene graph generation by iterative message passing. External Links: Link, 1701.02426 Cited by: §II.
  • [39] K. Xu, W. Hu, J. Leskovec, and S. Jegelka (2019) HOW POWERFUL ARE GRAPH NEURAL NETWORKS?. pp. 17. Cited by: §II.
  • [40] V. Zambaldi, D. Raposo, A. Santoro, V. Bapst, Y. Li, I. Babuschkin, K. Tuyls, D. Reichert, T. Lillicrap, E. Lockhart, M. Shanahan, V. Langston, R. Pascanu, M. Botvinick, O. Vinyals, and P. Battaglia (2018-06-05) Relational deep reinforcement learning. External Links: Link, 1806.01830 Cited by: §II.