I Introduction
Inferring the underlying relations between components of complex dynamic systems can inform decision making for autonomous agents. One natural system with complex dynamics is crowd navigation (i.e., navigation in the presence of multiple humans). The crowd navigation task is challenging as the agent must predict and plan relative to likely human motions so as to avoid collisions and remain at safe and socially appropriate distances from people. Some prior work predicts human trajectories using handcrafted social interaction models [18] or by modeling the temporal behavior of humans [2]. Although these methods can estimate human trajectories, they do not use the prediction to inform the navigation policy. Other recent works [10, 12, 8]
use deep reinforcement learning (DRL) to learn a socially compliant policy. These policies either do not leverage the human interactions or approximate it with heuristics. They also simplify the human motion prediction problem with unrealistic assumptions such as linear human motion, and typically consider only the current state of humans instead of incorporating predicted human motion to inform the policy.
More broadly, interacting systems have been studied extensively in recent work [4, 19, 22, 20, 11]
and Graph Neural Networks (GNNs) are one of the most powerful tools for modeling objects and their relations (interactions). A variant of GNNs is Graph Convolutional Networks (GCNs)
[23] where relations between nodes are defined as an adjacency matrix. Whereas in GCNs the relations between all nodes are given, Wang et al. [36] and Grover et al. [15] propose to learn the relations between objects and use learned attention to compute new features. Inspired by this work on relational reasoning and GNN models, we propose a relational graph learning model for crowd navigation that reasons about relations between agents (robot and humans) and then use a GCN to compute interactions between the agents. With the predicted interaction features of both the robot and humans, our approach jointly plans efficient robot navigation and predicts the motion of present humans. Figure 1 illustrates how interaction reasoning between all agents and explicit prediction can yield a farsighted navigation policy.Planning and prediction of multiagent states is fundamental in many problems, including the general case of decision making in an environment with noncooperative agents. In this paper, we address this problem with a relational graph learning approach for modelbased RL, which predicts future human motions and plans for crowd navigation simultaneously. We show that our model outperforms baselines from prior work and carry out ablation studies to measure the planning budget on navigation performance. We also qualitatively demonstrate that our approach mitigates undesirable robot behaviors in challenging scenarios. The code is available at github.com/ChanganVR/RelationalGraphLearning.
Ii Related Work
Crowd navigation. Mobile robot navigation in crowds is challenging due to the complex ways in which human intent and social interactions determine human motions. Prior work has used rulebased algorithms to characterize the interactions between agents. Helbing et al. [18] proposed to model the interactions as ”social forces”. RVO [5] and ORCA [32] solve for collision avoidance under reciprocal assumptions. Interacting Gaussian Process (IGP) [29, 30, 31] model the trajectory of each agent as an individual Gaussian Process and propose an interaction potential term to couple the individual GP for interaction. Rulebased methods rely heavily on handcrafting the model for navigation. SocialLSTM [2] used an LSTM to predict human trajectories directly from large scale datasets. However, for crowd navigation, forecasting models do not directly yield an action policy. Recently, Chen et al. [10, 9] propose to use DRL for crowd navigation by learning a value network encoding the state values.
LMSARL [8] improved on previous work by learning the robot state representation with attentive pooling over pairwise interaction features. However, this model is limited by partial modeling of crowd interactions, due to significant simplifying assumptions for the underlying state transition without explicitly modeling human motions. Most recently, LeTSDrive [6]
used online belieftree search to learn a value and policy function for autonomous driving in a crowded space. Although this approach models intentions and interactions between the vehicle and humans, the interaction is coarsegrained, utilizing Convolutional Neural Networks (CNNs) to process stacked frames of the environment, the history and intended path of the vehicle. In contrast, we include pairwise interactions among all agents, which coupled with our graph learning, explicitly captures relations between agents and models higherorder interactions.
Relational reasoning. Relational reasoning [3, 33] aims to capture relationships between entities such as decisionmaking agents [40, 1], image pixels [36], words[24], or humans and objects [7, 14]. The relations and entities are typically represented as a connected graph [38, 34], and standard tools for graphbased learning such as Graph Neural Nets(GNNs) [17, 39, 23, 16] are applied. GNNs are a class of neural networks that learn functions on graphs, representing objects as nodes and relations as edges. Knowledge about object dynamics can be encoded in the GNN node update function, and interaction dynamics can be encoded in the edge update function. Most current GNNs operate on interacting systems with known and fixed graph inputs, such as human skeletons or particles connected by springs [4, 23]. However, many interacting systems have unknown relations. For example, humans in sports [20] and crowds [35]. Thus, it is important to infer relations among the entities and learn based on the inferred relations. In crowd navigation, the temporal dynamics of these relations are useful for planning safe and efficient crowd navigation (e.g., understanding the joint motion of a pair of friends as they cross the robot’s path while walking close to each other, in contrast to the motion of a pair of strangers walking in opposite directions and passing each other). Inspired by Kipf et al. [22], who estimate the graph connectivity map from trajectories using an autoencoder architecture, our model dynamically infers the crowdrobot relation graph at each time step and learns the state representation for each agent based on this graph.
MPC, MCTS and modelbased DRL. Model predictive control (MPC) is a family of control algorithms that leverage models of the system to predict state changes and plan control accordingly. Traditional MPC [37, 25] usually assumes access to a known environment model, which is frequently unrealistic. MonteCarlo Tree Search (MCTS) has been used for decisiontime planning by estimating action values based on many simulated trajectories in complex search problems such as the game of Go [27]. More recent modelbased RL methods first acquire a predictive model of the world, and then use that model to make decisions. Finn et al. [13] learned a state transition model by predicting future frames to achieve goaldirected robotic manipulation. VPN [26] and Predictron [28] learned to predict future abstract states that are informative for planning in Atari Games. In contrast, our model predictive relational graph takes a set of raw human states (e.g., positions, velocities) as input and predicts multiple interacting human trajectories. To our knowledge, we are the first to integrate relational learning with a modelbased DRL algorithm for crowd navigation.
Iii Approach
We first describe how we formulate the crowd navigation problem with deep RL, then introduce our relational graph learning model for modeling interactions in the crowd. In addition, we show how this model can be integrated into a simplified MonteCarlo Tree Search (MCTS) approach for dstep planning. Figure 2 shows an overview of our approach.
Iiia Deep Reinforcement Learning for Crowd Navigation
In this work, we address the crowd navigation task where the robot navigates through a crowd of humans to a goal position as efficiently and safely as possible. This task is formulated as a sequential decision making problem in recent works [10, 12, 8]. Each agent (either human or the robot) observes others’ observable state, including position , velocity and radius (an abstract measure of size). Each agent also maintains an internal state, such as a goal position and preferred speed . We assume actions are instantaneous and the agent always arrives at the target position in the next time step. We use and to denote the robot state and the observed state of human at time , respectively. The robot input state is defined as . The optimal policy, , is to maximize the expected return:
where is the reward received at time , is the discount factor, is the optimal value function,
is the transition probability from time
to time . The preferred velocity is used as a normalization term in the discount factor for numerical reasons [10].We follow the formulation of the reward function defined in Chen et al. [8], which awards accomplishing the task while penalizing collisions or uncomfortable distances,
where is the minimum separation distance between the robot and the humans during the time period .
We do not make assumptions about the state transition model as in previous work [8]. Instead, we use a modelbased approach and learn a state transition model for predicting human motion.
This problem statement can be applied to a set of more general tasks where there are noncooperative agents and the decisionmaker only receives their observable states but does not know about their intents or hidden policies.
IiiB Model Predictive Relational Graph Learning
The interactions (i.e., spatial relations) between humans are important for robot navigation and for predicting future human motion. Previous work does not learn such interaction features for the robot and humans simultaneously. Here, we model the crowd as a graph, reason about the relations between all agents and use a GCN to compute the robot and human state representations. Using the graph model as a building block, we further construct two other modules: a value estimation module which estimates the value of the current state and a state prediction module which predicts the state at the next time step.
Relational graph learning. The key challenge in crowd navigation is to learn a good representation of the crowd encoding interactions among all agents. Chen et al. [8] show that attentive crowd aggregation improves both interpretation and performance by modeling oneway humanrobot interactions. This motivates us to model the crowd and the robot as a directed graph where . The edge indicates how much attention agent pays to agent or the importance of agent to agent . This pairwise relation is not known a priori, so it is inferred with a pairwise similarity function (relation inference). After the relations between all agents are inferred, a graph neural network propagates information from node to node and computes the state representations for all agents (interaction modeling) as shown in Figure 2 a).
LMSARL [8] can be viewed as an approximation of our Relational Graph Learning (RGL) formulation, in that it learns robot interaction features with respect to all humans and uses attentive pooling to aggregate the interaction features. Our RGL formulation not only learns the attention of the robot to humans but also from humans to humans. Apart from learning the robot state representation, RGL also learns the state representation for other agents simultaneously and propagates these features to the robot node by message passing. In this way, RGL also models higherorder interactions. For example, fully models humanhuman interaction rather than using a local map (as in LMSARL) to approximate the local interactions of humans. This approach is also favorable compared to LeTSDrive [6] in that LeTSDrive doesn’t reason about pairwise state interactions or model humanhuman interactions.
IiiB1 Relation Inference
the initial values of vertices are the state values for the agents:
. Since robot and human states have different dimensions, we use two multilayer perceptrons (MLPs)
and to embed them into a latent space, resulting in a matrix , where the first row is the latent state of the robot and the remaining rows are the latent states of humans. Given the feature matrix , a relation matrix is computed using a pairwise similarity function. Following Wang et al. [36], we use an embedded Gaussian as the similarity function. The pairwise form is given by and the matrix form is given by where , , and . A learned relation is illustrated in Figure 1 where the thickness of the line indicates the strength of pairwise interactions.IiiB2 Interaction Modeling
with the feature matrix and relation matrix , we use a GCN to compute the pairwise interaction features. The message passing rule is defined by where is a layerspecific trainable weight matrix, is the nodelevel feature of layer , and
is an activation function. The feature of node
at level aggregates its neighbor node features at level weighted by the relations stored in matrix .Let and after layer propagations, we have state representation matrix for , and is the state representation for agent encoding its local interactions.
Value estimation. Our value estimation module consists of two models: a relational graph model to infer the robot state representation and a subsequent value network to predict the value of the state . The value network is an MLP and predicts values over a discretized action space by looking ahead as shown in Figure 2 b).
State prediction. We assume the action of the robot can be perfectly executed and the next robot state can be computed directly from the current state and action . Our state prediction module models interactions between human states to predict future human states, as shown in Figure 2 c). In the action selection phase, previous works [10, 8] rely on either querying the ground truth or offtheshelf algorithms to approximate the next human states, which does not benefit from endtoend learning. Our state prediction module consists of two models: first a relational graph model predicts relations between all agents and their layer interaction features, then a successor motion prediction model uses the human state representations to predict their next state , where is the predicted state for human with . The motion prediction network is modelled by an MLP. In addition, to simplify the prediction task, we use a handcrafted function to estimate the reward based on the prediction of human states, denoted by .
IiiC step Planning
The use of DRL for crowd navigation addresses the problem of shortsighted collision avoidance in traditional methods [32] by learning a state value function. However, with imperfect value estimation and unknown human behavior policies, the robot can sometimes be stuck in a local minimum with suboptimal behaviors. By predicting future states and future rewards with our approach, we can simulate steps into the future to provide a better estimate of the state values. Furthermore, by controlling the depth of this simulation, we can trade off computation for better performance. Our approach is visualized in the tree search diagram shown in Figure 2 d).
Compared to Chen et al. [8], our model does not need to query the environment or use offtheshelf trajectory predictions to obtain the next human states. We follow the step planning method proposed in Oh et al. [26], performing rollouts using the crowd state prediction and value estimation up to steps in the future, and select the action with the maximum step predicted return, defined as follows:
where .
With step planning, computation increases exponentially with search depth and width. Due to our large action space, we use action space clipping to reduce the computational cost. Intuitively, the value function estimates the quality of entering a state. Using this estimate, we recursively search the top next states with onestep lookahead. Compared to only considering the top action, step planning provides a better state value estimate when the agent encounters unseen states. Pseudocode for the step planning is summarized in Algorithm 1. VPLAN is a recursive function that takes a raw state as input and simulates future trajectories up to depth with action space clipping width . Figure 2 c) shows one toy example of tree search with depth and clipping width .
IiiD Joint Value Estimation and State Prediction Learning
Pseudocode for the joint state prediction and value estimation learning scheme is in Algorithm 2
. We first use imitation learning with collected experience from a demonstrator ORCA
[32] policy to initialize the model and then use RL to refine the policy. During RL training, we use temporaldifference with experience replay and a fixed target network to train the value estimation functionand use supervised learning to train the state prediction function
. Sharing graph models between and makes the RL training unstable and harder to optimize, so we use separate graph models for these two functions. Compared to the training scheme in [8], in addition to the extra steps for training the state prediction model, our training algorithm uses as the target value rather than , which provides more accurate value estimation.Method  Success  Collision  Extra Time  Avg. Return  Max Diff. 

LMSARLLinear [8]  
RGLLinear (Ours)  
MPRGLOnestep (Ours)  
MPRGLMultistep (Ours full) 
indicates standard deviation measured using five independently seeded training runs.
Iv Experimental Results
Implementation details. The hidden units of , , , have dimensions (, ), (, ), (, , ), (, ) and the output dimension of is for all . For fair comparison with the baseline method, we use the value network for the baseline as reported in Chen et al. [8]. All the parameters are trained using Adam [21], and the learning rate is . The discount factor is set to be . The exploration rate of the greedy policy decays from to linearly in the first episodes. We assume holonomic kinematics for the robot, i.e. it can move in any direction. The action space consists of discrete actions: speeds exponentially spaced between (0, ] and headings evenly spaced between [0, 2).
Simulation setup. We use the CrowdNav^{1}^{1}1https://github.com/vitaepfl/CrowdNav simulation environment. In this simulation, humans are controlled by ORCA [32]
, the parameters of which are sampled from a Gaussian distribution to introduce behavioral diversity. We use circle crossing scenarios for our experiments. Circle crossing scenarios have
humans randomly positioned on a circle of radius with random perturbation added to their , coordinates. The maximum speed for the agent, is 1 , and the goal is fixed so that it can be reached in a minimum time of seconds. To fully evaluate the effectiveness of the proposed model, we look into the simulation setting where the robot is invisible to humans. As a result, the simulated humans react only to humans but not to the robot. This setting is a clean testbed for validating the model’s ability to reason about humanrobot and humanhuman interaction without affecting human behaviors. All models are evaluated with random test cases.Quantitative Evaluation. The stateoftheart method LMSARL [8] assumes that the next ground truth state is given during training. To demonstrate the effectiveness of our relational graph learning model, we use a linear motion model (agents keep velocities as in the last state) in LMSARL as well as serving as our state prediction model so that the comparison is purely in robot state representation learning. These two models are indicated by LMSARLLinear and RGLLinear, respectively. We refer to our full model as MPRGLMultistep and the model with onestep lookahead as MPRGLOnestep. For all RGLbased models, we use a twolayer GCN. For MPRGLMultistep, we let and . We do not compare with LeTSDrive [6] as it focuses on components other than interaction modeling and assumes different inputs. Table I reports the rates of success, collision, the average extra navigation time (defined as extra navigation time beyond the minimum possible 8 seconds) as well as average return, which is the cumulative discounted rewards averaged over all steps in the 500 test cases. To provide a clearer interpretation of the average return metric, we add one more metric, that is the difference between the average return and its upper bound. The upper bound only exists in imaginary cases where there are no other humans and the agent can head straight to the goal all the time.
As expected, the performance of LMSARLLinear drops compared to the one reported in [8] after replacing the unrealistic assumption of access to the next ground truth state with a linear motion model. The first comparison between LMSARLLinear and RGLLinear reflects the improvement brought by modeling the crowd interaction with graph neural networks. Then the comparison between RGLLinear and MPRGLOnestep shows that using a state prediction model leads to a better estimate of human motions and thus yields a better navigation policy. Finally, the best model in the table is our MPRGLMultistep. Comparing it to the onestep version, we see that step planning improves the success rate, reduces extra time to reach the goal and increases the average return. Even the bestperforming model is not collisionfree. The robot is invisible to humans and the policy does not have access to the next ground truth state, making some collisions unavoidable (e.g., when the robot is surrounded by humans converging towards it).
Effect of planning budget. In our step planning, both the tree depth and action space clipping width influence the computation time and planning performance. With larger , our approach is able to look further into the future and plan further ahead. With larger , we consider more actions at each depth and can predict a wider range of possible outcomes, potentially reaching a better path. We study how the planning budget in step planning influences performance. We tested the MPRGLOnestep model in Table I with various testtime planning budgets. By simply setting , the extra time decreases from to and the average return improves from to . With larger , the performance is also further improved. From these two experiments, we conclude that our step planning leads to better performance both at learning and decisionmaking.






Investigation of running time. The GNNbased state feature extractor is computationally efficient compared to attentionbased crowd navigation work [8]. The GNN computation amounts to a matrix multiplication, with negligible change when the number of agents N is relatively small. Search complexity is mostly related to the depth and width . The running time of our full model is 0.095s, 0.101s, 0.114s, 0.115s per each decision making in scenarios of 5, 10, 15, 20 humans respectively on an i74790 CPU. This shows that i) our approach runs at 10 FPS which is sufficient for realtime settings when the robot moves at relatively slow speeds, and ii) the impact of scaling the human agent number is small.
Qualitative evaluation. We further investigate how our full model handles two challenging scenarios. The first one is when shortsighted behaviors lead to collisions, and the second one shows the robot freezing when it is close to both the goal and potential danger.
The scenario shown in Figure 2(a) is a typical example of the first scenario. The onestep lookahead policy assigns almost equally high values to actions in the direction of and , with being the highest. This is reasonable for a shortsighted policy since no human occupies either of these two directions. However, taking action around will result in a collision with human 0. By looking two steps ahead, the policy anticipates an unavoidable collision in two steps if it moves in the direction now. The advantage of the multistep lookahead policy also validates the effectiveness of encoding higherorder human interactions. By observing the strong relation between human 0 and human 3, who are moving towards the same direction and are very close to each other, the robot predicts human 3 will move to a direction of and predicts negative rewards for taking an action in the direction of . Thus, the twostep lookahead policy assigns low values to the direction of and avoids collisions by taking action in the direction of .
The scenario shown in Figure 2(b) is a typical example of the second challenging scenario, where human 3 is standing near the goal position of the robot. The onestep lookahead policy assigns the highest value to the action in the direction of and a lower value to the action in the direction of . This is because taking an action towards will result in a discomfort penalty from stepping into the comfort zone of human 3. The twostep lookahead policy can predict the big positive reward after taking an action in the direction of , and the positive reward two steps later can compensate the discomfort penalty in one step. Thus, the twostep lookahead policy assigns the highest value to the action towards and makes a nonmyopic decision.
V Realworld Demonstrations
We deploy our trained policy in real world on a Pioneer robotic platform equipped with an Intel RealSense ZR300 camera (see Figure 4). Test episodes are in the video and the video is available at https://youtu.be/U3quW30Eu3A.
Vi Conclusion
In this paper, we address crowd navigation with a relational graph learning approach. By formulating humanrobot and humanhuman interactions as a graph and using GCNs to compute interaction features, our model can estimate state values as well as predict human motion. Augmented by step planning, our model explicitly plans into the future under a specific search budget. We show our model outperforms baseline methods by a large margin and can handle challenging navigation scenarios. The relational graph learning approach we proposed can be extended in several ways. For example, we do not model the temporal dynamics of past human trajectories, which can help infer the intent of individuals and group structure among humans.
References
 [1] (20190604) Learning transferable cooperative behavior in multiagent teams. External Links: Link, 1906.01202 Cited by: §II.

[2]
(201606)
Social LSTM: human trajectory prediction in crowded spaces.
In
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, pp. 961–971. External Links: ISBN 9781467388511, Link, Document Cited by: §I, §II. 
[3]
(20180604)
Relational inductive biases, deep learning, and graph networks
. External Links: Link, 1806.01261 Cited by: §II.  [4] (20161201) Interaction networks for learning about objects, relations and physics. External Links: Link, 1612.00222 Cited by: §I, §II.
 [5] (200805) Reciprocal velocity obstacles for realtime multiagent navigation. In 2008 IEEE International Conference on Robotics and Automation, pp. 1928–1935. External Links: Document Cited by: §II.
 [6] (20190528) LeTSdrive: driving in a crowd by learning from tree search. External Links: Link, 1905.12197 Cited by: §II, §IIIB, §IV.
 [7] (201512) HICO: a benchmark for recognizing humanobject interactions in images. In 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1017–1025. External Links: ISBN 9781467383912, Link, Document Cited by: §II.
 [8] (20180924) Crowdrobot interaction: crowdaware robot navigation with attentionbased deep reinforcement learning. External Links: Link, 1809.08835 Cited by: §I, §II, §IIIA, §IIIA, §IIIA, §IIIB2, §IIIB, §IIIB, §IIIC, §IIID, TABLE I, §IV, §IV, §IV, §IV.
 [9] (20170326) Socially aware motion planning with deep reinforcement learning. External Links: Link, 1703.08862 Cited by: §II.
 [10] (20160926) Decentralized noncommunicating multiagent collision avoidance with deep reinforcement learning. External Links: Link, 1609.07845 Cited by: §I, §II, §IIIA, §IIIB2.

[11]
(20151113)
Structure inference machines: recurrent neural networks for analyzing relations in group activity recognition
. External Links: Link, 1511.04196 Cited by: §I.  [12] (20180504) Motion planning among dynamic, decisionmaking agents with deep reinforcement learning. External Links: Link, 1805.01956 Cited by: §I, §IIIA.
 [13] (20161003) Deep visual foresight for planning robot motion. External Links: Link, 1610.00696 Cited by: §II.
 [14] (20170424) Detecting and recognizing humanobject interactions. External Links: Link, 1704.07333 Cited by: §II.
 [15] (20180617) Learning policy representations in multiagent systems. External Links: Link, 1806.06464 Cited by: §I.
 [16] (20170607) Inductive representation learning on large graphs. External Links: Link, 1706.02216 Cited by: §II.
 [17] (20170916) Representation learning on graphs: methods and applications. External Links: Link, 1709.05584 Cited by: §II.
 [18] (19950501) Social force model for pedestrian dynamics. 51 (5), pp. 4282–4286. External Links: Link, Document Cited by: §I, §II.
 [19] (20170619) VAIN: attentional multiagent predictive modeling. External Links: Link, 1706.06122 Cited by: §I.
 [20] (2018) Hierarchical relational networks for group activity recognition and retrieval. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 721–736. Cited by: §I, §II.
 [21] (201412) Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs]. Note: arXiv: 1412.6980Comment: Published as a conference paper at the 3rd International Conference for Learning Representations, San Diego, 2015 External Links: Link Cited by: §IV.
 [22] (20180213) Neural relational inference for interacting systems. External Links: Link, 1802.04687 Cited by: §I, §II.
 [23] (20160909) Semisupervised classification with graph convolutional networks. External Links: Link, 1609.02907 Cited by: §I, §II.
 [24] (20170308) A structured selfattentive sentence embedding. External Links: Link, 1703.03130 Cited by: §II.
 [25] (20050201) Robust model predictive control of constrained linear systems with bounded disturbances. 41 (2), pp. 219–224. External Links: ISSN 00051098, Link, Document Cited by: §II.
 [26] (20170711) Value prediction network. External Links: Link, 1707.03497 Cited by: §II, §IIIC.
 [27] (201601) Mastering the game of go with deep neural networks and tree search. 529 (7587), pp. 484–489. External Links: ISSN 00280836, 14764687, Link, Document Cited by: §II.
 [28] The predictron: endtoend learning and planning. pp. 9. Cited by: §II.
 [29] (201010) Unfreezing the robot: navigation in dense, interacting crowds. In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 797–803. External Links: Document Cited by: §II.
 [30] (201305) Robot navigation in dense human crowds: the case for cooperation. In 2013 IEEE International Conference on Robotics and Automation, pp. 2153–2160. External Links: Document Cited by: §II.
 [31] (201712) Sparse interacting gaussian processes: efficiency and optimality theorems of autonomous crowd navigation. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 327–334. External Links: Document Cited by: §II.
 [32] (2011) Reciprocal nbody collision avoidance. In Robotics Research, C. Pradalier, R. Siegwart, and G. Hirzinger (Eds.), Springer Tracts in Advanced Robotics, pp. 3–19. External Links: ISBN 9783642194573 Cited by: §II, §IIIC, §IIID, §IV.
 [33] (20170612) Attention is all you need. External Links: Link, 1706.03762 Cited by: §II.
 [34] (20171030) Graph attention networks. External Links: Link, 1710.10903 Cited by: §II.

[35]
(20171012)
Social attention: modeling attention in human crowds
. External Links: Link, 1710.04689 Cited by: §II.  [36] (20171121) Nonlocal neural networks. External Links: Link, 1711.07971 Cited by: §I, §II, §IIIB1.
 [37] (200612) Trajectory free linear model predictive control for stable walking in the presence of strong perturbations. In 2006 6th IEEERAS International Conference on Humanoid Robots, pp. 137–142. External Links: Document Cited by: §II.
 [38] (20170109) Scene graph generation by iterative message passing. External Links: Link, 1701.02426 Cited by: §II.
 [39] (2019) HOW POWERFUL ARE GRAPH NEURAL NETWORKS?. pp. 17. Cited by: §II.
 [40] (20180605) Relational deep reinforcement learning. External Links: Link, 1806.01830 Cited by: §II.
Comments
There are no comments yet.