Learning Decentralized Controllers for Robot Swarms with Graph Neural Networks

by   Ekaterina Tolstaya, et al.

We consider the problem of finding distributed controllers for large networks of mobile robots with interacting dynamics and sparsely available communications. Our approach is to learn local controllers which require only local information and local communications at test time by imitating the policy of centralized controllers using global information at training time. By extending aggregation graph neural networks to time varying signals and time varying network support, we learn a single common local controller which exploits information from distant teammates using only local communication interchanges. We apply this approach to a decentralized linear quadratic regulator problem and observe how faster communication rates and smaller network degree increase the value of multi-hop information. Separate experiments learning a decentralized flocking controller demonstrate performance on communication graphs that change as the robots move.


Coverage Control in Multi-Robot Systems via Graph Neural Networks

This paper develops a decentralized approach to mobile sensor coverage b...

Decentralized Control of a Hexapod Robot Using a Wireless Time Synchronized Network

Robots and control systems rely upon precise timing of sensors and actua...

Centralized and Decentralized Global Outer-synchronization of Asymmetric Recurrent Time-varying Neural Network by Data-sampling

In this paper, we discuss the outer-synchronization of the asymmetricall...

Communication Topology Co-Design in Graph Recurrent Neural Network Based Distributed Control

When designing large-scale distributed controllers, the information-shar...

Decentralized Control with Graph Neural Networks

Dynamical systems consisting of a set of autonomous agents face the chal...

Scalable Perception-Action-Communication Loops with Convolutional and Graph Neural Networks

In this paper, we present a perception-action-communication loop design ...

Controlling the Charging of Electric Vehicles with Neural Networks

We propose and evaluate controllers for the coordination of the charging...

I Introduction

The modern world depends on the fixed infrastructure provided by large scale cyber-physical systems. Smart-grid utilities, road traffic control systems, and industrial processes are all engineering challenges which involve complex plant dynamics, sparse placements of sensors and actuators, and communication constraints which require local controllers to perform with locally available information.

In the future, we envision hundreds of mobile robots working cooperatively to perform services such as providing on-demand wireless networks [21], environmental mapping [27, 26], search after natural disasters [1, 13], or sensor coverage in cluttered and communications denied environments [32]. Mobile robots force distributed solutions. The task may require shared information, but computation, sensing, and actuation are local to each robot. Mobility also poses unique challenges not present with fixed infrastructure. The communication graph between agents is no longer fixed. In addition, we wish robots teams to obtain resiliency through redundancy and interchangeability, and this motivates deploying a single shared local policy for each agent instead of a suite of specialized local controllers.

Our goal is to learn local agent policies which can collectively control a large scale dynamical process using information gathered through a sparse, potentially time-varying communication network. Our approach is to use imitation learning to approximate a global optimal policy using networked local controllers which at test time require only local information and connectivity. To improve the quality of these local policies, information from non-adjacent neighbors is distilled using Aggregation Graph Neural Networks.

Related work in decentralized optimal control is concerned with the design of optimal controllers given foreknowledge of both the plant model and of information-sharing constraints imposed on the controller by an existing communication network. In the case of a linear time invariant (LTI) system with the special property that these network constraints happen to be quadratically invariant with respect to the plant, the optimal control problem has a convex formulation [17]. Unfortunately, the resulting controller implementations are not scalable for many systems of interest. In particular, quadratic invariance requires all locally collected information to be exchanged in the common case where all system states may directly or indirectly influence each other. This observation has led to further work emphasizing the synthesis of localized controllers by imposing structure on the closed loop system response as part of the design process [28, 29]. However, these methods still assume fixed communication graphs and yield controllers specialized to each node and so are not directly applicable to teams of interchangeable mobile robots.

We address these difficulties by learning local controllers. However, the behavior cloning techniques which have been successfully applied to robotics applications such as autonomous driving [16] and quadrotor navigation [11] can not be directly applied to multi-robot teams. From the point of view of each agent, its local control system is partially observable, with other agents’ unobservable states affecting future values. Therefore, each agent must accumulate information from neighbors to gain a better understanding of the state of the system, and to choose better control actions.

Graph neural networks (GNNs) offer a solution to this problem of aggregating network information by exploiting the graph structure [3, 5, 14, 9, 18]. In particular, the aggregation GNN developed in [9, 10] offers an architecture that operates in an entirely local fashion, involving communication only with nearby neighbors, making it especially suited for teams of agents interacting over a physical network. In the context of multi-agent systems, [23] uses imitation learning to learn decentralized policies from expert policies originally trained using Actor-Critic methods. In that approach, a graph of relationships between agents is explicitly learned by a neural network. In contrast, we leverage the known relationships and connectivity between agents in order to use graph convolutions to extract features, following the approach of [9]. Exploiting the known network structure allows us to consider teams of agents an order of magnitude larger, and highlights the value of using information from multi-hop neighbors.

We begin by describing the optimal control problem for a dynamical process, and then pose the additional information constraints which define the more difficult decentralized control problem for a networked system. In Section 3 we introduce an extension of Aggregation Graph Neural Networks to time varying signals supported on time varying networks and their application to information exchange among teams of agents. This framework is applied to the specific problem of distributed LQR control in Section 4, and experiments illustrate the value of aggregating multi-hope information as communication rate or degree of the communication network are varied. Successful transfer of models learned on one network to a different network motivates the study of flocking dynamics in Secton 5, where now the communication network is explicitly time varying as the agents move.

Notation. For a matrix with entries we use to represent its th entry and to represent its th row. For matrices and we use to represent its block column concatenation and for its block row stacking.

Ii Control of Networked Systems

We consider a team of agents distributed in space and use to denote the possibly time varying position of agent at time . A dynamical process is evolving in the space where the team is deployed. We characterize this dynamical process by the collection of state values observed at the locations of each agent , as well as the control actions that agents take, and a noise term to model uncertainties and model mismatch. Grouping local states into the joint system state matrix , local actions into the overall system action

, and noise terms into the vector

, we can write the evolution of the dynamical process through a differential equation of the form


Although not formally required, we are interested in processes in which the function has spatial locality in the sense that the effect of and on the state of a different agent diminishes with their distance (see Section IV for an example).

In order to design a controller to affect the behavior of the dynamical system in (1), we operate in discrete time. To do so, introduce a sampling time and a discrete time index . Define the discrete time state and denote as , the action that the system takes at time and holds until . Solving (1) between times and we end up with the discrete dynamical system


At each point in (discrete) time, we consider a cost function . The objective of the control system is to choose actions that reduce the accumulated cost . When the collection of state observations is available at a central location it is possible for us to consider centralized policies that choose control actions that depend on global information. In such case the optimal policy is the one that minimizes the expected long term cost


If the dynamics in and the costs are known, as we assume here, there are several techniques to find the optimal policy [33, 2]. In this paper we are interested in decentralized controllers that operate without access to global information and interpret (3) as a benchmark that decentralized controllers are trying to imitate.

Ii-a Decentralized Control via Imitation of Central Control

The locations of agents at time determine a connectivity graph with an asymmetric edge set composed of pairs having associated weights . As in the case of the dynamical process in (1) we think of the weights as decreasing with the agents’ distance but do not formally require it (see Section IV for an example). We further define the weighted adjacency matrix as a sparse matrix with nonzero entries for . For all we have .

The presence of the edge means that it is possible for to send data to at time . When this happens we say that is a neighbor of and define the neighborhood of at time as the collection of all its neighbors


It is also of interest to define multihop neighborhoods of a node. To do so begin by convening that the 0-hop neighborhood of is ; namely, the node itself. Further rename the neighborhood of as the 1-hop neighborhood and denote . We can now define the -hop neighborhood of as the set of nodes that can reach node in exactly hops. Their formal definition can be given by the recursion


As per (5), the -hop neighbors of are the nodes that are -hop neighbors at time of the neighbors of at time . This recursive neighborhood definition is made in order to characterize the information that is available to node at time . This information includes the local state that can be directly observed by node at time as well as the value of the state at time for all nodes that are 1-hop neighbors of at time since this information can be communicated to node . Node can also learn the state of 2-hop neighbors at time since that information can be relayed from neighbors. In general, we can define the information history of node at time as the collection of state observations


where we have chosen a maximal history depth . The decentralized control problem consists on finding a policy that minimizes the long term cost restricted to the information structure in (6). This leads to problems in which finding optimal controllers is famously difficult to solve [30] except in some particularly simple scenarios [6]. This complexity motivates the introduction of a method that learns to mimic the centralized controller in (3). Formally, we introduce a parametrized policy that maps local information histories to local actions

as well as a loss function

to measure the difference between the optimal centralized policy and a system where all agents (locally) execute the (local) policy

. Our goal is to find the tensor of parameter

that solves the optimization problem


where we use the notation to emphasize that the distribution of observed states over which we compare the policies and is that of a system that follows the optimal policy .

The formulation in (7) is one in which we want to learn a policy that mimics to the extent that this is possible with the information that is available to each individual node. The success of this effort depends on the appropriate choice of the parametrization that determines the family of policies that can be represented by different choices of parameters, . In this work, we advocate the use of an aggregation graph neural network (Section III) and demonstrate its applications to the problem of distributed control of a linear system (Section IV) and to the problem of flocking with collision avoidance (Section V).



(e) Processing at the -th node
Fig. 1: Aggregation Graph Neural Networks. Perform successive local exchanges between the nodes and its neighbors. For each -hop neighborhood (illustrated by the increasing disks), record [cf. (9)] to build signal which exhibits a regular structure [cf. (10)] Agg_t0 The value of the state Agg_t1 One-hop neighborhood Agg_t2 Two-hop neighborhood Agg_t3 Three-hop neighborhood. atNode Once the regular time-structure signal is obtained, we take each row , representing the information collected at node , and we process it through a CNN to obtain the value of the decentralized controller at node at time [cf. (11)].

Iii Aggregation Graph Neural Networks

We propose the adaptation of an aggregation graph neural network (GNN) [9, 10] for the learning parametrization of the policy in (7). We begin by following the graph signal processing literature to introduce a graph shift operator as a matrix such that its components can be only if or if [22, 19, 4, 8]. This property means that the shift operator abides to the sparsity pattern of the graph and that we can therefore implement multiplications by using information exchanges between neighboring nodes. Indeed, recall that denotes the state observed by node at time and further distribute the product so that is associated with node . Since only if or if we can write


Thus, node can carry its part of the multiplication operation by receiving information from neighboring nodes. We point out that the adjacency matrix has a sparsity pattern that makes it a valid shift operator. Other eligible shift operators are unweighted and normalized adjacency matrices as well as weighted, unweighted, or normalized Laplacians. In the experiments in Sections IV- V, we set , but our methods are applicable to any matrix for which the local computation in (8) is feasible.

Aggregation GNNs leverage the locality of (8) to build a sequence of recursive -hop neighborhood [cf. (5)] aggregations to which a neural network can be applied; see Fig.. 1. More precisely, consider a sequence of signals that we define through the recursion


with the initialization . If we fix the time and consider increasing values of , the recursion in (9) produces a sequence of signals where the first element is , the second element is , and, in general, the th element is . Thus, (9) is modeling the diffusion of the state through the sequence of time varying networks through . This diffusion can be equally interpreted as an aggregation. Indeed, if we restrict attention to node and limit the diffusion to elements we can define the aggregation sequence at node as


The first element of this sequence is which represents the local state of node . The second element of this sequence is which aggregates the states of 1-hop neighboring nodes observed at time with a weighted average. In fact, this element is precisely the outcome of the local average shown in (8). If we now focus on the third element we see that is an average of the states of 2-hop neighbors at time . In general, the st element of is which is an average of the states of k-hop neighbors observed at time . From this explanation we conclude that the sequence is constructed with state information that is part of the local history defined in (6) and therefore a valid basis for a decentralized controller. We highlight in equations (8) and (9) that agents forward the aggregation of their neighbors’ information, rather a lists of neighbors’ states, further along to multi-hop neighbors; see Algorithm 1.

1:for n=0,1,…, do
2:     Receive aggregation sequences from [cf. (10)]
3:     Update aggregation sequence components [cf. (9) and (8)]
4:     Observe system state
5:     Update local aggregation sequence [cf. (10)]
6:     Compute next local action using the learned controller
7:     Transmit local aggregation sequence to neighbors
8:end for
Algorithm 1 Aggregation Graph Neural Network at Agent .

An important property of the aggregation sequence

is that it exhibits a regular temporal structure as it is made up of nested aggregation neighborhoods. This regular structure allows for the application of a regular convolutional neural network (CNN)

[9] of depth , where for each layer , we have


with a pointwise nonlinearity and a bank of small-support filters containing the learnable parameters. For each node , we set and collect the output to be the decentralized control action at node , at time . We note that the filters are shared across all nodes.

Algorithm 1 summarizes the inference methodology for the aggregation GNN at a single node of the network. At time , the agent receive aggregation sequences from its neighbors . Then, the agent pools this information from neighbors to form the current aggregation vector which is input to the learned controller to compute the new action . Finally, the agent transmits its aggregation vector to its current neighbors .

The aggregation GNN architecture, described in equations (9)-(11), constitutes a local parametrization of the policy that exploits the network structure and involves communication exchanges only with neighboring nodes. To learn the parameters for , we use a training set consisting of sample trajectories obtained from the centralized controller , cf. (3). We thus minimize the loss function over this training set, cf. (7),


where collects the output of (11) at each node .

Remark 1

We note that the policy learned from (12) can be extended to any network since the filters can be applied independently of

, facilitating transfer learning. This transfer is enabled by sharing the filter weights

among nodes at training time. The learned aggregation GNN models are, therefore, network and node independent.

Iv Distributed Linear Quadratic Regulator

To illustrate these ideas, consider a linear time invariant system with white Gaussian noise in which the state and action at node are scalars. In this case we have matrices and so that the generic model in (1) reduces to


where the noise is a Wiener process with covariance . This system when sampled with a rate produces a linear system in discrete time characterized by the linear difference equation


where the matrices are given by and and the noise is normal white with covariance [7]. We further choose the cost function to be of the quadratic form


For the linear system in (14) and the quadratic cost in (15), it is well known that the optimal policy is the linear map


in which is the solution of the algebraic Riccati equation [33]. The controller in (15) can not be implemented in a distributed manner. For once, we have made no assumption on the sparsity patterns of any of the matrices involved. But even if we assume that the matrices , , and match the sparsity of the graph, the gain may be a full matrix as it involves matrix inverses. Therefore, we adopt the strategy of learning a decentralized controller by imitating the optimal policy [cf. (6)]. Our choice of parametrization is the aggregation Graph Neural Network described in Section III.

Iv-a LQR Distributed over a Geometric Network

We now consider the special case of distributed LQR control for a linear system whose dynamic structure and communication network are both linked to an underlying random geometry graph induced by the positions of agents [15]. We are given the locations of the agents distributed over the unit square, where is the location of agent . The relationship between the states of all agents is encoded using a linear system (14) with an matrix determined by the distance between agents, where the elements of the are:


The constant regulates the dependence between agents. To reduce interference, we regulate the transmission power to keep the number of neighbors constant. For a given , the cardinality of the set of in-neighbors of agent is . To construct the set of neighbors of each agent , we allow connections to the nearest agents. Each agent can receive information from nearest in-neighbors, and knows their weights in the linear system. The time-invariant communication network, denoted , has weighted connections to each node from its -nearest neighbors: if node is a -nearest neighbor of node , and 0 otherwise.

The fixed network degree allows us to illuminate the effects of aggregation in systems with limited connectivity. Also, it is necessary to divide the matrix

by its largest eigenvalue to enable the numerical stability of aggregation operations.

Iv-B Neural Network Architecture

The agent architecture follows the construction of the Aggregation GNN in Section III. The aggregated vectors of length were input to a fully connected neural network with one hidden layer of neurons and a ReLu activation function. Following the notation of (11), this architecture corresponds to using and a full matrix for . The network was trained using the smooth L1 training loss function with L2 regularizer of

, and the training process implemented using the Numpy and PyTorch libraries.

Iv-C Experiment Parameters

Next, we provide a summary of the experimental parameters used for the LQR experiments. We used systems with agents and the following system matrices: . The state of the system

was initialized by drawing a sample from a uniform distribution on

. The covariance of the white noise

was . The default sampling time was chosen to be . The parameter regulating the strength of connections between nodes was set to be .

To train the Aggregation GNN, we used 500 system trajectories of length 120. The reported LQR cost was averaged over 200 trajectories of length 100 at steady state for each of 10 random graphs for both the optimal solution and the neural network. For the discretization and degree experiments, we report the ratio of the LQR cost obtained by the GNN divided by the cost obtained by the optimal controller. For the transfer experiment, we provide the median trajectory cost, due to the presence of a small number of divergent trajectories.

Iv-D Sampling Time and Aggregation Filter Length

It was shown by [17] that the system dynamics need to be sufficiently slow relative to the communication and control clocks to enable the use of a distributed controller. The effect is two-fold: changing the discretization of the linear system slows down the dynamics of the system and makes the control system less sensitive to errors in the control due to decreased eigenvalues of . For discretizations below, but not including, , the network can be controlled using the Aggregation GNN. Networks with smaller discretizations, and , benefit slightly from a longer aggregation length of , while the systems for do not. We hypothesize that for larger discretizations, the delay in the information aggregated from distant multi-hop neighbors renders this information useless to the controller.


Sampling Time
0.05 0.01 0.005 0.001


1 - - - -
2 - 0.88 1.43 1.32
3 - 0.98 1.03 1.00
4 - 0.97 1.01 1.01
5 - 0.98 1.01 1.01
6 - 0.98 1.01 1.02


Fig. 2: The transfer experiment to five new generated systems. The median control cost obtained by the GNN controller divided by the median LQR cost, at steady state, as a function of the system discretization and the filter length . Missing values indicate divergent trajectories with costs over 1000. We used degree for this experiment.

Iv-E Transfer to New Systems

A transfer learning experiment was performed to examine the performance of a GNN trained on one system, but tested on another. For the aggregation, we change the graph to that of the current test system, but use the old neural network weights from the model trained on the original system. This enables the generalization of our learned controller to new graphs.

Five linear systems were generated, and for each degree, the corresponding networks were computed. Then, a total of 30 GNN models were trained, five for each degree (columns). The baseline results for testing a model on the same system it was trained on are provided in the ”Self” row. Then, 5 new systems were generated for testing of the 30 models. For each row, the a new network was computed for each of the 5 systems. The median cost value across the 25 combinations of systems, for 200 trajectories each, are reported in each cell of the table. The parameters were fixed for these experiments. Most of the GNN controllers generalized well to the new systems, except the controller trained for a network degree of 4 (Fig. 3).

These results suggest that training one model for all nodes of a large linear system already allows the network behave correctly for nodes in previously-unseen systems. The weight sharing approach improves generalization, but differences in connectivity can change the controller performance. We observed a large variation in the LQR cost, with some experiments showing better performance on the previously-unseen systems.


Tested Trained on (Degree )
on 4 6 8 10 12 14


Self 53.04 28.88 24.11 22.42 21.53 21.32


4 1.20 1.01 1.02 1.09 1.32 1.33
6 1.27 1.07 1.02 0.98 0.98 0.99
8 1.00 1.01 0.98 0.96 0.94 0.93
10 1.42 1.21 1.13 1.08 1.05 1.02
12 2.64 1.63 1.39 1.30 1.23 1.19
14 1.63 1.35 1.25 1.18 1.12 1.09


Fig. 3: The transfer learning experiment examines the performance of a GNN trained on one system and tested on another. 5 systems are trained (per column), and 5 systems are are used for testing (per row). The median trajectory cost among 200 trajectories for each of the 25 combinations of systems, is reported in each cell as a ratio of the “Self” cost.

Iv-F Network Degree and Aggregation Filter Length

We aim to understand the relationship between the connectivity of the network and the impact of aggregation (Fig. 4). In the problem formulation, we fix the degree of the communication network to emphasize the effects of multi-hop aggregation. In networks with fewer connections, increasing the length of the aggregation filter is essential for ensuring good control performance. We show that aggregation is less impactful for highly connected networks. For the degree of 4, each node can obtain information from only the 4 nearest neighbors. In this case, an aggregation filter length of 4 is necessary to obtain acceptable control performance. This allows each node to indirectly obtain information from 3-hop neighbors. On the other hand, for the network a degree of 14, aggregation past the 1-hop neighborhood (K=2) is unnecessary because the information from the fourteen 1-hop neighbors is sufficient for control.


4 6 8 10 12 14


1 - - - - - -
2 - 56.28 517.59 1.53 1.33 1.25
3 25.08 1.57 1.25 1.15 1.11 1.09
4 4.36 1.58 1.24 1.14 1.10 1.07
5 4.65 1.57 1.23 1.14 1.09 1.07
6 4.47 1.55 1.24 1.14 1.10 1.07


Fig. 4: Control cost obtained by the GNN controller divided by the optimal LQR cost, at steady state, as a function of the graph degree and the filter length . Missing values indicate divergent trajectories with costs over 1000.

V Flocking

We examine the flocking to highlight the ability of our approach to handle dynamic communication networks. The broader applications of flocking include transportation and platooning of autonomous vehicles [24], in which agents must align their velocities and regulate their spacing for safety and efficiency. Our goal is to approximate a global controller for flocking using an Aggregation GNN that has access only to local information.

Ideally, we want a controller that works as well as the global controller, but respects the communication constraints in our large-scale distributed system. We demonstrate that a global controller outperforms local controller proposed by [25]. The novelty of our approach to flocking is the ability to aggregate and use information from multi-hop neighbors. Previous local approaches to flocking have used only single-hop neighbors’ information [25, 12]. Prior to this work, there has been no principled approaches for augmenting the communication between neighbors to pass on information aggregated from multi-hop neighbors.

Consensus algorithms do consider multi-hop communication, but, as compared to [12], we are interested in controllers that incorporate collision avoidance and go beyond simple consensus. We follow the approach of [25], which incorporates a potential function to regulate the spacing of agents. Though we note that this approach for collision avoidance does not guarantee a lack of collisions, or the preservation of the connectivity within the flock. We observe that the flock often splits up into smaller groups, and that agents in the sub-flocks remain connected to each other, but become disconnected from the majority of the flock.

The Aggregation GNN controller is trained using a point-mass physics model to imitate a global controller derived from the local controller of [25]. Our experiments test the impact of longer aggregation filter lengths and we demonstrate that aggregating from 2-hop neighbors does help with control, but we believe that longer filter lengths are not useful due to the continuing changes in the connectivity of the network and agent states. We also demonstrate the capability of our trained controllers to generalize from a point-mass control system to a simulated flock of quadrotors in the AirSim simulator [20].

V-a Flocking Dynamics

Following the approach of [25], we describe the acceleration-controlled dynamics of each agent to be:


where and , the two-dimensional position and velocity of each agent, . The input to the system is , the acceleration of the agent in each of the two dimensions, . We discretize this system for each agent at time step and add noise to the velocity component, :


In the next section, we are assuming that each agent can measure its’ neighbors positions and velocities instantaneously, so the implicit time index is for all quantities of the global and local controllers. We drop the time index for clarity in the section below.

V-B Global and Local Control for Flocking

We denote the relative position between agents, . An agent is in agent ’s set of neighbors if the distance between the agents is less than a threshold, . Only agents within range can observe each others’ states and participate in the aggregation of information. In the case of the global controller, the velocity and position of all agents is measured simultaneously. Therefore, the flock’s network is an unweighted adjacency matrix with no self loops, where the edge weight if and 0 otherwise.

We follow the approach of [25] to derive a feedback controller for flocking based on potential functions that regulate the distances between agents. One potential function that satisfies these conditions depends on the distance between the two agents, :


We restrict this function to be continuous at and compute accordingly. Next, we define the potential function as the aggregation of the potentials of the neighbors of node :


We use this potential function to define a global controller that relies on communication among all agents:


Our goal will be to learn to approximate the global controller using only local information.

We can compare the global controller to a local controller that uses only information from neighbors of node :


The global and local controllers are both non-linear in the states of the agents. The classical Aggregation GNN approach does not allow for non-linear operations prior to aggregation, so we cannot use the position and velocity vectors alone to imitate the potential function-based controllers. Rather than directly using the state of each node during aggregation, we design the relevant features needed to replicate the non-linear controller using only a linear aggregation operation, where :


This observation vector is then used during aggregation as described in (9)-(10). The local controller also requires the computation of (24), so we are not giving the GNN an unfair advantage by providing the instantaneous measurements of neighbors’ states.

We quantify the costs of the observed trajectories by the variance in velocities, where

is the mean velocity among all agents: :


The variance in velocities measures how far the system is from consensus in velocities [31].

V-C Methods and Results

We explore flocking as an application of the GNN controller to demonstrate the effect of aggregation in a rapidly changing network. We also explore the transfer of policies trained on point-masses to testing in the AirSim simulator.

V-C1 Experiment Parameters

For the point-mass system, a flock of agents was used, with initial positions sampled uniformly within a radius of of the origin and initial velocities sampled uniformly on the range . The default sampling time and aggregation filter length were chosen to be and . Agents were able to communicate within a radius of . The GNN architecture was modified to accommodate for the vector of six features per agent. The input layer had neurons and the hidden layer had neurons. For training, 600 trajectories of length 200 were used. For testing, 50 trajectories were used.


Global Local GNN


1 1.10 11.0 3.26 0.41
2 1.10 11.0 2.65 0.42
3 1.10 11.0 2.71 0.37
4 1.10 11.0 2.63 0.41
5 1.10 11.0 2.93 0.60


Fig. 5: The median trajectory cost for the global, local, and GNN controllers trained and tested on the point-mass problem. We report the median trajectory cost over 50 test trajectories for each of 10 separately trained models. The error bound is one half of the IQR.

V-C2 Aggregation for Flocking

We demonstrate the effect of changing the aggregation filter length on the performance of the Aggregation GNN controller for flocking (Fig. 5

) using the global and local controllers as benchmarks. The reported cost values were computed by taking the median over 50 test trajectories for each of 10 separately trained GNN models. The error bound is one half of the interquartile range (IQR). The median was used because random initializations of the flock produce a large number of outliers. We observe that all GNN controllers outperform the local controller, but fall behind the global controller. We hypothesize that the GNN for

, which has access to the exact same information as the local controller, outperforms the local controller due to the GNN’s model capacity, including the ability to use weights other than , and the non-linear activation. These capabilities may allow the controller to memorize the steady state configuration and reach it more quickly with the same information.

Fig. 6: The GNN flocking controllers were tested on quadrotors in AirSim. Pictured is the steady state configuration of a flock, with regular spacing between drones and aligned velocities.

V-C3 Testing in AirSim

Next, we evaluate the performance of GNN controllers trained on the point-mass flocking problem in testing on 50 drones in the AirSim simulator [20]. The GNN controller that was trained on point-masses with ideal dynamics was tested in the presence of latency of the dynamics and observations of a real drone. The absolute locations and velocities of the drones were scaled by a factor of before evaluation of the GNN so that the optimal spacing dictated by the potential function (21) does not result in collisions. The API provided by AirSim allows us to send velocity commands to the drones in the flock, but the trained GNN model produces acceleration outputs. If the current velocity of drone is given by , we compute the next velocity command as follows:


The altitude of the drones was fixed to be 40 meters, so that we could apply the learned two-dimensional flocking controller. The mean time interval of the control loop was measured to be 133ms, corresponding to an average control and communication frequency of 7.5 Hz.

We test our algorithm on a task in which two flocks of 25 quadrotors are moving towards each other with equal and opposite velocities of m/s.The desired flocking behavior is pictured in Figure 6

, showing a regular spacing of 0.5 m between drones, with aligned velocities. The results of the global and local controllers are provided as benchmarks. The mean cost is reported over 20 trajectories of 300 steps, with a 1 standard deviation error bound. Aggregation for

provides the best results out of the non-centralized controllers (Fig. 7). We believe that the controller may be struggling with the increased latency between subsequent observations in the simulation as compared to the point-mass system, and therefore performs worse than the controller for .

Both in simulation and in point-mass experiments, we observed a failure mode, in which a small group of agents is moving too quickly and escapes from the rest of the group. This small sub-flock typically exhibits flocking behavior among the several agents, but has no ability to re-join the flock, because it is permanently outside of the communication range of the rest of the agents. This drawback results from the lack of hard constraints on the connectivity in the system.



Mean Cost
Fig. 7: The trained GNN controllers were tested on a simulation of 50 drones for the two flock task in AirSim. The mean cost is reported over 10 trajectories of 100 steps in length, with a error bound.

Vi Conclusion

We have demonstrated the utility of aggregation graph neural networks as a tool for automatically learning distributed controllers for large teams of agents with coupled state dynamics and sparse communication links. We envision the use of an Aggregation GNN-based controller in large-scale drone teams deployed into communication-limited environments to accomplish coverage, surveillance or mapping tasks. In these settings it is critical for agents to incorporate information from distant teammates in spite of local communication constraints; we show aggregations GNNs can be extended to accomplish this even with the time-varying agent states and time-varying communication networks typical of mobile robots. In experiments, learning decentralized controllers for networked linear systems and learning flocking behaviors confirms the value of multi-hop information to performance and robustness to varying control rates and degree of the communication graph.

In the future, we plan to train similar agent architectures in a reinforcement learning setting in order to improve the quality of the learned controllers as well as address tasks where no optimal global policy is available. While these learned controllers can operate on dynamically changing communication graphs, the flocking experiments illustrate that a loss of connectivity is not always recoverable. In future work, enforcing state or input constraints could help avoid these failure modes. Finally, we would like to apply this same architecture to large heterogeneous teams of robots.


  • Baxter et al. [2007] Joseph L Baxter, EK Burke, Jonathan M Garibaldi, and Mark Norman. Multi-robot search and rescue: A potential field based approach. In Autonomous robots and agents, pages 9–16. Springer, 2007.
  • Bemporad et al. [2002] Alberto Bemporad, Manfred Morari, Vivek Dua, and Efstratios N Pistikopoulos. The explicit linear quadratic regulator for constrained systems. Automatica, 38(1):3–20, 2002.
  • Bruna et al. [2014] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun. Spectral networks and deep locally connected networks on graphs. arXiv:1312.6203v3 [cs.LG], 21 May 2014. URL http://arxiv.org/abs/1213.6203.
  • Chen et al. [2015] S. Chen, R. Varma, A. Sandryhaila, and J. Kovačević. Discrete signal processing on graphs: Sampling theory. IEEE Trans. Signal Process., 63(24):6510–6523, Dec. 2015.
  • Defferrard et al. [2016] M. Defferrard, X. Bresson, and P. Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Annu. Conf. Neural Inform. Process. Syst. 2016, Barcelona, Spain, 5-10 Dec. 2016. NIPS Foundation.
  • Eksin et al. [2013] Ceyhun Eksin, Pooya Molavi, Alejandro Ribeiro, and Ali Jadbabaie. Bayesian quadratic network game filters. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages 4589–4593. IEEE, 2013.
  • Franklin et al. [1980] G.F. Franklin, J.D. Powell, and M.L. Workman. Digital Control of Dynamic Systems, Second Edition. Addison-Wesley, 1980.
  • G. Marques et al. [2016] A. G. Marques, S. Segarra, G. Leus, and A. Ribeiro. Sampling of graph signals with successive local aggregations. IEEE Trans. Signal Process., 64(7):1832–1843, Apr. 2016.
  • Gama et al. [2019a] F. Gama, A. G. Marques, G. Leus, and A. Ribeiro. Convolutional neural network architectures for signals supported on graphs. IEEE Trans. Signal Process., 67(4):1034–1049, Feb. 2019a.
  • Gama et al. [2019b] F. Gama, A. G. Marques, A. Ribeiro, and G. Leus. Aggregation graph neural networks. In 44th IEEE Int. Conf. Acoust., Speech and Signal Process., Brighton, UK, 12-17 May 2019b. IEEE.
  • Giusti et al. [2016] Alessandro Giusti, Jérôme Guzzi, Dan C Ciresan, Fang-Lin He, Juan P Rodríguez, Flavio Fontana, Matthias Faessler, Christian Forster, Jürgen Schmidhuber, Gianni Di Caro, et al.

    A machine learning approach to visual perception of forest trails for mobile robots.

    IEEE Robotics and Automation Letters, 1(2):661–667, 2016.
  • Jadbabaie et al. [2003] Ali Jadbabaie, Jie Lin, and A Stephen Morse. Coordination of groups of mobile autonomous agents using nearest neighbor rules. IEEE Transactions on automatic control, 48(6):988–1001, 2003.
  • Jennings et al. [1997] James S Jennings, Greg Whelan, and William F Evans. Cooperative search and rescue with a team of mobile robots. In Advanced Robotics, 1997. ICAR’97. Proceedings., 8th International Conference on, pages 193–200. IEEE, 1997.
  • Kipf and Welling [2017] T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. In 5th Int. Conf. Learning Representations, Toulon, France, 24-26 Apr. 2017. Assoc. Comput. Linguistics.
  • Penrose et al. [2003] Mathew Penrose et al. Random geometric graphs. Number 5. Oxford university press, 2003.
  • Pomerleau [1989] Dean A Pomerleau. Alvinn: An autonomous land vehicle in a neural network. In Advances in neural information processing systems, pages 305–313, 1989.
  • Rotkowitz and Lall [2005] M. Rotkowitz and S. Lall. A characterization of convex problems in decentralized control. IEEE Transactions on Automatic Control, 50(12):1984–1996, Dec. 2005. doi: 10.1109/TAC.2005.860365.
  • Ruiz et al. [2019] L. Ruiz, F. Gama, A. G. Marques, and A. Ribeiro. Median activation functions for graph neural networks. In 44th IEEE Int. Conf. Acoust., Speech and Signal Process., Brighton, UK, 12-17 May 2019. IEEE. URL http://arxiv.org/abs/1810.12165.
  • Sandryhaila and Moura [2014] A. Sandryhaila and J. M. F. Moura. Big data analysis with signal processing on graphs. IEEE Signal Process. Mag., 31(5):80–90, Sep. 2014.
  • Shah et al. [2017] Shital Shah, Debadeepta Dey, Chris Lovett, and Ashish Kapoor. Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In Field and Service Robotics, 2017. URL https://arxiv.org/abs/1705.05065.
  • Sharma et al. [2016] Vishal Sharma, Mehdi Bennis, and Rajesh Kumar. Uav-assisted heterogeneous networks for capacity enhancement. IEEE Communications Letters, 20(6):1207–1210, 2016.
  • Shuman et al. [2013] D. I Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst.

    The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains.

    IEEE Signal Process. Mag., 30(3):83–98, May 2013.
  • Tacchetti et al. [2018] Andrea Tacchetti, H Francis Song, Pedro AM Mediano, Vinicius Zambaldi, Neil C Rabinowitz, Thore Graepel, Matthew Botvinick, and Peter W Battaglia. Relational forward models for multi-agent learning. arXiv preprint arXiv:1809.11044, 2018.
  • Tanner [2004] Herbert G Tanner. Flocking with obstacle avoidance in switching networks of interconnected vehicles. In IEEE International Conference on Robotics and Automation, volume 3, pages 3006–3011. Citeseer, 2004.
  • Tanner et al. [2003] Herbert G Tanner, Ali Jadbabaie, and George J Pappas. Stable flocking of mobile agents part ii: dynamic topology. In Decision and Control, 2003. Proceedings. 42nd IEEE Conference on, volume 2, pages 2016–2021. IEEE, 2003.
  • Thrun and Liu [2005] Sebastian Thrun and Yufeng Liu. Multi-robot slam with sparse extended information filers. In Robotics Research. The Eleventh International Symposium, pages 254–266. Springer, 2005.
  • Thrun et al. [2000] Sebastian Thrun, Wolfram Burgard, and Dieter Fox. A real-time algorithm for mobile robot mapping with applications to multi-robot and 3d mapping. In Robotics and Automation, 2000. Proceedings. ICRA’00. IEEE International Conference on, volume 1, pages 321–328. IEEE, 2000.
  • Wang et al. [2018] YuhShyang Wang, Nikolai Matni, and John C Doyle. Separable and localized system level synthesis for large-scale systems. IEEE Transactions on Automatic Control, 2018.
  • Wang et al. [2019] YuhShyang Wang, Nikolai Matni, and John C Doyle. A system level approach to controller synthesis. IEEE Transactions on Automatic Control, 2019.
  • Witsenhausen [1968] Hans S Witsenhausen. A counterexample in stochastic optimum control. SIAM Journal on Control, 6(1):131–147, 1968.
  • Xiao et al. [2007] Lin Xiao, Stephen Boyd, and Seung-Jean Kim. Distributed average consensus with least-mean-square deviation. Journal of parallel and distributed computing, 67(1):33–46, 2007.
  • Zhang and Hou [2005] Honghai Zhang and Jennifer C Hou. Maintaining sensing coverage and connectivity in large sensor networks. Ad Hoc & Sensor Wireless Networks, 1(1-2):89–124, 2005.
  • Zhou et al. [1996] Kemin Zhou, John Comstock Doyle, Keith Glover, et al. Robust and optimal control, volume 40. Prentice hall New Jersey, 1996.