Over recent years, we have seen UAVs participating in the fighting of wildfires in California, robotic snakes assisting in localization of earthquake victims or UUVs deployed for underwater ocean mapping . In such scenarios, autonomous robots need to sequentially make decisions that would allow them to better estimate the extent of the fire and decide upon extinguishing locations. The process of designing efficient paths to actively estimate a hidden state that expresses such a phenomenon of interest by utilizing measurement readings of on-board sensors is known in the literature as Active Information Acquisition (AIA). An exhaustive list of applications can be formulated in the AIA setup, from environmental monitoring, search & rescue, surveillance and coverage, target tracking and localization and active-SLAM [30, 15, 24, 8, 10, 11, 35, 21]. A plethora of existing works [13, 19, 17, 14, 25, 36, 34, 4, 26, 20, 23, 7, 39] are proposed that solve the AIA problem. Myopic approaches [13, 19, 17, 14] rely on computing controllers that incur the maximum immediate decrease of an uncertainty measure while search-based nonmyopic schemes [25, 36, 34, 4] solve the problem by pruning the exploration process. The latter return suboptimal solutions while their decentralized counterparts rely on coordinate descent, making them computationally intractable as the planning horizon and/or the number of robots increases. Nonmyopic sampling-based approaches [26, 20, 7, 23, 39] gained popularity due to their ability to compute informative paths fast. All these approaches suffer from the following limitations: 1) they do not scale, e.g. for more than dozens of robots, 2) are not robust to changes of the problem’s parameters, e.g. number of agents or changes to the connectivity of robots and/or 3) do not address dynamic phenomena of interest.
Graph Neural Networks (GNNs) are information processing architectures for signals on graphs. Thanks to their distributed structure, GNNs can easily be adapted to multi-robot applications where the nodes and edges represent the robots and communication links respectively. They follow a neighborhood aggregation scheme and are therefore inherently suitable for distributed implementations. GNNs are also promising for their transferability across different graphs and can be designed such that they generalize to previously unseen multi-robot scenarios. Their use has the potential to radically increase (a) robustness to changes in the topology of the graph and (b) scalability with respect to the number of robots.In this paper, we propose a novel architecture based on GNNs that solves the AIA problem in a distributed manner and addresses the aforementioned challenges. Specifically, we translate the multi-robot information gathering problem to a graph representation and formulate it as a sequential decision-making problem, where an Information-aware Graph Block Network (I-GBNet) learns to derive control actions that drive the robots to actively estimate the state of the phenomenon of interest. At every timestep, each robot collects positions and estimates from its neighbors and feeds them along with its own attributes to the I-GBNet that produces the next control inputs to be applied; see also Fig.1. To train our network, we use imitation learning where the expert is a centralized sampling-based algorithm. For this reason, we created a dataset that contains frames of episodes on randomly generated environments with fixed dimensions and a fixed number of robots and dimensionality of the state across all episodes. Literature Review: An exhaustive literature has been developed that addresses the AIA problem. A centralized sampling based approach is introduced in  that explores the joint physical and information space of the robots. In  the authors propose a decentralized Monte Carlo Tree Search  method that samples the individual action space of each robot and then creates a sparse approximation of the joint action space. In , the authors design a distributed sampling-based approach where robots build their own trees and share with their neighbors information of randomly sampled nodes of their trees. The authors in  propose a more scalable approach by tessellating the environment into Voronoi diagrams, but an all-to-all communication and static targets are assumed. Data-driven methods [12, 18, 33, 31] have also been proposed to allow for online implementations. Prior work in data-driven information acquisition utilizes imitation learning via a clairvoyant oracle , . In 
the authors develop a deep reinforcement learning technique that combines tree search with an offline-learned neural network, however it is only applicable to a single robot. To the best of our knowledge, GNNs have never been used before in the AIA setting. Noticeably, they recently started gaining popularity in multi-robot applications[27, 28, 16, 37, 38]. In  GNNs are employed for path planning where an attention mechanism prioritizes important information. The authors in  propose a GNN based method for coverage. Common in these works is the absence of notion of noisy observations of the robots’ surroundings. Contributions: This paper proposes the first method for Active Information Acquisition using Graph Neural Networks. We designed a network that is scalable with respect to the number of robots and dimensionality of the hidden state to be estimated and generalizes to previously unseen multi-robot configurations. Our method is robust to communication failures and can be applied for time-varying communication graphs and dynamic hidden state. Additionally, we provide a variety of quantitative and qualitative experiments that illustrate the efficacy of our proposed architecture for the example of target localization and tracking.
Ii Problem Formulation
Consider a homogeneous team of mobile robots that reside in a complex environment . The environment is cluttered with a set of arbitrarily shaped obstacles , thus forming the obstacle-free area . The robots are governed by the robot dynamics where and denote the state of robot at time and control action selected from a finite set of admissible control actions, . The robots are tasked with collaboratively estimating a hidden state with hidden state dynamics:
where is the noise covariance matrix. The robots are equipped with sensors capable of collecting noisy measurements of the hidden state as per the observation model
where is the measurement of robot at time . We assume that the transition matrices and the noise covariance matrices are known, similar to  and the robots perfectly localize themselves in a global frame. Hereafter, when subscript is omitted we denote compactly attributes of all robots.
Ii-a Multi-Robot Active Information Acquisition
The AIA problem requires the robots to collaboratively estimate the hidden state
. To do so, the robots maintain a prior Gaussian distribution for the hidden state,. Given the prior distribution and measurements received up to time , , the à-posteriori distribution is computed i.e., where and denote the à-posteriori mean and covariance matrix respectively. An indicative function of the overall quality of the measurements up to time in estimating is the determinant of the à-posteriori covariance matrix, i.e 
. Alternative information measures could be used such as the trace or the maximum eigenvalue of the covariance. Given initial robots’ positionsand prior distribution , the goal is to compute a planning horizon and a sequence of control inputs , which solves the following deterministic optimal control problem:
where the objective (3a) captures the cumulative uncertainty in the estimation of after fusing information collected by all robots from up to time and the constraint in (3b) requires that the final uncertainty of the state is less than a user-specified threshold . In (3c) we require obstacle free paths and in (3e),
stands for the Kalman Filter update rule that computes the à-posteriori distribution. A Centralized Sampling-Based (C-SB) method that solves Problem (3) is introduced in [23, Section III] that is probabilistically complete and asymptotically optimal.
Ii-B Problem Statement
Given initial positions of the robots , capable of taking measurements as per the observation model (2), communicating via an underlying communication network and prior estimates , we design a GNN architecture that (i) solves (3) in a distributed manner (ii) is trained using imitiation learning to mimic the C-SB expert and (iii) derives sequential control actions that allow the robots to actively decrease their uncertainty over .
Iii Graph Neural Networks for AIA
To solve the Multi-Robot Active Information Acquisition problem described in Section II
in a distributed way with GNNs, we develop a suitable graph representation of the problem. At every timestep the graph is given as an input to the network that consists of an Information-aware Graph Block Network (I-GBNet) and a Multi-Layer Perceptron (MLP). The I-GBNet updates the node attributes of the incoming graph and the updated attributes are fed to the MLP that derives the control actions for all robots.
Iii-a Graph Representation of AIA problem
In the distributed setting, we assume that each robot maintains a local estimate over the hidden state expressed via its à-posteriori Gaussian distribution . For simplicity reasons, we define and the information matrix . Let the robots communicate through an underlying communication network modeled as a directed communication graph . The set of vertices are indexed by the robots and an edge defines a direct communication between robots and . Robot can exchange information with the set of its neighbors . Hereafter we define . Given the aforementioned communication graph , let the state of the problem at timestep be described by the following homogeneous, weighted, directed network graph:
where is the set of node attributes of cardinality , indexed by the robots , is the set of edges, is the weight matrix and represents the global attributes of the graph (e.g. environment ). Each node (robot) contains the following node attribute:
The weight matrix quantifies the importance of information being propagated from robot to and is defined as , where and .
Iii-B Information-aware Graph Block Network
The core of our GNN is the Information-aware Graph Block Network (I-GBNet) , illustrated in Fig.2 and inspired by . I-GBNet receives a graph signal and updates the node attributes by propagating information through the graph, while keeping the graph topology unchanged, i.e. . For the design of I-GBNet, we finely discretize the environment and represent it via an occupancy grid-map of size . The network comprises of three parts i) two aggregation functions , (ii) a heatmap projection function and (iii) a node-update function . Aggregation: We assume that at timestep , the robot receives information of the form , where is a function that transforms the robot’s position into a binary robot-grid map. Let and . Inspired by the active information problem setup and constraint (3e), the aggregation function is defined as the Distributed Kalman Filter (DKF), proposed in , where robot updates its à-posteriori covariance matrix as follows:
Each robot, also, aggregates the positions of its neighbours into a neighbors-grid map as:
The selected aggregation functions are invariant to permutations of their inputs by construction. Heatmap Projection: Once the robot computes the aggregated covariance matrix , it projects it into the occupancy grid map via the projection function , where is the resulting grid heatmap that represents the magnitude of uncertainty in . In the case of multi-target localization, the projection module would produce a combination of the posterior Gaussian heatmaps. Node-Update: Given the aggregated neighbors-grid map , the robot-grid map , the heatmap-grid and the occupancy-grid map , the node-update function produces the control actions
hence the updated node attributes are . The node-update function
is parametrized by a Convolutional Neural Network with learnable parameterscomposed with an Action-MLP of parameters . The output of the Action MLP produces a categorical distribution over the set of admissible control inputs
from which the most probable actionis selected.
Iii-C Properties of Graph Block Network
The graph topology is expected to vary over time owing to the mobility of the agents. Therefore, the Graph Block is required to perform consistently for all permutations of , i.e. swapped order of robot indices, and regardless of the time shift. The permutation equivariance property for the aforementioned Graph Block network, directly limits the required sample complexity for training the network.
(Permutation equivariance of Graph Block) Given the aforementioned graph , for any permutation of , with corresponding permutation matrix , and the Graph Block defined in subsection III-B, it holds that .
For the purposes of the proof, we denote , , , and , where the operator denotes a concatenation of matrices along the first dimension. Let denote the adjacency matrix of the graph . The per-agent aggregation process, described by (6) and (7), can be reformulated as
Then, the permutation of the aggregated node attributes is the aggregation of the permuted node attributes as
Since the update function is applied on a per–node basis on the aggregated permuted node attributes
Thus, as the topology of the graph remains unchanged, we conclude .
(Time invariance of Graph Block)
For , given , where , and the Graph Block defined in subsection III-B, it holds that .
This proposition is satisfied naturally owing to the imitation learning method explained in the following Section and the fact that none of the components in is an explicit function of time. The Graph Block network is trained to enable each robot to predict actions consistently and independently of , given a graph , imitating expert decision making.
Iv Training and Architecture
Iv-a Dataset and Training
For the dataset creation, we run the C-SB expert for 1500 randomly generated environments cluttered with a fixed number of obstacles, robots with first-order dynamics and sensing radius of , static hidden state and by . The uncertainty threshold was fixed to . For each instantiation we run entire episodes and randomly select 10 timesteps out of each to collect sets . The latter sets are then transformed via the graph representation process (Section III-A) and result in our dataset . Given dataset and the learnable parameters of the I-GBNet, and , we minimize the cross-entropy loss:
For the training, the dataset is divided into a training set (90%) and a validation set (10%). The training is conducted with the Adam optimizer and the learning rate is scheduled to decay from to
within 20 epochs with cosine annealing. Finally, we trained with batch size 16 and weight decay equal.
Iv-B Network Architecture
The architecture features a single-layer Graph Block, described in Section III-B, to provide the back bone of the distributed coordination scheme. We parametrize the learnable, node update function , illustrated in Fig.2
as follows: Initially, a ResNet module implements feature extraction from the 4 channels and comprises of 3 sequential 2-layer (
Conv2d-BatchNorm2d-ReLU) Residual Blocks with skip connections, followed by an adaptive average-pooling module. The latter provides the network with the flexibility to process environments of variable size by projecting to a fixed representation. The final layer is an MLP with parameters , that translates this representation into probabilities per control action.
In this Section, we provide simulations to demonstrate the efficacy of our method. All case studies have been implemented using Python3 on a computer with a 3.2 GHz, 8-Core, Intel Core i11-800H CPU, 64GB RAM and an Nvidia RTX 3080Ti GPU, 16GB RAM. The experiments were conducted assuming that the robots can take noisy measurements with a camera that provides the -coordinates of any visible object within its sensing range, expressed in the robot’s frame, where the sensing radius is selected to be equal to and the noise covariance matrix . We assume first-order dynamics robots that select any action from the set by and the DKF parameters are chosen as and . We test our algorithm on the applications of target localization and tracking, where the hidden state collects the positions of all targets at time , i.e , where is the position of target at time and is the total number of targets. In this setting, we require to drop the uncertainty for all targets by at least one robot, i.e. . To achieve collision avoidance when an invalid action is given, the robot randomly selects a collision-free action from , otherwise stays idle. A more sophisticated collision avoidance method [29, 2, 1] could be used instead. Unless stated differently, we assume a fully connected graph. It is important to stress that for all the experiments below we use the same network, trained as described in IV-A and without any fine-tuning per task.
V-a Scalability and Generalization
We are interested in examining how our network, trained on ( occupancy grid maps) with robots and a hidden state would perform on previously unseen environments and varying number of robots and targets . For this experiment, we run the Expert (C-SB) for each of the following instances for random initializations (obstacles, robots and target positions) and compare our GNN approach with two baselines (a) a Random-Walker and (b) a Distributed Sampling-Based (Dec-SB) AIA algorithm . We evaluate the algorithms similar to  based on the following metrics:
1) Flowtime Increase , expresses the percentage change between the returned planning horizon and the expert’s horizon . For each of the configurations, we collect of the ’s and return as the flowtime increase of the configuration the mean.
2) , where is the number of successful cases out of the cases of the expert. A case is considered successful when a solution is acquired with a planning horizon .
The results are presented in Fig. 3. From Fig. 2(a), we can see that the proposed GNN method performs equally well and occasionally better than the Dec-SB algorithm and always outperforms the Random Walker. The Random-Walker performs better on more condensed environments (robots randomly initialized closer to targets) while the Dec-SB could not frequently return solutions with planning horizons less than . Even in the case of condensed environments where we expect smaller planning horizons returned by the expert, our method manages to complete the tasks withing the horizon bounds. In Fig. 2(b), the mean of the Flow-Time increases of our method is systematically smaller than those of the baselines. We conclude that our network, trained only on environments of 10 targets and 10 robots is able to learn policies that generalize to novel environments with much larger graphs and hidden states. Moreover, mimicking a centralized expert is able to learn a distributed policy that is usually more effective than hand- designed distributed policies for the same problem.
V-B Target Localization and Tracking
In this experiment, we are interested in validating our method qualitatively where the uncertainty threshold is set for each of the targets. In the first scenario, we tested the robots reside on a randomly-generated environment cluttered with a dense number of obstacles. The robots are tasked to localize 3 static landmarks, while avoiding the obstacles. From Fig. 3(a), we can see that the robots executed successfully their task and localized the targets. In Fig. 3(c), we draw the evolution of the determinants of the covariance matrices of the robots and certify that their uncertainties are decreasing with time. The robot depicted in green is the one that first achieves the desired ending condition. In the second scenario, we present how our algorithm responds to dynamic targets. Specifically, two robots are tasked to localize a static landmark and a dynamic target of known dynamics (see Section II). In Fig. 3(b) we present spotlights of the online execution. From left-to-right, up-to-bottom the robots start moving towards target 1. Once they localize it, they start heading towards the next target. In the meantime, as the target moves, the increase in the uncertainty becomes evident on the robots’ heatmap and thus return to re-localize it. The scope of Fig. 3(d) is to represent the global uncertainty for each of the targets defined at each timestep as the minimum of the determinants of the robots’ covariance matrices. From the figure, it is clear that the robots successfully dropped both of the targets’ uncertainty below the threshold, depicted as a dashed blue line. The online execution of our proposed scheme allows for an online update of the estimated location of the dynamic target when sensed, in contrast to the centralized approach where the estimated locations remain the same. This flexibility makes our algorithm more robust to poor prior estimates and eliminates the need for replanning like in the expert’s case ([40, Section VI.C]), since the updated mean will be immediately reflected in the heatmap.
V-C Robustness to Robot & Communication failure
This experiment examines the adaptability of I-GBNet to robot failure in target localization missions. In a obstacle riddled environment we assign robots to localize targets. The graph connectivity varies over time owing to the mobility of the agents for a given communication range
. We test for possible loss of packets during communication by introducing edge deletion based on a Poisson distribution. At time , a randomly selected agent exhibits catastrophic failure and is permanently isolated from the swarm. Our GNN-based approach demonstrates online robustness to alterations in the synthesis of the swarm. Despite robot failing, depriving the swarm of any information individually acquired through its sensing capabilities (see Figures 4(c) and 4(d); bottom-left target has been localized only from agent that is about to fail without propagating that information), the swarm re-deploys other robots to replace any acquired measurements from the failed robot not transmitted in the graph prior to failure, thus managing to localize all targets (see Figure 4(e); the swarm has managed to localize the bottom-left target). All the active robots reach a precision consensus regarding the determinants of the covariance matrices, i.e. , below the required threshold for localization (see Fig. 4(a)). However, robot or communications failure appear to have prolonged the average mission time by (see Fig.4(b)), as information propagation in the network is constantly impeded.
In this paper, we reduce the multi-robot Active Information Acquisition problem into a learning framework over graphs and introduce I-GBNet, an AIA-inspired GNN, that offers distributed decision-making, imitating a centralized expert. In contrast to other works, our method can deal with dynamic phenomena and is robust to changes in the communication graph. Experiments demonstrated that despite being trained on small graphs, our network successfully generalizes to previously unseen environments/robot configurations and scales well. In the future, we shall investigate on more applications of Active Information Acquisition, like occupancy mapping and we shall explore the applicability of attention layers to further encourage scalability and time-efficiency.
-  (2018) Cooperative collision avoidance for nonholonomic robots. IEEE Transactions on Robotics 34 (2), pp. 404–420. Cited by: §V.
-  (2013) Optimal reciprocal collision avoidance for multiple non-holonomic robots. In Distributed autonomous robotic systems, pp. 203–216. Cited by: §V.
-  (2014) Information acquisition with sensing robots: algorithms and error bounds. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 6447–6454. Cited by: Remark 1.
-  (2015) Decentralized active information acquisition: theory and application to multi-robot slam. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 4775–4782. Cited by: §I.
-  (2014) Joint estimation and localization in sensor networks. In 53rd IEEE Conference on Decision and Control, pp. 6875–6882. Cited by: §II-A, §III-B.
Relational inductive biases, deep learning, and graph networks. CoRR abs/1806.01261. External Links: Cited by: §III-B.
-  (2019) Dec-mcts: decentralized planning for multi-robot active perception. The International Journal of Robotics Research 38 (2-3), pp. 316–337. Cited by: §I, §I.
-  (2019) Collaborative visual area coverage using aerial agents equipped with ptz-cameras under localization uncertainty. In 2019 European Control Conference (ECC), Cited by: §I.
-  (2012) A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in games 4 (1), pp. 1–43. Cited by: §I.
-  (2019) Decentralized multi-target tracking in urban environments: overview and challenges. In 2019 22th International Conference on Information Fusion (FUSION), pp. 1–8. Cited by: §I.
Active slam and exploration with particle filters using kullback-leibler divergence. Journal of Intelligent & Robotic Systems 75 (2), pp. 291–311. Cited by: §I.
-  (2017) Learning to gather information via imitation. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 908–915. Cited by: §I.
-  (2006) A decentralized motion coordination strategy for dynamic target tracking. In Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006., pp. 2416–2422. Cited by: §I.
-  (2012) A decentralized control policy for adaptive information gathering in hazardous environments. In 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), pp. 2807–2813. Cited by: §I.
-  (2008) Target assignment for integrated search and tracking by active robot networks. In 2008 IEEE International Conference on Robotics and Automation, pp. 2354–2359. Cited by: §I.
-  (2022) Coverage control in multi-robot systems via graph neural networks. In 2022 International Conference on Robotics and Automation (ICRA), pp. 8787–8793. Cited by: §I.
Near-optimal sensor placements in gaussian processes.
Proceedings of the 22nd international conference on Machine learning, pp. 265–272. Cited by: §I.
-  (2016) Active information acquisition. arXiv preprint arXiv:1602.02181. Cited by: §I.
-  (2009) Mobile sensor network control using mutual information methods and particle filters. IEEE Transactions on Automatic Control 55 (1), pp. 32–47. Cited by: §I.
-  (2014) Sampling-based robotic information gathering algorithms. The International Journal of Robotics Research 33 (9), pp. 1271–1287. Cited by: §I.
-  (2019) Learning q-network for active information acquisition. CoRR abs/1910.10754. External Links: Cited by: §I.
-  (2021) Scalable active information acquisition for multi-robot systems. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 7987–7993. Cited by: §I.
-  (2019) Asymptotically optimal planning for non-myopic multi-robot information gathering.. In Robotics: Science and Systems, pp. 22–26. Cited by: §I, §I, §II-A.
-  (2004-11) Robot and sensor networks for first responders. Pervasive Computing, IEEE 3, pp. 24– 33. External Links: Cited by: §I.
-  (2009) On trajectory optimization for active sensing in gaussian process models. In Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference, pp. 6286–6292. Cited by: §I.
-  (2010) Information-rich path planning with general constraints using rapidly-exploring random trees. In AIAA Infotech at Aerospace Conference, Atlanta, GA, Cited by: §I.
-  (2020) Graph neural networks for decentralized multi-robot path planning. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 11785–11792. Cited by: §I, §V-A.
-  (2021) Message-aware graph attention networks for large-scale multi-robot path planning. IEEE Robotics and Automation Letters 6 (3), pp. 5533–5540. Cited by: §I.
-  (2018) Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6252–6259. Cited by: §V.
-  (2017) Informative planning and online learning with sparse gaussian processes. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 4292–4298. Cited by: §I.
-  (2022) Multi-robot persistent environmental monitoring based on constraint-driven execution of learned robot tasks. In ICRA 2022-IEEE International Conference on Robotics and Automation, Cited by: §I.
-  (2019) A review of mobile robots: concepts, methods, theoretical framework, and applications. International Journal of Advanced Robotic Systems 16 (2), pp. 1729881419839596. Cited by: §I.
-  (2022) Adaptive informative path planning using deep reinforcement learning for uav-based active sensing. In 2022 International Conference on Robotics and Automation (ICRA), pp. 4473–4479. Cited by: §I.
-  (2018) Anytime planning for decentralized multirobot active information gathering. IEEE Robotics and Automation Letters 3 (2), pp. 1025–1032. Cited by: §I.
-  (2005-05) Global a-optimal robot exploration in slam. pp. 661 – 666. External Links: Cited by: §I.
Efficient informative sensing using multiple robots.
Journal of Artificial Intelligence Research34, pp. 707–755. Cited by: §I.
-  (2020) Learning decentralized controllers for robot swarms with graph neural networks. In Conference on robot learning, pp. 671–682. Cited by: §I.
-  (2021) Multi-robot coverage and exploration using spatial graph neural networks. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 8944–8950. Cited by: §I.
-  (2021) Distributed sampling-based planning for non-myopic active information gathering. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5872–5877. Cited by: §I, §I, §II, §V-A, Remark 1.
-  (2022) Reactive informative planning for mobile manipulation tasks under sensing and environmental uncertainty. In 2022 International Conference on Robotics and Automation (ICRA), Vol. , pp. 7320–7326. External Links: Cited by: §V-B.