Recent advances in sensing and perception have enabled the deployment of mobile robots in unknown environments for information gathering missions such as environmental monitoring , , surveillance , coverage , , target tracking , and active-SLAM , . These tasks require spatio-temporal information collection, which can be achieved more efficiently by multi-robot systems, rather than relying on individual robots. To avoid the need for a central user computing sensor-based control policies for large multi-robot systems, distributed control frameworks are needed allowing robots to make decisions locally. In this paper, we are interested in designing distributed control policies for a team of mobile sensing robots tasked with actively reducing the accumulated uncertainty of a dynamic hidden state over an a priori unknown horizon while satisfying user-specified accuracy requirements. In particular, first we formulate the Active Information Acquisition (AIA) problem as a stochastic optimal control problem. Then, building upon the separation principle presented in , we convert the stochastic optimal control problem into a deterministic optimal control problem via the use of a Distributed Kalman Filter (DKF) for which offline/open-loop control policies are optimal. To solve the resulting deterministic optimal control problem, we propose a novel distributed sampling-based method under which all robot incrementally and in parallel build their own directed trees that explore both their respective robot motion and information space. To build these trees, the robots communicate with each other over an underlying connected communication network to exchange their beliefs about the hidden state which are then fused using a DKF . We show that the proposed algorithm is probabilistically complete and asymptotically optimal. The proposed scheme is evaluated through extensive simulations for a target localization and tracking scenario. Literature Review: Relevant works that address informative planning problems can be categorized into myopic/greedy and non-myopic (sampling-based and search-based). Myopic approaches typically rely on gradient-based controllers that although they enjoy computational efficiency, they suffer from local optima [12, 13, 14, 15, 16, 17, 18]. In  a decentralized, myopic approach is introduced where the robots are driven along the gradient of the Mutual Information (MI) between the targets and the sensor observations. To mitigate the issue of local optimality, non-myopic search-based approaches have been proposed that can design optimal paths . Typically, these methods are computationally expensive as they require exploring exhaustively both the robot motion and the information space in a centralized fashion. More computationally efficient but suboptimal controllers have also been proposed that rely on pruning the exploration process and on addressing the information gathering problem in a decentralized way via coordinate descent [20, 9, 21]. However, these approaches become computationally intractable as the planning horizon or the number of robots increases as decisions are made locally but sequentially across the robots. Monte Carlo Tree Search  has recently gain popularity for online planning in robotics. Best et. al  suggest the Dec-MCTS algorithm that efficiently samples the individual action space of each robot on contrary to our proposed method and then coordinates with a sparse approximation of the joint action space. To do so, the robots must exchange branches of their Monte-Carlo trees, burdening further the communication needs. Nonmyopic sampling-based approaches have also been proposed due to their ability to find feasible solutions very fast, see e.g., [24, 25, 26, 27]. Common in these works is that they are centralized and, therefore, as the number of robots or the dimensions of the hidden states increase, the state-space that needs to be explored grows exponentially and, as result, sampling-based approaches also fail to compute sensor policies because of either excessive runtime or memory requirements. A more scalable but centralized sampling-based approach is proposed in 
that requires communication to a central user for both estimation and control purposes. Building upon our previous work, we propose a novel distributed sampling-based algorithm that scales well with the number of robots and planning horizon while allowing the robots to locally and simultaneously explore their physical and reachable information space, exchange their beliefs over an underlying communication network, and fuse them via a DKF. Contributions: The contribution of this paper can be summarized as follows. First, we propose a distributed nonmyopic sampling-based approach for information-gathering tasks. Second, we propose the first distributed sampling-based information gathering algorithm that is probabilistically complete and asymptotically optimal, while it signfinicantly decreases the computational complexity per iteration of its centralized counterpart . Third, we provide extensive simulation results that show that the proposed method can efficiently handle large-scale estimation tasks.
Ii Problem Formulation
Ii-a Robot Dynamics, Hidden State, and Observation Model
Consider a team of mobile robots that reside in a complex environment where is the dimension of the workspace. Obstacles of arbitrary shape are denoted as and the obstacle-free area is The dynamics of the robots are governed by the following equation
where describes the state of robot at discrete time (e.g position), is the control input applied to robot at time from a finite space of admissible control inputs. The task of the robots is to collaboratively estimate a hidden state evolving in governed by the following dynamics
where and denote the hidden states and the process noise at time , respectively. Robots are equipped with sensors (e.g., cameras) which allow them to take measurements associated with the unknown state . Hereafter, we assume that the robots can generate measurements as per the following observation model
where is the measurement signal at time and is a sensor-state dependent measurement noise. Signals observed by robot are independent of other robots’ observations.
The dynamics of the state in (2), the control input the observation model (3), and process and measurement noise covariances and are known for all time instants . This assumption is common in the literature ,  and it is required for the application of a Kalman filter to estimate the hidden states.
Ii-B Distributed Kalman Filter for State Estimation
Given a Gaussian prior distribution for , i.e., , and measurements, denoted by , that robot has collected until a time instant , robot
computes a Gaussian distribution that the hidden state follows at time, denoted by , where and denote the a-posteriori mean and covariance matrix. To compute this local Gaussian distribution, we adopt the Distributed Kalman Filter (DKF) algorithm proposed in . To this end, we assume that the robots formulate an underlying communication network modeled as an undirected graph . The sets and denote the set of vertices, defined as and indexed by the robots, and set of edges where existence of an edge means that robots and can directly communicate with each other. Given, such a communication graph, every robot updates its respective Gaussian distribution as follows:
where , is the set of nodes (neighbors) connected to robot , .
Ii-C Active Information Acquisition
The quality of measurements taken by robot up to a time instant , , can be evaluated using information measures, such as the mutual information between and or the conditional entropy of given . Given initial robot states and a prior distribution of the hidden states , our goal is to compute a sequence of control policies and a planning horizon , for all robots and time instants which solves the following stochastic optimal control problem:
where the objective in (5a) captures the accumulated mutual information between the measurements received up to time and the state of all robots and stands for the concatenated sequence of control inputs applied to the robots from until . The constraint in (5b) requires that at the end of the planning there exists at least one robot that has a comprehensive view of the state with a certain confidence . Moreover not including constraint (5b) would result in robots to stay put, i.e . The constraints (5c), (5e) capture the robot and sensor model, respectively, while constraint (5f) is the hidden state model. Obstacle avoidance is ensured through constraint (5d). Active Information Problem (5) is a stochastic optimal control problem for which closed-loop policies outclass open loop ones. Nonetheless, the linear relation between measurement and state in (3) and the gaussian assumptions transform the stochastic optimal control problem (5) into a deterministic optimal control problem where open loop policies are optimal . The principle is extended for our case where robots communicate with neighbors according to a communication graph and fuse their beliefs via DKF (4) and an open-loop control sequence exists which is optimal in (5). Furthermore Problem (5) can be transformed to the following deterministic control problem
where is the DKF update rule in (4), is the concatenated sequence of control inputs of all robots from until and is the resulting uncertainty threshold from the definition of mutual information and (5b). The robots collaboratively solve the objective (6a) with each constraint applying individually to them. Note that solving (6) requires the robots to exchange their covariance matrices over the communication network due to (6e).
Iii Distributed Sampling Based Active Information Acquisition
A novel distributed sampling-based method to solve (6) is proposed which is summarized in Algorithm 1. In the proposed method, each robot builds, in parallel with all other robots, a local directed tree that explores both the information space and its robot motion space while exchanging information with neighboring robots collected in , opposed to the global tree built in  resulting in decreasing the computational complexity significantly.111Throughout the paper robot shall be used as a reference. Same procedure is followed for all the robots unless stated differently. The proposed distributed sampling-based algorithm is presented in Algorithm 1. In what follows, we denote the tree built by robot as , where is the set of nodes and denotes the set of edges. The set of nodes collects states of the form where is a set that collects nodes for , that participate in the update rule of as per (4) and , where denotes the -th element in the set ; see Section III-A2. The root of the tree is defined as where are the initial state of the robot and prior covariance, respectively, and . The cost function describes the cost of reaching a node starting from the root of the tree . The cost of the root is , while the cost of is equal to
where is the parent node of . Applying (7) recursively results in which is equal to the interior sum of the objective in (6). Thus, the summation of costs of individual tree paths over all robots yields the objective function in (6a). The tree is initialized so that and [line 1, Alg. 1]. The construction of the trees lasts for iterations and is based on two procedures sampling [line 1-1, Alg. 1] and extending-the-tree [line 1-1, Alg. 1]. A solution consisting of a terminal horizon and a sequence of paths control inputs is returned after iterations.
To extract such a solution, we need first to define the goal set that collects all states in the -th tree that satisfy [lines 1-1, Algorithm 1]. After iterations, every robot with non-empty goal set, collected in the set , selects a node in and computes the path connecting it to the root of its tree. This path is then propagated to all robots in the network in a multi-hop fashion through the communication graph . Using the sets , all robots compute a path in their trees that they should follow so that robot can follow its selected/transmitted path. The resulting paths are then transmitted back to robot which computes the total cost as per (6a). This process is repeated for all nodes in and for all robots in , [lines 1-1, Alg. 1]. Among all resulting team paths, the robots select the team path with the minimum cost as (6a), [lines 1-1, Alg. 1].
Iii-a Incremental Construction of trees
Iii-A1 Sampling Strategy
The sampling procedure begins by first dividing the set of nodes into a finite number of sets . If is the number of created sets during iteration then . The set collects nodes that share the same robot state ; see Fig. 0(a). At iteration it holds that and [line 1, Alg. 1]. Once the sets are created, the nodes will be expanded where index is sampled from a given distribution and points to the set [line 1, Alg. 1]. Given nodes to be expanded with corresponding robot state , a control input is sampled from a distribution . The new robot state is created by applying the selected control input to according to (1) [line 1, Alg. 1].
Remark 1 (Diversity of trees)
The selection of which group is expanded is independent between different robots. Due to the randomness imposed by the aforementioned sampling procedures, the constructed trees at iteration may have different structures among the robots.
Iii-A2 Extending the tree
Once is constructed, we check whether it belongs to the obstacle-free area or not. In the latter case, the sample is rejected and the sampling procedure is repeated [line 1, Alg. 1]. Otherwise, the state is constructed; see Fig. 0(b). Specifically, given the parent node and , the construction of requires the computation of and . To what follows we assume that and lie at depth and respectively.222To simplify notation we drop the rand and new subscripts since the relation breaks down to parent and child node where depths and discriminates them. The update rule in (4) requires covariance matrix obtained from node and from neighbors . From Remark 1, at depth there may exist more than one nodes to provide or no nodes. Moreover, the covariance matrix of node was once constructed using information received by the set of nodes , with and . To maintain consistency with the update rule in (4) only children of the set of nodes are allowed to distribute their corresponding covariance matrices. For every robot the set of admissible nodes to share with robot is . Given the set node is sampled from a given discrete distribution and the set is created by collecting the sampled nodes from all [lines 1-1, Alg. 1]; see also Fig. 2. The covariance matrix is then computed by applying the update rule in (4) and node is completed [lines 1-1, Alg. 1]. Next, the set of nodes and edges are updated and the cost of node is computed as [lines 1-1, Alg. 1]. Finally, the sets are updated, so that if there already exists a subset with the same configuration as the state , then . Otherwise, a new subset is created and the number of subsets increases by one, i.e. [lines 1-1, Alg. 1].
Remark 2 (Communication)
The update of in Section III-A2 is the only procedure for which robots need to communicate with each other to build their trees. The information robot sends to neighbors is the set . In exchange it receives covariance matrices .
Remark 3 (Set of candidate nodes)
In the case where the set of candidate nodes is empty, covariance matrix of node will be transmitted.
Iv Complexity, Completeness and Optimality
In this section, we show that Algorithm 1 reduces the computational complexity per iteration of its centralized counterpart . We further show that it is probabilistically complete and asymptotically optimal under the following assumptions regarding the mass functions . The proofs of the following results can be found in Section VIII.
Assumption 2 (Probability mass function )
obability mass function satisfies for some that remains constant across all iterations (ii) are drawn independently across iterations.
Assumption 3 (Probability mass function )
obability mass function satisfies , for some that remains constant across all iterations (ii) Samples are drawn independently across iterations.
Assumption 4 (Probability mass function )
obability mass function satisfies , where (ii) Samples are drawn independently across iterations.
Remark 4 (Mass functions)
Assumptions 2-4 are defined in such a way to ensure that the sets (i) , (ii) and (iii) have non-zero measure. They are important to ensure the exploration of the entire physical and reachable information space and the fusion of potential beliefs for the DKF. All of them are very flexible, since they allow and to change with iterations of Algorithm 1 , as the trees grow and to be different among the robots. Any mass functions could be used as long as they satisfy the aforementioned assumptions. A typical example is the discrete uniform distribution. In Section
, as the trees grow and to be different among the robots. Any mass functions could be used as long as they satisfy the aforementioned assumptions. A typical example is the discrete uniform distribution. In SectionV non-uniform distributions are suggested for a target-tracking scenario.
Theorem 1 (Probabilistic Completeness)
Theorem 2 (Asymptotic Optimality)
Theorem 3 (Complexity Per Iteration)
Let denote the maximum degree of the vertices of the communication graph , be the number of robots and be the number of nodes to be expanded in one iteration. Then the computational complexity per iteration of Algorithm 1 and the centralized approach  is and respectively, under a worst-case scenario complexity analysis.
V Biased Sampling - Target Tracking
In large-scale estimation problems where a large number of robots are equipped to estimate high-dimensional hidden states, mass functions and introduced in Section III could be designed such that the trees tend to explore regions that are expected to be informative. In this section, we consider an application to target localization for Algorithm 1. The hidden state now collects the positions of all targets at time , i.e , where is the position of target at time and is the number of targets. In this application, we require the constraint (6f) to hold for all targets for some , i.e where is the corresponding covariance matrix of target . In what follows, we design mass functions that allow us to address large-scale estimation tasks that involve large teams of robots and targets.
V-a Mass function
Let denote the depth of tree at iteration and be the set that collects indices that point to subsets for which there exists at least a node that lays at depth Given the set the mass function is designed as follows
where is the probability of selecting any group indexed by .
V-B Mass function
The mass function is designed such that control inputs that drive robot with position closer to target of predicted position at time are selected more often. If is the control input that minimizes the geodesic distance between and , i.e the mass function is constructed as follows
where (i) is the probability that input is selected (ii) is computed as per (1) (iii) is the predicted position of target from the Kalman filter prediction step and (iv) denotes the sensing range. Once the predicted position is within the robot’s sensing range, controllers are selected randomly.
V-C Mass function
The mass function is designed such that when more than one candidate nodes are available, node with the smallest uncertainty, i.e is selected with higher probability. Given the set the mass function is the following
where is the probability that node is selected to provide covariance matrix .
V-D On-the-fly Target Assignment
In Algorithm 2 we propose a target assignment process executed once the state is created. Initially robot inherits the assigned target of the parent node . Initially the sets and are created [Alg. 2, lines 2-2]. In particular, the set is the sorted set of indices of increasing order according to the geodesic distance . Moreover, the set collects the indices of the satisfied targets, i.e [line 2, Alg. 2]. Given the aforementioned sets, we can compute the sorted set of the unsatisfied targets [line 2, Alg. 2]. The set collects the indices that point to targets already assigned to neighbors , i.e [line 2, Alg. 2]. Once the sets are computed the robot sequentially seeks for the new target that is not already assigned to any of the neighbors and has not satisfied the uncertainty threshold [Alg. 2, lines 2-2].
Vi Numerical Experiments
In this section, we present numerical experiments that illustrate the performance of Algorithm 1 for the target localization and tracking problem, described in Section V. We are interested in examining the scalability of Algorithm 1 and compare performances for different communication networks. All case studies have been implemented using Python 3.6 on a computer with Intel Core i10 1.3GHz and 16Gb RAM. In this section, each robot uses differential drive dynamics with describing the position and orientation of robot at time . The control commands use motion primitives Furthermore, we assume that the robots are equipped with omnidirectional, range-only, line-of-sight sensors with limited range of 1m. The measurement received by robot regarding target is given by if where is the distance between robot and target , denotes the field-of-view of robot and is the measurement noise with if ; otherwise is infinite. For separation principle to hold and offline control policies to be optimal we linearize the observation model about the predicted target position. In all case studies the targets are modeled as linear systems and the threshold parameter is selected to be for all targets. The robots reside in the environment shown in Fig. 4. The DKF parameters are chosen as and if for a target ; otherwise the values interchange.
Vi-a Scalability Performance
In this Section we examine the scalability of the proposed distributed scheme with respect to the number of robots and targets for different communication networks. In Table I we provide averaged results from ten Monte-Carlo trials for random and fully connected communication networks and the centralized approach in . Especially for the random communication networks we state the average degree (A.D) of the vertices pointing to the robots. The results come from a sequential implementation of Algorithm 1 and the time required by the ”slowest” robot to build its tree is reported in Table I for the distributed scheme.
|N/M||A.D||Runtime / F||Runtime / F||Runtime / F|
|10/10||5.60||2.09 secs / 28||1.45 secs / 15||7.670 secs / 11|
|10/20||5.00||3.63 secs / 33||2.93 secs / 26||15.03 secs / 18|
|10/35||5.40||5.69 secs / 40||3.37 secs / 25||25.37 secs / 19|
|15/20||8.10||2.72 secs / 26||2.60 secs / 26||14.30 secs / 13|
|15/35||8.50||4.41 secs / 34||3.77 secs / 26||25.24 secs / 15|
|20/20||11.1||2.93 secs / 28||2.11 secs / 17||23.14 secs / 14|
|20/25||10.5||4.00 secs / 33||2.51 secs / 19||21.27 secs / 12|
|20/35||10.7||4.64 secs / 32||3.51 secs / 24||26.02 secs / 11|
|30/50||17.0||5.32 secs / 26||4.29 secs / 19||46.02 secs / 10|
We can notice that feasible paths are computed very fast for all connectivity cases regardless of the number of robots and targets. The planning horizon of the random connected networks is always larger since any transmitted information needs more timesteps until diffused. The centralized approach returns the smallest , as in contrast to DKF the update rule of Kalman Filter fuses measurement models of all robots and therefore the desired uncertainty threshold is reached faster. In Fig. 3 the average time of completion of one iteration is presented for all scenarios of Table I. We can see that the distributed approach improves the computational complexity per iteration by at least a factor of 10.
Vi-B Different communication networks
In this section we examine the performance of Algorithm 1 for different communication graph densities. Specifically, in Fig. 4, the trajectories for three different communication scenarios are presented for robots and targets (i) fully connected graph (ii) connected graph with three edges average degree and (iii) no communication. In the current example, the planning terminates once a robot realizes that the targets have been identified either by itself and/or by any of the neighboring robots.
The density of the communication network affects the returned paths and the planning horizon . Specifically, in the case of a fully connected network as in Fig. 3(a) the speed of fusing beliefs is high and thus robots reduce their uncertainty even for targets that have not been visited by themselves. The same idea extends in the case of a random connected network as in Fig. 3(b) with the only difference being that the fusion of beliefs has a smaller rate and therefore robots may revisit targets that have already been satisfied. In fact the less dense a graph is the more the planning horizon increases. In the case of no communication in Fig. 3(c) each robot plans completely on its own and therefore every target is visited. Kalman Filter updates may lead to an increase of the uncertainty over a dynamic target even if in the past it was well estimated. The robots then as in Fig. 3(c) oscillate between the two moving targets leading to a further increase of their planning horizon. The returned planning horizons of the aforementioned cases are , where for the disconnected network the maximum planning horizon out of the individual planning horizons of each robot is stated. In Fig. 3(d) the evolution of the uncertainties over time is depicted for the ”purple” robot of the fully connected case of Fig. 3(a). Thanks to the connectivity capabilities, we can see that the current robot has not reduced the uncertainty for the entire group of targets since it realizes that there exist other robots that took the responsibility of identifying the remaining targets.
A new distributed sampling-based algorithm for active information gathering is introduced in this paper followed by formal guarantees. The proposed scheme is tested via extensive simulations showing that feasible paths are quickly computed for large-scale problems, improving significantly also the computational time of existing scalable centralized method. In the future, time-varying communication graphs will be incorporated in the proposed framework.
-  K.-C. Ma, L. Liu, and G. S. Sukhatme, “Informative planning and online learning with sparse gaussian processes,” in 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2017, pp. 4292–4298.
-  Q. Lu and Q.-L. Han, “Mobile robot networks for environmental monitoring: A cooperative receding horizon temporal logic control approach,” IEEE transactions on cybernetics, vol. 49, no. 2, pp. 698–711, 2018.
-  E. W. Frew and J. Elston, “Target assignment for integrated search and tracking by active robot networks,” in 2008 IEEE International Conference on Robotics and Automation. IEEE, 2008, pp. 2354–2359.
-  M. Tzes, S. Papatheodorou, and A. Tzes, “Visual area coverage by heterogeneous aerial agents under imprecise localization,” IEEE control systems letters, vol. 2, no. 4, pp. 623–628, 2018.
-  T. Kusnur, D. M. Saxena, and M. Likhachev, “Search-based planning for active sensing in goal-directed coverage tasks,” arXiv preprint arXiv:2011.07383, 2020.
-  D. J. Bucci and P. K. Varshney, “Decentralized multi-target tracking in urban environments: Overview and challenges,” in 2019 22th International Conference on Information Fusion (FUSION). IEEE, 2019, pp. 1–8.
-  G. Huang, K. Zhou, N. Trawny, and S. I. Roumeliotis, “A bank of maximum a posteriori (map) estimators for target tracking,” IEEE Transactions on Robotics, vol. 31, no. 1, pp. 85–103, 2015.
L. Carlone, J. Du, M. K. Ng, B. Bona, and M. Indri, “Active slam and exploration with particle filters using kullback-leibler divergence,”Journal of Intelligent & Robotic Systems, vol. 75, no. 2, pp. 291–311, 2014.
-  N. Atanasov, J. Le Ny, K. Daniilidis, and G. J. Pappas, “Decentralized active information acquisition: Theory and application to multi-robot slam,” in 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2015, pp. 4775–4782.
-  ——, “Information acquisition with sensing robots: Algorithms and error bounds,” in 2014 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2014, pp. 6447–6454.
-  N. Atanasov, R. Tron, V. M. Preciado, and G. J. Pappas, “Joint estimation and localization in sensor networks,” in 53rd IEEE Conference on Decision and Control. IEEE, 2014, pp. 6875–6882.
-  T. H. Chung, J. W. Burdick, and R. M. Murray, “A decentralized motion coordination strategy for dynamic target tracking,” in Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006. IEEE, 2006, pp. 2416–2422.
-  R. Olfati-Saber, “Distributed tracking for mobile sensor networks with information-driven mobility,” in 2007 American Control Conference. IEEE, 2007, pp. 4606–4612.
-  G. M. Hoffmann and C. J. Tomlin, “Mobile sensor network control using mutual information methods and particle filters,” IEEE Transactions on Automatic Control, vol. 55, no. 1, pp. 32–47, 2009.
-  P. Dames, M. Schwager, V. Kumar, and D. Rus, “A decentralized control policy for adaptive information gathering in hazardous environments,” in 2012 IEEE 51st IEEE Conference on Decision and Control (CDC). IEEE, 2012, pp. 2807–2813.
-  B. Charrow, V. Kumar, and N. Michael, “Approximate representations for multi-robot control policies that maximize mutual information,” Autonomous Robots, vol. 37, no. 4, pp. 383–400, 2014.
-  F. Meyer, H. Wymeersch, M. Fröhle, and F. Hlawatsch, “Distributed estimation with information-seeking control in agent networks,” IEEE Journal on Selected Areas in Communications, vol. 33, no. 11, pp. 2439–2456, 2015.
-  M. Corah and N. Michael, “Distributed submodular maximization on partition matroids for planning on large sensor networks,” in 2018 IEEE Conference on Decision and Control (CDC). IEEE, 2018, pp. 6792–6799.
-  J. Le Ny and G. J. Pappas, “On trajectory optimization for active sensing in gaussian process models,” in Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference. IEEE, 2009, pp. 6286–6292.
A. Singh, A. Krause, C. Guestrin, and W. J. Kaiser, “Efficient informative
sensing using multiple robots,”
Journal of Artificial Intelligence Research, vol. 34, pp. 707–755, 2009.
-  B. Schlotfeldt, D. Thakur, N. Atanasov, V. Kumar, and G. J. Pappas, “Anytime planning for decentralized multirobot active information gathering,” IEEE Robotics and Automation Letters, vol. 3, no. 2, pp. 1025–1032, 2018.
-  C. B. Browne, E. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis, and S. Colton, “A survey of monte carlo tree search methods,” IEEE Transactions on Computational Intelligence and AI in games, vol. 4, no. 1, pp. 1–43, 2012.
-  G. Best, O. M. Cliff, T. Patten, R. R. Mettu, and R. Fitch, “Dec-mcts: Decentralized planning for multi-robot active perception,” The International Journal of Robotics Research, vol. 38, no. 2-3, pp. 316–337, 2019.
-  D. Levine, B. Luders, and J. P. How, “Information-rich path planning with general constraints using rapidly-exploring random trees,” in AIAA Infotech at Aerospace Conference, Atlanta, GA, 2010.
-  G. A. Hollinger and G. S. Sukhatme, “Sampling-based robotic information gathering algorithms,” The International Journal of Robotics Research, vol. 33, no. 9, pp. 1271–1287, 2014.
-  R. Khodayi-mehr, Y. Kantaros, and M. M. Zavlanos, “Distributed state estimation using intermittently connected robot networks,” IEEE Transactions on Robotics, vol. 35, no. 3, pp. 709–724, 2019.
-  X. Lan and M. Schwager, “Rapidly exploring random cycles: Persistent estimation of spatiotemporal fields with multiple sensing robots,” IEEE Transactions on Robotics, vol. 32, no. 5, pp. 1230–1244, 2016.
-  Y. Kantaros, B. Schlotfeldt, N. Atanasov, and G. J. Pappas, “Asymptotically optimal planning for non-myopic multi-robot information gathering.” in Robotics: Science and Systems, 2019, pp. 22–26.
Viii Appendix: Proof of Separation Principle, Completeness & Optimality
(Sampling ) Consider any subset , any fixed iteration index and any fixed . Then, there exists an infinite number of subsequent iterations , where and is a subsequence of