I Introduction
Multirobot systems are of interest for their potential in performing tasks which may not be feasible or desirable to do with only a single robot in applications such as perimeter surveillance [27, 31], patrolling missions [15, 28], searching operations [17, 2], and formation control [8, 34, 20]. For example, the task of surveilling a large area is often infeasible for one robot due to the robot’s limited coverage but can be accomplished by a team of robots under proper coordination. A major requirement for these applications is that the robots need to be localized within a common reference frame. That way, each robot can execute its designated subtask correctly and the team can collaboratively complete the full task. This requirement becomes trivial when there is a single global coordinate system can provide the state estimate for all robots, such as GPS, motion capture systems, aerial images [29]. However, such systems are often not available or reliable.
Another solution to this problem is to launch the robots in a predetermined spatial configuration with a common frame and let robots localize within this frame. This solution is obviously time consuming and requires significant effort since we either need to displace the robots at predetermined poses or to measure the relative poses between the robots at the beginning.
Alternative solutions rely on local sensing modalities such as bearing, range, and camera imaging to measure the instantaneous, relative pose between pairs of robots. These modalities can provide measurements which are either landmark features of the environment or direct relative poses. By allowing robots to collaboratively localize using these relative measurements, they can selflocalize in a common frame.
Methods relying on exchanging environment features are called map localization [13, 23, 37, 12, 35]. For example, Montijano et al. [23] and Leahy et al. [14] propose to use homography estimation [26] to compute the relative pose between two robots. The main problem with this approach is that the robots either need to maintain a map of features or have overlapping views with shared features, not to mention the challenge in finding good features in low or textureless environments. Another methods, including this study, rely on direct robottorobot measurements, is called mutual localization in [9].
Our study focus on localization for systems of multiple micro Aerial Vehicles (MAVs) which have relatively small sizes and weights than unmanned aerial vehicles and suitable for multiple applications such as surveillance, and search and rescue operations. Unlike ground vehicles, MAVs are subject to size, weight and payload (SWaP) constraints. As a result, cameras are often preferred over range finders and LiDARs to do the same task due to their compactness and light weight. The system of robots studied in this work, as shown in Fig. 1, are equipped with two types of sensor modalities: visionbased detection and measurement of other robots within each robot’s field of view; and visual inertial odometry (VIO) using stereo cameras and inertial measurement unit measurements. These robots are in length, inexpensive and lightweight, making them a great candidate for research and civilian applications.
Despite its appeals, mutual localization in such multiMAV systems is challenging due to a couple of reasons. First, our system of robots is homogeneous, visually similar, and using tagging or specific sensors for identifying robots are neither practical or desirable due to the SWaP constraints. Thus, the visionbased detection provides no identify information, leading to data association ambiguity. Secondly, the visionbased detection often yields false positives and false negatives. Furthermore, unlike range finders used in previous studies such as [11], the visionbased measurements of distance and bearing are quite noisy. These factors make the data association problem become even more challenging.
In short, we study the problem of multiMAV mutual localization under following assumptions,

Initial relative poses between robots are unknown.

Robot detection provides no identity information.

Robot detection can include false negatives and false positives.

Visionbased distance and bearing measurements are noisy.
Our main contributions in this work include:

We introduce a vision plus IMU framework for localization that uses relative distance and bearing measurements, on a SWaPconstraint platform.

We propose an extension of CPDAF with a simple but effective gating and evaluating mechanism to keep the number of hypotheses manageable.

We demonstrate how onboard sensing, estimation and control can be used for formation flight.
Our method is applicable to a system with any kind of distance and bearing sensors but the implementation introduced in this work focuses on using camera sensors for these measurements.
Ii Related Work
Mutual localization has attracted a large amount of research works. For example, Spica et al. [32] address the problem of estimating the formation scale in the context of bearingbased formation localization for multiple robots. In [16]
, authors propose an Extended Kalman Filter for the estimation of each follower position and orientation with respect to the leader, using bearing information only. However, those works do not consider the problem caused by unknown data association which plays an important role in mutual localization. For instance, Mehrez et al.
[18] assume that robots are able to uniquely identify each of the observed robots in their field of view and measure their relative range and bearing.The literature has investigated in providing relative measurements with robot identities via tagging. Some recent examples using this method for relative localization of a team of aerial robots include [33] where authors use colored circular markers on the robots to obtain relative bearing between the robots. De silva et al. [6] develop 3D sensor nodes employing ultrasonicbased range measurement and infraredbased bearing measurement for spatial localization of robots. Dias et al. [7] utilize active markers to identify unique ID of quadrotors based on pulsating at a predefined frequency. The main disadvantage of these methods is that they do not scale well with the number of robots.
Recent approaches directly deal with unknown robots’ identities and attempt to estimate these identities together with robot localization. Chang et al. [4] propose a maximum likelihood data association algorithm with a threshold gating on the Mahalanobis distance between the incoming measurement and the expected measurement. The problem with this method is that the selected measurement may not be the correct one due to the inaccuracy of the measurements, leading to filter divergence. In this work, we propose a probabilistic data association framework that can handle the noisy measurements.
The problem of mutual localization with anonymous relative measurements was first considered by Franchi et al. [9, 10, 11]. In [9], authors introduce a twophase localization system in which a multiple registration algorithm to build data association hypotheses is followed by a MultiHypothesis EKF to prune out hypotheses inconsistent with the current belief. Their successive work [10] proposes to feed back the belief in system’s spatial configuration to handle the worst case scenario where the computation can be a factorial function of number of agents when the spatial arrangement of the robots is rotational symmetric. In [11]
, they improve the algorithm further by using particle filters to replace EFK filters. Compared to our work, theirs rely on an assumption that the posterior probability distribution functions of robot states are independent so that each particle filter can be feasibly updated in a separate manner. Their frameworks also suffer from adding computation to maintain and update particle filters, especially when scaling up the number of robots or in case the robot state has a high dimension.
Iii Problem Formulation
Let us consider a problem formulation with a team of homogeneous robots , is known. Beside the the attached moving frame , each robot is attached to a fixed frame as shown in Fig. 1 such that the axis of the frame is on the same direction with the gravity. In the follows, we describe the mutual localization problem whose objective is to localize every to a common frame, which can be any in the set . Without loosing the generalization, we choose to be .
Suppose and , denote the translation and rotation between robot and frame , respectively. Then, localizing robot in frame is equivalent to estimate . We can define a set involving all variables that we aim to estimate. Our problem becomes,
(1) 
where is the ground truth.
Iv The Stochastic Model
Before representing the proposed approach, we first define discrete models for the system dynamics and observation measurements.
Iva The System State Model
Looking from the chosen common frame , we have,
(2) 
Thus, the odometry measurement comes from the VIO system of robot ,
(3) 
The detection measurement generating from robot detected by robot can be presented as,
(4) 
The first two equations in Eq. 2 show that to achieve the rotation and translation of robot in frame , given only local measurements, we need to know the rotation and translation of frame in . Thus, we define the state of robot as
Note that, we substitute by to make it convenient to define the state equation and that these two variables can be derived from each other. The coupled state system can be defined as
We can decompose the rotation into two parts,
corresponds to the rotation around the gravity vector, and
corresponds to the rotation on the plane perpendicular to the gravity vector,In a VIO system, only rotation along axis is unobservable. Thus, we can assume that is known, leaving only needs to estimate. Furthermore, every frame is defined to be different only in the rotation around axis, denoted as . Thus, and we can rewrite the individual robot state,
We utilize a linear system whose input is the velocity. The velocity input is assumed to be corrupted with i.i.d zeromean Guassian noise. The robots’ position and yaw can be modelled as follows
Frames have unknown, fixed transformations with respect to frame but there can be drifts due to errors from VIO systems,
We can write the state equation in a standard form
where
is the covariance matrix of the i.i.d Gaussian noise. We discretize this continuous time system using zerohold for the input
(5) 
where is the current time step, , , , is the sampling time.
IvB The Measurement Model
During the update process, we update two types of measurements in a decoupled manner.
IvB1 Odometry Measurements
Odometry measurements from VIO systems directly provide each robot’s individual state with respect to its own frame. We rewrite Eq. 3 as follows, taking into account a Gaussian noise,
where denotes noises. There is no data association involved in this partial update step.
IvB2 Detection Measurements
Detection measurements from the visionbased detection can help estimate the pose of robots detected. We assume that robots can detect all other robots with detection probability and some probability of false positive, false negative. This assumption can be achieved using cameras and that robots are in the range of detection. Eq. 4
can be rewritten with white noise added, for true measurements,
Let
be random variable representing the number of false positives at time
. We assumeto has Poisson distribution,
IvB3 Detection Measurement Permutation
To handle the unknown data association as well as the false positives, false negatives, we define the following helper variables, similar to those in [3]. Note that for each robot, we have targets.

[leftmargin=*]

, number of measurements at current time on robot

, an indicator that tells whether robot is detected by robot among target measurements

, a dimensional vector stack of all

, the number of detected robots on robot

, a , a permutation of true measurements among relative measurements at current time
Given measurements on robot , it can be understood that is a possible outcome on which robots are actually detected among targets. There can be such outcomes. Given an outcome , there could be many possible ways to match detection measurements with the detected robots. We do not know for sure since there is no identity information in the detection measurements. Each is a possible match. Thus, by combining and , we can cover all possible association events happened to the detection measurements on robot . We call a tuple a data association hypothesis, or simply a hypothesis. The approach introduced in this work centers around finding all feasible hypotheses and updating the system state based on the probability of each hypothesis, for every robot .
V Method
As can be seen from Eq. 4, each detection measurement depends on the state of multiple robots, making it improper to use the standard JPDA filter [1] for state estimation. Instead, we develop an extension of CPDA filter [3] for the nonlinear measurement model to update the system state over time using both detection measurements and odometry measurements. Our approach iteratively treats each robot as a station while others are targets and applies CPDAF on the system. To simplify the notations, in the follows, we represent our extension of CPDAF in case a robot served as the station with targets, detection measurements and D dimensional states.
Va CPDAF Extension for Nonlinear Measurement Model
VA1 CPDAF Step 1  Prediction
We denote as the state and covariance of the state at time step , respectively. A prior estimate for the state and covariance at time step is obtained by probagating through the system model in Eq. 5,
(6)  
VA2 CPDAF Step 2  Gating
This step aims to reduce the number of possible measurements in the measurement set that can be assigned to each target robot. In our problem, the detection measurement is nonlinear, due to Eq. 4. Thus the gating for target robot
(7) 
Where is the innovation covariance matrix computed as same as in UKF [36] and is the block of this matrix corresponding to target . is a threshold taken from the inverse chisquared cumulative distribution at a significance level
and the degree of freedom equal to dimension of
.VA3 CPDAF Step 3  Evaluation of Hypotheses
As [5] pointed out, the total number of hypotheses for a set of measurement on a robot is
(8) 
This can make evaluating them over time intractable when and are large. However, this assumption is not valid in our case where a detection measurement depends on the state of two robots. To tackle this problem, we propose an efficient evaluating algorithm. This algorithm, inspired by [38], starts by creating an association hypothesis tree of depth where each level of the tree represents the matching for a target robot. Each level consists of some nodes representing all possible detection measurements that can be assigned to that target, including  an indication that the target is not detected. Thus, a valid hypothesis is a path connecting the root with one single node on every level such that each node exists exactly one time on the path, except . The remaining of this algorithm is to do a depthfirst traversal to find all those paths. We omitted the algorithm’s details here due to the space limit. An illustration of the algorithm is shown in Fig. 2(a).
VA4 CPDAF Step 4  Measurementbased Update
As mentioned in section IV, the measurementbased update is separated into two update steps. The first update is based on the odometry measurement, as same as in UKF [36], and the second update is based on the detection measurement.
The later update is based upon the list of valid hypotheses obtained after step 2 and step 3. We extend CPDAF to compute the probability of a hypothesis [3] in case the measurements are nonlinear,
where is the normalization factor, is the false observation spatial density, is the detection probability of a target robot. can vary among robots.
where
with is the system state at current time which is dimensional, is the dimensional observation matrix, is the dimensional vector of stacked detection observations, is a binary matrix with row equal to nonzero row of diag, and is the innovation covariance matrix, computed as same as in UKF [36].
Based on [3], we derive the state and covariance udpate as follows.
where is the Kalman gain, is the measurementstate crosscovariance matrix.
VB Time Complexity Analysis
Step 1  executing model prediction takes .
Step 2 iterates over target measurements on every robot to prune out measurements that are not in the validation gate. Step 2 has time complexity.
Step 3 essentially carries out a depthfirst traversal over all nodes and edges. In the worst case, each level, excepting the root, consists of nodes and since every node in a level is connected to every node in the next level, there will be edges connecting two levels. Thus, the total of nodes and edges, or the worst case time complexity for traversing the tree is .
The running time for the first update in step 4 is as same as in UKF [36]. The second update depends on how many valid hypotheses selected. In the worst case, this number is exponential with respect to the number of robots and measurements, making step 4 exponential in time of computation. In practice, this number is largely reduced thanks to step 2 and step 3. In addition, some technique such as best hypothesis can be used to make step 4 manageable. best hypothesis algorithm [24, 21] is . In our case, the complexity of finding best hypotheses on a tree is
Vi Experiments
We evaluate the efficiency of the proposed framework and algorithms on a simulation derived from experimentallyobtained measurement models.
Via Simulation Settings
The simulation is in ROS where the robots simulate our FLA Falcon 250 platforms as shown in Fig. 1. In reality, each robot is featured with an Open Vision Computer [30] consisting two grayscale Python1300 cameras to provide odometry measurements as well as detection measurements. Robots are simulated with in diameter and weigh . The measurements that robots receive simulate measurement models that we obtain by doing real experiments on real robot platforms.
ViA1 Odometry Measurement Model
The VIO error is modeled as a multivariate Gaussian distribution,
where the standard deviation
is for elements in the transition and for euler angle elements in the orientation.ViA2 Detection Model
We utilize MAVNet [25], a lightweight and fast network for visionbased robot detection. The output segmentation is used to estimate the distance from the camera to the target as well as the bearing, assuming the robot’s dimension in 3D world is known.
ViB Evaluation Metrics
We use two metrics: 1) in Eq. 1 that evaluates the absolute state  the state of robots within the common fixed frame, , and 2) that evaluates the relative state  the state of robots within the common moving frame, i.e. frame of robot .
(9) 
where is the set of relative poses, is the ground truth.
ViC State Estimation Performance
We evaluate the performance of our algorithm in comparison with a nave method which assumes to know the initial system’s configuration and integrates the VIO measurements over time. In each experiment, a team of homogeneous robots are controlled to form different spatial configurations in a centralized manner. Each robot is controlled to follow a predefined path using its true state. The first experiment results depicted in Fig. 4, show that our state estimation converges very fast and matches the nave method in the absolute state error while performs better in the relative state error. Fig. 5 also shows our method’s superior performance in other three experiments in relative state errors. Absolute state errors are omitted since both methods perform similarly.
ViD Formation Control Using Estimated State
We demonstrate an use case of our approach by designing an openloop controller for formation flight. Our controller takes as input the initial estimate of the system state and determines an optimal path for every robot using minimum snap trajectory generation [19]. Each robot is then controlled to the final state using its current estimated state as the feedback. All steps, from state estimation to control is done onboard.
We launch robots forming a line of and evaluate which measures the convergence of the relative state estimate versus the final relative state.
(10) 
Fig. 5(d) shows the error converges to illustrating that our controller is able to converge to the desire state when the robot team forms the desirable line.
ViE Effectiveness of Gating and Hypothesis Evaluation
Fig. 2(b) shows the computation time of our proposed approach with various number of robots in two different settings: with and without using gating and hypothesis evaluation steps. Since these two steps eliminate unnecessary hypotheses, they keeps the algorithm run much faster. The more number of robots, the higher time saving can be achieved, starting from times with robots to times for robots.
Vii Conclusions
This work introduces CPDAF algorithm to address localization in SWaPconstrainted aerial platforms. We address three main challenges: 1) unknown data association in visionbased relative measurements and robot targets, 2) the need to boostrap the system from an unknown initial configuration, and 3) noisy visionbased measurements with false negatives, false positives. Experiments in simulation based on experimentallyderived models of measurements demonstrate the superior performance of our approach. We show how our state estimation can be used in a simple openloop controller, extending the capability of using onboard sensing for estimation and control in formation flight. Our future work is to develop a closeloop controller as well as improve the hypothesis evaluation step.
References
 [1] (2009) The probabilistic data association filter. IEEE Control Systems Magazine 29 (6), pp. 82–100. Cited by: §V.
 [2] (2011) Exploration strategies based on multicriteria decision making for searching environments in rescue operations. Autonomous Robots 31 (4), pp. 401. Cited by: §I.
 [3] (2000) Probabilistic data association avoiding track coalescence. IEEE Transactions on Automatic Control 45 (2), pp. 247–259. Cited by: Visionbased MultiMAV Localization with Anonymous Relative Measurements Using Coupled Probabilistic Data Association Filter, §IVB3, §VA4, §VA4, §V.
 [4] (2011) Visionbased cooperative simultaneous localization and tracking. In 2011 IEEE International Conference on Robotics and Automation, pp. 5191–5197. Cited by: §II.
 [5] (2010) The jpdaf in practical systems: computation and snake oil. In Signal and Data Processing of Small Targets 2010, Vol. 7698, pp. 769813. Cited by: §VA3.
 [6] (2014) An ultrasonic and visionbased relative positioning sensor for multirobot localization. IEEE Sensors Journal 15 (3), pp. 1716–1726. Cited by: §II.
 [7] (2016) Onboard visionbased 3d relative localization system for multiple quadrotors. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 1181–1187. Cited by: §II.
 [8] (2012) Modeling and control of uav bearing formations with bilateral highlevel steering. The International Journal of Robotics Research 31 (12), pp. 1504–1525. Cited by: §I.
 [9] (2009) Mutual localization in a multirobot system with anonymous relative position measures. In 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3974–3980. Cited by: §I, §II.
 [10] (2010) On the solvability of the mutual localization problem with anonymous position measures. In 2010 IEEE International Conference on Robotics and Automation, pp. 3193–3199. Cited by: §II.
 [11] (2013) Mutual localization in multirobot systems using anonymous relative measurements. The International Journal of Robotics Research 32 (11), pp. 1302–1322. Cited by: §I, §II.
 [12] (2017) Ultrawideband based cooperative relative localization algorithm and experiments for multiple unmanned aerial vehicles in gps denied environments. International Journal of Micro Air Vehicles 9 (3), pp. 169–186. Cited by: §I.

[13]
(2014)
Multirobot pose graph localization and data association from unknown initial relative poses via expectation maximization
. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 593–600. Cited by: §I.  [14] (2019) Control in belief space with temporal logic specifications using visionbased localization. The International Journal of Robotics Research 38 (6), pp. 702–722. Cited by: §I.
 [15] (2013) A decentralized architecture for multirobot systems based on the nullspacebehavioral control with application to multirobot border patrolling. Journal of Intelligent & Robotic Systems 71 (34), pp. 423–444. Cited by: §I.
 [16] (2005) Visionbased localization of leaderfollower formations. In Proceedings of the 44th IEEE Conference on Decision and Control, pp. 635–640. Cited by: §II.
 [17] (2009) Multirobot exploration and fire searching. In 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1929–1934. Cited by: §I.
 [18] (2017) An optimization based approach for relative localization and relative tracking control in multirobot systems. Journal of Intelligent & Robotic Systems 85 (2), pp. 385–408. Cited by: §II.
 [19] (2011) Minimum snap trajectory generation and control for quadrotors. In 2011 IEEE International Conference on Robotics and Automation, pp. 2520–2525. Cited by: §VID.
 [20] (2008) Distributed multirobot task assignment and formation control. In 2008 IEEE International Conference on Robotics and Automation, pp. 128–133. Cited by: §I.
 [21] (1997) Optimizing murty’s ranked assignment method. IEEE Transactions on Aerospace and Electronic Systems 33 (3), pp. 851–862. Cited by: §VB.
 [22] (2018) Fast, autonomous flight in gpsdenied and cluttered environments. Journal of Field Robotics 35 (1), pp. 101–120. Cited by: Fig. 1.
 [23] (2016) Visionbased distributed formation control without an external positioning system. IEEE Transactions on Robotics 32 (2), pp. 339–351. Cited by: §I.
 [24] (1968) An algorithm for ranking all the assignments in order of increasing cost. Operations Research 16 (3), pp. 682–687. External Links: ISSN 0030364X, 15265463, Link Cited by: §VB.
 [25] (201910) MAVNet: an effective semantic segmentation micronetwork for mavbased tasks. IEEE Robotics and Automation Letters 4 (4), pp. 3908–3915. External Links: Document, ISSN 23773766 Cited by: §VIA2.
 [26] (2018) Unsupervised deep homography: a fast and robust homography estimation model. IEEE Robotics and Automation Letters 3 (3), pp. 2346–2353. Cited by: §I.
 [27] (2013) Decentralized controllers for perimeter surveillance with teams of aerial robots. Advanced Robotics 27 (9), pp. 697–709. Cited by: §I.
 [28] (2016) Cooperative multirobot patrol with bayesian learning. Autonomous Robots 40 (5), pp. 929–953. Cited by: §I.
 [29] (2018) Selflocalization of anonymous mobile robots from aerial images. In 2018 European Control Conference (ECC), pp. 1094–1099. Cited by: §I.
 [30] (2019) The open vision computer: an integrated sensing and compute system for mobile robots. In 2019 IEEE International Conference on Robotics and Automation (ICRA), Cited by: §VIA.
 [31] (2016) Dynamic perimeter surveillance with a team of robots. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 5289–5294. Cited by: §I.
 [32] (2016) Active decentralized scale estimation for bearingbased localization. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5084–5091. Cited by: §II.
 [33] (2016) A distributed optimization framework for localization and formation control: applications to visionbased measurements. IEEE Control Systems Magazine 36 (4), pp. 22–44. Cited by: §II.
 [34] (2012) Decentralized formation control with variable shapes for aerial robots. In 2012 IEEE international conference on robotics and automation, pp. 23–30. Cited by: §I.
 [35] (2016) Vision based collaborative localization for multirotor vehicles. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1653–1658. Cited by: §I.
 [36] (2000) The unscented kalman filter for nonlinear estimation. In Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No. 00EX373), pp. 153–158. Cited by: §VA2, §VA4, §VA4, §VB.
 [37] (2019) A robust localization system for multirobot formations based on an extension of a gaussian mixture probability hypothesis density filter. Autonomous Robots, pp. 1–20. Cited by: §I.
 [38] (1993) Multitarget tracking in clutter: fast algorithms for data association. IEEE Transactions on aerospace and electronic systems 29 (2), pp. 352–363. Cited by: §VA3.
Comments
There are no comments yet.