Multi-robot systems are of interest for their potential in performing tasks which may not be feasible or desirable to do with only a single robot in applications such as perimeter surveillance [27, 31], patrolling missions [15, 28], searching operations [17, 2], and formation control [8, 34, 20]. For example, the task of surveilling a large area is often infeasible for one robot due to the robot’s limited coverage but can be accomplished by a team of robots under proper coordination. A major requirement for these applications is that the robots need to be localized within a common reference frame. That way, each robot can execute its designated subtask correctly and the team can collaboratively complete the full task. This requirement becomes trivial when there is a single global coordinate system can provide the state estimate for all robots, such as GPS, motion capture systems, aerial images . However, such systems are often not available or reliable.
Another solution to this problem is to launch the robots in a predetermined spatial configuration with a common frame and let robots localize within this frame. This solution is obviously time consuming and requires significant effort since we either need to displace the robots at predetermined poses or to measure the relative poses between the robots at the beginning.
Alternative solutions rely on local sensing modalities such as bearing, range, and camera imaging to measure the instantaneous, relative pose between pairs of robots. These modalities can provide measurements which are either landmark features of the environment or direct relative poses. By allowing robots to collaboratively localize using these relative measurements, they can self-localize in a common frame.
Methods relying on exchanging environment features are called map localization [13, 23, 37, 12, 35]. For example, Montijano et al.  and Leahy et al.  propose to use homography estimation  to compute the relative pose between two robots. The main problem with this approach is that the robots either need to maintain a map of features or have overlapping views with shared features, not to mention the challenge in finding good features in low or texture-less environments. Another methods, including this study, rely on direct robot-to-robot measurements, is called mutual localization in .
Our study focus on localization for systems of multiple micro Aerial Vehicles (MAVs) which have relatively small sizes and weights than unmanned aerial vehicles and suitable for multiple applications such as surveillance, and search and rescue operations. Unlike ground vehicles, MAVs are subject to size, weight and payload (SWaP) constraints. As a result, cameras are often preferred over range finders and LiDARs to do the same task due to their compactness and light weight. The system of robots studied in this work, as shown in Fig. 1, are equipped with two types of sensor modalities: vision-based detection and measurement of other robots within each robot’s field of view; and visual inertial odometry (VIO) using stereo cameras and inertial measurement unit measurements. These robots are in length, inexpensive and light-weight, making them a great candidate for research and civilian applications.
Despite its appeals, mutual localization in such multi-MAV systems is challenging due to a couple of reasons. First, our system of robots is homogeneous, visually similar, and using tagging or specific sensors for identifying robots are neither practical or desirable due to the SWaP constraints. Thus, the vision-based detection provides no identify information, leading to data association ambiguity. Secondly, the vision-based detection often yields false positives and false negatives. Furthermore, unlike range finders used in previous studies such as , the vision-based measurements of distance and bearing are quite noisy. These factors make the data association problem become even more challenging.
In short, we study the problem of multi-MAV mutual localization under following assumptions,
Initial relative poses between robots are unknown.
Robot detection provides no identity information.
Robot detection can include false negatives and false positives.
Vision-based distance and bearing measurements are noisy.
Our main contributions in this work include:
We introduce a vision plus IMU framework for localization that uses relative distance and bearing measurements, on a SWaP-constraint platform.
We propose an extension of CPDAF with a simple but effective gating and evaluating mechanism to keep the number of hypotheses manageable.
We demonstrate how on-board sensing, estimation and control can be used for formation flight.
Our method is applicable to a system with any kind of distance and bearing sensors but the implementation introduced in this work focuses on using camera sensors for these measurements.
Ii Related Work
Mutual localization has attracted a large amount of research works. For example, Spica et al.  address the problem of estimating the formation scale in the context of bearing-based formation localization for multiple robots. In 
, authors propose an Extended Kalman Filter for the estimation of each follower position and orientation with respect to the leader, using bearing information only. However, those works do not consider the problem caused by unknown data association which plays an important role in mutual localization. For instance, Mehrez et al. assume that robots are able to uniquely identify each of the observed robots in their field of view and measure their relative range and bearing.
The literature has investigated in providing relative measurements with robot identities via tagging. Some recent examples using this method for relative localization of a team of aerial robots include  where authors use colored circular markers on the robots to obtain relative bearing between the robots. De silva et al.  develop 3D sensor nodes employing ultrasonic-based range measurement and infrared-based bearing measurement for spatial localization of robots. Dias et al.  utilize active markers to identify unique ID of quadrotors based on pulsating at a predefined frequency. The main disadvantage of these methods is that they do not scale well with the number of robots.
Recent approaches directly deal with unknown robots’ identities and attempt to estimate these identities together with robot localization. Chang et al.  propose a maximum likelihood data association algorithm with a threshold gating on the Mahalanobis distance between the incoming measurement and the expected measurement. The problem with this method is that the selected measurement may not be the correct one due to the inaccuracy of the measurements, leading to filter divergence. In this work, we propose a probabilistic data association framework that can handle the noisy measurements.
The problem of mutual localization with anonymous relative measurements was first considered by Franchi et al. [9, 10, 11]. In , authors introduce a two-phase localization system in which a multiple registration algorithm to build data association hypotheses is followed by a Multi-Hypothesis EKF to prune out hypotheses inconsistent with the current belief. Their successive work  proposes to feed back the belief in system’s spatial configuration to handle the worst case scenario where the computation can be a factorial function of number of agents when the spatial arrangement of the robots is rotational symmetric. In 
, they improve the algorithm further by using particle filters to replace EFK filters. Compared to our work, theirs rely on an assumption that the posterior probability distribution functions of robot states are independent so that each particle filter can be feasibly updated in a separate manner. Their frameworks also suffer from adding computation to maintain and update particle filters, especially when scaling up the number of robots or in case the robot state has a high dimension.
Iii Problem Formulation
Let us consider a problem formulation with a team of homogeneous robots , is known. Beside the the attached moving frame , each robot is attached to a fixed frame as shown in Fig. 1 such that the axis of the frame is on the same direction with the gravity. In the follows, we describe the mutual localization problem whose objective is to localize every to a common frame, which can be any in the set . Without loosing the generalization, we choose to be .
Suppose and , denote the translation and rotation between robot and frame , respectively. Then, localizing robot in frame is equivalent to estimate . We can define a set involving all variables that we aim to estimate. Our problem becomes,
where is the ground truth.
Iv The Stochastic Model
Before representing the proposed approach, we first define discrete models for the system dynamics and observation measurements.
Iv-a The System State Model
Looking from the chosen common frame , we have,
Thus, the odometry measurement comes from the VIO system of robot ,
The detection measurement generating from robot detected by robot can be presented as,
The first two equations in Eq. 2 show that to achieve the rotation and translation of robot in frame , given only local measurements, we need to know the rotation and translation of frame in . Thus, we define the state of robot as
Note that, we substitute by to make it convenient to define the state equation and that these two variables can be derived from each other. The coupled state system can be defined as
We can decompose the rotation into two parts,
corresponds to the rotation around the gravity vector, andcorresponds to the rotation on the plane perpendicular to the gravity vector,
In a VIO system, only rotation along axis is unobservable. Thus, we can assume that is known, leaving only needs to estimate. Furthermore, every frame is defined to be different only in the rotation around axis, denoted as . Thus, and we can rewrite the individual robot state,
We utilize a linear system whose input is the velocity. The velocity input is assumed to be corrupted with i.i.d zero-mean Guassian noise. The robots’ position and yaw can be modelled as follows
Frames have unknown, fixed transformations with respect to frame but there can be drifts due to errors from VIO systems,
We can write the state equation in a standard form
is the covariance matrix of the i.i.d Gaussian noise. We discretize this continuous time system using zero-hold for the input
where is the current time step, , , , is the sampling time.
Iv-B The Measurement Model
During the update process, we update two types of measurements in a decoupled manner.
Iv-B1 Odometry Measurements
Odometry measurements from VIO systems directly provide each robot’s individual state with respect to its own frame. We rewrite Eq. 3 as follows, taking into account a Gaussian noise,
where denotes noises. There is no data association involved in this partial update step.
Iv-B2 Detection Measurements
Detection measurements from the vision-based detection can help estimate the pose of robots detected. We assume that robots can detect all other robots with detection probability and some probability of false positive, false negative. This assumption can be achieved using -cameras and that robots are in the range of detection. Eq. 4
can be rewritten with white noise added, for true measurements,
be random variable representing the number of false positives at time. We assume
to has Poisson distribution,
Iv-B3 Detection Measurement Permutation
To handle the unknown data association as well as the false positives, false negatives, we define the following helper variables, similar to those in . Note that for each robot, we have targets.
, number of measurements at current time on robot
, an indicator that tells whether robot is detected by robot among target measurements
, a -dimensional vector stack of all
, the number of detected robots on robot
, a , a permutation of true measurements among relative measurements at current time
Given measurements on robot , it can be understood that is a possible outcome on which robots are actually detected among targets. There can be such outcomes. Given an outcome , there could be many possible ways to match detection measurements with the detected robots. We do not know for sure since there is no identity information in the detection measurements. Each is a possible match. Thus, by combining and , we can cover all possible association events happened to the detection measurements on robot . We call a tuple a data association hypothesis, or simply a hypothesis. The approach introduced in this work centers around finding all feasible -hypotheses and updating the system state based on the probability of each hypothesis, for every robot .
As can be seen from Eq. 4, each detection measurement depends on the state of multiple robots, making it improper to use the standard JPDA filter  for state estimation. Instead, we develop an extension of CPDA filter  for the nonlinear measurement model to update the system state over time using both detection measurements and odometry measurements. Our approach iteratively treats each robot as a station while others are targets and applies CPDAF on the system. To simplify the notations, in the follows, we represent our extension of CPDAF in case a robot served as the station with targets, detection measurements and -D dimensional states.
V-a CPDAF Extension for Nonlinear Measurement Model
V-A1 CPDAF Step 1 - Prediction
We denote as the state and covariance of the state at time step , respectively. A prior estimate for the state and covariance at time step is obtained by probagating through the system model in Eq. 5,
V-A2 CPDAF Step 2 - Gating
This step aims to reduce the number of possible measurements in the -measurement set that can be assigned to each target robot. In our problem, the detection measurement is nonlinear, due to Eq. 4. Thus the gating for target robot
Where is the innovation covariance matrix computed as same as in UKF  and is the block of this matrix corresponding to target . is a threshold taken from the inverse chi-squared cumulative distribution at a significance level
and the degree of freedom equal to dimension of.
V-A3 CPDAF Step 3 - Evaluation of Hypotheses
As  pointed out, the total number of -hypotheses for a set of measurement on a robot is
This can make evaluating them over time intractable when and are large. However, this assumption is not valid in our case where a detection measurement depends on the state of two robots. To tackle this problem, we propose an efficient evaluating algorithm. This algorithm, inspired by , starts by creating an association hypothesis tree of depth where each level of the tree represents the matching for a target robot. Each level consists of some nodes representing all possible detection measurements that can be assigned to that target, including - an indication that the target is not detected. Thus, a valid hypothesis is a path connecting the root with one single node on every level such that each node exists exactly one time on the path, except . The remaining of this algorithm is to do a depth-first traversal to find all those paths. We omitted the algorithm’s details here due to the space limit. An illustration of the algorithm is shown in Fig. 2(a).
V-A4 CPDAF Step 4 - Measurement-based Update
As mentioned in section IV, the measurement-based update is separated into two update steps. The first update is based on the odometry measurement, as same as in UKF , and the second update is based on the detection measurement.
The later update is based upon the list of valid hypotheses obtained after step 2 and step 3. We extend CPDAF to compute the probability of a hypothesis  in case the measurements are nonlinear,
where is the normalization factor, is the false observation spatial density, is the detection probability of a target robot. can vary among robots.
with is the system state at current time which is dimensional, is the dimensional observation matrix, is the -dimensional vector of stacked detection observations, is a binary matrix with row equal to non-zero row of diag, and is the innovation covariance matrix, computed as same as in UKF .
Based on , we derive the state and covariance udpate as follows.
where is the Kalman gain, is the measurement-state cross-covariance matrix.
V-B Time Complexity Analysis
Step 1 - executing model prediction takes .
Step 2 iterates over target measurements on every robot to prune out measurements that are not in the validation gate. Step 2 has time complexity.
Step 3 essentially carries out a depth-first traversal over all nodes and edges. In the worst case, each level, excepting the root, consists of nodes and since every node in a level is connected to every node in the next level, there will be edges connecting two levels. Thus, the total of nodes and edges, or the worst case time complexity for traversing the tree is .
The running time for the first update in step 4 is as same as in UKF . The second update depends on how many valid hypotheses selected. In the worst case, this number is exponential with respect to the number of robots and measurements, making step 4 exponential in time of computation. In practice, this number is largely reduced thanks to step 2 and step 3. In addition, some technique such as -best hypothesis can be used to make step 4 manageable. -best hypothesis algorithm [24, 21] is . In our case, the complexity of finding -best hypotheses on a tree is
We evaluate the efficiency of the proposed framework and algorithms on a simulation derived from experimentally-obtained measurement models.
Vi-a Simulation Settings
The simulation is in ROS where the robots simulate our FLA Falcon 250 platforms as shown in Fig. 1. In reality, each robot is featured with an Open Vision Computer  consisting two gray-scale Python-1300 cameras to provide odometry measurements as well as detection measurements. Robots are simulated with in diameter and weigh . The measurements that robots receive simulate measurement models that we obtain by doing real experiments on real robot platforms.
Vi-A1 Odometry Measurement Model
Vi-A2 Detection Model
We utilize MAVNet , a light-weight and fast network for vision-based robot detection. The output segmentation is used to estimate the distance from the camera to the target as well as the bearing, assuming the robot’s dimension in 3D world is known.
The bearing measurement, as shown in Fig. 3(a), is modelled as,
The distance measurement as shown in Fig. 3(b), is modelled as,
Vi-B Evaluation Metrics
We use two metrics: 1) in Eq. 1 that evaluates the absolute state - the state of robots within the common fixed frame, , and 2) that evaluates the relative state - the state of robots within the common moving frame, i.e. frame of robot .
where is the set of relative poses, is the ground truth.
Vi-C State Estimation Performance
We evaluate the performance of our algorithm in comparison with a nave method which assumes to know the initial system’s configuration and integrates the VIO measurements over time. In each experiment, a team of homogeneous robots are controlled to form different spatial configurations in a centralized manner. Each robot is controlled to follow a predefined path using its true state. The first experiment results depicted in Fig. 4, show that our state estimation converges very fast and matches the nave method in the absolute state error while performs better in the relative state error. Fig. 5 also shows our method’s superior performance in other three experiments in relative state errors. Absolute state errors are omitted since both methods perform similarly.
Vi-D Formation Control Using Estimated State
We demonstrate an use case of our approach by designing an open-loop controller for formation flight. Our controller takes as input the initial estimate of the system state and determines an optimal path for every robot using minimum snap trajectory generation . Each robot is then controlled to the final state using its current estimated state as the feedback. All steps, from state estimation to control is done on-board.
We launch robots forming a line of and evaluate which measures the convergence of the relative state estimate versus the final relative state.
Fig. 5(d) shows the error converges to illustrating that our controller is able to converge to the desire state when the robot team forms the desirable line.
Vi-E Effectiveness of Gating and Hypothesis Evaluation
Fig. 2(b) shows the computation time of our proposed approach with various number of robots in two different settings: with and without using gating and hypothesis evaluation steps. Since these two steps eliminate unnecessary hypotheses, they keeps the algorithm run much faster. The more number of robots, the higher time saving can be achieved, starting from times with robots to times for robots.
This work introduces CPDAF algorithm to address localization in SWaP-constrainted aerial platforms. We address three main challenges: 1) unknown data association in vision-based relative measurements and robot targets, 2) the need to boostrap the system from an unknown initial configuration, and 3) noisy vision-based measurements with false negatives, false positives. Experiments in simulation based on experimentally-derived models of measurements demonstrate the superior performance of our approach. We show how our state estimation can be used in a simple open-loop controller, extending the capability of using on-board sensing for estimation and control in formation flight. Our future work is to develop a close-loop controller as well as improve the hypothesis evaluation step.
-  (2009) The probabilistic data association filter. IEEE Control Systems Magazine 29 (6), pp. 82–100. Cited by: §V.
-  (2011) Exploration strategies based on multi-criteria decision making for searching environments in rescue operations. Autonomous Robots 31 (4), pp. 401. Cited by: §I.
-  (2000) Probabilistic data association avoiding track coalescence. IEEE Transactions on Automatic Control 45 (2), pp. 247–259. Cited by: Vision-based Multi-MAV Localization with Anonymous Relative Measurements Using Coupled Probabilistic Data Association Filter, §IV-B3, §V-A4, §V-A4, §V.
-  (2011) Vision-based cooperative simultaneous localization and tracking. In 2011 IEEE International Conference on Robotics and Automation, pp. 5191–5197. Cited by: §II.
-  (2010) The jpdaf in practical systems: computation and snake oil. In Signal and Data Processing of Small Targets 2010, Vol. 7698, pp. 769813. Cited by: §V-A3.
-  (2014) An ultrasonic and vision-based relative positioning sensor for multirobot localization. IEEE Sensors Journal 15 (3), pp. 1716–1726. Cited by: §II.
-  (2016) On-board vision-based 3d relative localization system for multiple quadrotors. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 1181–1187. Cited by: §II.
-  (2012) Modeling and control of uav bearing formations with bilateral high-level steering. The International Journal of Robotics Research 31 (12), pp. 1504–1525. Cited by: §I.
-  (2009) Mutual localization in a multi-robot system with anonymous relative position measures. In 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3974–3980. Cited by: §I, §II.
-  (2010) On the solvability of the mutual localization problem with anonymous position measures. In 2010 IEEE International Conference on Robotics and Automation, pp. 3193–3199. Cited by: §II.
-  (2013) Mutual localization in multi-robot systems using anonymous relative measurements. The International Journal of Robotics Research 32 (11), pp. 1302–1322. Cited by: §I, §II.
-  (2017) Ultra-wideband based cooperative relative localization algorithm and experiments for multiple unmanned aerial vehicles in gps denied environments. International Journal of Micro Air Vehicles 9 (3), pp. 169–186. Cited by: §I.
Multi-robot pose graph localization and data association from unknown initial relative poses via expectation maximization. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 593–600. Cited by: §I.
-  (2019) Control in belief space with temporal logic specifications using vision-based localization. The International Journal of Robotics Research 38 (6), pp. 702–722. Cited by: §I.
-  (2013) A decentralized architecture for multi-robot systems based on the null-space-behavioral control with application to multi-robot border patrolling. Journal of Intelligent & Robotic Systems 71 (3-4), pp. 423–444. Cited by: §I.
-  (2005) Vision-based localization of leader-follower formations. In Proceedings of the 44th IEEE Conference on Decision and Control, pp. 635–640. Cited by: §II.
-  (2009) Multi-robot exploration and fire searching. In 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1929–1934. Cited by: §I.
-  (2017) An optimization based approach for relative localization and relative tracking control in multi-robot systems. Journal of Intelligent & Robotic Systems 85 (2), pp. 385–408. Cited by: §II.
-  (2011) Minimum snap trajectory generation and control for quadrotors. In 2011 IEEE International Conference on Robotics and Automation, pp. 2520–2525. Cited by: §VI-D.
-  (2008) Distributed multi-robot task assignment and formation control. In 2008 IEEE International Conference on Robotics and Automation, pp. 128–133. Cited by: §I.
-  (1997) Optimizing murty’s ranked assignment method. IEEE Transactions on Aerospace and Electronic Systems 33 (3), pp. 851–862. Cited by: §V-B.
-  (2018) Fast, autonomous flight in gps-denied and cluttered environments. Journal of Field Robotics 35 (1), pp. 101–120. Cited by: Fig. 1.
-  (2016) Vision-based distributed formation control without an external positioning system. IEEE Transactions on Robotics 32 (2), pp. 339–351. Cited by: §I.
-  (1968) An algorithm for ranking all the assignments in order of increasing cost. Operations Research 16 (3), pp. 682–687. External Links: Cited by: §V-B.
-  (2019-10) MAVNet: an effective semantic segmentation micro-network for mav-based tasks. IEEE Robotics and Automation Letters 4 (4), pp. 3908–3915. External Links: Cited by: §VI-A2.
-  (2018) Unsupervised deep homography: a fast and robust homography estimation model. IEEE Robotics and Automation Letters 3 (3), pp. 2346–2353. Cited by: §I.
-  (2013) Decentralized controllers for perimeter surveillance with teams of aerial robots. Advanced Robotics 27 (9), pp. 697–709. Cited by: §I.
-  (2016) Cooperative multi-robot patrol with bayesian learning. Autonomous Robots 40 (5), pp. 929–953. Cited by: §I.
-  (2018) Self-localization of anonymous mobile robots from aerial images. In 2018 European Control Conference (ECC), pp. 1094–1099. Cited by: §I.
-  (2019) The open vision computer: an integrated sensing and compute system for mobile robots. In 2019 IEEE International Conference on Robotics and Automation (ICRA), Cited by: §VI-A.
-  (2016) Dynamic perimeter surveillance with a team of robots. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 5289–5294. Cited by: §I.
-  (2016) Active decentralized scale estimation for bearing-based localization. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5084–5091. Cited by: §II.
-  (2016) A distributed optimization framework for localization and formation control: applications to vision-based measurements. IEEE Control Systems Magazine 36 (4), pp. 22–44. Cited by: §II.
-  (2012) Decentralized formation control with variable shapes for aerial robots. In 2012 IEEE international conference on robotics and automation, pp. 23–30. Cited by: §I.
-  (2016) Vision based collaborative localization for multirotor vehicles. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1653–1658. Cited by: §I.
-  (2000) The unscented kalman filter for nonlinear estimation. In Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No. 00EX373), pp. 153–158. Cited by: §V-A2, §V-A4, §V-A4, §V-B.
-  (2019) A robust localization system for multi-robot formations based on an extension of a gaussian mixture probability hypothesis density filter. Autonomous Robots, pp. 1–20. Cited by: §I.
-  (1993) Multitarget tracking in clutter: fast algorithms for data association. IEEE Transactions on aerospace and electronic systems 29 (2), pp. 352–363. Cited by: §V-A3.