I Introduction
Aerial motion capture (mocap) of humans in unstructured and outdoor scenarios is a challenging research problem that has recently attracted widespread attention [1, 2, 3, 4, 5]. It facilitates human pose and shape analysis in sports medicine, highaccuracy cinematography, rescue operations in disaster scenarios and several such applications that require estimation of human 3D position or full body skeleton pose and shape in a natural environment. Our aerial mocap system consists of a team of micro aerial vehicles (MAVs), autonomously detecting, tracking and following a person. The online task is that the MAVs cooperatively and continuously estimate the 3D global position of the person and always keep him/her centered in the field of view of their onboard camera while he/she performs activities, such as, walking, running, jumping, etc., in an unstructured, outdoor environment. The online task implies only onboard and realtime computation which is the core focus of this article. The offline task, which is not addressed in this paper, is to estimate full body skeleton pose and shape from the onboard acquired and saved image stream.
In our previous work [1]
, we developed a markerless deep neural network based cooperative detection and tracking (CDT) algorithm for our aerial mocap system. However, that work has two major shortcomings. First, the motion planner on each MAV, which computes the 3D waypoints for the MAV to follow the person, does not consider the uncertainty in the tracked person’s 3D position estimate. This implies that the MAV formations are nonoptimal viewpoint configurations for minimizing jointly perceived target uncertainty. Second, the the local motion plans generated by the MAVs using a modelpredictive controller (MPC) are subsequently modified using potential fields for collision avoidance. This further leads to suboptimal trajectories.
In this work, we solve both of the aforementioned problems through a novel convex decentralized formation controller based on MPC. It actively minimizes the joint uncertainty in the tracked person’s 3D position estimate while following him/her and generates safe motion plans which avoid dynamic and static obstacles. We achieve this by decoupling the joint uncertainty minimization into i) a convex quadratic objective that maintains a threshold distance to the tracked person, and ii) linear constraints that enforce angular configurations of the MAVs with respect to (w.r.t.) the person. We theoretically derive this decoupling based on Gaussian observation model assumptions within the CDT algorithm. Dynamic collision avoidance w.r.t. other MAVs, the tracked person and static obstacles is achieved also using linear constraints in the MPC. Both dynamic collision avoidance and angular configuration constraints are essentially nonconvex. We preserve convexity in our MPC formulation by converting both of these constraints to external control input terms for the MPC dynamics, in addition to the default MPC control input. To determine the equivalent external control input, we use a repulsive potential field function to compute the corresponding force for each nonconvex constraint, at every timestep, over the whole planning horizon, and then feed it directly to the MPC dynamics. A feasible solution to the optimization guarantees obstacle avoidance and enables active minimization of jointly estimated uncertainty in the person’s tracked state. We also provide an analysis of our collision avoidance guarantees that takes into account the realtime motion plans of neighboring agents, realtime message communication delays between agents and the uncertainties in the poses of the MAVs and the position of the tracked person.
Thus, the following are the key novelties in this article.

A convex MPCbased formation controller that actively minimizes the jointly estimated uncertainty in the tracked person’s 3D position, when he/she performs activities in an unstructured and outdoor environment,

guaranteed dynamic collision avoidance within a convex optimization framework,

opensource ROSbased [6] source code of our method.
We demonstrate the efficiency of our approach both in simulation and real robot experiments. Real robot experiments involving 3 MAVs in several challenging scenarios are presented, along with the comparisons to the stateoftheart approaches. Figure 1 presents a multiexposure image of a part of one of these experiments. Extensive simulation results demonstrate the robustness of our approach and scalability w.r.t. to the number of robots.
Ii StateoftheArt
In this section, we review stateofthe art literature in the context of multirobot active tracking. In [7], multirobot trajectories are analytically derived and uncertainty bounds are identified for optimizing the fused estimate of target position. In [8], a nonconvex optimization problem is solved analytically, to minimize the trace of the fused target covariance. However, their approach is centralized and uses GaussSeidel relaxation to compute optimal solutions for multiple robots. Centralized nonconvex optimization [9, 10] is used to track a target in stochastic environments and locally minimize the fused estimate of the target position. In [11], active perception based formation control is addressed using a decentralized nonlinear model predictive controller. However, their method is validated only for ground robots and only identifies suboptimal control inputs owing to the nonconvexity of optimization and the lack of collision avoidance guarantees. In [12], a perception aware modelpredictive controller generates realtime motion plans which maximize the visibility of a desired target. The motion plans are, however, generated only for a single aerial robot. A markerbased multirobot aerial motion capture system is presented in [5], where formation control is achieved using a decentralized nonlinear modelpredictive controller. Scalability () in the number of robots and formation collision avoidance behaviors are not explicitly addressed in their approach. Our previous work [1]
, uses convolutional neural networks for person detection and performs cooperative person detection using distributed extended Kalman filters. However, the person tracking formation controller used (in
[1]) does not optimize motion plans to minimize the uncertainty in the joint estimate of target position. Furthermore, interagent and target collision avoidance is not guaranteed which leads to suboptimal and unsafe formation local motion plans.Formation Configuration: In order to identify formation configurations which optimize the target observation, we refer to the analytically derived sensor geometries in [13]. These geometries provide a useful basis to leverage numerical optimization techniques for decentralized motion planning. Optimization ensures that the emergent behavior of the decentralized system of robots minimizes the uncertainty in fused target position estimate.
Obstacle Avoidance: In our previous work [14], we developed a convex optimization program to generate local collisionfree motion plans while tracking a movable pick and place static target using multiple aerial vehicles. This approach generates fast, feasible motion plans and has a linear computational complexity in the number of environmental obstacles. The work was validated only in a simulation environment and moreover obstacle avoidance was not guaranteed. In our current work, we consider real robots, stochasticity in the environment and bounds on repulsive potential fields to guarantee collision avoidance and generate safe local motion plans.
Iii Proposed Approach
In this section we present our proposed approach for active perception based formation control. Sec. IIIA introduces the system setup and the mathematical notations used in the paper. Sec. IIIB details the modules involved in the cooperative detection and tracking algorithm used for this work. Sec. IIIC derives the person observation model for each MAV and identifies multirobot configurations which jointly minimize uncertainty in target person’s state estimation. Sec. IIID uses these models and formation constraints from Sec. IIIC to formulate an MPC for decentralized motion planning.
Iiia Preliminaries and Problem Statement
Let there be K MAVs tracking a person P. The pose (position and orientation) of the MAV in the world frame at time is given by . The MAVs pose uncertainty covariance matrix is denoted as . Each MAV has an onboard, monocular, perspective camera. It is important to note that the camera is rigidly attached to its body frame. The observation measurement of the person ’s centroid made by MAV in its local spherical coordinates is given by a mean range , bearing and azimuth (see Fig. 2). We assume zero correlation for measurements along these axes. The measurements are associated with a zero mean measurement noise, denoted by , and respectively. In local Cartesian coordinates this is denoted as and respectively. Each MAV’s fused estimate of the person’s centroid position and uncertainty covariance in the world frame are denoted as and . These estimates are computed by fusing the person observation measurements from MAV , () and the received measurements from teammates j, ().
The MAVs operate in an environment with known static obstacles ^{1}^{1}1Our approach is agnostic to how the static obstacles are detected. and neighboring MAVs as dynamic obstacles. The goal of each MAV is to cooperatively track the target person using a replicated instance of the proposed formation control algorithm. This involves (a) minimizing the MAV’s fused estimate of the measurement uncertainty covariance and, (b) avoiding environmental obstacles based on the positions of M static obstacles and the local horizon motion plans received from its teammates.
IiiB Cooperative Detection and Tracking
Fig. 3 details the various modules used in our approach. The MAVs detect a moving person using a deep neural network based markerless CDT algorithm, which was developed and presented in our previous work [1]. As part of CDT, each vehicle runs its own instance of an extended Kalman filter (EKF) for fusing detections made by a MAV and all its teammates, which results in a replicated joint state estimate of the person at each MAV. This estimate is then used to predict a region of interest (ROI) in future camera frames. The ROI guides the yaw of each MAV thereby ensuring that the target person is in the MAV’s field of view. In Section IIID, we use these joint state estimates to drive our decentralized modelpredictive formation controller. In [1], the focus was on developing cooperative person detection and ROI prediction algorithm and the contribution was in the green blocks. The work in this paper focuses on developing the module in blue, which utilizes information from the other modules. At every time step, the controller of MAV generates waypoint positions and velocities for itself using (a) the jointly estimated position and velocity of the tracked person, (b) predicted horizon motion plans communicated by teammates and (c) the positions of obstacles. The generated waypoints guide the lowlevel position and yaw controller of the MAV k.
IiiC Observation Measurement Models and Joint Uncertainty Minimization
In this subsection, we first derive a person observation model for a single MAV in its local frame cartesian coordinates. Subsequently, we state assumptions for determining the MAV observation model for our case scenario. Finally, for multiple MAVs, we identify formation objectives and constraints which facilitate the minimization of .
IiiC1 Observation Measurement Model for a MAV
To derive the person observation model for a single MAV we drop the superscripts and subscripts for the spherical and Cartesian coordinate measurements introduced in the schematic of Fig. 2
. The variance in the range measurement evolves quadratically with distance to the person
, . This is because the projected size of the person in the image plane varies quadratically with distance to the person [15]. Secondly, the change in the person’s projected size on the image plane is negligible with independent variations in and (). Hence, we can assume a constant variance w.r.t. these angles, i.e., and . are positive constants specific to the system. We now derive the MAV observation model in the local Cartesian coordinates as follows. (1) (5) state equations which are useful for deriving the person observation model.(1) 
where, random variables
and are uncorrelated. (2) lists expected values for functions of Normally distributed random variables
.(2)  
where, , are constants. Using (1) we derive the following.
(3)  
(4)  
(5) 
Using the equations (1)(5) we derive the person uncertainty covariance model in cartesian coordinates as follows.
(6)  
(7) 
where,
IiiC2 Assumptions in Our Case Scenario
We make the following assumptions on the MAV observation model.
Bearing: We employ a separate yaw controller [1] to guide the orientation of MAVs towards the target person. Due to this, we assume that the measured value is almost always approximately zero .
Azimuth: We assume that the azimuth angle remains close to as observed in Fig. 2. The rationale for this assumption is that, in our approach, the MAVs always maintain a constant altitude above the person. The only change that affects the radial distance () to the person is the horizontal distance to him/her from the MAVs. Although the change in is not negligible with horizontal distance, the change in with is however small enough to assume . Therefore the expected values of the measurements along the range, bearing and azimuth are respectively assumed to be . Using the aforementioned assumptions for our case scenario, the measurement noise for a single MAV is as follows.
(8) 
IiiC3 Minimizing Fused Uncertainty for MAV Formations
The minimum uncertainty in the person’s fused state estimate is achieved when the following conditions are met.

The angles between the MAVs about the person’s estimated position is .
Given measurements from observations of different robots (or sensors), [13] analytically derives optimal sensor geometries for unbiased and efficient passive target localization. Geometries which maximize the determinant of Fisher information matrix are identified. For K() independent measurements, the angle between sensors resulting in minimum fused uncertainty is either or . We validate this analysis by computing the merged covariance, , of individual MAV measurement uncertainties, using the recursive covariance merging equations from [16]. The angular configurations which minimize trace of are highlighted in yellow boxes of Fig. 3(c).The angular configurations which minimize the merged covariance are as given in Fig. 3(a) (this also corresponds to the redencircled box in Fig. 3(c)). Fig. 3(b) illustrates some other configurations. Out of these, Fig. 3(b) 1 and 2 are examples of nonoptimal configurations. Fig. 3(b) 3 is an example of several other minimum covariance configurations. However, in this work the angular configuration of Fig. 3(a) is used. This is because it allows the tracked person to be viewed uniformly from different directions. Also, it reduces the possibility of losing the target person to environmental occlusions. Moreover, as our system is motivated by the application of outdoor motion capture, observing an asymmetric target (at least in one axis) from different directions increases visual coverage.

The measurement uncertainty for each individual MAV is minimized.
The previous angular constraint condition minimizes the fused uncertainty for any given set of measurements from the MAVs. However, the fused covariance remains a function of each individual MAV’s measurement uncertainty, . is a function of the relative position of MAV w.r.t. the tracked person. Notice that each MAV’s position remains controllable without changing the angular configuration of the whole formation. Therefore, the fused uncertainty is completely minimized by minimizing the trace of for each MAV . How this is done is discussed in the following subsection.
IiiD Decentralized Quadratic Model Predictive Control
The trace of (IIIC2) is as shown below.
(9) 
Minimizing (9) ensures that each MAV improves its person observation measurement uncertainty. However, would lead the MAVs to collide with the target person and other MAVs. In order to ensure safety of the target person, we limit the MAV to reach a desired circular safety surface of radius , which is at the desired height from the target person. Rewriting (9) in terms of its position and a desired position chosen on the safety surface at we obtain the following equations.
(10)  
The desired position is chosen along the MAVs current orientation to the target person. We define the objective of our MPC running on each MAV k as follows. Minimize,

the distance to the desired target surface (safety surface) around the tracked person,

the difference in velocity between the MAV and the fused velocity estimate of the target person and,

MAV control effort.
Consequently, the optimization objective for active tracking using MPC is as given below.
(11) 
where . The first part of this objective minimizes the input control signal and the second part ensures that the distance between desired terminal state (position and velocity) and the final horizon robot state is minimized. In order to ensure continuous and smooth person tracking, we minimize the difference in velocity of the MAV and the fused velocity estimate of the person available from the EKF. is experimentally determined. are customdefined positive definite weight matrices. Furthermore, the active tracking MPC is subjected to the following constraints.

A nonconvex constraint of maintaining a desired angle difference of w.r.t other MAVs about fused estimate of the person’s position ,

A nonconvex constraint to maintain at least a distance from other obstacles.

Regional and control saturation bounds for MAV states and control inputs for a horizon.
The nonconvex constraints are embedded into the MPC dynamics as repulsive potential field forces [14]. The computation of these forces is discussed in Sec. IIIE. The convex MPC is thus defined by the following equations.
(12) 
s.t  
(13)  
(14)  
(15) 
The 3D translation motion of MAV is governed partly using accelerations and partly by an external control input , where is the horizon time step. The statespace translational dynamics of the robot is given by (13). Dynamics () and control transfer () matrices are given by,
(16) 
where, is the sampling time. is a realtime computed external control input representing nonconvex constraints of formation configuration and obstacle avoidance. The following section details the computation of these forces.
IiiE Computation of External Control Input ()
A constraint for enforcing angular configurations w.r.t other MAVs involves a computation of the angle of each neighboring MAV w.r.t the target person using an inverse tangent function. This is a nonconvex operation in . Moreover, avoiding collisions by maintaining a minimum distance w.r.t (a) other MAVs, (b) obstacles in the environment or (c) the tracked person, constraint the robot to operate in nonconvex regions. In our previous work [14], we introduce the concept of converting nonconvex constraints into repulsive potential field force values . These exact values are computed at every time step and fed into the MPC dynamics to enforce the nonconvex dependencies and ensure quick convergence to an optimal solution. In this work, we utilize these potential field values to enforce formation angular configuration and collision avoidance. The repulsive potential field function used in this work is referred to as cotangential repulsive force function. The cotangential repulsive force acting on MAV due to a nonconvex constraint with respect to an entity (teammate, obstacle or target person) is as given below.
(17) 
where, and is a chosen distance metric (absolute angular difference or euclidean distance) between MAV and entity . (17) varies hyperbolically with distance .The potential field introduces two threshold radii and . defines the region of influence of the potential field. is the distance below which the value of the potential field tends to infinity. Practically this is clamped to a large positive value which is derived to guarantee obstacle avoidance (see Sec. IIIE3).
IiiE1 Active Tracking Input
To satisfy the interMAV angular configuration constraints, we compute (i) the angle of robot w.r.t. the tracked person^{2}^{2}2We overload the notation to now represent the angle of the MAV in the world frame w.r.t till the end of this paper. , (ii) the angle of neighboring MAV w.r.t. the the tracked person , (iii) the distance metric for (17) as absolute angular difference at each horizon time step . We then apply a cotangential force along the normal direction (in the plane of MAV motion) to the MAVs approach towards the target person. is computed using MAV ’s horizon motion plan and the teammate ’s communicated horizon motion plan. The total active tracking force per teammate is computed as follows.
(18) 
The factor of in (18) helps avoid field local minima. This is because, if two robots have similar angles of approach, the MAV farther away from the desired distance is repelled with a higher force than the MAV near desired distance ^{3}^{3}3Refer [14] for a detailed explanation on how these forces avoid field localminima problems associated with potential field based planners. is a small positive constant which ensures that the force magnitude is nonzero at the target surface, if the desired angular difference is not yet achieved. The force acts in a direction which is normal to the direction of approach to the target person . There are two possible direction choices, namely, in the plane of approach towards the target person. The direction pointing away from neighboring MAV is chosen to ensure a natural deadlock resolution for robots having similar angles of approach to the target person (see horizon motion plans in Fig. 5). The total active tracking external control input of MAV is then computed as follows.
(19) 
Notice that the nonconvex constraint requiring the inequality of two angles and maintaining a desired interMAV angular difference is converted into an external control input computed at every iteration of MPC (used in (13)) which preserves the convexity of the optimization. For MAVs, the value of for the active tracking force.
IiiE2 Obstacle Avoidance Input
We use (17) to compute external control inputs for obstacle avoidance, the formulation of which is discussed below.
Dynamic Obstacle Avoidance:
The teammates of a MAV are dynamic obstacles for the MAV. The euclidean distance ,between MAV and is computed using MAV ’s horizon motion plan and teammate ’s communicated horizon motion plan. The external control input acts along the direction
which is a unit vector pointing in the direction away from the teammate MAV
’s horizon motion plan. The total dynamic obstacle avoidance external input is as follows.(20) 
The choice for and for dynamic obstacle avoidance are discussed in the following section IIIE3.
Static Obstacle Avoidance:
We consider static obstacles in the environmet. To compute the external input w.r.t an obstacle , the euclidean distance between MAV ’s horizon motion plan and the location of the static obstacle is computed. The total static obstacle avoidance force is as given below.
(21) 
Further, multiple static obstacles placed closed together might cause the MAV to get stuck in a field local minima (as shown in [14]). To avoid these scenarios, we compute a force which penalizes MAV approach angles w.r.t target person which have static obstacles in the line approach. The computation of this force is similar to the active tracking force described in the previous section.
Target Person Obstacle Avoidance:
The robots must also maintain a safety distance () w.r.t. target person. The objective of the modelpredictive controller guides the MAVs to this safety distance but does not guarantee the safety of the person. To guarantee collision avoidance we generate a repulsive potential field around the target person. We treat the person as another obstacle, with an external input computation similar to (21). The MAVs are now repelled away from the tracked person by a force of .
The total external control input for obstacle avoidance is a summation of all the aforementioned obstacle avoiding potential field forces as given below.
(22) 
Subsequently, the total external control input acting on MAV is determined.
(23) 
The following subsection details considerations for choosing , and to guarantee obstacle avoidance in stochasic environments.
IiiE3 Obstacle Avoidance Guarantee
Fig. 5, details a typical obstacle avoidance maneuver of two MAVs running the modelpredictive control. To ensure stable repulsive force and guarantee obstacle avoidance, we take into account the following important considerations.

maximum tracking error () of lowlevel controller w.r.t generated waypoint for one time step ,

maximum velocity of the robot,

selflocalization uncertainty of the MAV ,

localization uncertainty of neighboring MAVs ,

communication or packet loss with neighboring MAVs.

maximum cotangential force magnitude
To guarantee obstacle avoidance the and values of the potential field are defined as follows,
(24)  
(25) 
where, is the empirically determined maximum tracking error in time along the direction of the waypoint , is the maximum translational motion of the robot in any direction. is the length of maximum eigen value of , is an experimentally determined maximum possible localization uncertainty () for a neighboring MAV . Wireless communication delays are unavoidable in realtime implementations and cannot be ignored as they affect the MAVs estimate of neighbor positions. The value of increases in proportion to the communication delay between two MAVs . For a delay of seconds, increases by a distance meters which is the maximum a neighboring MAV can travel in seconds. Finally the maximum value of the cotangential force should be at least equal to or greater than the maximum control input . Therefore, the vector sum has a magnitude greater than or equal to zero and acts along a direction which is away from the obstacle.
IiiF Algorithm for Active Perception based Formation Control
Algorithm 1 summarizes all the steps involved in our proposed decentralized formation controller. In line 1, the current angle w.r.t. fused target person’s position estimate is computed. In line 2, desired robot position is computed using , fused estimate of the person’s position from EKF and the desired horizontal distance to target person i.e. . Since MPC state vector also includes its current velocity, we minimize the difference between robot velocity and EKF estimated person velocity . In line 3 and 4, an external control input is computed for a horizon to enforce nonconvex formation constraints and ensure obstacle avoidance. In line 5, the external input is bound to guarantee obstacle avoidance (discussed in Sec. IIIE3). In line 6, the convex modelpredictive optimization (12) is numerically evaluated. The output motion plan consists of a set of acceleration control inputs and a horizon trajectory . In line 7, the predicted position and velocity for the horizon are used as desired inputs to the lowlevel flight controller.
Iv Experiments and Results
Iva Real Robot Experiments
IvA1 Setup
We use three selfdeveloped and customized octocopter MAVs in the field experiments. Each MAV is capable of running the CDT algorithm (Fig. IIIB) onboard and in realtime, including the deep neural network. The MAVs are equipped with an Intel i7 CPU, an NVidia Jetson TX1 GPU and Open Pilot Revolution flight controller system on chip. The MAVs use onboard GPS and IMU data, logged at 100Hz, for localization. We place Emlid Reach differential GPS receiver on each MAV and on either shoulder of the tracked person to acquire ground truth location. The data from the differential GPS is not used during realtime experiments and stored for comparison only. For a decentralized implementation, a ROS multimaster setup with communication over Wifi is used. Each MAV continuously captures and stores camera images acquired at . The onboard GPU runs SSDMultibox [17]. The modelpredictive controller is evaluated at Hz using the CVXGEN convex optimization library. The used horizon length time steps or seconds. The state and velocity limits of each MAV is in and in . The control limits are defined as . The distance to the target in all experiments is set to , the desired height above the joint target estimate’s centroid. We observe that on average the tracking error w.r.t to the MPC generated waypoint is around , therefore using (24), . w.r.t neighboring MAVs varies with change in self localization uncertainty and communication losses. The task for each MAV in all experiments is to track the person using the CDT pipeline while using the proposed decentralized active perception based target tracking algorithm. We conducted realMAV tracking experiments, each with a MAV formation as follows.
In each experiment, the person remains stationary at first. Data collection starts after all MAV’s have acquired the visual line of sight to the person and converged on a stable formation around him/her. The person then walks randomly at moderate speeds. Data collection for each experiment is stopped once any of the MAV indicates low battery status. The MAV’s are then manually landed.
After the experiments, the error in the tracked position estimate is calculated as the 3D Euclidean distance between the estimated and corresponding ground truth provided by the differential GPS system. Error for MAV pose is calculated similarly.
IvA2 Analysis of results
Fig. 8 compares ground truth and tracked trajectory with estimated uncertainty during tracking experiment using our proposed approach in Exp. RE 3. We analyzed person tracking accuracy, uncertainty, and MAV self pose accuracy for all experiments in Fig. 7(ab). Exp. RE 3 achieve significantly more accurate person track estimates (mean error of m) than using state of the art method in Exp. RE 1 (mean error of m), despite the worse self pose estimates of the MAVs in Exp. RE 3 (the selflocalization errors were due to the high errors in the MAV’s GPS localization and may vary arbitrarily over different experiments). This showcases the ability of our approach to attain high tracking accuracy despite bad GPS reception.
Even though the state of the art approach in Exp. RE 2 achieves similar accuracy to that of our approach in Exp. RE 3, we see that our approach outperforms the former in keeping the person always in the camera image and close to the image center as shown by bar and box plots in Fig. 7(cd). Moreover, our approach achieves least tracking uncertainty in the tracked person estimate as compared to all the other state of the art methods.
The presence of obstacles that the MAVs need to navigate around affects the target tracking and self pose accuracy in Exps. RE 4 and RE 5. This is mainly due to the additional maneuvering overhead to avoid obstacles. However, the ability to keep the person close to the center of the image is only slightly affected.
Fig 7(cd) compares the different approaches ability to keep the tracked person centered in each MAV’s camera view. Our approach in Exp. RE 3 not only reduces the average distance between the tracked person and each camera’s image center but also ensures the person is completely covered by the camera image in of camera frames. This is a crucial property if the formation is used for aerial scanning of motion tracking.
IvB Simulation Experiments
IvB1 Setup
The proposed algorithm was simulated in the Gazebo simulation environment [18]. Identical algorithm codebase is used for experimentation on both real and simulated MAVs. The simulations were conducted on a standalone Intel i73970X CPU with an NVIDIA GTX 1080 GPU. The environment consists of multiple AscTec Hexarotor Firefly MAVs confined to operate in a world with dimensions . Each MAV is equipped with a rigidly attached Asus Xtion camera with their parameters set to the real MAV camera parameters (resolution and focal length). To be as close as possible to real scenarios, we simulate GPS and IMU drift by imposing randomwalk offset on the ground truth position. We simulate a human actor model and guide it along a predefined trajectory with random motions. The actor traverses on a randomly varying terrain interspersed with emulated trees. The trees create obstacles and occlusions for the hovering MAVs and make it challenging to track the person effectively.
IvB2 Comparison of methods
We conducted 10 simulation runs for the following methods with a MAV formation.

SE 1: Approach of Price et al. [1]

SE 2: DQMPC (Tallamraju et al.) [14]

SE 3: Our approach

SE 4: Our approach with all MAVs avoiding emulated virtual obstacles.
Each run was 120 seconds long.
Figure 9 shows the statistics of this set of experiments. We clearly observe that our proposed approach with or without static obstacle avoidance outperforms both the state of the art approaches. Moreover, when static obstacles are not present, our approach is able to keep the person fully in the camera image frame almost of the experiment duration. Also, it significantly outperforms all other methods in minimizing the joint uncertainty in the target position estimate.
IvB3 Scalability Experiments
Fig. 10(a) shows the result of target tracking with our method for an increasing number of robots. We conduct 10 simulation trials for each number of robot configuration. Realtime factor for all runs in this experiment was set to . We validate the scalability of our model predictive controller by running these experiments in an environment with several static and known obstacles (in the form of trees) and perfect communication for a formation of 3 to 16 MAVs. We observe nearly linear improvement in tracking error with a higher number of robots. At the same time, we notice that computational requirements did not affect realtime performance for up to 16 robots.
IvB4 Experiments with Communication Failure
In Fig. 10(b), the effect of interrobot communication failure on tracking is demonstrated for our approach through experiments with 8 MAVs and simulated static obstacles. Communication losses varying from to is simulated. The results were averaged over trials per communication loss percentage. It can be observed that the tracking gets progressively worse with a higher percentage of communication loss. At communication loss, each robot relies only on its detection and does not cooperatively improve the target position estimate. Nevertheless, we observe that our approach is able to maintain an accurate target state estimate for up to 25% communication loss.
V Conclusions and Future Work
In this article we introduced a novel robotic frontend for an aerial mocap system consisting of multiple MAVs. We proposed a decentralized convex MPCbased algorithm for the MAVs to actively track and follow a moving person in outdoor environments and in presence of static and dynamic obstacles. In contrast to cooperatively tracking only the 3D positions of the person, the MAVs now actively compute optimal local motion plans, resulting in optimal viewpoint configurations, which minimize the uncertainty in the tracked estimate. We achieved this by decoupling the goal of active tracking as a convex quadratic objective and nonconvex constraints corresponding to angular configurations of the MAVs w.r.t. the person. We showed how this is derived using Gaussian observation model assumptions within the cooperative detection and tracking module. We also showed how we embed all the nonconvex constraints, including those for dynamic and static obstacle avoidance, as external control inputs in the MPC dynamics. We evaluated our approach through rigorous and multiple real robot field experiments in an outdoor scenario. It involved 3 selfbuilt 8rotor MAVs. These experiments validated our approach and showed that it significantly improved accuracy over the previous methods. In simulations we showed results with up to 16 MAVs. These demonstrated scalability of our method as well as its robustness to communication failures. Future work includes extending our method to full body human pose detection and offline reconstruction of human pose and shape.
Acknowledgments
The authors would like to thank Igor Martinovic and Mason Landry for their help with the field experiments.
References
 [1] E. Price, G. Lawless, R. Ludwig, I. Martinovic, H. H. Bülthoff, M. J. Black, and A. Ahmad, “Deep neural networkbased cooperative visual tracking through multiple micro aerial vehicles,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 3193–3200, Oct 2018.
 [2] C. Huang, F. Gao, J. Pan, Z. Yang, W. Qiu, P. Chen, X. Yang, S. Shen, and K.T. T. Cheng, “Act: An autonomous drone cinematography system for action scenes,” in IEEE International Conference on Robotics and Automation (ICRA), 2018.
 [3] X. Zhou, S. Liu, G. Pavlakos, V. Kumar, and K. Daniilidis, “Human motion capture using a drone,” arXiv preprint arXiv:1804.06112, 2018.
 [4] L. Xu, Y. Liu, W. Cheng, K. Guo, G. Zhou, Q. Dai, and L. Fang, “Flycap: Markerless motion capture using multiple autonomous flying cameras,” IEEE transactions on visualization and computer graphics, vol. 24, no. 8, 2018.
 [5] T. Nägeli, S. Oberholzer, S. Plüss, J. AlonsoMora, and O. Hilliges, “Flycon: realtime environmentindependent multiview human pose estimation with aerial vehicles,” in SIGGRAPH Asia 2018 Technical Papers. ACM, 2018, p. 182.
 [6] “Aircap: Perceptionbased control,” http://ps.is.tuebingen.mpg.de/research_fields/robotperceptiongroup, accessed: 2019.
 [7] F. Morbidi and G. L. Mariottini, “On active target tracking and cooperative localization for multiple aerial vehicles,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2011. IEEE, 2011, pp. 2229–2234.
 [8] K. Zhou, S. I. Roumeliotis et al., “Multirobot active target tracking with combinations of relative observations,” IEEE Transactions on Robotics, vol. 27, no. 4, pp. 678–695, 2011.
 [9] K. Hausman, J. Müller, A. Hariharan, N. Ayanian, and G. S. Sukhatme, “Cooperative multirobot control for target tracking with onboard sensing,” The International Journal of Robotics Research, vol. 34, no. 13, pp. 1660–1677, 2015.
 [10] K. Hausman, G. Kahn, S. Patil, J. Müller, K. Goldberg, P. Abbeel, and G. S. Sukhatme, “Occlusionaware multirobot 3d tracking,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016. IEEE, 2016, pp. 1863–1870.
 [11] P. U. Lima, A. Ahmad, A. Dias, A. G. Conceição, A. P. Moreira, E. Silva, L. Almeida, L. Oliveira, and T. P. Nascimento, “Formation control driven by cooperative object tracking,” Robotics and Autonomous Systems, vol. 63, pp. 68–79, 2015.
 [12] D. Falanga, P. Foehn, P. Lu, and D. Scaramuzza, “Pampc: Perceptionaware model predictive control for quadrotors,” arXiv preprint arXiv:1804.04811, 2018.
 [13] A. N. Bishop, B. Fidan, B. D. Anderson, K. Doğançay, and P. N. Pathirana, “Optimality analysis of sensortarget localization geometries,” Automatica, vol. 46, no. 3, pp. 479–492, 2010.
 [14] R. Tallamraju, S. Rajappa, M. J. Black, K. Karlapalem, and A. Ahmad, “Decentralized mpc based obstacle avoidance for multirobot target tracking scenarios,” in Proceedings of IEEE International Symposium on Safety, Security and Rescue Robotics (SSRR), 2018.
 [15] S. J. Prince, Computer vision: models, learning, and inference. Cambridge University Press, 2012.
 [16] R. C. Smith and P. Cheeseman, “On the representation and estimation of spatial uncertainty,” The international journal of Robotics Research, vol. 5, no. 4, pp. 56–68, 1986.
 [17] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in European Conference on Computer Vision. Springer, 2016, pp. 21–37.
 [18] F. Furrer, M. Burri, M. Achtelik, and R. Siegwart, Robot Operating System (ROS): The Complete Reference (Volume 1). Cham: Springer International Publishing, 2016, ch. RotorS—A Modular Gazebo MAV Simulator Framework, pp. 595–625. [Online]. Available: http://dx.doi.org/10.1007/9783319260549_23
Comments
There are no comments yet.