Active Perception based Formation Control for Multiple Aerial Vehicles

by   Rahul Tallamraju, et al.
IIIT Hyderabad

Autonomous motion capture (mocap) systems for outdoor scenarios involving flying or mobile cameras rely on i) a robotic front-end to track and follow a human subject in real-time while he/she performs physical activities, and ii) an algorithmic back-end that estimates full body human pose and shape from the saved videos. In this paper we present a novel front-end for our aerial mocap system that consists of multiple micro aerial vehicles (MAVs) with only on-board cameras and computation. In previous work, we presented an approach for cooperative detection and tracking (CDT) of a subject using multiple MAVs. However, it did not ensure optimal view-point configurations of the MAVs to minimize the uncertainty in the person's cooperatively tracked 3D position estimate. In this article we introduce an active approach for CDT. In contrast to cooperatively tracking only the 3D positions of the person, the MAVs can now actively compute optimal local motion plans, resulting in optimal view-point configurations, which minimize the uncertainty in the tracked estimate. We achieve this by decoupling the goal of active tracking as a convex quadratic objective and non-convex constraints corresponding to angular configurations of the MAVs w.r.t. the person. We derive it using Gaussian observation model assumptions within the CDT algorithm. We also show how we embed all the non-convex constraints, including those for dynamic and static obstacle avoidance, as external control inputs in the MPC dynamics. Multiple real robot experiments and comparisons involving 3 MAVs in several challenging scenarios are presented (video link : Extensive simulation results demonstrate the scalability and robustness of our approach. ROS-based source code is also provided.



There are no comments yet.


page 1

page 4

page 8


Deep Neural Network-based Cooperative Visual Tracking through Multiple Micro Aerial Vehicles

Multi-camera full-body pose capture of humans and animals in outdoor env...

AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape Estimation

In this letter, we present a novel markerless 3D human motion capture (M...

AirCapRL: Autonomous Aerial Human Motion Capture using Deep Reinforcement Learning

In this letter, we introduce a deep reinforcement learning (RL) based mu...

Decentralized MPC based Obstacle Avoidance for Multi-Robot Target Tracking Scenarios

In this work, we consider the problem of decentralized multi-robot targe...

Leveraging Stereo-Camera Data for Real-Time Dynamic Obstacle Detection and Tracking

Dynamic obstacle avoidance is one crucial component for compliant naviga...

FlyCap: Markerless Motion Capture Using Multiple Autonomous Flying Cameras

Aiming at automatic, convenient and non-instrusive motion capture, this ...

Robust Active Visual Perching with Quadrotors on Inclined Surfaces

Autonomous Micro Aerial Vehicles are deployed for a variety tasks includ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Aerial motion capture (mocap) of humans in unstructured and outdoor scenarios is a challenging research problem that has recently attracted widespread attention [1, 2, 3, 4, 5]. It facilitates human pose and shape analysis in sports medicine, high-accuracy cinematography, rescue operations in disaster scenarios and several such applications that require estimation of human 3D position or full body skeleton pose and shape in a natural environment. Our aerial mocap system consists of a team of micro aerial vehicles (MAVs), autonomously detecting, tracking and following a person. The online task is that the MAVs cooperatively and continuously estimate the 3D global position of the person and always keep him/her centered in the field of view of their on-board camera while he/she performs activities, such as, walking, running, jumping, etc., in an unstructured, outdoor environment. The online task implies only on-board and real-time computation which is the core focus of this article. The offline task, which is not addressed in this paper, is to estimate full body skeleton pose and shape from the on-board acquired and saved image stream.

Fig. 1: Multi-exposure image of three of our octocopter MAVs cooperatively tracking a fast walking person while maintaining a perception-driven formation using the approach proposed in this paper. The person walks from right to left. The initial positions of the MAVs in their trajectories are highlighted.

In our previous work [1]

, we developed a marker-less deep neural network based cooperative detection and tracking (CDT) algorithm for our aerial mocap system. However, that work has two major shortcomings. First, the motion planner on each MAV, which computes the 3D way-points for the MAV to follow the person, does not consider the uncertainty in the tracked person’s 3D position estimate. This implies that the MAV formations are non-optimal view-point configurations for minimizing jointly perceived target uncertainty. Second, the the local motion plans generated by the MAVs using a model-predictive controller (MPC) are subsequently modified using potential fields for collision avoidance. This further leads to sub-optimal trajectories.

In this work, we solve both of the aforementioned problems through a novel convex decentralized formation controller based on MPC. It actively minimizes the joint uncertainty in the tracked person’s 3D position estimate while following him/her and generates safe motion plans which avoid dynamic and static obstacles. We achieve this by decoupling the joint uncertainty minimization into i) a convex quadratic objective that maintains a threshold distance to the tracked person, and ii) linear constraints that enforce angular configurations of the MAVs with respect to (w.r.t.) the person. We theoretically derive this decoupling based on Gaussian observation model assumptions within the CDT algorithm. Dynamic collision avoidance w.r.t. other MAVs, the tracked person and static obstacles is achieved also using linear constraints in the MPC. Both dynamic collision avoidance and angular configuration constraints are essentially non-convex. We preserve convexity in our MPC formulation by converting both of these constraints to external control input terms for the MPC dynamics, in addition to the default MPC control input. To determine the equivalent external control input, we use a repulsive potential field function to compute the corresponding force for each non-convex constraint, at every time-step, over the whole planning horizon, and then feed it directly to the MPC dynamics. A feasible solution to the optimization guarantees obstacle avoidance and enables active minimization of jointly estimated uncertainty in the person’s tracked state. We also provide an analysis of our collision avoidance guarantees that takes into account the real-time motion plans of neighboring agents, real-time message communication delays between agents and the uncertainties in the poses of the MAVs and the position of the tracked person.

Thus, the following are the key novelties in this article.

  • A convex MPC-based formation controller that actively minimizes the jointly estimated uncertainty in the tracked person’s 3D position, when he/she performs activities in an unstructured and outdoor environment,

  • guaranteed dynamic collision avoidance within a convex optimization framework,

  • open-source ROS-based [6] source code of our method.

We demonstrate the efficiency of our approach both in simulation and real robot experiments. Real robot experiments involving 3 MAVs in several challenging scenarios are presented, along with the comparisons to the state-of-the-art approaches. Figure 1 presents a multi-exposure image of a part of one of these experiments. Extensive simulation results demonstrate the robustness of our approach and scalability w.r.t. to the number of robots.

Ii State-of-the-Art

In this section, we review state-of-the art literature in the context of multi-robot active tracking. In [7], multi-robot trajectories are analytically derived and uncertainty bounds are identified for optimizing the fused estimate of target position. In [8], a non-convex optimization problem is solved analytically, to minimize the trace of the fused target covariance. However, their approach is centralized and uses Gauss-Seidel relaxation to compute optimal solutions for multiple robots. Centralized non-convex optimization [9, 10] is used to track a target in stochastic environments and locally minimize the fused estimate of the target position. In [11], active perception based formation control is addressed using a decentralized non-linear model predictive controller. However, their method is validated only for ground robots and only identifies sub-optimal control inputs owing to the non-convexity of optimization and the lack of collision avoidance guarantees. In [12], a perception aware model-predictive controller generates real-time motion plans which maximize the visibility of a desired target. The motion plans are, however, generated only for a single aerial robot. A marker-based multi-robot aerial motion capture system is presented in [5], where formation control is achieved using a decentralized non-linear model-predictive controller. Scalability () in the number of robots and formation collision avoidance behaviors are not explicitly addressed in their approach. Our previous work [1]

, uses convolutional neural networks for person detection and performs cooperative person detection using distributed extended Kalman filters. However, the person tracking formation controller used (in

[1]) does not optimize motion plans to minimize the uncertainty in the joint estimate of target position. Furthermore, inter-agent and target collision avoidance is not guaranteed which leads to sub-optimal and unsafe formation local motion plans.
Formation Configuration: In order to identify formation configurations which optimize the target observation, we refer to the analytically derived sensor geometries in [13]. These geometries provide a useful basis to leverage numerical optimization techniques for decentralized motion planning. Optimization ensures that the emergent behavior of the decentralized system of robots minimizes the uncertainty in fused target position estimate.
Obstacle Avoidance: In our previous work [14], we developed a convex optimization program to generate local collision-free motion plans while tracking a movable pick and place static target using multiple aerial vehicles. This approach generates fast, feasible motion plans and has a linear computational complexity in the number of environmental obstacles. The work was validated only in a simulation environment and moreover obstacle avoidance was not guaranteed. In our current work, we consider real robots, stochasticity in the environment and bounds on repulsive potential fields to guarantee collision avoidance and generate safe local motion plans.

Iii Proposed Approach

In this section we present our proposed approach for active perception based formation control. Sec. III-A introduces the system setup and the mathematical notations used in the paper. Sec. III-B details the modules involved in the cooperative detection and tracking algorithm used for this work. Sec. III-C derives the person observation model for each MAV and identifies multi-robot configurations which jointly minimize uncertainty in target person’s state estimation. Sec. III-D uses these models and formation constraints from Sec. III-C to formulate an MPC for decentralized motion planning.

Iii-a Preliminaries and Problem Statement

Fig. 2: Formation of MAVs with the mathematical notations used in this paper.

Let there be K MAVs tracking a person P. The pose (position and orientation) of the MAV in the world frame at time is given by . The MAVs pose uncertainty covariance matrix is denoted as . Each MAV has an on-board, monocular, perspective camera. It is important to note that the camera is rigidly attached to its body frame. The observation measurement of the person ’s centroid made by MAV in its local spherical coordinates is given by a mean range , bearing and azimuth (see Fig. 2). We assume zero correlation for measurements along these axes. The measurements are associated with a zero mean measurement noise, denoted by , and respectively. In local Cartesian coordinates this is denoted as and respectively. Each MAV’s fused estimate of the person’s centroid position and uncertainty covariance in the world frame are denoted as and . These estimates are computed by fusing the person observation measurements from MAV , () and the received measurements from teammates j, ().

The MAVs operate in an environment with known static obstacles 111Our approach is agnostic to how the static obstacles are detected. and neighboring MAVs as dynamic obstacles. The goal of each MAV is to cooperatively track the target person using a replicated instance of the proposed formation control algorithm. This involves (a) minimizing the MAV’s fused estimate of the measurement uncertainty covariance and, (b) avoiding environmental obstacles based on the positions of M static obstacles and the local horizon motion plans received from its teammates.

Iii-B Cooperative Detection and Tracking

Fig. 3 details the various modules used in our approach. The MAVs detect a moving person using a deep neural network based markerless CDT algorithm, which was developed and presented in our previous work [1]. As part of CDT, each vehicle runs its own instance of an extended Kalman filter (EKF) for fusing detections made by a MAV and all its teammates, which results in a replicated joint state estimate of the person at each MAV. This estimate is then used to predict a region of interest (ROI) in future camera frames. The ROI guides the yaw of each MAV thereby ensuring that the target person is in the MAV’s field of view. In Section III-D, we use these joint state estimates to drive our decentralized model-predictive formation controller. In [1], the focus was on developing cooperative person detection and ROI prediction algorithm and the contribution was in the green blocks. The work in this paper focuses on developing the module in blue, which utilizes information from the other modules. At every time step, the controller of MAV generates way-point positions and velocities for itself using (a) the jointly estimated position and velocity of the tracked person, (b) predicted horizon motion plans communicated by teammates and (c) the positions of obstacles. The generated way-points guide the low-level position and yaw controller of the MAV k.

Fig. 3: CDT Architecture. Our current work focuses on the module in blue
Fig. 4: (a) Optimal angular configuration for MAVs. (b) First and second figures represent arbitrary configurations with large joint uncertainties, while the third is another optimal configuration but does not allow complete coverage. (c) Variation of the trace of fused covariance for different angular configurations.

Iii-C Observation Measurement Models and Joint Uncertainty Minimization

In this subsection, we first derive a person observation model for a single MAV in its local frame cartesian coordinates. Subsequently, we state assumptions for determining the MAV observation model for our case scenario. Finally, for multiple MAVs, we identify formation objectives and constraints which facilitate the minimization of .

Iii-C1 Observation Measurement Model for a MAV

To derive the person observation model for a single MAV we drop the superscripts and subscripts for the spherical and Cartesian coordinate measurements introduced in the schematic of Fig. 2

. The variance in the range measurement evolves quadratically with distance to the person

, . This is because the projected size of the person in the image plane varies quadratically with distance to the person [15]. Secondly, the change in the person’s projected size on the image plane is negligible with independent variations in and (). Hence, we can assume a constant variance w.r.t. these angles, i.e., and . are positive constants specific to the system. We now derive the MAV observation model in the local Cartesian coordinates as follows. (1) (5) state equations which are useful for deriving the person observation model.


where, random variables

and are uncorrelated. (2

) lists expected values for functions of Normally distributed random variables



where, , are constants. Using (1) we derive the following.


Using the equations (1)(5) we derive the person uncertainty covariance model in cartesian coordinates as follows.



Iii-C2 Assumptions in Our Case Scenario

We make the following assumptions on the MAV observation model.
Bearing: We employ a separate yaw controller [1] to guide the orientation of MAVs towards the target person. Due to this, we assume that the measured value is almost always approximately zero .
Azimuth: We assume that the azimuth angle remains close to as observed in Fig. 2. The rationale for this assumption is that, in our approach, the MAVs always maintain a constant altitude above the person. The only change that affects the radial distance () to the person is the horizontal distance to him/her from the MAVs. Although the change in is not negligible with horizontal distance, the change in with is however small enough to assume . Therefore the expected values of the measurements along the range, bearing and azimuth are respectively assumed to be . Using the aforementioned assumptions for our case scenario, the measurement noise for a single MAV is as follows.


Iii-C3 Minimizing Fused Uncertainty for MAV Formations

The minimum uncertainty in the person’s fused state estimate is achieved when the following conditions are met.

  1. The angles between the MAVs about the person’s estimated position is .
    Given measurements from observations of different robots (or sensors), [13] analytically derives optimal sensor geometries for unbiased and efficient passive target localization. Geometries which maximize the determinant of Fisher information matrix are identified. For K() independent measurements, the angle between sensors resulting in minimum fused uncertainty is either or . We validate this analysis by computing the merged covariance, , of individual MAV measurement uncertainties, using the recursive covariance merging equations from [16]. The angular configurations which minimize trace of are highlighted in yellow boxes of Fig. 3(c).

    The angular configurations which minimize the merged covariance are as given in Fig. 3(a) (this also corresponds to the red-encircled box in Fig. 3(c)). Fig. 3(b) illustrates some other configurations. Out of these, Fig. 3(b) 1 and 2 are examples of non-optimal configurations. Fig. 3(b) 3 is an example of several other minimum covariance configurations. However, in this work the angular configuration of Fig. 3(a) is used. This is because it allows the tracked person to be viewed uniformly from different directions. Also, it reduces the possibility of losing the target person to environmental occlusions. Moreover, as our system is motivated by the application of outdoor motion capture, observing an asymmetric target (at least in one axis) from different directions increases visual coverage.

  2. The measurement uncertainty for each individual MAV is minimized.
    The previous angular constraint condition minimizes the fused uncertainty for any given set of measurements from the MAVs. However, the fused covariance remains a function of each individual MAV’s measurement uncertainty, . is a function of the relative position of MAV w.r.t. the tracked person. Notice that each MAV’s position remains controllable without changing the angular configuration of the whole formation. Therefore, the fused uncertainty is completely minimized by minimizing the trace of for each MAV . How this is done is discussed in the following subsection.

Iii-D Decentralized Quadratic Model Predictive Control

The trace of (III-C2) is as shown below.


Minimizing (9) ensures that each MAV improves its person observation measurement uncertainty. However, would lead the MAVs to collide with the target person and other MAVs. In order to ensure safety of the target person, we limit the MAV to reach a desired circular safety surface of radius , which is at the desired height from the target person. Rewriting (9) in terms of its position and a desired position chosen on the safety surface at we obtain the following equations.


The desired position is chosen along the MAVs current orientation to the target person. We define the objective of our MPC running on each MAV k as follows. Minimize,

  • the distance to the desired target surface (safety surface) around the tracked person,

  • the difference in velocity between the MAV and the fused velocity estimate of the target person and,

  • MAV control effort.

Consequently, the optimization objective for active tracking using MPC is as given below.


where . The first part of this objective minimizes the input control signal and the second part ensures that the distance between desired terminal state (position and velocity) and the final horizon robot state is minimized. In order to ensure continuous and smooth person tracking, we minimize the difference in velocity of the MAV and the fused velocity estimate of the person available from the EKF. is experimentally determined. are custom-defined positive definite weight matrices. Furthermore, the active tracking MPC is subjected to the following constraints.

  • A non-convex constraint of maintaining a desired angle difference of w.r.t other MAVs about fused estimate of the person’s position ,

  • A non-convex constraint to maintain at least a distance from other obstacles.

  • Regional and control saturation bounds for MAV states and control inputs for a horizon.

The non-convex constraints are embedded into the MPC dynamics as repulsive potential field forces [14]. The computation of these forces is discussed in Sec. III-E. The convex MPC is thus defined by the following equations.


The 3D translation motion of MAV is governed partly using accelerations and partly by an external control input , where is the horizon time step. The state-space translational dynamics of the robot is given by (13). Dynamics () and control transfer () matrices are given by,


where, is the sampling time. is a real-time computed external control input representing non-convex constraints of formation configuration and obstacle avoidance. The following section details the computation of these forces.

Iii-E Computation of External Control Input ()

A constraint for enforcing angular configurations w.r.t other MAVs involves a computation of the angle of each neighboring MAV w.r.t the target person using an inverse tangent function. This is a non-convex operation in . Moreover, avoiding collisions by maintaining a minimum distance w.r.t (a) other MAVs, (b) obstacles in the environment or (c) the tracked person, constraint the robot to operate in non-convex regions. In our previous work [14], we introduce the concept of converting non-convex constraints into repulsive potential field force values . These exact values are computed at every time step and fed into the MPC dynamics to enforce the non-convex dependencies and ensure quick convergence to an optimal solution. In this work, we utilize these potential field values to enforce formation angular configuration and collision avoidance. The repulsive potential field function used in this work is referred to as cotangential repulsive force function. The cotangential repulsive force acting on MAV due to a non-convex constraint with respect to an entity (teammate, obstacle or target person) is as given below.


where, and is a chosen distance metric (absolute angular difference or euclidean distance) between MAV and entity . (17) varies hyperbolically with distance .The potential field introduces two threshold radii and . defines the region of influence of the potential field. is the distance below which the value of the potential field tends to infinity. Practically this is clamped to a large positive value which is derived to guarantee obstacle avoidance (see Sec. III-E3).

Iii-E1 Active Tracking Input

To satisfy the inter-MAV angular configuration constraints, we compute (i) the angle of robot w.r.t. the tracked person222We overload the notation to now represent the angle of the MAV in the world frame w.r.t till the end of this paper. , (ii) the angle of neighboring MAV w.r.t. the the tracked person , (iii) the distance metric for (17) as absolute angular difference at each horizon time step . We then apply a cotangential force along the normal direction (in the plane of MAV motion) to the MAVs approach towards the target person. is computed using MAV ’s horizon motion plan and the teammate ’s communicated horizon motion plan. The total active tracking force per teammate is computed as follows.


The factor of in (18) helps avoid field local minima. This is because, if two robots have similar angles of approach, the MAV farther away from the desired distance is repelled with a higher force than the MAV near desired distance 333Refer [14] for a detailed explanation on how these forces avoid field local-minima problems associated with potential field based planners. is a small positive constant which ensures that the force magnitude is non-zero at the target surface, if the desired angular difference is not yet achieved. The force acts in a direction which is normal to the direction of approach to the target person . There are two possible direction choices, namely, in the plane of approach towards the target person. The direction pointing away from neighboring MAV is chosen to ensure a natural deadlock resolution for robots having similar angles of approach to the target person (see horizon motion plans in Fig. 5). The total active tracking external control input of MAV is then computed as follows.


Notice that the non-convex constraint requiring the inequality of two angles and maintaining a desired inter-MAV angular difference is converted into an external control input computed at every iteration of MPC (used in (13)) which preserves the convexity of the optimization. For MAVs, the value of for the active tracking force.

Fig. 5: Schematic of an obstacle avoidance maneuver by MAVs using the proposed MPC and repulsive potential fields in one iteration of algorithm 1

Iii-E2 Obstacle Avoidance Input

We use (17) to compute external control inputs for obstacle avoidance, the formulation of which is discussed below.
Dynamic Obstacle Avoidance: The teammates of a MAV are dynamic obstacles for the MAV. The euclidean distance ,between MAV and is computed using MAV ’s horizon motion plan and teammate ’s communicated horizon motion plan. The external control input acts along the direction

which is a unit vector pointing in the direction away from the teammate MAV

’s horizon motion plan. The total dynamic obstacle avoidance external input is as follows.


The choice for and for dynamic obstacle avoidance are discussed in the following section III-E3.
Static Obstacle Avoidance: We consider static obstacles in the environmet. To compute the external input w.r.t an obstacle , the euclidean distance between MAV ’s horizon motion plan and the location of the static obstacle is computed. The total static obstacle avoidance force is as given below.


Further, multiple static obstacles placed closed together might cause the MAV to get stuck in a field local minima (as shown in [14]). To avoid these scenarios, we compute a force which penalizes MAV approach angles w.r.t target person which have static obstacles in the line approach. The computation of this force is similar to the active tracking force described in the previous section.
Target Person Obstacle Avoidance: The robots must also maintain a safety distance () w.r.t. target person. The objective of the model-predictive controller guides the MAVs to this safety distance but does not guarantee the safety of the person. To guarantee collision avoidance we generate a repulsive potential field around the target person. We treat the person as another obstacle, with an external input computation similar to (21). The MAVs are now repelled away from the tracked person by a force of . The total external control input for obstacle avoidance is a summation of all the aforementioned obstacle avoiding potential field forces as given below.


Subsequently, the total external control input acting on MAV is determined.


The following subsection details considerations for choosing , and to guarantee obstacle avoidance in stochasic environments.

Iii-E3 Obstacle Avoidance Guarantee

Fig. 5, details a typical obstacle avoidance maneuver of two MAVs running the model-predictive control. To ensure stable repulsive force and guarantee obstacle avoidance, we take into account the following important considerations.

  • maximum tracking error () of low-level controller w.r.t generated way-point for one time step ,

  • maximum velocity of the robot,

  • self-localization uncertainty of the MAV ,

  • localization uncertainty of neighboring MAVs ,

  • communication or packet loss with neighboring MAVs.

  • maximum cotangential force magnitude

To guarantee obstacle avoidance the and values of the potential field are defined as follows,


where, is the empirically determined maximum tracking error in time along the direction of the way-point , is the maximum translational motion of the robot in any direction. is the length of maximum eigen value of , is an experimentally determined maximum possible localization uncertainty () for a neighboring MAV . Wireless communication delays are unavoidable in real-time implementations and cannot be ignored as they affect the MAVs estimate of neighbor positions. The value of increases in proportion to the communication delay between two MAVs . For a delay of seconds, increases by a distance meters which is the maximum a neighboring MAV can travel in seconds. Finally the maximum value of the cotangential force should be at least equal to or greater than the maximum control input . Therefore, the vector sum has a magnitude greater than or equal to zero and acts along a direction which is away from the obstacle.

Iii-F Algorithm for Active Perception based Formation Control

1:   Compute Current Angle to Person
2:   Compute Destination Position
3:   Active Input
4:   Obstacle Input
6:   MPC
7:   to Low-level Controller
Algorithm 1 Active Tracking Convex MPC for MAV with inputs

Algorithm 1 summarizes all the steps involved in our proposed decentralized formation controller. In line 1, the current angle w.r.t. fused target person’s position estimate is computed. In line 2, desired robot position is computed using , fused estimate of the person’s position from EKF and the desired horizontal distance to target person i.e. . Since MPC state vector also includes its current velocity, we minimize the difference between robot velocity and EKF estimated person velocity . In line 3 and 4, an external control input is computed for a horizon to enforce non-convex formation constraints and ensure obstacle avoidance. In line 5, the external input is bound to guarantee obstacle avoidance (discussed in Sec. III-E3). In line 6, the convex model-predictive optimization (12) is numerically evaluated. The output motion plan consists of a set of acceleration control inputs and a horizon trajectory . In line 7, the predicted position and velocity for the horizon are used as desired inputs to the low-level flight controller.

Iv Experiments and Results

Iv-a Real Robot Experiments

Fig. 6: Real world experiments: Multi-exposure images of small sequences of the five different real robot experiments (RE 1 to RE 5). (a) Showcases the formation controller based on our previous work [1]

. It can be observed that the MAVs are extremely close to the target person and also never uniformly distributed about the target person’s position. (b) Shows the MAV formation control based on our previous work

[14]. Notice that the MAVs and are quite close to each other and the resulting formation is non-optimal for uncertainty minimization. In (c), (d) and (e) the approach presented in the current paper is used. These show footage of active perception based formation control experiments RE 3 to RE 5 respectively. Notice that the MAVs nearly maintain an angular configuration with a difference of w.r.t each other. Moreover the MAVs are always at a safe distance and height from the person and are uniformly distributed about the person’s position. This configuration ensures minimization of fused uncertainty of the person’s position. In (d) and (e) Virtual obstacles are at positions , and in the showcased grass field of dimensions meter meter. In (d) MAV ’s trajectory can be seen avoiding a virtual obstacle. In (e) MAV ’s trajectory can be seen avoiding a virtual obstacle. The experimental footage can be best viewed in the video link:

Iv-A1 Setup

We use three self-developed and customized octocopter MAVs in the field experiments. Each MAV is capable of running the CDT algorithm (Fig. III-B) on-board and in real-time, including the deep neural network. The MAVs are equipped with an Intel i7 CPU, an NVidia Jetson TX1 GPU and Open Pilot Revolution flight controller system on chip. The MAVs use on-board GPS and IMU data, logged at 100Hz, for localization. We place Emlid Reach differential GPS receiver on each MAV and on either shoulder of the tracked person to acquire ground truth location. The data from the differential GPS is not used during real-time experiments and stored for comparison only. For a decentralized implementation, a ROS multi-master setup with communication over Wifi is used. Each MAV continuously captures and stores camera images acquired at . The on-board GPU runs SSD-Multibox [17]. The model-predictive controller is evaluated at Hz using the CVXGEN convex optimization library. The used horizon length time steps or seconds. The state and velocity limits of each MAV is in and in . The control limits are defined as . The distance to the target in all experiments is set to , the desired height above the joint target estimate’s centroid. We observe that on average the tracking error w.r.t to the MPC generated way-point is around , therefore using (24), . w.r.t neighboring MAVs varies with change in self localization uncertainty and communication losses. The task for each MAV in all experiments is to track the person using the CDT pipeline while using the proposed decentralized active perception based target tracking algorithm. We conducted real-MAV tracking experiments, each with a MAV formation as follows.

  • RE 1: Approach of Price et al. [1] (duration 257s)

  • RE 2: DQMPC (Tallamraju et al.) [14] (231s)

  • RE 3: Our approach (269s)

  • RE 4: Our approach with one MAV avoiding emulated virtual obstacles (278s)

  • RE 5: Our approach with all MAVs avoiding emulated virtual obstacles (106s)

In each experiment, the person remains stationary at first. Data collection starts after all MAV’s have acquired the visual line of sight to the person and converged on a stable formation around him/her. The person then walks randomly at moderate speeds. Data collection for each experiment is stopped once any of the MAV indicates low battery status. The MAV’s are then manually landed.

After the experiments, the error in the tracked position estimate is calculated as the 3D Euclidean distance between the estimated and corresponding ground truth provided by the differential GPS system. Error for MAV pose is calculated similarly.

Iv-A2 Analysis of results

Fig. 8 compares ground truth and tracked trajectory with estimated uncertainty during tracking experiment using our proposed approach in Exp. RE 3. We analyzed person tracking accuracy, uncertainty, and MAV self pose accuracy for all experiments in Fig. 7(a-b). Exp. RE 3 achieve significantly more accurate person track estimates (mean error of m) than using state of the art method in Exp. RE 1 (mean error of m), despite the worse self pose estimates of the MAVs in Exp. RE 3 (the self-localization errors were due to the high errors in the MAV’s GPS localization and may vary arbitrarily over different experiments). This showcases the ability of our approach to attain high tracking accuracy despite bad GPS reception.

Even though the state of the art approach in Exp. RE 2 achieves similar accuracy to that of our approach in Exp. RE 3, we see that our approach outperforms the former in keeping the person always in the camera image and close to the image center as shown by bar and box plots in Fig. 7(c-d). Moreover, our approach achieves least tracking uncertainty in the tracked person estimate as compared to all the other state of the art methods.

The presence of obstacles that the MAVs need to navigate around affects the target tracking and self pose accuracy in Exps. RE 4 and RE 5. This is mainly due to the additional maneuvering overhead to avoid obstacles. However, the ability to keep the person close to the center of the image is only slightly affected.

Fig. 7: Comparison of different methods in real world experiments. Tracked person accuracy, MAV self pose accuracy, person tracking uncertainty, percentage of complete person coverage in camera frames and distribution of distance of tracked person from image center is analysed.
Fig. 8: Real world experiment RE 3: Tracked person’s estimated (blue) and ground truth (green) trajectory comparison of our active perception approach (without simulated stationary obstacles). Ground truth remains within the 3 error bound of tracked estimate almost throughout the experiment duration except for a few initial seconds.

Fig 7(c-d) compares the different approaches ability to keep the tracked person centered in each MAV’s camera view. Our approach in Exp. RE 3 not only reduces the average distance between the tracked person and each camera’s image center but also ensures the person is completely covered by the camera image in of camera frames. This is a crucial property if the formation is used for aerial scanning of motion tracking.

Iv-B Simulation Experiments

Iv-B1 Setup

The proposed algorithm was simulated in the Gazebo simulation environment [18]. Identical algorithm codebase is used for experimentation on both real and simulated MAVs. The simulations were conducted on a standalone Intel i7-3970X CPU with an NVIDIA GTX 1080 GPU. The environment consists of multiple AscTec Hexarotor Firefly MAVs confined to operate in a world with dimensions . Each MAV is equipped with a rigidly attached Asus Xtion camera with their parameters set to the real MAV camera parameters (resolution and focal length). To be as close as possible to real scenarios, we simulate GPS and IMU drift by imposing random-walk offset on the ground truth position. We simulate a human actor model and guide it along a pre-defined trajectory with random motions. The actor traverses on a randomly varying terrain interspersed with emulated trees. The trees create obstacles and occlusions for the hovering MAVs and make it challenging to track the person effectively.

Iv-B2 Comparison of methods

Fig. 9: Comparison of different methods in simulation experiments.

We conducted 10 simulation runs for the following methods with a MAV formation.

  • SE 1: Approach of Price et al. [1]

  • SE 2: DQMPC (Tallamraju et al.) [14]

  • SE 3: Our approach

  • SE 4: Our approach with all MAVs avoiding emulated virtual obstacles.

Each run was 120 seconds long.

Figure 9 shows the statistics of this set of experiments. We clearly observe that our proposed approach with or without static obstacle avoidance outperforms both the state of the art approaches. Moreover, when static obstacles are not present, our approach is able to keep the person fully in the camera image frame almost of the experiment duration. Also, it significantly outperforms all other methods in minimizing the joint uncertainty in the target position estimate.

Iv-B3 Scalability Experiments

Fig. 10: Scalability w.r.t. number of robots and robustness of our method under communication loss.

Fig. 10(a) shows the result of target tracking with our method for an increasing number of robots. We conduct 10 simulation trials for each number of robot configuration. Real-time factor for all runs in this experiment was set to . We validate the scalability of our model predictive controller by running these experiments in an environment with several static and known obstacles (in the form of trees) and perfect communication for a formation of 3 to 16 MAVs. We observe nearly linear improvement in tracking error with a higher number of robots. At the same time, we notice that computational requirements did not affect real-time performance for up to 16 robots.

Iv-B4 Experiments with Communication Failure

In Fig. 10(b), the effect of inter-robot communication failure on tracking is demonstrated for our approach through experiments with 8 MAVs and simulated static obstacles. Communication losses varying from to is simulated. The results were averaged over trials per communication loss percentage. It can be observed that the tracking gets progressively worse with a higher percentage of communication loss. At communication loss, each robot relies only on its detection and does not cooperatively improve the target position estimate. Nevertheless, we observe that our approach is able to maintain an accurate target state estimate for up to 25% communication loss.

V Conclusions and Future Work

In this article we introduced a novel robotic front-end for an aerial mocap system consisting of multiple MAVs. We proposed a decentralized convex MPC-based algorithm for the MAVs to actively track and follow a moving person in outdoor environments and in presence of static and dynamic obstacles. In contrast to cooperatively tracking only the 3D positions of the person, the MAVs now actively compute optimal local motion plans, resulting in optimal view-point configurations, which minimize the uncertainty in the tracked estimate. We achieved this by decoupling the goal of active tracking as a convex quadratic objective and non-convex constraints corresponding to angular configurations of the MAVs w.r.t. the person. We showed how this is derived using Gaussian observation model assumptions within the cooperative detection and tracking module. We also showed how we embed all the non-convex constraints, including those for dynamic and static obstacle avoidance, as external control inputs in the MPC dynamics. We evaluated our approach through rigorous and multiple real robot field experiments in an outdoor scenario. It involved 3 self-built 8-rotor MAVs. These experiments validated our approach and showed that it significantly improved accuracy over the previous methods. In simulations we showed results with up to 16 MAVs. These demonstrated scalability of our method as well as its robustness to communication failures. Future work includes extending our method to full body human pose detection and offline reconstruction of human pose and shape.


The authors would like to thank Igor Martinovic and Mason Landry for their help with the field experiments.


  • [1] E. Price, G. Lawless, R. Ludwig, I. Martinovic, H. H. Bülthoff, M. J. Black, and A. Ahmad, “Deep neural network-based cooperative visual tracking through multiple micro aerial vehicles,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 3193–3200, Oct 2018.
  • [2] C. Huang, F. Gao, J. Pan, Z. Yang, W. Qiu, P. Chen, X. Yang, S. Shen, and K.-T. T. Cheng, “Act: An autonomous drone cinematography system for action scenes,” in IEEE International Conference on Robotics and Automation (ICRA), 2018.
  • [3] X. Zhou, S. Liu, G. Pavlakos, V. Kumar, and K. Daniilidis, “Human motion capture using a drone,” arXiv preprint arXiv:1804.06112, 2018.
  • [4] L. Xu, Y. Liu, W. Cheng, K. Guo, G. Zhou, Q. Dai, and L. Fang, “Flycap: Markerless motion capture using multiple autonomous flying cameras,” IEEE transactions on visualization and computer graphics, vol. 24, no. 8, 2018.
  • [5] T. Nägeli, S. Oberholzer, S. Plüss, J. Alonso-Mora, and O. Hilliges, “Flycon: real-time environment-independent multi-view human pose estimation with aerial vehicles,” in SIGGRAPH Asia 2018 Technical Papers.   ACM, 2018, p. 182.
  • [6] “Aircap: Perception-based control,”, accessed: 2019.
  • [7] F. Morbidi and G. L. Mariottini, “On active target tracking and cooperative localization for multiple aerial vehicles,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2011.   IEEE, 2011, pp. 2229–2234.
  • [8] K. Zhou, S. I. Roumeliotis et al., “Multirobot active target tracking with combinations of relative observations,” IEEE Transactions on Robotics, vol. 27, no. 4, pp. 678–695, 2011.
  • [9] K. Hausman, J. Müller, A. Hariharan, N. Ayanian, and G. S. Sukhatme, “Cooperative multi-robot control for target tracking with onboard sensing,” The International Journal of Robotics Research, vol. 34, no. 13, pp. 1660–1677, 2015.
  • [10] K. Hausman, G. Kahn, S. Patil, J. Müller, K. Goldberg, P. Abbeel, and G. S. Sukhatme, “Occlusion-aware multi-robot 3d tracking,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016.   IEEE, 2016, pp. 1863–1870.
  • [11] P. U. Lima, A. Ahmad, A. Dias, A. G. Conceição, A. P. Moreira, E. Silva, L. Almeida, L. Oliveira, and T. P. Nascimento, “Formation control driven by cooperative object tracking,” Robotics and Autonomous Systems, vol. 63, pp. 68–79, 2015.
  • [12] D. Falanga, P. Foehn, P. Lu, and D. Scaramuzza, “Pampc: Perception-aware model predictive control for quadrotors,” arXiv preprint arXiv:1804.04811, 2018.
  • [13] A. N. Bishop, B. Fidan, B. D. Anderson, K. Doğançay, and P. N. Pathirana, “Optimality analysis of sensor-target localization geometries,” Automatica, vol. 46, no. 3, pp. 479–492, 2010.
  • [14] R. Tallamraju, S. Rajappa, M. J. Black, K. Karlapalem, and A. Ahmad, “Decentralized mpc based obstacle avoidance for multi-robot target tracking scenarios,” in Proceedings of IEEE International Symposium on Safety, Security and Rescue Robotics (SSRR), 2018.
  • [15] S. J. Prince, Computer vision: models, learning, and inference.   Cambridge University Press, 2012.
  • [16] R. C. Smith and P. Cheeseman, “On the representation and estimation of spatial uncertainty,” The international journal of Robotics Research, vol. 5, no. 4, pp. 56–68, 1986.
  • [17] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in European Conference on Computer Vision.   Springer, 2016, pp. 21–37.
  • [18] F. Furrer, M. Burri, M. Achtelik, and R. Siegwart, Robot Operating System (ROS): The Complete Reference (Volume 1).   Cham: Springer International Publishing, 2016, ch. RotorS—A Modular Gazebo MAV Simulator Framework, pp. 595–625. [Online]. Available: