Distributed Multi-Target Tracking for Autonomous Vehicle Fleets

04/13/2020 ∙ by Ola Shorinwa, et al. ∙ Stanford University 0

We present a scalable distributed target tracking algorithm based on the alternating direction method of multipliers that is well-suited for a fleet of autonomous cars communicating over a vehicle-to-vehicle network. Each sensing vehicle communicates with its neighbors to execute iterations of a Kalman filter-like update such that each agent's estimate approximates the centralized maximum a posteriori estimate without requiring the communication of measurements. We show that our method outperforms the Consensus Kalman Filter in recovering the centralized estimate given a fixed communication bandwidth. We also demonstrate the algorithm in a high fidelity urban driving simulator (CARLA), in which 50 autonomous cars connected on a time-varying communication network track the positions and velocities of 50 target vehicles using on-board cameras.



There are no comments yet.


page 1

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

A key challenge in integrating autonomous vehicles into the transportation infrastructure is ensuring their safe operation in the presence of potential hazards, such as human-operated vehicles and pedestrians. However, tracking the paths of these safety-critical targets using on-board sensors is difficult in urban environments due to the presence of occlusions. Collaborative estimation among networked autonomous vehicles has the potential to alleviate the limitations of each vehicle’s individual perception capabilities. Networked fleets of autonomous vehicles operating in urban environments can collectively improve the safety of their planning and decision-making by collaboratively tracking the trajectories of nearby vehicles in real-time.

Constraints on communication and computation impose fundamental challenges on collaborative tracking. Given limited communication bandwidth, information communicated between vehicles must be succinct and actionable. Communication channels must also be free to form and dissolve responsively given the highly dynamic nature of urban traffic. Relying on centralized computation is neither robust to single points of failure, nor communication-efficient in disseminating information to those vehicles to whom it is relevant. Rather, a fully-distributed scheme that exploits the computational and communication resources of an autonomous fleet is crucial to reliable tracking.

Fig. 1: Autonomous vehicles (in green) track the trajectory of target vehicles (in blue and red) with images from on-board cameras at a four-way intersection using our algorithm.

In this paper, we consider the problem of distributed target tracking in a fleet of vehicles collaborating over a dynamic communication network, posed as a Maximum A Posteriori (MAP) optimization problem. Our key contribution is a scalable Distributed Rolling Window Tracking (DRWT) algorithm derived from the Alternating Direction Method of Multipliers (ADMM) distributed optimization framework. The algorithm consists of closed-form algebraic iterations reminiscent of the Kalman filter and Kalman smoother, but guarantees that the network of vehicles converge to the centralized MAP estimate of the targets’ trajectories over a designated sliding time window. We show in extensive simulations that our DRWT algorithm converges to the centralized estimate orders of magnitude faster than a state-of-the art Consensus Kalman Filter for the same bandwidth. We demonstrate our algorithm in a realistic urban driving scenario in the CARLA simulator, in which 50 autonomous cars track 50 target vehicles in real time using only segmented images from their on-board cameras.

The paper is organized as follows. We give related work in Sec. II and pose the distributed estimation problem in Sec. III. In Sec. IV, we formulate the centralized MAP optimization problem, and we derive our DRWT algorithm in Sec. V. Sec. VI presents results comparing our DRWT to the Consensus Kalman Filter, and describes large-scale simulations in a CARLA urban driving scenario.

Ii Related Work

Several approaches have previously been applied to solving distributed estimation problems. In distributed filtering methods, consensus techniques enable the asymptotic diffusion of information throughout the communication network, allowing individual computation nodes to approximate the joint estimate in the Consensus Kalman Filter [20, 21, 22, 3]. Alternatively, using finite consensus techniques can improve communication efficiency [27]. Similar techniques have also been applied to particle filtering [23, 1]

. However, the messages communicated in these consensus-based methods contain both information vectors and information matrices, so the communication cost scales superlinearly with the size of the estimate. Our approach recovers the same centralized solution while only communicating the estimate vector.

Sensor fusion techniques accomplish distributed estimation by computing a centralized approximation given individual estimates throughout the network [5]. A key challenge in sensor fusion is keeping track of cross-correlation in conditioning individual estimates on previously-fused centralized estimates [26]. Covariance Intersection (CI) addresses this issue by computing a consistent centralized estimate that accounts for any possible cross-correlation between individual estimates [18, 12, 13, 19]. However, in ensuring consistency, CI is often extremely conservative and therefore significantly suboptimal, especially for large networks.

Other estimation techniques approach distributed estimation using optimization. One approach is to aggregate all observations of each target to form a non-linear least squares objective function which recovers the MAP estimate [2], though such an approach requires all-to-all communication. In [17], each robot communicates its measurement and state estimate to its neighbors to solve the MAP least-squares problem using the conjugate gradient method. However, this approach still requires each node to communicate its measurements to its neighbors. Alternatively, some methods have been proposed to divide targets among the trackers using Voronoi partitions [7]

, and to track multiple targets using the Probability Hypothesis Density (PHD) filter


In this paper, we apply a novel approach to the problem of target tracking. We pose target tracking as a MAP estimate over a rolling window that bears some similarity to [25]. We apply ADMM, a technique that allows for distributed optimization of problems with separable objectives, to distribute the resulting MAP optimization problem (see [4, 15] for a detailed survey of ADMM). This approach guarantees convergence to the centralized solution [24].

Iii Problem Formulation

Iii-a Communication model

We consider the scenario of camera-equipped autonomous vehicles (“sensors”) navigating a city that also contains other vehicles (“targets”). Each sensor takes measurements of the positions of the targets in its vicinity and can communicate with other nearby sensors. We model the communication network among the sensors at time as a dynamic undirected graph , with vertices corresponding to sensors and edges containing pairs of sensors that can directly share information with each other. The presence of an edge depends on the proximity between sensors and at time . The neighbor set consists of sensors that can communicate with sensor at time .

Iii-B Target assignment

We assume that the each target in the environment has a unique identifier known to all sensors. This data association task is addressed in [16], and can be performed in a completely distributed fashion.

The set of sensors observing any given target changes due to occlusions coupled with the limited sensing-range of the cameras. At each time that a sensor observes one or more targets, it generates a set of features for each target ([14], [11], [10]) which identify the target. This identifier is communicated to its neighbors. Considering the case of a particular target, we denote the set of sensors that observe it over the time horizon as . The subgraph of sensors that are relevant to the target in the time horizon is , such that and . Sensor knows that sensor belongs to since sensor communicates a descriptor of each observed target. We assume that the subgraph is connected at all times (that is, there exists a set of edges that form a path between any ).

Iii-C Distributed estimation

Given a particular target, each sensor has the task of estimating the target’s state which includes its position and velocity over discrete timesteps modeled as a linear Gaussian system in which


with linear dynamics and additive noise . In the following, we represent the trajectory over the time horizon using the notation . Sensor makes an observation of the target at time according to


with measurement vector , measurement matrix , and additive noise . We also refer to the joint set of observations across all sensors in the network as


where the joint variables , , and are the column-wise concatenations over all , of , and , respectively.

While the joint measurements (3) are not available to any single sensing agent, each agent uses its individual measurements (2) as well as communication with its neighbors to estimate the target’s state. We compare the sensor’s estimated mean and covariance with the mean and covariance computed with full knowledge of all measurements. In the distributed estimation problem, each sensor seeks to approximate the centralized (best-possible) estimate using only individual measurements and local communication.

Iv Centralized Estimation

The centralized estimate, which is conditioned on all measurements and priors in the network, gives the best estimate of a target’s state and therefore represents the best possible performance. The MAP batch estimate maximizes the probability of the estimated target trajectory conditioned on the full set of measurements and a prior of mean and covariance :


Given Gaussian conditional probabilities, the posterior in (5

) is the Gaussian distribution


In the case of linear Gaussian systems, we can solve (5) as a linear system of equations. However, recursively estimating the trajectory reduces the size of the system of equations, improving computational efficiency. Instead of maximizing , the Kalman filter infers from the result of the previous timestep’s estimate, the prior distribution :


However, the Kalman filter only exactly replicates the result of the batch estimate for the final timestep . For some intermediate , is conditioned on the full measurement set in the batch approach, but only on in the filtering approach. Employing the Rauch-Tung-Striebel smoother exactly recovers the batch solution by computing for (a backward pass of the trajectory performed after the Kalman filter’s forward pass).

For our application of persistently tracking targets, a MAP rolling window approach is appropriate as it incorporates smoothing effects into a single Kalman filter-like update. The rolling window refers to a time horizon over which we compute the MAP estimate. Given the prior , we compute the window’s posterior distribution by factoring the original MAP solution as


The estimate is conditioned on , and is equivalent to performing a filtering pass for the times and a smoothing pass from time to time . We then increment the rolling window forward to , retaining the estimate as the prior for that window. Therefore, the rolling window approach preserves much of the smoothing effect of the batch estimate while maintaining a constant problem size at each time step.

Applying (1) and (3) to (7) yields


for which the minimizing is the solution to (7).

We can express the MAP rolling window estimate as


given the block matrices

We implement this procedure recursively by retaining the lower-right block of the covariance matrix as the prior covariance for the next timestep’s estimate. The estimate over all but timestep becomes the prior mean . Therefore, we have a tractable centralized target tracking method that serves as a benchmark for our distributed target tracking algorithm.

V Distributed Estimation

One typical approach for the distributed implementation of the MAP estimate is to use consensus techniques to diffuse information across the network, enabling each agent to minimize (8). This is true of Consensus Kalman Filter (CKF) approaches, in which each agent maintains local measurement information (2) rather than the joint measurements [21, 22, 3, 20, 27]. The CKF uses asymptotic consensus with Metropolis weights to sum and over all , where . The fused observations are then fused with local copies of the dynamics terms and prior terms of the cost function. The consensus rounds diffuse the joint measurement information to each sensor, enabling local computation of (9) and (10).

The CKF requires communication of local information matrices and information vectors during consensus, a communication-intensive process that is a drawback of the method. Furthermore, performing an approximation of the centralized estimate at each node is redundant, failing to take advantage of the distributed nature of the computational resources in the network. In contrast to the CKF, we propose a Distributed Rolling Window Tracking (DRWT) algorithm that uses an ADMM-based approach to enable each sensor to replicate the centralized estimate without reconstructing the centralized cost function. First, we pose the centralized cost function (8) as a separable problem with linear constraints:


for which . In the following, we express the cost function in (11) as and omit the subscript from the primal variable . The slack variable encodes agreement constraints between neighbors and . The ADMM approach to solving problems of this form uses the augmented Lagrangian, which adds to the cost function a quadratic penalty for constraint violations, . The augmented problem is equivalent to the original problem as the added penalty is zero for the feasible set of estimates. We find the saddle point of the augmented Lagrangian


by alternating between minimizing with respect to the primal variables and and performing a gradient ascent step on the dual variables and . Each can update its respective , , , and for all in parallel since minimizing with respect to these variables does not depend on the values of its neighbors’ variables. Furthermore, as shown in [15, 6], substituting and assuming the initialization yields the minimization of as . Initializing and , the following iterations alternate between a gradient ascent step on and a minimization step on , converging to the centralized estimate when run in parallel across all :


Furthermore, due to our assumption of a linear Gaussian system (14) can be expressed in closed form as


using the local versions of the block matrices in (9) (replacing , , , , , and with , , , , , and , respectively). We note that the matrix inverse in (15) only needs to be computed once rather than at every primal update iteration.

Lemma 1.

Given a connected and priors and such that and , there is a saddle point of (12) at


where is the centralized MAP rolling window estimate given priors , .


The Hessian of with respect to the primal variables is positive definite. Observing that

for each , is minimized with respect to the primal variables at . Substituting the primal solution into the dual update, we see that . Substituting (16) into (15) yields (17). ∎

In other words, the network can minimize (8) in a fully distributed manner using only independent measurements and local communication. By decomposing the centralized problem according to (11), each estimate converges to the solution of (9). A key assumption, however, is the decomposability of prior information, i.e., . Given that the distributed prior inverse covariances sum to the centralized prior inverse covariances, then the distributed posterior inverse covariances (where is the Hessian of the local cost function ) also sum to the centralized posterior inverse covariances. However, this assumption weakens in implementing DRWT recursively. In performing the marginalization step in which is the block of , the distributed implementation is not exactly equivalent to the centralized. It is always true that . Consequently, the distributed marginalization is conservative with respect to the centralized solution. The conservativeness of the estimated covariance is a feature of other distributed algoithms as well—as Figure 3

shows, the CKF has an even more conservative covariance estimate. Therefore, while DRWT remains an unbiased estimator, it does not exactly replicate the centralized covariance in its recursive implementation, as the prior mean is under-weighted. Lemma

1 holds, with the modification that the saddle point is the solution to a centralized optimization problem with a potentially overestimated prior covariance. As we show in Sec. VI, this effect is minimal in practice.

Finally, we propose a “hand-off” protocol by which sensor removes itself from estimating a target after not directly observing it in the most recent timesteps. If there exists (i.e., neighbor is continuing to estimate the target), then transfers the Hessian of its local cost function to a single neighbor at the end of the ADMM iterations. Sensor fuses the new information matrix with its own, thereby preserving the same joint information across the entire network. Algorithm 1 summarizes DRWT, including the hand-off protocol.

1:function DRWT()
2:     for  do
6:     end for
7:     while stopping criterion is unmet do
8:         for  do
9:               Equation (13) dual update
10:               Equation (14) primal update
11:         end for
13:     end while
14:     for  do
15:          hand-off
16:     end for
17:     return
18:end function
Algorithm 1 Distributed Rolling Window Tracking

After each communication round per timestep, sensor updates its estimate of the target’s trajectory (14) by inverting the Hessian of its local objective function which requires floating point operations (flops), posing a bottleneck for long window lengths. Here, we provide an efficient algorithm for performing this update in flops rather than cubic complexity, without any matrix inversion. We factor the Hessian using Cholesky decomposition to obtain a lower triangular matrix for each and compute to update using forward and backward iterations, reminiscent of the Kalman smoothing procedure. The Cholesky decomposition of the Hessian takes flops, along with the forward and backward iterations. We present the algorithm in Algorithm 2.

1:function PrimalUpdate()
4:     initialization      
7:     forward pass      
8:         for  do
14:         end for
18:     backward pass      
19:         for  do
21:         end for     
22:     return
23:end function
Algorithm 2 DRWT Primal Update

Vi Simulation Results

Vi-a Performance Comparison

We compare the performance of the DRWT method in Algorithm 1 to the CKF in a distributed estimation problem involving a static network with and . All sensors acquire noisy measurements of the target at each time step, and perform DRWT with . During each estimation phase, the same bandwidth limitations are imposed on the CKF and DRWT. We benchmark both distributed methods against the centralized MAP estimate.

Results from 2000 Monte Carlo simulations of this scenario show that DRWT method outperforms the CKF. DRWT is significantly more communication-efficient, as sensors communicate only their target estimates. From Figure 2, DRWT yields better convergence to the centralized estimate compared to the CKF method as a function of the total number of communication bits per node. As Figure 3 shows, the improved convergence of the DRWT contributes to improved estimation performance over entire trajectories. The estimated trajectories and covariances of the DRWT method closely match the centralized estimates. The CKF does not track the centralized estimate as closely and is also more significantly overconservative in its estimate.

Fig. 2: Convergence of distributed estimation methods to the centralized estimate as a function of bits of communication passed on a 100 node, 400 edge network for a single timestep’s estimate.
Fig. 3: Mean squared error of estimation methods on a 100 node, 400 edge network with respect to ground truth, averaged over 4000 Monte Carlo simulations. Solid lines show the indicate mean squared error, while dashed lines represent estimated covariances, computed as .

Vi-B CARLA Simulations

We demonstrate our algorithm in a scenario involving a network of 50 sensor vehicles and 50 target vehicles within CARLA [9], a simulation test-bed for autonomous driving systems. For the simulation trials, each sensor vehicle is equipped with a forward and a backward-facing camera, each with a field of view. As shown in Figure 4, sensor vehicles acquire semantic segmentation and depth images at . The sensing radius of the vehicles is limited to 100m.

Fig. 4: CARLA frame showing raw and segmented camera images.

The relative position of each target vehicle is deduced from the depth and segmentation images and the camera’s projection matrix. Each sensor uses its odometry information to transform the relative position of the target into the global coordinate frame corresponding to the measurement used by the vehicle in DRWT. The sensor estimates trajectories of in length. For this simulation, we assume that the target labeling is known a priori. The communication network between sensor vehicles is modeled as a disk graph with a radius and is updated at . DRWT uses a simple double integrator model for the vehicle dynamics.

Fig. 5: Mean squared error to the centralized estimate across the full trajectories of all 50 targets. Red lines are the positional estimate errors for each individual sensor (with no communication), and the blue lines are for the DWRT positional estimates.
Fig. 6: The sum of the traces of information matrices maintained by sensor vehicles using DRWT for a single target in a CARLA simulation. Each colored band represents the information of one sensor. Although any one sensor possesses only a fraction of the joint information, the sum over the network closely matches the information of a centralized estimator. Spikes in individual bands correspond to execution of the hand-off procedure.

Figure 5 shows the mean squared error of the estimated target trajectories of all target vehicles for all the sensor vehicles with respect to the centralized trajectory estimate. Collaborative target tracking using DRWT significantly outperforms the estimates made by any single agent. Increasing the number of iterations of DRWT in each estimation round can further reduce the remaining error.

Figure 6 shows how the information (represented as the trace of the inverse covariance) corresponding to a given target is apportioned across the network. As the set of sensors tracking a target changes in time, the hand-off procedure enables their joint information to closely match the information of the centralized estimate.

Vii Conclusion

The DRWT algorithm enables a fleet of autonomous vehicles to track other vehicles in urban environment in the presence of occlusions. In this method, each sensor-equipped vehicle estimates the target’s state over a rolling window, leading to a scalable algorithm that can be parallelized to multiple targets. We show that DRWT converges to the centralized estimate even with less communication bits per node. Future work will focus on target tracking by vehicles with non-linear dynamics and non-linear sensors such as radar and lidar.


  • [1] A. Ahmad and P. U. Lima (2011) Multi-robot cooperative object tracking based on particle filters.. In ECMR, pp. 37–42. Cited by: §II.
  • [2] A. Ahmad, G. D. Tipaldi, P. Lima, and W. Burgard (2013) Cooperative robot localization and target tracking based on least squares minimization. In 2013 IEEE International Conference on Robotics and Automation, pp. 5696–5701. Cited by: §II.
  • [3] G. Battistelli and L. Chisci (2016) Stability of consensus extended Kalman filter for distributed state estimation. Automatica 68, pp. 169–178. Cited by: §II, §V.
  • [4] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, et al. (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers.

    Foundations and Trends® in Machine Learning

    3 (1), pp. 1–122.
    Cited by: §II.
  • [5] R. Carli, A. Chiuso, L. Schenato, and S. Zampieri (2008) Distributed Kalman filtering based on consensus strategies. IEEE Journal on Selected Areas in Communications 26 (4), pp. 622–633. Cited by: §II.
  • [6] T. Chang, M. Hong, and X. Wang (2014) Multi-agent distributed optimization via inexact consensus ADMM. IEEE Transactions on Signal Processing 63 (2), pp. 482–497. Cited by: §V.
  • [7] P. Dames, P. Tokekar, and V. Kumar (2017) Detecting, localizing, and tracking an unknown number of moving targets using a team of mobile robots. The International Journal of Robotics Research 36 (13-14), pp. 1540–1553. Cited by: §II.
  • [8] P. Dames (2017) Distributed multi-target search and tracking using the PHD filter. In 2017 International Symposium on Multi-Robot and Multi-Agent Systems (MRS), pp. 1–8. Cited by: §II.
  • [9] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun (2017) CARLA: An open urban driving simulator. In Proceedings of the 1st Annual Conference on Robot Learning, pp. 1–16. Cited by: §VI-B.
  • [10] S. Du, M. Ibrahim, M. Shehata, and W. Badawy (2012) Automatic license plate recognition: a state-of-the-art review. IEEE Transactions on Circuits and Systems for Video Technology 23 (2), pp. 311–325. Cited by: §III-B.
  • [11] J. Hsieh, L. Chen, and D. Chen (2014) Symmetrical SURF and its applications to vehicle detection and vehicle make and model recognition. IEEE Transactions on Intelligent Transportation Systems 15 (1), pp. 6–20. Cited by: §III-B.
  • [12] S. J. Julier and J. K. Uhlmann (2007) Using covariance intersection for SLAM. Robotics and Autonomous Systems 55 (1), pp. 3–20. Cited by: §II.
  • [13] H. Li and F. Nashashibi (2013) Cooperative multi-vehicle localization using split covariance intersection filter. IEEE Intelligent Transportation Systems Magazine 5 (2), pp. 33–44. Cited by: §II.
  • [14] M. A. Manzoor, Y. Morgan, and A. Bais (2019) Real-time vehicle make and model recognition system. Machine Learning and Knowledge Extraction 1 (2), pp. 611–629. Cited by: §III-B.
  • [15] G. Mateos, J. A. Bazerque, and G. B. Giannakis (2010)

    Distributed sparse linear regression

    IEEE Transactions on Signal Processing 58 (10), pp. 5262–5276. Cited by: §II, §V.
  • [16] E. Montijano, R. Aragues, and C. Sagüés (2013) Distributed data association in robotic networks with cameras and limited communications. IEEE Transactions on Robotics 29 (6), pp. 1408–1423. Cited by: §III-B.
  • [17] E. D. Nerurkar, S. I. Roumeliotis, and A. Martinelli (2009) Distributed maximum a posteriori estimation for multi-robot cooperative localization. In 2009 IEEE International Conference on Robotics and Automation, pp. 1402–1409. Cited by: §II.
  • [18] W. Niehsen (2002) Information fusion based on fast covariance intersection filtering. In Proceedings of the Fifth International Conference on Information Fusion. FUSION 2002.(IEEE Cat. No. 02EX5997), Vol. 2, pp. 901–904. Cited by: §II.
  • [19] B. Noack, J. Sijs, M. Reinhardt, and U. D. Hanebeck (2017) Decentralized data fusion with inverse covariance intersection. Automatica 79, pp. 35–41. Cited by: §II.
  • [20] R. Olfati-Saber (2005) Distributed Kalman filter with embedded consensus filters. In Proceedings of the 44th IEEE Conference on Decision and Control, pp. 8179–8184. Cited by: §II, §V.
  • [21] R. Olfati-Saber (2007) Distributed Kalman filtering for sensor networks. In Decision and Control, 2007 46th IEEE Conference on, pp. 5492–5498. Cited by: §II, §V.
  • [22] R. Olfati-Saber (2009) Kalman-consensus filter: optimality, stability, and performance. In Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference, pp. 7036–7042. Cited by: §II, §V.
  • [23] L. Ong, B. Upcroft, T. Bailey, M. Ridley, S. Sukkarieh, and H. Durrant-Whyte (2006) A decentralised particle filtering algorithm for multi-target tracking across multiple flight vehicles. In 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4539–4544. Cited by: §II.
  • [24] R. T. Rockafellar (1976) Monotone operators and the proximal point algorithm. SIAM Journal on Control and Optimization 14 (5), pp. 877–898. Cited by: §II.
  • [25] G. Sibley (2006) Sliding window filters for SLAM. University of Southern California, Tech. Rep.. Cited by: §II.
  • [26] A. W. Stroupe, M. C. Martin, and T. Balch (2001) Distributed sensor fusion for object position estimation by multi-robot systems. In Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No. 01CH37164), Vol. 2, pp. 1092–1098. Cited by: §II.
  • [27] Z. Wu, M. Fu, Y. Xu, and R. Lu (2018) A distributed Kalman filtering algorithm with fast finite-time convergence for sensor networks. Automatica 95, pp. 63–72. Cited by: §II, §V.