Multiple object tracking (MOT) is a critical module for enabling an autonomous vehicle achieve robust perception of its environment and consequently achieve safe maneuvering within the environment surrounding the vehicle. The main challenges for MOT in autonomous driving applications are threefold: (1) uncertainty in the number of objects; (2) uncertainty regarding when and where the objects may appear and disappear; (3) uncertainty in objects’ states. Traditional filtering based methods, such as Kalman filtering [7, 18, 3]
, perform well in state update and estimation but can hardly model the unknown number of objects, and the so-calledbirth and death phenomena of objects. Meanwhile, the emergence of random finite set (RFS)[15, 11, 16] based approaches have opened the door for developing theoretically sound Bayesian frameworks that naturally model all the aforementioned uncertainties accurately and elegantly.
RFS-based MOT algorithms have been shown to be very effective for radar-based MOT applications [17, 13]. In particular, Poisson multi-Bernoulli mixture (PMBM) filtering has shown superior tracking performance and favourable computational cost  when compared to other RFS-based approaches. Consequently, under this work, we propose a PMBM filter to solve the amodal MOT problem for autonomous driving applications (Fig 1). Applying RFS-based trackers for 3D LiDAR data and/or for 2D/3D amodal detections (bounding boxes) has not been well explored. Existing works in this area either under-perform state-of-the-art trackers or they have been tested using a small dataset that do not reflect broad and truly challenging scenarios [8, 10, 6]. We believe that RFS-based methods could provide robust and highly-effective solution for these emerging detection modalities.
The contributions of our paper are as follows: (1) We propose a PMBM filter to solve the amodal MOT problem for autonomous driving applications. To the best of our knowledge, this represents a first attempt for employing an RFS-based approach in conjunction with 3D LiDAR data and neural network-based detectors. (2) We demonstrate that our PMBM tracker is low-complexity, and it can run at an average rate of 20 Hz on a standard desktop. (3) We validate and test the performance of our PMBM tracker using two extensive open datasets provided by two industry leaders – Waymo  and Argoverse . These datasets, which have more than 80000 diverse testing frames, clearly demonstrate that our tracker outperforms many state-of-the-art methods under realistic driving conditions. It is worth noting that our PMBM tracker ranks No.2 in average MOTA and No.1 in vehicle MOTA among all the entries that use the organizer provided detections on the Argoverse dataset.
The PMBM filter is a relatively new approach to the MOT problem. It is able to track objects based on a Bayesian framework, and it models undetected objects and detected objects using two distinct probability distributions. The high-level system of the proposed PMBM MOT tracker architecture is shown in Fig2. As shown in the figure, the tracker consists of four primary components: (1) PMBM Predictions; (2) Data Association; (3) PMBM Update; and (4) Reduction.
2.1 Object State
The object states used in this work is defined as , where and represent the 2D location of the object, and are the velocities along and directions respectively. The reasons that we define the states in this compressed way are as follows: first, the value does not change dramatically for consecutive frames; second, the dimension of the objects are already precise from a neural network based detector; therefore, it is not necessary to incorporate all 3D information; third, reducing the state dimension inherently enables the tracking system operate at a lower computational cost for real-time performance. Under this work, our PMBM tracker is designed as a point-based tracker. The center points of the objects, which are initially output from the detector, are tracked using unique tracked IDs, while the bounding box dimensions (height, width, length) are directly extracted from the detection measurements.
2.2 Detected and Undetected Objects
Under the PMBM model, the set of objects at timestamp is the union of detected objects and undetected objects . Detected objects are objects that have been detected at least once. Undetected objects are objects that are not detected. Note that we are not explicitly tracking the undetected objects, which is impossible under a tracking-by-detection framework. Instead, we have a representation of their possible existences. For example, if we consider an autonomous car under a scenario where a large truck blocks part of the view. It is possible that some objects are located in the occluded area behind the truck, and hence, these objects are inherently undetected.
2.3 Data Association Hypotheses
For each timestamp, there are multiple hypotheses for data association. For our measurement-driven framework, each measurement, either it is a newly detected target, a previously detected target, or a false positive detection. We form different global association hypotheses from possible combinations of the single target hypothesis (STH). Gating is used to reduce the total number of hypotheses and only keeps reasonable ones. Murty’s algorithm , an extension of the Hungarian algorithm  is used to generate best global hypotheses instead of only one.
2.4 PMBM Density
Under the PMBM model, we use Poisson RFS, also named as Poisson point process (PPP) to represent undetected objects, and multi-Bernoulli mixture (MBM) RFS to represent detected objects. The PMBM density is defined as a convolution of a PPP density for undetected objects and a MBM density for detected objects:
where represents all the objects in the surveillance area, and where is the disjoint union set of undetected objects and detected objects . and are the Poisson point process density and multi-Bernoulli mixture density, respectively.
2.5 PMBM Prediction
A crucial aspect of the PMBM filter is its conjugacy property, which was proved in . The notion of conjugacy is quite critical for robust and accurate Bayesian-based MOT. In summary, the conjugacy of the PMBM filter inplies that if the prior is in a PMBM form, then the distribution after the Bayesian prediction and update steps will be of the same distribution form. Therefore, the prediction stage of a PMBM filter can be written as:
where represents the transition density. Constant velocity model is used as the motion model in this work for simplicity. Under the PMBM filter, undetected and detected objects can be predicted independently. We define
as the probability of survival, which models the probability that an object survives from one time step to the next. For undetected objects, the predicted parameters consist of predicted parameters from the previous timestamp and PPP birth parameters. The weight of each undetected object is scaled byin the prediction step. For detected objects, which are modeled as multi-Bernoulli mixture RFSs, each multi-Bernoulli (MB) process can also be predicted independently of the other MB processes. The probability of existence for each MB-modeled object is decreased by a factor in order to account for the higher uncertainty of existence within the Prediction stage.
2.6 PMBM Update
Furthermore, by adding information from the measurement model , the PMBM density can be updated with:
In the update step, the undetected objects that do not have any measurement associated with them remain undetected. The Bayesian update will thus not change the states or variances of the Poisson distributions since no new information is added. Here we defineas the probability of detection, which models an object ought to be detected with that probability. For undetected objects without measurement associated, the weight is thus decreased with a factor as to account for the decreased probability of existing. For detected object, the predicted state is updated by weighting in the information contained in the measurement.
There are two different types of updates for detected objects: the objects being detected for the first time and the detected objects from the previous timestamp. Our tracker is a measurement-driven framework: an object must be connected to a measurement in order to be classified as detected for the first time. All the undetected PPP intensity components and corresponding gated measurements are considered to generate the fused distribution. Note that the detections provided from a neural network always have confident scores attached to them. This confident score is an invaluable indicator of the object probability of existence. So unlike a standard PMBM filter, we incorporate the detection confident score into the update step of objects detected for the first time. We get a new Bernoulli process for each first-time detected object. As for detected objects from the previous timestamp, if there are measurements associated with them, then for each hypothesis, a standard Kalman filter is used to update the state vector, the updated probability of existence is set to 1 because one can not associate a measurement to an object that does not exist; if there is no measurement associated with an object, which was detected from a previous frame, then we maintain the object predicted state unchanged. Furthermore, we decrease the probability of existence and weight with. Note that here is related to the associated detection confident score in the past frames. Here, unlike other standard Kalman filter based trackers, the survival time of detected objects without measurement varies based on the tracking status from the previous time period.
|Method||Split||Class||MOTA (%)||MOTP||#False Possitive||#Misses||#IDS||#FRAG|
|Argoverse Baseline ||validation||Vehicle||63.15||0.37||17385||19811||122||323|
|Our PMBM Tracker||validation||Vehicle||68.70||0.375||10644||20782||271||1116|
|Argoverse Baseline ||validation||pedestrian||41.70||0.257||3783||14411||113||123|
|Our PMBM Tracker||validation||pedestrian||45.50||0.249||4935||11653||201||903|
|Argoverse Baseline ||test||Vehicle||65.90||0.34||15693||23594||200||393|
|Our PMBM Tracker||test||Vehicle||71.67||0.34||8278||24165||362||1217|
|Argoverse Baseline ||test||pedestrian||48.31||0.37||4933||25780||424||387|
|Our PMBM Tracker||test||pedestrian||48.56||0.4||5924||24278||783||1750|
|Method||Split||Class||MOTA (Primary) (%)||MOTP||False Possitive (%)||Misses (%)|
Waymo Baseline 
|Argoverse Baseline ||test||All||29.14||0.270||17.14||53.47|
|Probabilistic KF ||test||All||36.57||0.270||8.32||54.02|
|Our PMBM Tracker||test||All||38.51||0.270||7.74||52.86|
The assignment problem in MOT will theoretically become NP-hard, and hence, the reduction of the number of hypotheses is necessary for decreasing the computational complexity and maintain real-time performance. Five reduction techniques are used in this work: pruning, capping, gating, recycling and merging. Pruning is used to remove objects and global hypotheses with low weights. Capping is used to set an upper bound for the number of global hypotheses and detected objects. Gating refers to limiting the search distance for data association, and where Mahalanobis distance is used here instead of Euclidean distance. Recycling is applied to detected objects with lower probability of existence. In that context, instead of discarding these objects, we recycle them by moving them from detected object set to undetected object set. There may be non-unique global hypotheses, and hence, merging would merge these identical global hypotheses into one.
Dataset. We evaluate our method on two popular open dataset provided by two industry leaders: Waymo  and Argoverse . Waymo 3D tracking dataset contains 800 training segments, 202 validation segments and 150 testing segments. Each segment includes 200 frames covering 20 seconds. There are 40400 frames in the validation set and 30000 frames in the testing set. Three classes are evaluated: vehicle, pedestrian and cyclist.
Argoverse 3D tracking dataset consists of 113 total number of segments (scenes) and 15-30 seconds for each segment. The data is divided into three sets: 65 segments for training ( 13000 frames), 24 segments for validation ( 5000 frames) and 24 segments for test ( 4200 frames). Since our method is not a learning based method, we don’t need to use the training set; but validation set is used for tuning the parameters. Class vehicle and class pedestrian are evaluated. The field of view of these two datasets are both 360 degrees.
Note that for both Waymo and Argoverse datasets, ground truth labels are only available for training and validation sets. For fairness of evaluation of testing samples, one needs to submit the tracking results to the relevant server, and the evaluation results are subsequently published on the corresponding leaderboard.
Detections. We use precomputed 3D object detections provided by the dataset organizer.
3.2 Experimental Results
The results for Argovers dataset and Waymo dataset are shown in Table 1 and Table 2. As shown in these two tables, our method outperforms other state-of-the-art tracker significantly. At the time of submission (May 2020), among all the entries that use organizer provided detections, our PMBM tracker ranked No.3 in averaged ranking (primary metric), No.2 in averaged MOTA for all classes, No.1 in Vehicle MOTA, No.5 in Pedestrian MOTA within the Argoverse 3D Tracking Leaderboard111https://evalai.cloudcv.org/web/challenges/challenge-page/453/leaderboard/1278#leaderboardrank-1, and No.5 on the Waymo 3D tracking leaderboard 222https://waymo.com/open/challenges/3d-tracking/. For tracking-by-detection approaches, the impact of the quality of input detections is inherently of a paramount importance. But our performance is still very competitive compared to other entries that use better self-generated detections.
It is worth noting that our tracker has superior performance for the vehicle case compared to pedestrian tracking. We believe that this is attributed to the fact that the distance between pedestrians are much smaller than distances among vehicles. Thus, since our tracker is a point-based method, it is unable to process object dimension, and consequently, close distances between hypotheses could result in merging such hypotheses that should be separated. Subsequently, this could lead to erroneous data associations. Therefore, as shown in Table 1, for class pedestrian, our tracker has higher false positives, ID switches and fragmentations, which are due to low quality data association. Meanwhile, for vehicles, all the vehicle detection measurements are further separated due to the fact that two vehicles cannot be overlaid with each other when compared to the pedestrian case.
In this paper, we propose a PMBM filter to solve the 3D amodal MOT problem with 3D LiDAR data for autonomous driving applications. Our framework can naturally model the uncertainties in MOT problem. This represents a first attempt for employing an RFS-based approach in conjunction with 3D LiDAR data, neural network-based detectors and with comprehensive testing in large-scale datasets. The experimental results on Waymo and Argoverse datasets demonstrate that our approach outperforms previous state-of-the-art methods by a large margin. Finally, we hope that our results motivate future research on RFS-based trackers for self-driving applications.
Acknowledgement: This work has been supported in part by the Semiconductor Research Corporation (SRC) and by Amazon Robotics under the Amazon Research Award (ARA) program.
-  (2008) Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing 2008, pp. 1–10. Cited by: §3.1.
-  (2019) Argoverse: 3d tracking and forecasting with rich maps. In , pp. 8748–8757. Cited by: §1, Table 1, Table 2, §3.1, §3.1.
-  (2020) Probabilistic 3d multi-object tracking for autonomous driving. arXiv preprint arXiv:2001.05673. Cited by: §1, Table 2.
-  (2018) Machine learning methods for solving assignment problems in multi-target tracking. arXiv preprint arXiv:1802.06897. Cited by: §2.7.
-  (2018) Poisson multi-bernoulli mixture filter: direct derivation and implementation. IEEE Transactions on Aerospace and Electronic Systems 54 (4), pp. 1883–1901. Cited by: §2.5.
-  (2017) Pedestrian tracking using velodyne data—stochastic optimization for extended object tracking. In 2017 ieee intelligent vehicles symposium (iv), pp. 39–46. Cited by: §1.
-  (1960) A new approach to linear filtering and prediction problems. Cited by: §1.
-  (2010) A random finite set based detection and tracking using 3d lidar in dynamic environments. In 2010 IEEE International Conference on Systems, Man and Cybernetics, pp. 2288–2292. Cited by: §1.
-  (1955) The hungarian method for the assignment problem. Naval research logistics quarterly 2 (1-2), pp. 83–97. Cited by: §2.3.
-  (2010) Tracking random finite objects using 3d-lidar in marine environments. In Proceedings of the 2010 ACM Symposium on Applied Computing, pp. 1282–1287. Cited by: §1.
-  (2007) Statistical multisource-multitarget information fusion. Vol. 685, Artech House Norwood, MA. Cited by: §1.
-  (1968) An algorithm for ranking all the assignments in order of increasing costs. Operations research 16 (3), pp. 682–687. Cited by: §2.3.
-  (2013) Multi-target track-before-detect using labeled random finite set. In 2013 International Conference on Control, Automation and Information Sciences (ICCAIS), pp. 116–121. Cited by: §1.
-  (2020) Scalability in perception for autonomous driving: waymo open dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2446–2454. Cited by: §1, Table 2, §3.1, §3.1.
-  (2005) Sequential monte carlo methods for multitarget filtering with random finite sets. IEEE Transactions on Aerospace and electronic systems 41 (4), pp. 1224–1245. Cited by: §1.
-  (2008) Bayesian filtering with random finite set observations. IEEE Transactions on signal processing 56 (4), pp. 1313–1326. Cited by: §1.
A random finite set conjugate prior and application to multi-target tracking. In 2011 Seventh International Conference on Intelligent Sensors, Sensor Networks and Information Processing, pp. 431–436. Cited by: §1.
-  (2020) 3D multi-object tracking: a baseline and new evaluation metrics. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Cited by: §1.
-  (2017) Performance evaluation of multi-bernoulli conjugate priors for multi-target filtering. In 2017 20th International Conference on Information Fusion (Fusion), pp. 1–8. Cited by: §1.