1 Introduction
Multiple object tracking (MOT) is a critical module for enabling an autonomous vehicle achieve robust perception of its environment and consequently achieve safe maneuvering within the environment surrounding the vehicle. The main challenges for MOT in autonomous driving applications are threefold: (1) uncertainty in the number of objects; (2) uncertainty regarding when and where the objects may appear and disappear; (3) uncertainty in objects’ states. Traditional filtering based methods, such as Kalman filtering [7, 18, 3]
, perform well in state update and estimation but can hardly model the unknown number of objects, and the socalled
birth and death phenomena of objects. Meanwhile, the emergence of random finite set (RFS)[15, 11, 16] based approaches have opened the door for developing theoretically sound Bayesian frameworks that naturally model all the aforementioned uncertainties accurately and elegantly.RFSbased MOT algorithms have been shown to be very effective for radarbased MOT applications [17, 13]. In particular, Poisson multiBernoulli mixture (PMBM) filtering has shown superior tracking performance and favourable computational cost [19] when compared to other RFSbased approaches. Consequently, under this work, we propose a PMBM filter to solve the amodal MOT problem for autonomous driving applications (Fig 1). Applying RFSbased trackers for 3D LiDAR data and/or for 2D/3D amodal detections (bounding boxes) has not been well explored. Existing works in this area either underperform stateoftheart trackers or they have been tested using a small dataset that do not reflect broad and truly challenging scenarios [8, 10, 6]. We believe that RFSbased methods could provide robust and highlyeffective solution for these emerging detection modalities.
The contributions of our paper are as follows: (1) We propose a PMBM filter to solve the amodal MOT problem for autonomous driving applications. To the best of our knowledge, this represents a first attempt for employing an RFSbased approach in conjunction with 3D LiDAR data and neural networkbased detectors. (2) We demonstrate that our PMBM tracker is lowcomplexity, and it can run at an average rate of 20 Hz on a standard desktop. (3) We validate and test the performance of our PMBM tracker using two extensive open datasets provided by two industry leaders – Waymo [14] and Argoverse [2]. These datasets, which have more than 80000 diverse testing frames, clearly demonstrate that our tracker outperforms many stateoftheart methods under realistic driving conditions. It is worth noting that our PMBM tracker ranks No.2 in average MOTA and No.1 in vehicle MOTA among all the entries that use the organizer provided detections on the Argoverse dataset.
2 Approach
The PMBM filter is a relatively new approach to the MOT problem. It is able to track objects based on a Bayesian framework, and it models undetected objects and detected objects using two distinct probability distributions. The highlevel system of the proposed PMBM MOT tracker architecture is shown in Fig
2. As shown in the figure, the tracker consists of four primary components: (1) PMBM Predictions; (2) Data Association; (3) PMBM Update; and (4) Reduction.2.1 Object State
The object states used in this work is defined as , where and represent the 2D location of the object, and are the velocities along and directions respectively. The reasons that we define the states in this compressed way are as follows: first, the value does not change dramatically for consecutive frames; second, the dimension of the objects are already precise from a neural network based detector; therefore, it is not necessary to incorporate all 3D information; third, reducing the state dimension inherently enables the tracking system operate at a lower computational cost for realtime performance. Under this work, our PMBM tracker is designed as a pointbased tracker. The center points of the objects, which are initially output from the detector, are tracked using unique tracked IDs, while the bounding box dimensions (height, width, length) are directly extracted from the detection measurements.
2.2 Detected and Undetected Objects
Under the PMBM model, the set of objects at timestamp is the union of detected objects and undetected objects . Detected objects are objects that have been detected at least once. Undetected objects are objects that are not detected. Note that we are not explicitly tracking the undetected objects, which is impossible under a trackingbydetection framework. Instead, we have a representation of their possible existences. For example, if we consider an autonomous car under a scenario where a large truck blocks part of the view. It is possible that some objects are located in the occluded area behind the truck, and hence, these objects are inherently undetected.
2.3 Data Association Hypotheses
For each timestamp, there are multiple hypotheses for data association. For our measurementdriven framework, each measurement, either it is a newly detected target, a previously detected target, or a false positive detection. We form different global association hypotheses from possible combinations of the single target hypothesis (STH). Gating is used to reduce the total number of hypotheses and only keeps reasonable ones. Murty’s algorithm [12], an extension of the Hungarian algorithm [9] is used to generate best global hypotheses instead of only one.
2.4 PMBM Density
Under the PMBM model, we use Poisson RFS, also named as Poisson point process (PPP) to represent undetected objects, and multiBernoulli mixture (MBM) RFS to represent detected objects. The PMBM density is defined as a convolution of a PPP density for undetected objects and a MBM density for detected objects:
(1) 
where represents all the objects in the surveillance area, and where is the disjoint union set of undetected objects and detected objects . and are the Poisson point process density and multiBernoulli mixture density, respectively.
2.5 PMBM Prediction
A crucial aspect of the PMBM filter is its conjugacy property, which was proved in [5]. The notion of conjugacy is quite critical for robust and accurate Bayesianbased MOT. In summary, the conjugacy of the PMBM filter inplies that if the prior is in a PMBM form, then the distribution after the Bayesian prediction and update steps will be of the same distribution form. Therefore, the prediction stage of a PMBM filter can be written as:
(2) 
where represents the transition density. Constant velocity model is used as the motion model in this work for simplicity. Under the PMBM filter, undetected and detected objects can be predicted independently. We define
as the probability of survival, which models the probability that an object survives from one time step to the next. For undetected objects, the predicted parameters consist of predicted parameters from the previous timestamp and PPP birth parameters. The weight of each undetected object is scaled by
in the prediction step. For detected objects, which are modeled as multiBernoulli mixture RFSs, each multiBernoulli (MB) process can also be predicted independently of the other MB processes. The probability of existence for each MBmodeled object is decreased by a factor in order to account for the higher uncertainty of existence within the Prediction stage.2.6 PMBM Update
Furthermore, by adding information from the measurement model , the PMBM density can be updated with:
(3) 
In the update step, the undetected objects that do not have any measurement associated with them remain undetected. The Bayesian update will thus not change the states or variances of the Poisson distributions since no new information is added. Here we define
as the probability of detection, which models an object ought to be detected with that probability. For undetected objects without measurement associated, the weight is thus decreased with a factor as to account for the decreased probability of existing. For detected object, the predicted state is updated by weighting in the information contained in the measurement.There are two different types of updates for detected objects: the objects being detected for the first time and the detected objects from the previous timestamp. Our tracker is a measurementdriven framework: an object must be connected to a measurement in order to be classified as detected for the first time. All the undetected PPP intensity components and corresponding gated measurements are considered to generate the fused distribution. Note that the detections provided from a neural network always have confident scores attached to them. This confident score is an invaluable indicator of the object probability of existence. So unlike a standard PMBM filter, we incorporate the detection confident score into the update step of objects detected for the first time. We get a new Bernoulli process for each firsttime detected object. As for detected objects from the previous timestamp, if there are measurements associated with them, then for each hypothesis, a standard Kalman filter is used to update the state vector, the updated probability of existence is set to 1 because one can not associate a measurement to an object that does not exist; if there is no measurement associated with an object, which was detected from a previous frame, then we maintain the object predicted state unchanged. Furthermore, we decrease the probability of existence and weight with
. Note that here is related to the associated detection confident score in the past frames. Here, unlike other standard Kalman filter based trackers, the survival time of detected objects without measurement varies based on the tracking status from the previous time period.Method  Split  Class  MOTA (%)  MOTP  #False Possitive  #Misses  #IDS  #FRAG 
Argoverse Baseline [2]  validation  Vehicle  63.15  0.37  17385  19811  122  323 
Our PMBM Tracker  validation  Vehicle  68.70  0.375  10644  20782  271  1116 
Argoverse Baseline [2]  validation  pedestrian  41.70  0.257  3783  14411  113  123 
Our PMBM Tracker  validation  pedestrian  45.50  0.249  4935  11653  201  903 
Argoverse Baseline [2]  test  Vehicle  65.90  0.34  15693  23594  200  393 
Our PMBM Tracker  test  Vehicle  71.67  0.34  8278  24165  362  1217 
Argoverse Baseline [2]  test  pedestrian  48.31  0.37  4933  25780  424  387 
Our PMBM Tracker  test  pedestrian  48.56  0.4  5924  24278  783  1750 

Method  Split  Class  MOTA (Primary) (%)  MOTP  False Possitive (%)  Misses (%) 
Waymo Baseline [14] 
test  All  25.92  0.263  13.98  64.55 
Argoverse Baseline [2]  test  All  29.14  0.270  17.14  53.47 
Probabilistic KF [3]  test  All  36.57  0.270  8.32  54.02 
Our PMBM Tracker  test  All  38.51  0.270  7.74  52.86 

2.7 Reduction
The assignment problem in MOT will theoretically become NPhard[4], and hence, the reduction of the number of hypotheses is necessary for decreasing the computational complexity and maintain realtime performance. Five reduction techniques are used in this work: pruning, capping, gating, recycling and merging. Pruning is used to remove objects and global hypotheses with low weights. Capping is used to set an upper bound for the number of global hypotheses and detected objects. Gating refers to limiting the search distance for data association, and where Mahalanobis distance is used here instead of Euclidean distance. Recycling is applied to detected objects with lower probability of existence. In that context, instead of discarding these objects, we recycle them by moving them from detected object set to undetected object set. There may be nonunique global hypotheses, and hence, merging would merge these identical global hypotheses into one.
3 Experiment
3.1 Settings
Dataset. We evaluate our method on two popular open dataset provided by two industry leaders: Waymo [14] and Argoverse [2]. Waymo 3D tracking dataset contains 800 training segments, 202 validation segments and 150 testing segments. Each segment includes 200 frames covering 20 seconds. There are 40400 frames in the validation set and 30000 frames in the testing set. Three classes are evaluated: vehicle, pedestrian and cyclist.
Argoverse 3D tracking dataset consists of 113 total number of segments (scenes) and 1530 seconds for each segment. The data is divided into three sets: 65 segments for training ( 13000 frames), 24 segments for validation ( 5000 frames) and 24 segments for test ( 4200 frames). Since our method is not a learning based method, we don’t need to use the training set; but validation set is used for tuning the parameters. Class vehicle and class pedestrian are evaluated. The field of view of these two datasets are both 360 degrees.
Note that for both Waymo and Argoverse datasets, ground truth labels are only available for training and validation sets. For fairness of evaluation of testing samples, one needs to submit the tracking results to the relevant server, and the evaluation results are subsequently published on the corresponding leaderboard.
Detections. We use precomputed 3D object detections provided by the dataset organizer.
Evaluation Metric.
Standard evaluation metrics
[1] for MOT are used. The details of the metrics for each dataset can be found in [14, 2].3.2 Experimental Results
The results for Argovers dataset and Waymo dataset are shown in Table 1 and Table 2. As shown in these two tables, our method outperforms other stateoftheart tracker significantly. At the time of submission (May 2020), among all the entries that use organizer provided detections, our PMBM tracker ranked No.3 in averaged ranking (primary metric), No.2 in averaged MOTA for all classes, No.1 in Vehicle MOTA, No.5 in Pedestrian MOTA within the Argoverse 3D Tracking Leaderboard^{1}^{1}1https://evalai.cloudcv.org/web/challenges/challengepage/453/leaderboard/1278#leaderboardrank1, and No.5 on the Waymo 3D tracking leaderboard ^{2}^{2}2https://waymo.com/open/challenges/3dtracking/. For trackingbydetection approaches, the impact of the quality of input detections is inherently of a paramount importance. But our performance is still very competitive compared to other entries that use better selfgenerated detections.
It is worth noting that our tracker has superior performance for the vehicle case compared to pedestrian tracking. We believe that this is attributed to the fact that the distance between pedestrians are much smaller than distances among vehicles. Thus, since our tracker is a pointbased method, it is unable to process object dimension, and consequently, close distances between hypotheses could result in merging such hypotheses that should be separated. Subsequently, this could lead to erroneous data associations. Therefore, as shown in Table 1, for class pedestrian, our tracker has higher false positives, ID switches and fragmentations, which are due to low quality data association. Meanwhile, for vehicles, all the vehicle detection measurements are further separated due to the fact that two vehicles cannot be overlaid with each other when compared to the pedestrian case.
4 Conclusion
In this paper, we propose a PMBM filter to solve the 3D amodal MOT problem with 3D LiDAR data for autonomous driving applications. Our framework can naturally model the uncertainties in MOT problem. This represents a first attempt for employing an RFSbased approach in conjunction with 3D LiDAR data, neural networkbased detectors and with comprehensive testing in largescale datasets. The experimental results on Waymo and Argoverse datasets demonstrate that our approach outperforms previous stateoftheart methods by a large margin. Finally, we hope that our results motivate future research on RFSbased trackers for selfdriving applications.
Acknowledgement: This work has been supported in part by the Semiconductor Research Corporation (SRC) and by Amazon Robotics under the Amazon Research Award (ARA) program.
References
 [1] (2008) Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing 2008, pp. 1–10. Cited by: §3.1.

[2]
(2019)
Argoverse: 3d tracking and forecasting with rich maps.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 8748–8757. Cited by: §1, Table 1, Table 2, §3.1, §3.1.  [3] (2020) Probabilistic 3d multiobject tracking for autonomous driving. arXiv preprint arXiv:2001.05673. Cited by: §1, Table 2.
 [4] (2018) Machine learning methods for solving assignment problems in multitarget tracking. arXiv preprint arXiv:1802.06897. Cited by: §2.7.
 [5] (2018) Poisson multibernoulli mixture filter: direct derivation and implementation. IEEE Transactions on Aerospace and Electronic Systems 54 (4), pp. 1883–1901. Cited by: §2.5.
 [6] (2017) Pedestrian tracking using velodyne data—stochastic optimization for extended object tracking. In 2017 ieee intelligent vehicles symposium (iv), pp. 39–46. Cited by: §1.
 [7] (1960) A new approach to linear filtering and prediction problems. Cited by: §1.
 [8] (2010) A random finite set based detection and tracking using 3d lidar in dynamic environments. In 2010 IEEE International Conference on Systems, Man and Cybernetics, pp. 2288–2292. Cited by: §1.
 [9] (1955) The hungarian method for the assignment problem. Naval research logistics quarterly 2 (12), pp. 83–97. Cited by: §2.3.
 [10] (2010) Tracking random finite objects using 3dlidar in marine environments. In Proceedings of the 2010 ACM Symposium on Applied Computing, pp. 1282–1287. Cited by: §1.
 [11] (2007) Statistical multisourcemultitarget information fusion. Vol. 685, Artech House Norwood, MA. Cited by: §1.
 [12] (1968) An algorithm for ranking all the assignments in order of increasing costs. Operations research 16 (3), pp. 682–687. Cited by: §2.3.
 [13] (2013) Multitarget trackbeforedetect using labeled random finite set. In 2013 International Conference on Control, Automation and Information Sciences (ICCAIS), pp. 116–121. Cited by: §1.
 [14] (2020) Scalability in perception for autonomous driving: waymo open dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2446–2454. Cited by: §1, Table 2, §3.1, §3.1.
 [15] (2005) Sequential monte carlo methods for multitarget filtering with random finite sets. IEEE Transactions on Aerospace and electronic systems 41 (4), pp. 1224–1245. Cited by: §1.
 [16] (2008) Bayesian filtering with random finite set observations. IEEE Transactions on signal processing 56 (4), pp. 1313–1326. Cited by: §1.

[17]
(2011)
A random finite set conjugate prior and application to multitarget tracking
. In 2011 Seventh International Conference on Intelligent Sensors, Sensor Networks and Information Processing, pp. 431–436. Cited by: §1.  [18] (2020) 3D multiobject tracking: a baseline and new evaluation metrics. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Cited by: §1.
 [19] (2017) Performance evaluation of multibernoulli conjugate priors for multitarget filtering. In 2017 20th International Conference on Information Fusion (Fusion), pp. 1–8. Cited by: §1.
Comments
There are no comments yet.