1 Introduction
Multiobject tracking (MOT) aims to track all objects of interest categories in a video sequence [18, 26]. It is crucial in applications like video surveillance and autonomous driving, where multiple pedestrians and vehicles need to be tracked simultaneously [6, 25, 3]. In recent years, trackingbydetection [9, 18, 23, 6, 1, 5] has become the predominant paradigm of MOT. This approach first detects objects in each frame, then extracts discriminative features to quantify the similarities between targets, and finally perform data association to assign detections into their most likely trajectories. During this process, several influential parameters need to be set manually, such as the threshold determining whether to establish associations. To find the optimal parameters, an evaluation procedure is needed to measure the tracking performance. However, existing evaluation metrics like eventbased measures CLEAR MOT [3] or identitybased measures [16] all require ground truth annotations, limiting the optimization to training data. Since the optimized parameters could be suboptimal in test scenes, a self evaluation metric that enables parameters optimization without ground truth is urgently needed.
To evaluate the accuracy and stability of a tracker without ground truth, we design a self quality evaluation metric that considers the quantity, length, and feature distance information of the trajectory hypotheses comprehensively. Our method can assess the quality of trajectories owing to the distinctive distance distribution forms as shown in Figure 1. The intra distance denotes the feature distance between each two detection boxes in the same trajectory, and all pairs constitute the intra distance distribution. Similarly, the inter distance denotes the feature distance between each two detection boxes from different trajectories. Intuitively, when a trajectory contains different targets, the distance distribution scatters, and we demonstrate that it has the general characteristic of multiple peaks.
enables automatic parameters adaptation to accommodate different scenes. Designing a tracking algorithm that performs well under various video scenes is hard, yet tuning parameters in existing tracking algorithms can achieve equally outstanding performance in an easier manner. To the best of our knowledge, there is no previous work in this area to date. We believe that our approach is exceedingly instructive and provides new ideas for future research.
In summary, our contributions are as follows: (1) We show that feature distance distributions can reflect trajectory hypotheses quality; (2) We propose a self quality evaluation metric
based on twoclass Gaussian mixture model, which can primarily fulfill the selfevaluation desire; (3) We test the effectiveness of our method on various data sets and note its drawbacks. A future prospect of using distributions to estimate erroneous frames is discussed in the end.
2 Related Work
2.1 MOT algorithms
In the trackingbydetection paradigm, trackers first detect objects in each frame and then associate detections over time to form trajectories for targeted individuals [9, 26, 12]. Online methods [14, 24, 4, 23] only use previous and current frames and are thus suitable for realtime applications. One straightforward implementation is simple online and realtime tracking (SORT) [4]
, which predicts the new locations of bounding boxes using Kalman filter, followed by a data association procedure using intersectionoverunion (IOU) to calculate the cost matrix. Although SORT achieves favourable speed and accuracy simultaneously, it suffers from heavy identity switches due to shortterm motion information. Deep SORT
[23], on the other hand, introduces object reidentification (REID) as appearance information to handle longterm occlusions, leading to a more robust and effective algorithm. Due to the rapid development of deep neural networks (DNNs), REID features with powerful discriminative capability have been popularized in MOT algorithms
[6, 22, 26, 25]. In addition, the framebyframe association problem is often seen as bipartite graph matching solved by Hungarian algorithm [8].By contrast, offline methods [27, 9, 2] have access to the whole sequence and can perform global optimization on data association. These batch methods generally formulate MOT as a network flow problem [27, 15]. Kshortest paths (KSP) [2], successive shortestpath (SSP) [9], and dynamic programming (DP) [15] can be used to find the optimal solution. Offline methods enable correction of early errors in online methods and often show better performance, but are not applicable to timecritical applications.
In this paper, we focus on a simple, efficient, and easytoimplement tracking framework. We use REID features to calculate the cost between current object detections and existing tracklets, minimize the total cost by Hungarian algorithm, and employ operations like interpolation and merging to correct previous results. Among all the parameters that need to be set, the REID threshold and merging threshold are the two most dominant parameters, which allows establishing associations and merging tracklets respectively.
2.2 Evaluation metrics
Quantitative evaluation of tracking performance is challenging due to the complexity of multitarget tracking task. A large number of metrics have been proposed [10, 17, 20, 13], including two main common metrics serving different purposes. One of them is CLEAR MOT metrics [3, 11], which contains multiple object tracking accuracy (MOTA) and multiple object tracking precision (MOTP):
(1) 
(2) 
where denotes the number of matched targets in frame , and denotes the matching distance of target . Comparing to MOTP, which is mainly influenced by localization accuracy of detections, MOTA sums various sources of errors, including false negatives, false positives, and identity switches, providing a better overall performance measure.
The other is ID metrics [16], which contains identification precision (), identification recall () and corresponding score :
(3) 
where , , and are calculated by the truthtoresult match, i.e., bipartite graph matching between true trajectories and hypothetical trajectories. Afterwards, each hypothesis is assigned to a unique target. All the frames of hypotheses with small overlap are seen as false positives and that of ground truth are seen as false negatives.
Comparing to , better measures the consistency of ID matching. A simple example to illustrate its effectiveness is presented in Figure 2. In this paper, we focus on the performance of identification, and thus use as the reference of our selfevaluation metric.
3 Self Quality Evaluation
We design a novel self quality evaluation metric to measure the tracking performance without ground truth annotations that can enable parameters optimization to gain better tracking performance in reality. This metric should be positively correlated with which generally measures the tracking performance the best. The guiding design criteria is provided below in which we highlight some distinctive features an ideal tracker should possess. Both the theoretical and practical facets show that highquality trajectories present single peaks in the feature distance distribution, while lowquality trajectories present multiple peaks.
3.1 Design criteria
To have a better understanding of the proposed metric, we first explain that an ideal MOT tracker should meet the following criteria. It should be able to: (1) track all targets continuously from appearing to leaving the tracking area; (2) track each target consistently, that is, each target should be assigned one and only one track ID over time; (3) locate the position of each target as accurately as possible.
As mentioned in Section 2.2, (3) quantifies the detection performance in the trackingbydetection paradigm, thus it is not our main focus. For self evaluation metrics design, (1) inspires that the number and length of trajectories are supposed to be appropriate. (2) leads to the assumption that for an outstanding tracker, REID features are as similar as possible if coming from the same trajectory, otherwise are as different as possible. This can be characterized by the intra and inter distance of trajectories. We define the distance between two features and as their Euclidean distance:
(4) 
Based on the above considerations, our self evaluation metric should take the quantity, length, and feature distance information into account comprehensively. Since establishing relationship between the identification quality and the absolute values of distance is hard, distance distribution analysis is considered to be a more reasonable solution.
3.2 Distance distribution analysis
We demonstrate in theory that the intra distance of the same target and the inter distance of different targets obey chi distribution.
For object representation, it is common that lowquality inputs will lead to uncertain estimations, causing the computed REID features to fluctuate around the ideal value. We follow the assumptions in [19]
, modeling the distribution of features as multivariate Gaussian distribution:
(5) 
where
is a Ndimension feature vector,
and represent the ideal value and uncertainty along each dimension respectively. Each dimension obeys an independent Gaussian distribution.We measure the Euclidean distance between a pair of features :
(6) 
According to the nature of independent Gaussian random variables, we have
. If comes from the same target, then ,. Thus, the feature distance after standardization obeys chi distribution with a degree of freedom equals to N:
(7) 
and if comes from different targets:
(8) 
Therefore, the intra and inter distance distributions of ideal trajectory hypotheses present single peaks. Next we consider a lowquality trajectory containing an identity switch between target A and B. For the ease of analysis, we assume that each target and feature dimension has the same variance. Therefore, the distance of features
obeys noncentral chi distribution with a positive noncentrality parameter . Meanwhile, the distance within each target obeys central chi distribution proved as above. The final distance distribution is indeed the sum of central and noncentral chi distributions, thus showing a bimodal form. It can be inferred that the lowquality trajectories with wrong identification would present multiple peaks in the intra and inter distance distributions.3.3 Practical verification
We practically verify the above conclusions by visualizing the intra and inter distance distributions of several different tracking cases in Figure 3. The results exhibit that the highquality trajectories, such as the one labeled with ID 0 consistently tracks a person moving forward while being separated from the one with ID 1, present single peaks. In contrast, the lowquality trajectories, such as the one containing an identity switch with ID 9 and the overlapped ones with ID 3 and ID 220, present multiple peaks.
To quantify the validity of our Gaussian assumption in Section 3.2, we use the descriptor provided in [21]
to perform a normality test on the ground truth of MOT16 train set and find that 74% of the trajectories can be approximate as Gaussian distribution at a significance level of 0.1. Under lowdensity scenarios like MOT1605, the percentage raises to 88 %. Considering that counterexamples may occur in practice, such as two similardressed people, we have also tested the performance of the descriptor on classifying unique person IDs in MOT16’s detection boxes. When the precision is set to 0.95, the recall and mAP can reach 0.94 and 0.98 respectively. Therefore, we consider the counterexamples only make up a small portion.
However, due to nonideal factors, the final distances do not fully obey the theoretical chi distribution. We take ID 0 for example. Although a similar overall shape is shown, the hypothesis test has an extremely low pvalue of 0, indicating a statistically significant difference. This may have two reasons: (1) Bias is introduced when using sample statistics to replace the true mean and variance for standardization; (2) Features extracted by the REID model are not independent in each dimension. The second reason is very common, since deep neural networks tend to cause strong correlations between multiple dimensions.
It is encouraging that the trajectories of different qualities still retain the distinctive single or multiple peaks. The more frames with wrong identification, the more obvious the two peaks, and the larger interval between them. In practice we found that fitting a twoclass Gaussian distribution and setting a threshold for the mean difference can qualitatively detect those lowquality trajectories which significantly affect tracking performance. According to the visualization results, we also found that the false alarm trajectory is usually short in length, large in variance, and may interfere with the inter distances to produce multiple peaks. These trajectories for which no real target exists are also categorized as lowquality trajectories.
3.4 Metric
Based on the above criteria and distance distribution analyses, we propose a novel self quality evaluation metric , which can be expressed as:
(9) 
The specific explanation is detailed below. The evaluation process is summarized in Algorithm 1 and mainly divided into four steps:
(1) For a trajectory with short length and large standard deviation, we mark it as false alarm and accumulate
.(2) For the rest trajectories we utilize a twoclass Gaussian mixture model to fit the intra distances, and judge whether it is a lowquality trajectory according to the mean difference. If it exceeds a certain threshold, we assert that this trajectory contains more than one target and accumulate a difference error, denoted by .
(3) Similarly, the inter distances of each two nonfalse alarm trajectories are also fitted. They are considered to match the same target with a large mean difference, and the similarity error is denoted by .
(4) Other internal characteristics like the number and mean length of trajectories are also embedded.
When the REID threshold is set too strict, there are so many detection boxes being excluded that and are both small; when
remains almost constant, the two variables have opposite trend, and extreme situations including excessively fragmented or concatenated trajectories will lead to imbalance between them. To downgrade these poor tracking results, we employ the form of harmonic mean, and set
to accommodate moving speed and density of tracking objects. For pedestrian tracking task on street videos, the magnitude of and is approximately equivalent, and thus could be set to concisely.Based on this rough constraint form, a correction item is added to the denominator. We have demonstrated that the accumulated , and can reflect the number of lowquality trajectories. Therefore, their sum is expected to be small, and meanwhile the value of is large. The correction item actually plays a key role within the range of moderate values of and . is used to adjust the ratio between , and sum of errors.
Parameters in SQE are not difficult to set. is comparable to the video’s frame rate. With a highprecision ReID model, randomly selecting false alarms and ID switch examples from reference videos is adequate to observe and , so as to set and accordingly. Additionally, when the tracker and task (vehicle/pedestrian) are given, and could be set empirically.
4 Experiments
Implementation details. We assess our self evaluation method mainly on the MOT16 Challenge data sets [11], which contains 14 video sequences (7 for training, 7 for testing) taken by both static and moving cameras from different angles in different scenes. We focus our study on pedestrian tracking and make use of the person ReID model provided by [21]. All the experiments are completed with the same parameter setting: , , , and takes 2 and 10 for the REID threshold and the merging threshold, respectively. The REID threshold varies from 0.3 to 1.6, beyond which remains invariant. Similarly, the merging threshold varies from 0.5 to 1.5. The parameter optimization process is based on grid search. The rest of this section prove the accuracy, universality, and effectiveness of our self quality evaluation metric .
video 
method  parameter 


02  baseline(gt)  0.80  58.3  79.3  46.0  51.9  69  0.0  
0.80  58.3  79.3  46.0  51.9  69  
04  baseline(gt)  1.05  82.0  93.5  73.0  77.3  21  1.1  
0.80  80.9  93.0  71.5  76.2  32  
05  baseline(gt)  0.90  71.2  79.2  64.6  62.0  23  0.1  
1.00  71.1  78.3  65.2  61.5  32  
09  baseline(gt)  1.20  76.0  88.8  66.4  73.5  8  1.3  
0.80  74.7  88.6  64.6  72.2  7  
10  baseline(gt)  0.95  72.4  76.6  68.7  71.5  79  1.9  
0.90  70.5  74.5  66.8  71.2  81  
11  baseline(gt)  0.85  80.1  89.7  72.4  75.0  29  4.6  
1.00  75.5  83.8  68.7  73.5  34  
13  baseline(gt)  0.75  58.2  74.6  47.7  47.0  73  2.3  
1.05  55.9  68.9  47.0  45.6  90  


denotes that the score for parameters is only calculated after the parameters are determined by , but not used to tune the parameters.
Comparison with supervised metrics. To demonstrate the effectiveness of our self evaluation metric in evaluating tracking performance, we compare its score with existing commonly used supervised metrics on MOT1602 training video, and visualize and in Figure 4. We found that as the the REID threshold ascends, both and increase at first and decrease afterwards, and reach the highest value at 0.8 with relatively high , , and . These two items present a very similar trend, which indicates that our designed metric can primarily fulfill the desired positive correlations with which generally measures the performance of identification the best.
MOT1602 video records a complex scene with a large number of people walking around a large square. We further analyse the result on MOT1609 video, a simpler street scene with low density and the least number of tracks from a low angle, in Figure 5. The favourable similarity illustrates that our self evaluation method can be generalized to different viewpoints and scenarios. The detailed results on other videos are provided in the supplementary material. We summarize the optimal REID threshold determined by and in Table 1, with corresponding evaluation scores under these parameters. Our self evaluation method can approximately quantify tracking performance, specifically, 85% of the optimal parameter differences do not exceed 0.25, and 85% of the corresponding differences do not exceed 3.
Generalization to other tracking algorithms.
To illustrate the robustness and universality of our method, other tracking algorithms are supposed to be tested as a supplementary experiment. We choose Deep SORT, which is one of the highly recognized and open source MOT algorithms in recent years. The REID threshold corresponds to the matching cosine threshold in Deep SORT. This algorithm replaces our interpolation logic with IOU matching, causing the features during occlusion time period to exhibit a small interference peak in the intra distance distribution; therefore, we remove the feature information of these frames when performing self evaluation. As shown in Figure
6, a strong correlation between and is presented, demonstrating the success of our method on other trackers.Generalization to other parameters. We further test the universality of our method on other parameters. Except for the REID threshold, the merging threshold is another dominant factor affecting final tracking performance. Similarly, we visualize the comparison of and of both complex and simple scenes in Figure 7 and 8. The results still maintain positive correlations. Table 2 shows a high accuracy, with 5 out of 7 videos have an optimal parameter difference below 0.1, and almost all the corresponding differences do not exceed 3.
video 
method  parameter 


02  baseline(gt)  1.00  57.9  72.7  46.1  51.7  82  0.0  
1.00  57.9  72.7  46.1  51.7  82  
04  baseline(gt)  1.05  82.3  94.2  73.1  76.8  23  0.6  
1.00  81.7  93.7  72.4  76.5  27  
05  baseline(gt)  0.85  73.6  82.5  66.5  61.9  34  0.5  
0.75  73.1  82.8  65.5  61.4  46  
09  baseline(gt)  1.00  75.9  89.2  66.0  73.2  7  3.1  
0.70  72.8  85.8  63.2  72.8  12  
10  baseline(gt)  1.05  71.5  75.1  68.2  70.8  77  1.6  
0.95  69.9  74.5  65.9  71.5  82  
11  baseline(gt)  1.10  78.2  87.6  70.5  75.0  30  2.8  
0.85  75.4  84.7  68.0  73.9  35  
13  baseline(gt)  1.05  56.7  70.7  47.3  46.2  68  0.6  
1.10  56.1  69.2  47.1  45.4  72  

Practical testing. Our ultimate goal is to find the optimal parameters in realistic scenes where ground truth is unavailable. Additionally in reality the training data is relatively small in scale comparing to the unknown test environment. To test our method in a pragmatic manner, we regard the first 4 training videos as our test set and the last 3 training videos as our training set. Conventionally, the parameters are tuned on the training set and remain constant during testing. In our simulation, we name these parameters as the baseline parameters. Conversely, our metric can guide the selfoptimization of parameters without ground truth. Thus, it is employed directly to tune the 4 testing videos individually.
In reality we can first acquire baseline parameters as reference on smallscale training data, then conduct self evaluation to further optimize the parameters in a relatively small range. The procedure of computing the customized parameters is as follows: (1) Find baseline parameters; (2) For each testing video, fix one parameter with reference to baseline, and then tune the other according to alternately; (3) Combine them to be the customized parameters.
Our method is considered to be effectual if the tracker using the customized parameters outperform the tracker using constant trainingsettuned parameters. The result is shown in Table 3, where gt denotes the true optimal parameters on each video. To be rigorous, we use the best parameters found by grid search on the 3 assumed training videos as the baseline. It is apparent that the parameters tuned by achieve considerable improvement comparing to the baseline, and the results are much closer to the the true optimum, showing the effectiveness of our method when implemented in a practical manner.
video  method  parameter  
02  gt  0.85, 0.95  59.2  79.9  47.0  52.8  80 
baseline  0.75, 1.00  56.2  78.3  43.8  50.1  61  
0.90, 1.00  57.9  75.5  47.0  52.8  82  
04  gt  0.85, 1.05  82.6  94.6  73.3  76.7  23 
baseline  0.75, 1.05  80.9  92.9  71.6  76.3  25  
0.75, 1.10  82.3  94.4  72.9  76.6  21  
05  gt  0.90, 0.85  73.6  82.5  66.5  61.9  34 
baseline  0.75, 1.05  68.4  80.1  59.7  57.6  24  
1.00, 0.95  72.2  81.1  65.1  62.6  25  
09  gt  1.20, 0.70  76.0  88.8  66.4  73.5  8 
baseline  0.75, 1.05  71.4  84.7  61.7  72.2  6  
0.85, 0.90  73.0  87.1  62.8  71.2  10  
overall  gt    76.4  90.3  66.2  69.7  145 
baseline    74.0  88.5  63.5  68.4  116  
  75.5  88.8  65.6  69.5  138 
dataset  method  
MOT16 test  baseline  66.6  75.8  59.4  442 
ours  68.3  83.4  57.8  456  
KITTI train  baseline  67.4  67.2  67.7  37 
ours  68.5  67.9  69.1  44 

we use the 5 videos with the most pedestrians in KITTI train set.
To further illustrate the performance of selfoptimization using , we experiment on MOT16 test set and KITTI train set [7]. The baseline parameters are the best parameters found by grid search on MOT16 training set, which outperform empirical parameters by 5.8% already. This setup is based on the updated submitting policy of KITTI ^{1}^{1}1http://www.cvlibs.net/datasets/kitti/eval_tracking.php, and we believe it can simulate pedestrian tracking in reality where test scenes varies greatly compared to annotated videos. As shown in Table 4, the parameter selfoptimization enabled by elevates the performance of the tracker on these data sets.
Drawbacks and prospects. The above experiments reflect the effectiveness of our proposed metric, while there are still some drawbacks worth noting. Firstly, due to the randomness during model fitting, and possess several units of uncertainty, resulting in insufficient sensitivity to small changes in . Secondly, current metrics lack physical consistency explanation. is calculated by , and , while our method simply records the number of lowquality trajectories. A more precise idea is to estimate and relying on the quantity information. Assume that for a trajectory where an identity switch occurs, target A appears frames while target B appears frames. The total length is and the number of distances in the class with larger values is . Then A and B satisfy the following conditions:
(10) 
which can be easily solved. We can make estimations by:
(11) 
Such processing for the intra distance distribution can accurately estimate the number of erroneous frames. Furthermore, the inter distance distribution can help refine the estimations. For example, if there is another trajectory that also tracks A, we only keep the longer one as according to the calculation rule of . However, more detailed considerations are needed for global precise estimations. In addition, categorizing lowquality trajectories and estimating erroneous frames may also be conducive to tracker’s postprocessing so as to improve tracking performance. Finally, the adjustable parameters and need to be defined more strictly. We plan to investigate these downsides in the future.
5 Conclusion
In this paper, we propose a self quality evaluation metric to enable the parameters optimization in the test environment and realistic scenes where ground truth is unavailable. This new perspective can bypass the difficulty of designing an algorithm that perform well in various scenes. We demonstrate that trajectories with different qualities exhibit different single or multiple peaks in feature distance distribution, inspiring us to use a twoclass Gaussian mixture model to estimate identification errors. Experiments mainly on the MOT16 Challenge data sets demonstrate the effectiveness of our method in both correlating with existing metrics and enabling parameters selfoptimization to achieve better tracking performance. In the end, the drawbacks and prospects for future work are summed up. We believe that our work is instructive for further MOT research.
6 Acknowledgement
This research was supported by National Key R&D Program of China (No. 2017YFA0700800).
References

[1]
(2012)
Discretecontinuous optimization for multitarget tracking.
In
2012 IEEE Conference on Computer Vision and Pattern Recognition
, pp. 1926–1933. Cited by: §1.  [2] (2011) Multiple object tracking using kshortest paths optimization. IEEE transactions on pattern analysis and machine intelligence 33 (9), pp. 1806–1819. Cited by: §2.1.
 [3] (2008) Evaluating multiple object tracking performance: the clear mot metrics. Journal on Image and Video Processing 2008, pp. 1. Cited by: §1, §2.2.
 [4] (2016) Simple online and realtime tracking. In 2016 IEEE International Conference on Image Processing (ICIP), pp. 3464–3468. Cited by: §2.1.
 [5] (2010) Online multiperson trackingbydetection from a single, uncalibrated camera. IEEE transactions on pattern analysis and machine intelligence 33 (9), pp. 1820–1833. Cited by: §1.
 [6] (2019) Multiobject tracking with multiple cues and switcheraware classification. arXiv preprint arXiv:1901.06129. Cited by: §1, §2.1.
 [7] (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §4.
 [8] (1955) The hungarian method for the assignment problem. Naval research logistics quarterly 2 (12), pp. 83–97. Cited by: §2.1.
 [9] (2015) Followme: efficient online mincost flow tracking with bounded memory and computation. In Proceedings of the IEEE International Conference on Computer Vision, pp. 4364–4372. Cited by: §1, §2.1, §2.1.
 [10] (2009) Learning to associate: hybridboosted multitarget tracker for crowded scene. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2953–2960. Cited by: §2.2.
 [11] (2016) MOT16: a benchmark for multiobject tracking. arXiv preprint arXiv:1603.00831. Cited by: §2.2, §4.

[12]
(2017)
Online multitarget tracking using recurrent neural networks
. InThirtyFirst AAAI Conference on Artificial Intelligence
, Cited by: §2.1.  [13] (2007) ETISEO, performance evaluation for video surveillance systems. In 2007 IEEE Conference on Advanced Video and Signal Based Surveillance, pp. 476–481. Cited by: §2.2.
 [14] (2004) Markov chain monte carlo data association for general multipletarget tracking problems. In 2004 43rd IEEE Conference on Decision and Control (CDC)(IEEE Cat. No. 04CH37601), Vol. 1, pp. 735–742. Cited by: §2.1.
 [15] (2011) Globallyoptimal greedy algorithms for tracking a variable number of objects. In CVPR 2011, pp. 1201–1208. Cited by: §2.1.
 [16] (2016) Performance measures and a data set for multitarget, multicamera tracking. In European Conference on Computer Vision, pp. 17–35. Cited by: §1, §2.2.
 [17] (2008) A consistent metric for performance evaluation of multiobject filters. IEEE transactions on signal processing 56 (8), pp. 3447–3457. Cited by: §2.2.
 [18] (2017) Deep network flow for multiobject tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6951–6960. Cited by: §1.
 [19] (201910) Probabilistic face embeddings. In The IEEE International Conference on Computer Vision (ICCV), Cited by: §3.2.
 [20] (2005) Evaluating multiobject tracking. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)Workshops, pp. 36–36. Cited by: §2.2.
 [21] (2018) Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In Proceedings of the European Conference on Computer Vision (ECCV), pp. 480–496. Cited by: §3.3, §4.
 [22] (2017) Multiple people tracking by lifted multicut and person reidentification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3539–3548. Cited by: §2.1.
 [23] (2017) Simple online and realtime tracking with a deep association metric. In 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649. Cited by: §1, §2.1.
 [24] (2015) Learning to track: online multiobject tracking by decision making. In Proceedings of the IEEE international conference on computer vision, pp. 4705–4713. Cited by: §2.1.
 [25] (2019) Online multiple pedestrian tracking using deep temporal appearance matching association. arXiv preprint arXiv:1907.00831. Cited by: §1, §2.1.
 [26] (2019) Framewise motion and appearance for realtime multiple object tracking. arXiv preprint arXiv:1905.02292. Cited by: §1, §2.1.
 [27] (2008) Global data association for multiobject tracking using network flows. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. Cited by: §2.1.
Comments
There are no comments yet.