While several tracking methods have been developed over the past decade [34, 15, 18, 8, 14, 42, 9] and have been proven to be successful in many applications, such as robotics or video surveillance, tracking small objects in videos still remains a challenging problem, in particular when the complex scenarios and real time constraints are to be considered. In this paper, small objects mean that the targets in images have sizes of less than 1% of the whole image. The challenge of small object tracking mainly roots in two main facts: first, the visual features of small objects are extremely fickle, thus making feature representation difficult; second, sudden and large drift always occurs to small objects in tracking because of the shaking of the lens, compared to the normal-sized objects. The so-called sudden and large drift is that the target distance between two adjacent frames in the image coordinate system is two times larger than the target size.
For a long time, researchers only reported tracking results on common benchmarks using reasonably sized targets, but paid less attention to the small-object tracking problem. Just few existing algorithms related to the small object tracking, while that were designed to enhance the visual features of such a type of targets, with the hope that tracked objects would no longer be lost if robust features were exploited. For instance, the method in [2, 13] integrates both spatial and frequency domain features in order to localize the targets more accurately. Alternatively, the method in  tends to enhance the robustness of a tracker by strengthening the feature representations (e.g., target attributes) for the small targets. Recently, Rozumnyi et al.  have proposed to deal with fast moving and motion blur problems of the objects, but the performance is unsatisfactory due to low resolution and complex background clutters. Regarding now deep learning methods  are developed, we think the high-level features seem not to be effective for small objects. Moreover, we doubt that a continuous tracking of small-sized objects can be guaranteed even if robust visual features are exploited, considering the fact that small targets can easily be confused with the noise and clutters in real scenes. In other words, it might be more realistic to allow small objects to get lost during tracking, while investigating a better solution to re-detect them.
The intuition here is about “how human beings recognize the small target when it is lost due to clutter background?” Most likely, humans first look at the salient objects/regions popping up in the scene, and further verify whether one of the salient objects is the target of interest . A few works mimic human being’s behavior and involve the saliency information in object tracking. For example, the method in  integrates saliency for the representation of context, while [39, 11, 25] incorporate saliency into appearance models in various ways in order to improve the robustness of the tracker. However, as they mostly focus on the target appearances in the image domain, performance is not satisfactory since the appearance is implicitly weak for small objects. Therefore, they might only be reliably applied for tracking normal-sized objects. In this paper, we propose a new saliency online learning framework, termed aggregation signature, and focus on small object tracking. To the best of our knowledge, no saliency-based methods have utilized all context information, including intensity, saturation, saliency and motion information, for small object tracking yet.
Unlike handcrafted image signatures, which are simple yet powerful tools to spatially match the sparse foreground objects in an image [17, 33], the explicit advantage of our aggregation signature lies in a learning mechanism exploited to build an adaptive target signature. The result is that it can quickly detect the salient objects even though they are very small, which can further improve the (re-)localization performance of the trackers. We open up a new direction to track small objects by mimicking the human attention mechanism. In particular, the theoretical evidence proves that it is more effective, and that the resulting foreground saliency map from our aggregation signature becomes more consistent with the target appearance along iterations, as shown in Fig. 1. Moreover, the aggregation signature is so generic that it can be integrated into other trackers. In summary, the contributions of this paper include:
(i) The proposed aggregation signature is proved, in the theoretical terms, to be more efficient for sparse foreground detection, makings the tracked target more salient as compared to the background.
(ii) The aggregation signature improves the capacity of accumulating information for the target based on a learning mechanism, whereas the conventional image signatures are handcrafted and more likely prone to fail to adapt to the target.
(iii) New challenging datasets – small90 and small112 – are collected for small object tracking evaluation. The datasets are publicly available for further research development.
Ii Aggregation Signature
Image signature is a simple yet powerful tool to spatially match the sparse foreground of an image . By using the sign function of DCT, the resulting handcrafted descriptor can approximately detect salient image regions efficiently. Rather than separating a color image into three channel images and computing image signatures respectively, QDCT  can discriminate the relative importance of four components by introducing a quaternion component. In general, both DCT and QDCT based image signatures are handcrafted methods with no involvement of a learning process. Differently, the proposed aggregation signature improves the discriminative capability of QDCT signature via learning multi-cue information, in particular the target prior information.
Ii-a Definition of Aggregation Signature
We begin by considering an image which exhibits the following structure:
where represents the foreground and represents the background. Please refer to Table I for the definitions used throughout the rest of this section. Formally, the aggregation signature (AS) is defined as:
where is the entrywise sign operator, represents the iteration and represents the 4 channels in use. Then, the reconstructed image can be defined as:
where 111If is the image signature based on DCT, we have ., represents the reconstructed result in the iteration with as its conjugate form, and represents the element wise product. , , represent three different channels such as any one channel of RGB, image intensity and image saturation (or motion in tracking). is a two-dimensional prior related to the tracked target, which will be elaborated in Section IV.
Ii-B Foreground Aggregation Signature Properties
In this section, we provide evidence that, for an image which adheres to a certain mathematical structure, the background can be suppressed by the aggregation signature.
Proposition: The image reconstructed from the aggregation signature matches the foreground object more accurately in the learning process with a high probability as follows:
where stands for probability, is a small positive value, N represents total image pixel number, represents the norm, denotes the inner-product. denotes expectation, which reveals about the similarity between the foreground and the object saliency information obtained by aggregation signature.
|The entrywise sign operator.|
|The conjugate form of .|
|, the reconstructed image of DCT.|
|,the reconstructed image|
The expectation of random variable.
norm of vector. (p=2 if omitted).
|The inner-product of and .|
|The Hadamard (entrywise) product operator.|
|Support set of .|
Proof: We know the transform between QDCT and DCT is
For ease of explanation, we only focus on one channel, that is to say and the result can be easily generalized for the quaternion case in a straightforward way, then we have
where and represents the points of the corresponding support set. We note that the proof is applicable to channels in Equ. (6), so we take the channel for example. Then, we have
Since the results obtained by DCT are independent of each other, we assume
where is very small, since the probability that the DCT output is equal to a certain value is very small. Then we have the following statement:
which means that in a high probability we have , considering that is very small.
Similarly, we have
Since , if , then we have
Combining (11) and (12), we have
Based on the image signature proposed by Hou , we have
where represents the support set of . Given the bound , we have
And then it becomes
For a spatially sparse foreground, we have the following statement:
Together with Equ. (10), we have
which proves the proposition.
Remark: Here, is very small as in Equ. (9), e.g., , and the probability mentioned above is % when . In other words, background is suppressed more during learning aggregation signature with high probability. We also did a statistic analysis on in Equ. (9) based on the MSRA-B dataset  , which indicates that is very small less than .
Iii Aggregation Signature Tracker
We exploit the aggregation signature to enhance the re-detection process for small object tacking, which is called aggregation signature tracker (AST). More specifically, when a target is found drifting by a thresholding method, a saliency detection with the tracked target as prior will be triggered, which enables the online aggregation signature to suppress the background data. Together with the context information indicated in different channels, we re-detect the objects to relocate the tracked target. The whole tracking procedure is illustrated in Fig. 2(a) and Algorithm 1, and we elaborate each key component in the following.
Drifting detection: As evident on output constraint transfer tracking method (OCT) 
, a simple distribution is necessary and significant to achieve high efficiency. OCT builds upon a reasonable assumption that the response to the target image follows a Gaussian distribution, so we trigger the re-detection process based on a thresholding method as:
where represents the mean response using all previous frames, represents the maximum response of the current frame, and is the threshold. The target is supposed to be lost if the response of the current frame is far from the average response. Once the target is occluded or out of view, this mechanism helps us search continuously in the following frames.
Saliency map calculation: The aggregation signature is used to obtain the saliency map and to further coarsely re-localize the target. Through iterations, we gradually smooth the aggregation signature by a Gaussian kernel  to obtain the saliency map. The salient regions are regarded as the coarse candidate positions of the target, on which a re-detection process is performed still based on the selected base trackers. It should be mentioned that involving the targeted object, as a prior in saliency detection, does not occur in the conventional methods. Two key components are elaborated as follows:
1) Channels design: We denote the input image captured at frame t as , where , , and are the red, green and blue channels of . Then, we obtain three channels used in our aggregation signature representing as: intensity , saturation and movement , respectively, where is a constant. We deploy image signature  to calculate the initial saliency map as the first channel .
2) Target Prior: As shown in Fig. 2 (b), we select salient regions similar to the target in the last frame in size. Next, we assign each candidate a weight indicating the similarity to a target prior information, which is measured simply by the Euclidean distance as:
where denotes the weight of the region for the candidate saliency map at the frame, is a constant. , where represents the histogram of the candidate saliency map, while denotes the target histogram for the frame calculated by
where is 0.5 in this paper. We note that the weights are set to for the regions outside the selected salient areas.
In this section, we evaluate the aggregation signature based on our small90 dataset and a visual saliency benchmark MSRA-B . We further test the performance of our aggregation signature based tracker on the small90, small112, UAV123_10fps  and UAV20L  according to the object tracking benchmark . The test platforms are Intel I7 2.7 GZ (4 cores) CPU with 8G RAM, and GPU with NVIDIA GeForce GTX 1070.
Few datasets are available for small object tracking task. We establish a comprehensive database, termed small90 benchmark, consisting of 90 annotated small-sized object sequences, where several additional challenges, such as target drifting and low resolution, have been encompassed. We add 22 more challenging sequences into small90, and obtain another new dataset termed as small112. Each sequence is categorized with 11 attributes - illumination variations (IV), scale variations (SV), occlusions (OCC), deformations (DEF), motion blur (MB), fast motion (FM), in-plane rotation (IPR), out-of-plane rotation (OPR), out-of-view (OV), background clutters (BC) and low resolution (LR), for better analysis of the tracking approaches. The attribute distribution in our dataset is plotted in Fig. 3, which shows that some attributes occur more frequently, e.g., LR, than the others. We note that one sequence is often annotated with multiple attributes. The examples of first frames from our datasets are illustrated in Fig. 4.
Iv-B Aggregation Signature on Image
We first evaluate how aggregation signature can enhance the performance of saliency detection, based on the commonly used metrics including location-based metrics normalized scanpath saliency (NSS) , mean absolute error (MAE)  and distribution-based metric similarity (SIM) . The comparative DCT image signature (IS) and QDCT image signature (QIS) are computed to extensively validate the effectiveness of our aggregation signature (AS) method, particularly on both MSRA-B  and small90 databases. There are 5000 images in MSRA-B, which is a large scale image database for quantitative evaluation of visual attention algorithms. From the results in Table II
, we observe that our method achieves overall better performance quantitatively than IS and QIS in terms of the MAE, NSS, SIM measures, and thus leading to a better estimation of the visual distance between the predicted saliency map and the ground truth. Fig.5 provides the saliency maps of different methods, and the ground truth on images from small90, which shows that the background is more suppressed in aggregation signature with resepct to the others methods. In terms of running speed, the aggregation signature module achieves 32 frames per second (FPS) in our experiments.
Iv-C Aggregation Signature on Tracking
We empirically set the iteration number equal to 4, the saliency patches as 6. For other parameters, we follow the previous work  and set , , in all experiments for fair comparisons.
We then test the performance of aggregation signature in tracking (AST) by comparing with DCT image signature, QDCT image signature, which are incorporated with KCF, based on small90. The results in Fig. 6 reveal that the aggregation signature clearly outperforms other signatures in small object tracking. Also, we use one-pass evaluation (OPE)  to evaluate our results in the whole experiments section. Furthermore, we compare KCF_AST with other saliency-based trackers, including saliency prior context model (SPC)  and structuralist cognitive tracker (SCT)  in the same figure. KCF_AST (76.6%) is about 22% higher than SPC (54.9%), and 9% higher than SCT (67.7%) in terms of the precision, while KCF_AST (46.6%) is about 16% higher than SPC (30.9%) and 5% higher than SCT (42.1%) based on the average success rate.
We also compare our trackers with OCT, which also exploits the similar failure detection scheme to improve KCF. One can note that the performance of KCF_AST is higher than OCT by 13.2% and 7.8% in terms of precision and success rate, respectively.
The small90 benchmark: In Fig. 7, we further show the precision and success plots of 30 state-of-the-art trackers including SiamRPN  , LDES , SAT , TLD , LCT , OCT , CSK , CT , STC , KCF , ECO , MDNet , LCCF , SRDCF  and CPF , generated by the benchmark toolbox. While several baseline algorithms, e.g., LDES, DaSiamRPN, ECO, have shown promising potential in tracking small objects, our AST still helps achieve the precision rates of 84.9% (LDES_AST), 83.1% (DaSiamRPN_AST), 83.2% (ECO_AST) which improve its counterpart base trackers by 1.6%, 0.9%, 1.7% respectively. Meanwhile, the above three trackers with our AST on achieves a success rate of 68.6%, 69.7%, 64.3%, outperforming the base trackers by 1.7%, 0.4%, 0.9% respectively. Besides, our MDNet_AST outperforms by 7.1% and 4.0% respectively to achieve a precision rate of 86.6% and a success rate of 65.9% compared to MDNet. This again confirms that our aggregation signature can consistently improve the performance of base trackers. Likewise, LCCF_AST also shows a significant incremental performance, compared with the base tracker LCCF. Besides, when compared with the state-of-the-art re-detection trackers, our LCCF_AST (54.8%) significantly outperforms its base tracker LCCF (46.4%), and also TLD (52.7%), LCT(46.7%) and OCT (54.2%) by 2.1%, 8.3% and 0.7% in terms of the success rate on small90, respectively. The superior tracking performance confirms that our method is more effective than the state-of-the-art re-detection trackers such as TLD, LCT and OCT.
We illustrate some examples for KCF_AST in Fig. 8
to show how our aggregation signature helps to improve the tracking performance. In the sequences selected from small90, the tracked objects are subject to severe image quality deterioration during the tracking process. In particular: 1) the background of the scene presents clutters while many objects are similar to the target in appearance and 2) severe drifting or long-time out of view results in directly drift of the target in the far range. In addition, we adopt the MDNet, LCCF (deep feature) and the KCF as base trackers in our frameworks for comparision of visual tracking experiments. Results are shown in Fig.11; our main goal here is to show how our method helps to drastically reduce the tracking failure.
Observed from the results on Fig. 8 and Fig. 11, we can conclude that the aggregation signature can effectively improve the performance of base trackers, especially for small object tracking, and both saliency detection and tracking are enhanced by incorporating our image signature. As a final consideration, we acknowledge that the proposed method has the ability to relocate the target when drifting, and performs very well on the small target sequences.
The small112 benchmark: We further collect a new benchmark dataset with 112 fully annotated sequences to facilitate the performance evaluation. On the basis of small90, the added 22 sequences are more difficult sequences. As shown in Fig. 9, KCF_AST, LCCF_AST, ECO_AST improve the performance of KCF, LCCF, ECO from 58.0%, 64.7%, 77.9% to 71.0%, 77.1%, 81.9% on precision rate and 41.6%, 44.5%, 62.9% to 49.2%, 50.8%, 66.0% on success rate, which demonstrates that AST improves these base trackers significantly on complex small object tracking sequences. Though the baseline trackers, such as SiamRPN, LDES, perform very well, still 0.1% and 0.4% improvements on precision and 0.5% and 0.5% improvements on success rate have been obtained by AST, which validates the effectiveness of AST. Observed from the experimental results, all the trackers endowed with the aggregation signature module perform consistently better than the base trackers, which further validates the effectiveness of the proposed approach. Also, the results show that better base trackers gain less performance improvements. The reason might be that aggregation signature is less useful if the drifting is not obvious, which is the case of using a better tracker.
The UAV123_10fps benchmark: We test ASTs on UAV123_10fps  as shown in Fig. 10, which contains 123 sequences posing many challenges. Compared to the base tracker MDNet, we can see the aggregation signature (MDNet_AST) significantly improves the performance of MDNet from 50.2% to 54.2% in precision rate and 42.2% to 47.5% in success rate, which further validates the effectiveness of the proposed method. While KCF_AST is about 6% higher than KCF based on the precision, and is about 8% higher based on success rate. As for these more recent state-of-the-art trackers such as LDES, DaSiamRPN, ECO, their corresponding ASTs still achieve better results than these base trackers.
The UAV20L benchmark: We also test ASTs on the well-known benchmark UAV20L  as shown in Fig. 11, where some of the tracked objects are very small. The state-of-the-art SRDCF is chosen as the base tracker, leading to our SRDCF_AST. Apparently, SRDCF_AST obtains better performances with respect to the state-of-the-art. As compared to the base tracker SRDCF, we can see the aggregation signature (SRDCF_AST) significantly improves the performance of SRDCF from 50.7% to 53.1% in precision rate, which further validates the effectiveness of the proposed method. LCCF_AST is about 7% higher than LCCF, while KCF_AST is about 3% higher than KCF based on the precision. In addition, LCCF_AST and KCF_AST, though showing no outstanding performance in terms of success rate, still achieved better results than their base trackers, respectively. Furthermore, as for the more state-of-the-art trackers LDES and DaSiamRPN, we also show that LDES_AST and DaSiamRPN_AST improve their base trackers by a clear margin.
Quantitative Attribution Evaluation of Benchmarks: The full set of plots generated by the benchmark toolbox for small90 are also shown in Table III. From the results, we can conclude that AST trackers achieve a much better performance in most cases for small-sized objects, especially for motion blur and fast motion, in which we can see all AST trackers improve dramatically, since saliency-based AST trackers can be more robust than base trackers to the variations mentioned previously. To conclude, AST can consistently improve the results of base trackers in most cases, and AST-trackers achieve new state-of-the-art results.
Speed analysis: In terms of tracking speed on small90, KCF_AST has a processing rate of 120.88 frames per second (FPS), while LCCF_AST based on deep features has 16.52 FPS, which show that our proposed trackers not only achieve the state-of-the-art results, but also performs in real time. Although the frame rate of the proposed tracking framework has a drop, as compared to the original base tracker, the tracking performance is significantly improved on small90, e.g., 8.2% improvement on LCCF in terms of success rate.
A new aggregation signature has been proposed to improve the small target tracking performance. The aggregation signature uses the target as a prior to adaptively locate the salient object, which is deployed to re-detect the tracked objects when drifting. It is generic and can be used in conjunction with other trackers. We evaluated our tracking framework with KCF, SRDCF, LCCF, ECO, SAT, LDES, DaSiamRPN and MDNet. To validate the resulting aggregation signature tracker, we have also collected new video datasets named small90 and small112, which contain fully annotated video sequences for small target tracking. The experimental results have clearly demonstrated how our methods improve the performance for the challenging situations, such as severe drifting, deformation and out of view. Furthermore, our approach will be extended to different applications in the future, such as large-scale retrieval  and classification .
The work was supported by the National Key Research and Development Program of China (Grant No. 2016YFB0502602) and National Natural Science Foundation of China under Grant 61672079, in part by Supported by Shenzhen Science and Technology Program (No.KQTD2016112515134654).
-  (2015) Small dim object tracking using a multi objective particle swarm optimisation technique. IET Image Processing 9 (9), pp. 820–826. Cited by: §I.
-  (2016) Small dim object tracking using frequency and spatial domain information. Pattern Recognition 58, pp. 227–234. Cited by: §I.
-  (2013) Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. IEEE Transactions on Image Processing 22 (1), pp. 55. Cited by: §IV-B.
-  (2016) Visual tracking using attention-modulated disintegration and integration. In Computer Vision and Pattern Recognition, pp. 4321–4330. Cited by: §IV-C.
-  (2017) Attentional correlation filter network for adaptive visual tracking. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 4828–4837. Cited by: §I.
-  (2017) ECO: efficient convolution operators for tracking. In Computer Vision and Pattern Recognition, pp. 6931–6939. Cited by: §IV-C.
-  (2015) Learning spatially regularized correlation filters for visual tracking. In IEEE International Conference on Computer Vision, pp. 4310–4318. Cited by: §IV-C.
-  (2017) Discriminative scale space tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (8), pp. 1561–1575. Cited by: §I.
-  (2018) High performance visual tracking with extreme learning machine framework. IEEE Trans. on Cyber.. External Links: Cited by: §I.
-  (2019) DECODE: deep confidence network for robust image classification. IEEE Transactions on Image Processing 28 (8), pp. 3752–3765. Cited by: §V.
-  (2010) Discriminative spatial attention for robust tracking. In European Conference on Computer Vision, pp. 480–493. Cited by: §I.
-  (2019) State-aware anti-drift object tracking. IEEE Transactions on Image Processing 28 (8), pp. 4075–4086. Cited by: §IV-C.
-  (2019) Spatial-temporal context-aware tracking. IEEE Signal Process. Lett. 26 (3), pp. 500–504. Cited by: §I.
-  (2016) Struck: structured output tracking with kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (10), pp. 2096–2109. Cited by: §I.
-  (2013) Segmentation-based tracking by support fusion. Computer Vision and Image Understanding 117 (6), pp. 573–586. Cited by: §I.
-  (2015) High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (3), pp. 583–596. Cited by: §IV-C.
-  (2012) Image signature: highlighting sparse salient regions. IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (1), pp. 194. Cited by: §I, §II-B, §II, §III, §III.
-  (2012) Tracking-learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (7), pp. 1409–1422. Cited by: §I, §IV-C.
-  (2018) High performance visual tracking with siamese region proposal network. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §IV-C.
-  (2015) A data-driven metric for comprehensive evaluation of saliency models. In IEEE International Conference on Computer Vision, pp. 190–198. Cited by: §IV-B.
Robust estimation of similarity transformation for visual object tracking.
Association for the Advance of Artificial Intelligence. Cited by: §IV-C.
-  (2019) RBCN: rectified binary convolutional networks for enhancing the performance of 1-bit dcnns. Cited by: §I.
-  (2019) Circulant binary convolutional networks: enhancing the performance of 1-bit dcnns with circulant back propagation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2691–2699. Cited by: §I.
-  (2007) Learning to detect a salient object. In Computer Vision and Pattern Recognition, pp. 1–8. Cited by: §II-B, §IV-B, §IV.
-  (2013) Salient object detection in videos by optimal spatio-temporal path discovery. In Acm International Conference on Multimedia, pp. 509–512. Cited by: §I.
-  (2015) Long-term correlation tracking. In Computer Vision and Pattern Recognition, pp. 5388–5396. Cited by: §IV-C.
-  (2017) A saliency prior context model for real-time object tracking. IEEE Transactions on Multimedia PP (99), pp. 1–1. Cited by: §I, §IV-C.
-  (2016) A benchmark and simulator for uav tracking. In European Conference on Computer Vision, pp. 445–461. Cited by: §IV-C, §IV-C, §IV.
Learning multi-domain convolutional neural networks for visual tracking. In Computer Vision and Pattern Recognition, pp. 4293–4302. Cited by: §IV-C.
-  (2002) Color-based probabilistic tracking. European Conference on Computer Vision I, pp. 661–675. Cited by: §IV-C.
-  (2017) The world of fast moving objects. In Computer Vision and Pattern Recognition, pp. 4838–4846. Cited by: §I.
-  (2012) Exploiting the circulant structure of tracking-by-detection with kernels. In European Conference on Computer Vision, pp. 702–715. Cited by: §IV-C.
-  (2012) Quaternion-based spectral saliency detection for eye fixation prediction. In European Conference on Computer Vision, pp. 116–129. Cited by: §I, §II.
-  (2010) Cascaded confidence filtering for improved tracking-by-detection. In European Conference on Computer Vision, pp. 369–382. Cited by: §I.
-  (2005) Advantages of the mean absolute error (mae) over the root mean square error (rmse) in assessing average model performance. Climate Research 30 (1), pp. 79. Cited by: §IV-B.
-  (2018) Unsupervised deep video hashing via balanced code for large-scale video retrieval. IEEE Transactions on Image Processing 28 (4), pp. 1993–2007. Cited by: §V.
-  (2019) Joint image-text hashing for fast large-scale cross-media retrieval using self-supervised deep learning. IEEE Transactions on Industrial Electronics 66 (12), pp. 9868–9877. Cited by: §V.
-  (2015) Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (9), pp. 1834–1848. Cited by: §IV-C, §IV.
-  (2014) Initialization-insensitive visual tracking through voting with salient local features. In IEEE International Conference on Computer Vision, pp. 2912–2919. Cited by: §I.
-  (2017) Output constraint transfer for kernelized correlation filter in tracking. IEEE Transactions on Systems Man and Cybernetics Systems 47 (4), pp. 693–703. Cited by: §III, §IV-C, §IV-C.
-  (2018) Latent constrained correlation filter. IEEE Transactions on Image Processing PP (99), pp. 1–1. Cited by: §IV-C.
-  (2016) Bounding multiple gaussians uncertainty with application to object tracking. International Journal of Computer Vision 118 (3), pp. 364–379. Cited by: §I.
-  (2014) Fast visual tracking via dense spatio-temporal context learning. In European Conference on Computer Vision, pp. 127–141. Cited by: §IV-C.
-  (2012) Real-time compressive tracking. In European Conference on Computer Vision, pp. 864–877. Cited by: §IV-C.
-  (2018) Distractor-aware siamese networks for visual object tracking. In European Conference on Computer Vision, Cited by: §IV-C.