Recently, the computer vision community has witnessed significant activity and impressive advances in the area of model-free short-term trackers[49, 30]. Short-term trackers localize a target in a video sequence given a single training example in the first frame. Modern short-term trackers [12, 37, 25, 1, 45, 18, 23] localize the target moderately well even in the presence of significant appearance and motion changes and they are robust to short-term occlusions. Nevertheless, any adaptation at an inaccurate target position leads to gradual corruption of the visual model, drift and irreversible failure. Another major source of failures of short-term trackers are significant target occlusion and target disappearance from the field of view.
These problems are addressed by long-term trackers which combine a short-term tracker with a detector that is capable of reinitializing the tracker. A long-term tracker has to consider several design choices: (i) design of the two core components, (ii) their interaction algorithm, (iii) adaptation strategy, and (iv) estimation of tracking and detection uncertainty.
The design complexity has lead to ad hoc choices and heterogeneous, difficult to reproduce solutions. Initially, memoryless displacement estimators like flock-of-trackers  and the flow calculated at keypoints  were considered. Later methods applied keypoint detectors [42, 22, 43, 39]
, but this approach requires large and sufficiently well textured targets. Cascades of classifiers[24, 38]
and more recently deep feature object retrieval systems have been proposed to deal with diverse targets. The drawback is in the significant increase of computational complexity and the subsequent reduction in the scope of viable applications. Recent long-term trackers either adapt the detector [38, 22], which makes them prone to failure due to learning from incorrect training examples, or train the detector only on the first frame [42, 15], thus losing the opportunity to use the latest learned target appearance.
The main contribution of this paper is a novel fully correlational long-term (FCLT) tracker. The “fully correlational” refers to the fact that both the short-term tracker and the detector of FCLT are discriminative correlation filters (DCFs) operating on the same representation. For some time, DCFs have been the state-of-the-art in short-term tracking, topping a number of recent benchmarks [49, 48, 28, 32, 29, 28, 27, 30]. However, with the standard learning algorithm , a correlation filter cannot be used for detection because of two reasons: (i) the dominance of the background in the search regions which necessarily has the same size as the target model and (ii) the effects of the periodic extension on the borders. Only recently theoretical breakthroughs [8, 25, 37, 26] have been made that allow constraining the non-zero filter response to the area covered by the target, effectively decoupling the sizes of the target and the search regions.
The FCLT is the first long-term tracker that exploits the novel DCF learning method, adopting the ADMM optimization from CSRDCF , the best performing real-time tracker in a recent short-term benchmark . FCLT is the first tracker to use the optimization technique to build a fast detector. The FCLT thus uses a CSRDCF core  to maintain two correlation filters trained on different time scales that act as a short-term tracker and a detector (Figure 1). Since both the detector and the short-term tracker produce the correlation response on the same representation, the localization uncertainty can be estimated by inspecting the correlation response. As another contribution, a stabilizing mechanism is introduced to enable the detector to recover from model contamination.
The interaction between the short-term component and the detector allows long-term tracking even through long-lasting occlusions. Both components enjoy efficient implementation through FFT, which makes our tracker close to real-time. To the best of our knowledge this is the first long-term tracker fully formulated within the framework of discriminative correlation filters.
Extensive experiments show that FCLT tracker outperforms all long-term term-trackers on a long-term benchmark and achieves excellent performance even on short-term benchmarks, while running at 15 fps.
2 Related work
We briefly overview the most closely related work in short-term DCFs and long-term trackers.
Short-term DCFs. Since their inception in MOSSE tracker , several advancements have been made in discriminative correlation filters that made them the most widely used methodology in short-term tracking . Major boosts in performance followed introduction of kernels by , multi-channel formulations [11, 16] and application to scale estimation [7, 35]. Hand-crafted features have been recently replaced with deep features trained for classification [9, 12, 6] as well as features trained for localization [45, 18]. Another line of advancements are constrained filter learning approaches [8, 25, 37] that allow learning a filter with effective size smaller than the training patch.
Long-term trackers. The long-term trackers combine a short-term tracker with a detector. The seminal work of Kalal et al.  proposes a memory-less flock of flows as a short-term tracker and a template-based detector run in parallel. They propose a P-N learning approach in which the short-term tracker provides training examples for the detector and pruning events are used to reduce contamination of the detector model. The detector is implemented as a cascade to reduce the computational complexity.
Another major paradigm was pioneered by Pernici et al. . Their approach casts localization as local keypoint descriptors matching with a weak geometrical model. They propose an approach to reduce contamination of the keypoints model that occurs at adaptation during occlusion. Nebehay et al.  have shown that a keypoint tracker can be utilized even without updating and using pairs of correspondences in a GHT framework to track deformable models. Maresca and Petrosino  have extend the GHT approach by integrating various descriptors and introducing a conservative updating mechanism. The keypoint methods require a large and well textured target, which constrains their application scenarios.
Recent long-term trackers have shifted back to the tracker-detector paradigm of Kalal et al. , mainly due to advent of DCF trackers  which provide a robust and fast short-term tracking component. Ma et al.  proposed a combination of KCF tracker  and a random ferns classifier as a detector that is used to correct the tracker. Similarly, Hong et al.  have proposed a method that combines KCF tracker with a SIFT-based detector that is also used to detect occlusions.
The most extreme example of using a fast tracker and a slow detector is the recent work of Fan and Ling . Their tracker combines a DSST  tracker with a CNN detector  that verifies and potentially corrects proposals of the short-term tracker. Their tracker achieved excellent results on the challenging long-term benchmark , but requires GPU, has a very large memory footprint and requires parallel implementation with backtracking to achieve a reasonable runtime.
3 Fully correlational long-term tracker
In the following we describe our long-term tracking approach based on constrained discriminative correlation filters. The constrained DCF is overviewed in Section 3.1, Section 3.2 overviews the short-term component, Section 3.3 describes the detector, Section 3.4 describes detection of tracking uncertainty and the long-term tracker is described in Section 3.5.
3.1 Constrained discriminative filter formulation
Our tracker is formulated within the framework of discriminative correlation filters. Given a search region of size a set of feature channels , where , are extracted. A set of correlation filters , where , are correlated with the extracted features and the object position is estimated as the location of the maximum of the averaged correlation responses
represents circular correlation, which is efficiently implemented by a Fast Fourier Transform andare channel weights. The target scale can be efficiently estimated by another correlation filter trained over the scale-space .
We use the recently proposed discriminative correlation filter with channel and spatial reliability (CSRDCF)  as the basic filter learning method. This tracker constraints the filter learning by a binary mask, resulting in increased robustness and achieves excellent results on a recent benchmark . The tracker also estimates per-channel reliability and uses it in responses averaging for increased robustness. We provide only a brief overview of the learning framework here and refer the reader to the original paper  for details.
Constrained learning. Since CSRDCF  treats feature channels independently, we will assume a single feature channel (i.e., ) in the following. A channel feature is extracted from a learning region and a fast segmentation  is applied to produce a binary mask that approximately separates the target from the background. Next a filter of the same size as the training region is learned, with support constrained by the mask . CSRDCF  learns the discriminative filter by minimizing the following augmented Lagrangian
where is a desired output, is a complex Lagrange multiplier, denotes Fourier transformed variable, , and we use the definition for compact notation. The solution is obtained via ADMM  iterations between two closed-form solutions, i.e.,
where denotes inverse Fourier transform. In case of multiple channels, the approach independently learns a single filter per channel. Since the support of the learned filter is constrained to be smaller than the learning region, the maximum response on the training region reflects the reliability of the learned filter . These values are used as per-channel weights in (1) for improved target localization. After estimating the new filter, CSRDCF  updates the segmentation model as well.
3.2 The short-term component
The CSRDCF  tracker is used as a short-term component in our long-term tracker. The short-term component is run within a search region centered on the estimated target position from the previous frame. The new target position hypothesis is estimated as the location of the maximum of the correlation response between the short-term filter
and the features extracted from the search region (see Figure2).
The visual model of the short-term component is updated by an exponential moving average
where is a correlation filter used to localize the target, is a filter estimated by constrained filter learning (Section 3.1) in the current time-step, and is the update factor.
3.3 The detector
) estimates a filter implicitly padded with zeros to match the learning region size. In contrast to naive learning of filter with a standard approach like and multiplying with a mask post-hoc, the padding is explicitly enforced during learning, resulting in increased filter robustness. But even more importantly, since adding or removing the zeros at filter borders keeps the filter unchanged, correlation on an arbitrary large region via FFT is thus possible by zero padding the filter to match the size. These properties make  an excellent candidate to train the detector in our long-term tracker.
Ideally, a visual model non-contaminated by false training examples is desired for reliable re-detection after a long period of target loss. The only known non-contaminated filter is the one learned at initialization. But for short-term occlusions, the most recent uncontaminated model would likely yield a better detection.
While contamination of the short-term visual model (Section 3.2) is minimized by our long-term system (Section 3.5), it cannot be prevented. We therefore construct the detector correlation filter as a convex combination of the visual model learned by CSRDCF  at initialization and the most recent short-term visual model , i.e.,
where the mixing weight depends on the mixing parameter and number of frames since last confidently estimated position. Thus the detector model starts as the last short-term visual model and gradually reverts to the uncontaminated initial model. This guarantees full recovery from potential contamination of the short-term visual model.
A motion model is added to increase robustness. We use a basic random walk, which models the likelihood of target position at time-step by a Gaussian with a diagonal covariance matrix centered at the last confidently estimated position
. The variances in the motion model gradually increase with the number of framessince the last confident estimation, i.e., , where is scale increase parameter, and are the target width and height, respectively.
3.4 Detection of tracking uncertainty
Tracking uncertainty detection is crucial for minimizing short-term visual model contamination as well as for activating target re-detection after events like occlusion. Our tracker is fully formulated within discriminative correlation filter framework, therefore tracking quality can be evaluated by inspecting the correlation response used for target localization.
Confident localization produces in a well expressed local maximum in the correlation response , which can be measured by a peak-to-sidelobe ratio  as well as by the peak absolute value, . The localization quality is thus defined as the product of the two, i.e.,
Detrimental events like occlusion occur on a relatively short time-scale and are reflected in a significant reduction of the current localization quality. Let be the average localization quality computed over the recent confidently tracked frames. Tracking is considered uncertain if the ratio between and exceeds a predefined threshold , i.e.,
Figure 4 shows an example of confident tracking before and after occlusion. The ratio between average and current localization quality significantly increases during occlusion, indicating a highly uncertain tracking.
3.5 Tracking with FCLT
Initialization. The FCLT tracker is initialized in the first frame and the learned initialization model is stored. In the remaining frames, two visual models are maintained at different time-scales for target localization: the short-term visual model and the detector visual model .
Localization. A tracking iteration at frame starts with the target position from previous frame as well as tracking quality score and the mean over the recent confidently tracked frames. A region is extracted at location in the current image and the correlation response is computed using the short-term component model (Section 3.2). Position and localization quality (7) are estimated from the correlation response . If tracking was confident at , i.e., the uncertainty (8) was smaller than , only the short-term component is run, otherwise the detector (Section 3.3) is activated as well to address potential target disappearance. A detector filter is constructed according to (6) and correlated with the features extracted from the entire image. The detection hypothesis is obtained as the location of the maximum of the correlation multiplied by the motion model , while the localization quality (7) is computed only on the correlation response.
Update. In case the detector has not been activated, the short-term position is taken as the final target position estimate. Alternatively, both position hypotheses, i.e., the position estimated by the short-term component as well as the position estimated by the detector, are considered. The final target position is estimated as the one with higher quality score, i.e.,
This section provides a comprehensive experimental evaluation of the FCLT tracker. Implementation details are discussed in Section 4.1. FCLT is a long-term tracker, we nevertheless start by an evaluation on challenging short-term benchmarks OTB100 , UAV123  and VOT2016  since the transition between long and short term tracking is not abrupt and a long-term tracker without competitive short-term performance is of limited use. The results are presented in Section 4.2. An extensive evaluation on the long-term benchmark UAV20L  including per-sequence and attribute-based analysis is reported in Section 4.3, while the detector importance is experimentally evaluated in Section 4.4.
4.1 Implementation details
We use the same standard HOG  and colornames [46, 11] features in the short-term component and in the detector. All the parameters of the CSRDCF filter learning are the same as in , including filter learning rate and regularization . The parameter for filter mixing in detector construction was set to and the motion model scale increase parameter was set to . The uncertainty threshold was set to and the parameter “recent frames” was . The parameters did not require fine tunning and were kept constant throughout all experiments. Our Matlab implementation runs on average at 15 frames per second on OTB100  dataset on an Intel Core i7 3.4GHz standard desktop.
4.2 Performance on short-term benchmarks
For completeness, we first evaluate the performance of FCLT on the popular short-term benchmarks: OTB100 , UAV123  and VOT2016 . A standard no-reset evaluation (OPE ) is applied to focus on long-term behavior: a tracker is initialized in the first frame and left to track until the end of the sequence.
Tracking quality is measured by precision and success plots. The success plot shows all threshold values, the proportion of frames with the overlap between the predicted and ground truth bounding boxes as greater than a threshold. The results are summarized by areas under these plots which are shown in the legend. The precision plots in Figures 5, 6, 7 and 8 show a similar statistics computed from the center error. The results in the legends are summarized by percentage of frames tracked with an center error less than 20 pixels.
The benchmarks results already contain some long-term trackers. We added the most recent PTAV  – the currently best-performing published long-term tracker. Since FCLT is derived from the recent CSRDCF  we include this tracker as well. We remark that PTAV is not causal, i.e. it uses future frames to predict the position of the tracked object which limits its applicability.
4.2.1 Otb100  benchmark
The OTB100  contains results of 29 trackers evaluated on 100 sequences with average sequence length of 589 frames. To reduce clutter in the graphs, we show here only the results for top-performing recent baselines, i.e., Struck , TLD , CXT , ASLA , SCM , LSK , CSK , OAB , VTS , VTD , CMT  and results for recent top-performing state-of-the-art trackers SRDCF , MUSTER , LCT  PTAV  and CSRDCF .
The FCLT ranks among the top on this benchmark (Figure 5) outperforming all baselines as well as state-of-the-art SRDCF, CSRDCF and MUSTER. Using only handcrafted features, the FCLT achieves comparable performance to the non-causal PTAV  which uses deep features for redetection and applies backtracking.
4.2.2 Vot2016  benchmark
The VOT2016  is the most challenging recent short-term tracking benchmark which contains results of 70 trackers evaluated on 60 sequences with the average sequence length of 358 frames. The dataset was created using a methodology that selected sequences which are difficult to track, thus the target appearance varies much more than in other benchmarks. In the interest of visibility, we show only top-performing trackers on no-reset evaluation, i.e., SSAT [41, 28], TCNN [41, 28], CCOT , MDNetN [41, 28], GGTv2 , MLDF , DNT , DeepSRDCF , SiamRN  and FCF . In addition we add CSRDCF  and the long-term trackers TLD , LCT , MUSTER , CMT  and PTAV .
The FCLT is ranked fifth on this benchmark according to the tracking success measure, outperforming 65 trackers, including DeepSRDCF  with deep features, CSRDCF and PTAV. Note that four tracker that achieve better performance than FCLT (SSAT, TCNN, CCOT and MDNetN) are CNN-based trackers and are computationally very expensive. They are optimized for accurate tracking on short sequences, without an ability for re-detection. The FCLT outperforms all long-term trackers on this benchmark (TLD, CMT, LCT, MUSTER and PTAV).
4.2.3 Uav123  benchmark
The UAV123  contains results of 14 trackers evaluated on 123 sequences with average sequence length of 915 frames. To reduce clutter in the graphs, we show here only the results for top-performing recent baselines, i.e., ASLA , Struck , SAMF , MEEM , LCT , TLD , CMT  and results for recent top-performing state-of-the-art trackers SRDCF , MUSTER , PTAV  and CSRDCF .
Results are shown in Figure 7. The FCLT outperforms by a margin all recent short-term trackers, i.e., SRDCF and CSRDCF as well as the long-term trackers (PTAV, MUSTER, LCT, CMT and TLD) in both measures.
4.3 Evaluation on a long-term benchmark
The long-term performance of the FCLT is analyzed on the recent long-term benchmark UAV20L  that contains results of 14 trackers on 20 long term sequences with average sequence length 2934 frames. To reduce clutter in the plots we include top-performing trackers SRDCF , OAB , SAMF , MEEM , Struck , DSST  and all long-term trackers in the benchmark (MUSTER , TLD ). We include the most recent state-of-the-art long-term trackers CMT , LCT , and PTAV  in the analysis. Additionally, we add recent state-of-the-art short-term DCF trackers CSRDCF  and CNN-based CCOT .
Results in Figure 8 show that FCLT by far outperforms all top baseline trackers on benchmark as well as all the recent long-term state-of-the-art. In particular FCLT outperforms the recent long-term correlation filter LCT  by in precision and success measures. The FCLT also outperforms the currently best-performing published long-term tracker PTAV  by over and in precision and success measures, respectively. This is an excellent result especially considering that FCLT does not apply deep features and backtracking like PTAV  and that it runs in near-realtime on a single thread CPU.
Table 2 shows tracking performance with respect to the twelve attributes annotated in the UAV20L benchmark. The FCLT is top performing tracker across all attributes, except fast motion, where PTAV exploits its backtracking mechanism. On the other hand, the FCLT achieves and better performance in tracking success at full occlusion and out-of-view comparing to PTAV. These attributes are the most specific for long-term tracking since target re-detection is required after they happen.
Figure 9 shows qualitative tracking examples for the proposed FCLT and four state-of-the-art trackers: PTAV , CSRDCF , MUSTER  and TLD . In Group2 and Person19 a long-lasting full occlusion happens during tracking. Only FCLT is able to redetect the target in both situations. In sequence Person10, the target disappears from the image and only FCLT, TLD and MUSTER are able to re-detect it, but TLD and MUSTER are tracking it with much lower accuracy. In Bolt2 sequence trackers suffer from background clutter, while FCLT and CSRDCF are able to track the target to the end. Sequence Human3 contains many partial occlusions. Only the FCLT and PTAV are able to successfully track.
4.4 Impact of the detector in FCLT
To study the importance of the detector in FCLT, we have compared per-sequence performance with the CSRDCF  which is used as the short-term tracker in the FCLT. For each sequence in UAV20L  we have calculated the average overlap between the predicted and ground truth bounding box and fraction of the tracked frames with overlap greater than zero as global performance measures. The number of tracking recoveries is quantified by counting the number of times the overlap increased from zero to a positive value. The results show that the detector and recovery system in FCLT is successful in many sequences. As a result, FCLT manages to track significantly longer than CSRDCF. In some cases, like the person14 sequence, the recovery dramatically improves FCLT tracking performance with a 1880 increase in the tracked sequence length compared to CSRDCF. On average, the FCLT outperforms the CSRDCF in overlap by over .
We proposed a fully-correlational long-term tracker (FCLT). The FCLT is the first long-term tracker that exploits the novel DCF learning method from CSRDCF , the best performing real-time tracker in a recent short-term benchmark . The method is used in FCLT to maintain two correlation filters trained on different time scales that act as a short-term and a detector component. The short-term component localizes the target within a limited search range in each frame. On the other hand, the detector exploits properties of the recent constrained filter learning  and is able to re-detect the target in the whole image efficiently. A failure detection mechanism based on correlation response quality is proposed and used for tracking uncertainty detection. The interaction between the short-term component and the detector allows long-term tracking even through long-lasting occlusions.
Experimental evaluation on short-term benchmarks [49, 28, 40] showed state-of-the-art performance. On long-term benchmark  the FCLT outperform the best method by over 18%. The FCLT also consistently outperforms the short-term state-of-the-art CSRDCF , while running at the same frame-rate. Our Matlab implementation runs at 15 fps and it will be made publicly available.
-  L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. Torr. Fully-convolutional siamese networks for object tracking. arXiv preprint arXiv:1606.09549, 2016.
-  D. S. Bolme, J. R. Beveridge, B. A. Draper, and Y. M. Lui. Visual object tracking using adaptive correlation filters. In Comp. Vis. Patt. Recognition, pages 2544–2550. IEEE, 2010.
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein.
Distributed optimization and statistical learning via the alternating
direction method of multipliers.
Foundations and Trends in Machine Learning, 3(1):1–122, 2011.
-  Z. Chi, H. Li, H. Lu, and M. H. Yang. Dual deep network for visual tracking. IEEE Trans. Image Proc., 26(4):2005–2015, 2017.
-  N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Comp. Vis. Patt. Recognition, volume 1, pages 886–893, June 2005.
-  M. Danelljan, G. Bhat, F. Shahbaz Khan, and M. Felsberg. Eco: Efficient convolution operators for tracking. In Comp. Vis. Patt. Recognition, pages 6638–6646, 2017.
-  M. Danelljan, G. Häger, F. S. Khan, and M. Felsberg. Accurate scale estimation for robust visual tracking. In Proc. British Machine Vision Conference, pages 1–11, 2014.
-  M. Danelljan, G. Hager, F. Shahbaz Khan, and M. Felsberg. Learning spatially regularized correlation filters for visual tracking. In Int. Conf. Computer Vision, pages 4310–4318, 2015.
-  M. Danelljan, G. Häger, F. S. Khan, and M. Felsberg. Convolutional features for correlation filter based visual tracking. In IEEE International Conference on Computer Vision Workshop (ICCVW), pages 621–629, Dec 2015.
-  M. Danelljan, G. Häger, F. S. Khan, and M. Felsberg. Discriminative scale space tracking. IEEE Trans. Pattern Anal. Mach. Intell., 39(8):1561–1575, 2017.
M. Danelljan, F. S. Khan, M. Felsberg, and J. van de Weijer.
Adaptive color attributes for real-time visual tracking.
2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014, pages 1090–1097, 2014.
-  M. Danelljan, A. Robinson, F. S. Khan, and M. Felsberg. Beyond correlation filters: learning continuous convolution operators for visual tracking. In Proc. European Conf. Computer Vision, pages 472–488. Springer, 2016.
-  T. B. Dinh, N. Vo, and G. Medioni. Context tracker: Exploring supporters and distracters in unconstrained environments. In Comp. Vis. Patt. Recognition, pages 1177–1184, 2011.
-  D. Du, H. Qi, L. Wen, Q. Tian, Q. Huang, and S. Lyu. Geometric hypergraph learning for visual tracking. IEEE Transactions on Cybernetics, 47(12):4182–4195, 2017.
-  H. Fan and H. Ling. Parallel tracking and verifying: A framework for real-time and high accuracy visual tracking. In Int. Conf. Computer Vision, pages 5486–5494, 2017.
-  H. K. Galoogahi, T. Sim, and S. Lucey. Multi-channel correlation filters. In Int. Conf. Computer Vision, pages 3072–3079, 2013.
-  H. Grabner, M. Grabner, and H. Bischof. Real-time tracking via on-line boosting. In Proc. British Machine Vision Conference, volume 1, pages 47–56, 2006.
-  E. Gundogdu and A. A. Alatan. Good features to correlate for visual tracking. arXiv preprint arXiv:1704.06326, 2017.
-  S. Hare, A. Saffari, and P. H. S. Torr. Struck: Structured output tracking with kernels. In Int. Conf. Computer Vision, pages 263–270, Washington, DC, USA, 2011. IEEE Computer Society.
-  J. F. Henriques, R. Caseiro, P. Martins, and J. Batista. Exploiting the circulant structure of tracking-by-detection with kernels. In A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, and C. Schmid, editors, Proc. European Conf. Computer Vision, pages 702–715, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg.
-  J. F. Henriques, R. Caseiro, P. Martins, and J. Batista. High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell., 37(3):583–596, 2015.
-  Z. Hong, Z. Chen, C. Wang, X. Mei, D. Prokhorov, and D. Tao. Multi-store tracker (muster): A cognitive psychology inspired approach to object tracking. In Comp. Vis. Patt. Recognition, pages 749–758, June 2015.
-  C. Huang, S. Lucey, and D. Ramanan. Learning policies for adaptive tracking with deep feature cascades. In Int. Conf. Computer Vision, number 105–114, 2017.
-  Z. Kalal, K. Mikolajczyk, and J. Matas. Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell., 34(7):1409–1422, July 2012.
-  H. Kiani Galoogahi, A. Fagg, and S. Lucey. Learning background-aware correlation filters for visual tracking. In Int. Conf. Computer Vision, number 1135–1143, 2017.
-  H. Kiani Galoogahi, T. Sim, and S. Lucey. Correlation filters with limited boundaries. In Comp. Vis. Patt. Recognition, pages 4630–4638, 2015.
-  M. Kristan, A. Leonardis, J. Matas, M. Felsberg, R. Pflugfelder, L. Cehovin Zajc, T. Vojir, G. Hager, A. Lukezic, A. Eldesokey, and G. Fernandez. The visual object tracking vot2017 challenge results. In The IEEE International Conference on Computer Vision (ICCV), 2017.
-  M. Kristan, A. Leonardis, J. Matas, M. Felsberg, R. Pflugfelder, L. Čehovin, T. Vojir, G. Häger, A. Lukežič, and G. et al. Fernandez. The visual object tracking vot2016 challenge results. In Proc. European Conf. Computer Vision, 2016.
-  M. Kristan, J. Matas, A. Leonardis, M. Felsberg, L. Čehovin, G. Fernandez, T. Vojir, G. Häger, G. Nebehay, and R. et al. Pflugfelder. The visual object tracking vot2015 challenge results. In Int. Conf. Computer Vision, 2015.
-  M. Kristan, J. Matas, A. Leonardis, T. Vojir, R. Pflugfelder, G. Fernandez, G. Nebehay, F. Porikli, and L. Cehovin. A novel performance evaluation methodology for single-target trackers. IEEE Trans. Pattern Anal. Mach. Intell., 2016.
-  M. Kristan, J. Perš, V. Sulič, and S. Kovačič. A graphical model for rapid obstacle image-map estimation from unmanned surface vehicles. In Proc. Asian Conf. Computer Vision, pages 391–406, 2014.
-  M. Kristan, R. Pflugfelder, A. Leonardis, J. Matas, L. Čehovin, G. Nebehay, T. Vojir, and G. et al. Fernandez. The visual object tracking vot2014 challenge results. In Proc. European Conf. Computer Vision, pages 191–217, 2014.
-  J. Kwon and K. M. Lee. Visual tracking decomposition. In Comp. Vis. Patt. Recognition, pages 1269–1276, 2010.
-  J. Kwon and K. M. Lee. Tracking by sampling and integrating multiple trackers. IEEE Trans. Pattern Anal. Mach. Intell., 36(7):1428–1441, July 2014.
-  Y. Li and J. Zhu. A scale adaptive kernel correlation filter tracker with feature integration. In Proc. European Conf. Computer Vision, pages 254–265, 2014.
-  B. Liu, J. Huang, L. Yang, and C. Kulikowsk. Robust tracking using local sparse appearance model and k-selection. In Comp. Vis. Patt. Recognition, pages 1313–1320, June 2011.
-  A. Lukežič, T. Vojíř, L. Čehovin Zajc, J. Matas, and M. Kristan. Discriminative correlation filter with channel and spatial reliability. In Comp. Vis. Patt. Recognition, pages 6309–6318, 2017.
-  C. Ma, X. Yang, C. Zhang, and M.-H. Yang. Long-term correlation tracking. In Comp. Vis. Patt. Recognition, pages 5388–5396, 2015.
-  M. E. Maresca and A. Petrosino. Matrioska: A multi-level approach to fast tracking by learning. In Proc. Int. Conf. Image Analysis and Processing, pages 419–428, 2013.
-  M. Mueller, N. Smith, and B. Ghanem. A benchmark and simulator for uav tracking. In Proc. European Conf. Computer Vision, pages 445–461, 2016.
H. Nam and B. Han.
Learning multi-domain convolutional neural networks for visual tracking.In Comp. Vis. Patt. Recognition, pages 4293–4302, June 2016.
-  G. Nebehay and R. Pflugfelder. Clustering of static-adaptive correspondences for deformable object tracking. In Comp. Vis. Patt. Recognition, pages 2784–2791, 2015.
-  F. Pernici and A. Del Bimbo. Object tracking by oversampling local features. IEEE Trans. Pattern Anal. Mach. Intell., 36(12):2538–2551, 2013.
-  R. Tao, E. Gavves, and A. W. M. Smeulders. Siamese instance search for tracking. In Comp. Vis. Patt. Recognition, pages 1420–1429, 2016.
-  J. Valmadre, L. Bertinetto, J. Henriques, A. Vedaldi, and P. H. S. Torr. End-to-end representation learning for correlation filter based tracking. In Comp. Vis. Patt. Recognition, pages 2805–2813, 2017.
-  J. van de Weijer, C. Schmid, J. Verbeek, and D. Larlus. Learning color names for real-world applications. IEEE Trans. Image Proc., 18(7):1512–1523, July 2009.
-  L. Wang, W. Ouyang, X. Wang, and H. Lu. Visual tracking with fully convolutional networks. In Int. Conf. Computer Vision, pages 3119–3127, Dec 2015.
-  Y. Wu, J. Lim, and M.-H. Yang. Online object tracking: A benchmark. In Comp. Vis. Patt. Recognition, pages 2411– 2418, 2013.
-  Y. Wu, J. Lim, and M. H. Yang. Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell., 37(9):1834–1848, Sept 2015.
-  M.-H. Y. Xu Jia, Huchuan Lu. Visual tracking via adaptive structural local sparse appearance model. In Comp. Vis. Patt. Recognition, pages 1822–1829, 2012.
-  J. Zhang, S. Ma, and S. Sclaroff. MEEM: robust tracking via multiple experts using entropy minimization. In Proc. European Conf. Computer Vision, pages 188––203, 2014.
-  W. Zhong, H. Lu, and M. H. Yang. Robust object tracking via sparse collaborative appearance model. IEEE Trans. Image Proc., 23(5):2356–2368, 2014.