Best-Buddies Tracking

11/01/2016 ∙ by Shaul Oron, et al. ∙ Tel Aviv University 0

Best-Buddies Tracking (BBT) applies the Best-Buddies Similarity measure (BBS) to the problem of model-free online tracking. BBS was introduced as a similarity measure between two point sets and was shown to be very effective for template matching. Originally, BBS was designed to work with point sets of equal size, and we propose a modification that lets it handle point sets of different size. The modified BBS is better suited to handle scale changes in the template size, as well as support a variable number of template images. We embed the modified BBS in a particle filter framework and obtain good results on a number of standard benchmarks.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Online tracking plays an important role in many computer vision applications such as autonomous driving, surveillance systems and human-computer-interface, to name a few. Tracking is a challenging task because of changes in view-point, illumination, and non rigid deformations of the object to be tracked. A tracker must strike a delicate balance between adapting to legitimate changes in the appearance of an object and adapting to background clutter thus causing drift.

These sources of variability require a representation that is invariant to them or the development of similarity measures that can handle them. In its most basic form, online tracking boils down to template matching, where the goal is to find a given template in the current image. This requires the definition of a similarity measure, such as minimizing the intensity difference between the template and the target. To complicate things, it is common to represent the position of an object with a bounding box. The bounding box often include some background pixels. These pixels are outliers that might confuse the matching function and cause drift.

We address these problems by using a new (dis)similarity measure called the Best-Buddies Similarity or BBS [8]. It is used within a particle filtering framework to produce what we call the Best-Buddies Tracker or BBT.

The template and candidate image regions are mapped to two point sets in some high dimensional space where BBS takes place. The mapping simply involves breaking the image region into small patches and creating vectors (i.e., high dimensional points) consisting of their pixel value and relative

position. BBS counts the number of Best-Buddies Pairs (BBPs) - pairs of points in source and target sets, where each point is the nearest neighbor of the other. This simple measure is quite robust to geometric deformations and considerable amount of outliers, making it an ideal candidate for tracking.

Applying BBS to tracking requires computing BBS between sets of points with different size. For example, in case the target and candidate have different scales, or if multiple templates are used. However, the original BBS formulation does not handle scale change properly. Specifically, our analysis shows that BBS scores increase if one set is made larger while the other is kept fixed. We analyze this phenomenon both theoretically and empirically, and suggest several ways to address the problem.

In addition to the scientific benefit, solving this issue is crucial for applying BBS to tracking. Otherwise, for example, comparing BBS scores of candidates of different scale (and hence point sets of different size) would be biased towards larger candidates.

Because BBS is a statistical measure, we show that it is enough to sample equal size point sets from the input point sets, even if the input point sets are of different size. A nice side benefit of this sampling strategy is that we can increase computational efficiency by sampling small size point sets. This is especially important in tracking where run time considerations are important.

In [8]

BBS was used for template matching and was computed exhaustively over an image using a sliding window. In BBT we propose incorporating BBS into a particle filtering framework. This removes the need to perform an exhaustive search and allows us to account for scale changes as well. Additional features of BBT include, online confidence estimation using a forward-backward consistency check, and use of a tracker ensemble for more reliable scale estimation.

Finally, we would like BBS to reason about object appearance changes over time. To this end, we leverage the fact we are working with point sets, and propose a “bag-of-points” approach, augmenting together points from multiple templates, captured at different times. By doing so, we obtain a better non-parametric representation of the underlying appearance model of the object as it changes over time, leading to more reliable matching.

We perform extensive tests evaluating the performance of BBT. To this end, we use three commonly used tracking benchmarks, and compared our performance to many recently published methods. Overall, BBT demonstrates good initial performance, with significant improvement achieved through better ensemble fusion.

To summarize, the contribution of our work is two fold: (i) We address the problem of computing BBS with unbalanced point sets. We analyze the problem both theoretically and empirically, and propose two effective solutions to the problem. Solving this problem is crucial for applying BBS to tracking. (ii) We propose a novel tracking framework termed BBT that uses BBS as a similarity measure. We perform extensive experiments evaluating its performance and comparing them to other recently published methods showing promising initial results.

2 Related Work

One can categorize a tracking algorithm as either taking a discriminative or generative approach to the problem. Within each category, one can use different representations to work with. On top of that, recent work demonstrated the benefits of using multiple trackers in parallel.

Discriminative trackers treat tracking as a binary classification problem where the goal is to develop a classifier that will separate the object from the background

[1, 10, 11, 19, 12, 7, 36].

Generative trackers, on the other hand, build a model of the object and try to minimize reconstruction error when searching for the object in the next frame [5, 30, 23].

Within each approach one should determine the representation with which to work. Methods proposed in recent years often use multiple templates for a better representation of the time-varying appearance of the object. These templates are then used in various ways for computing the similarity. Some methods might use the raw pixel values. For example, using the templates as a dictionary and solving a sparse optimization problem [41, 39]. Treating the templates as a bag of patches and then using a patch matching based similarity measure [27]. Other methods take a discriminative approach treating the similarity as a form of classification problem [15, 24], or some combination of the two [11, 19, 9].

Of course, one can mix and match different types of trackers. An early attempt was made by [17] that combine a short term with a long term tracker. More recently, [21] proposed a tracking sampling framework where new trackers are sampled in each new frame. Recently, the MUSTer tracker [40] combined a short term tracker with a long term memory module to achieve excellent results. More general approaches for fusing results from an ensemble of trackers have also been proposed [3, 22, 33]. In most such approaches the trackers themselves are treated as “black-boxes” and only their output signals are considered.

Cardinal to any tracking algorithm is the need to determine the similarity measure between the appearance of an object and some candidate hypothesis. Such a measure must be defined for any type of tracker, be it a particle filtering (PF) approach [41, 26], a gradient decent based method [28], exhaustive search over a region of interest [13, 15] or some other control schemes [27, 40]. Here we focus on the suitability of the BBS measure to the task.

3 BBS with Unbalanced Set Sizes

We briefly review BBS [8] first. Then we discuss the problem of computing BBS when the point sets have different size. Solving this problem is crucial for proper visual tracking because the target and candidate point sets might have different size, either because of scale difference, or use of multiple templates.

3.1 BBS for Template Matching

BBS is a similarity measure between two point sets. So given a template and a target (in the form of two rectangular image region) we must first convert them to point sets. We do this by breaking each region into distinct patches of size (we use in our experiments). Each such patch is represented by a vector of size that is the concatenation of its pixels each with color channels and the location of the central pixel, relative to the region’s coordinate system. That is, each patch is a point in a space. The template and the target are now represented as sets of points in this space. We are now ready to formally define BBS [8]:

BBS measures the similarity between two sets of points and , where . The BBS is the fraction of Best-Buddies Pairs (BBPs) between the two sets. Specifically, a pair of points is a BBP if is the nearest neighbor of in the set , and vice versa. Formally,


where, , and is some distance measure. The BBS between the point sets and is given by:


3.2 BBS with Uneven Sets

BBS counts the number of Best-Buddy Pairs and then normalizes by the number of points in the smallest set. As discussed in [8], BBS is a statistical property of the data. It is governed by the underlying density functions from which the data was sampled. Moreover, it is shown in [29] that BBS converges to the Chi-Square () distance between distributions when the set sizes are sufficiently large.

In general, BBS is well defined, and can be computed for uneven point sets. Unfortunately, in such cases BBS becomes biased. Specifically, when one set has fixed size, and we increase the size of the other set, the probability for finding BBP increases. However, the normalization factor is kept constant, thus making the final BBS score higher.

To see this, consider the case where and are drawn i.i.d from some underlying multivariate, , distribution functions and . We keep the size of set fixed, and check what happens to the BBS score when .

This means that the BBS score goes to one when the size of set goes to infinity (and the size of is fixed).

Intuitively, every point has some region around it such that if a point is in this region then they are a BPP. As becomes larger the probability that no point in falls within this region gets smaller (for any point in ). This means that eventually all the points in will be BBP, and since we normalize by the size of the smaller set which is , then the BBS score will goto one. For a more rigorous proof of this claim see the Appendix.

3.3 Making BBS Work With Uneven Sets

Uneven point sets affects the BBS score, and we propose two solutions to this problem. Key to both solutions is the understanding that if somehow these sets were made equal then the BBS computation will be unbiased.

Our first solution is clustering. Specifically, cluster the larger set such that it has the same size as the smaller set. This of course makes the sets equal in size but poses several problems. First, BBS is now computed relative to cluster centers which are not actual data points, but rather some form of averaged information. Second, clustering may, in some cases, change the underlying distribution of the data. Third, BBS disregards cluster weights which hold information regarding the underlying distribution. Finally, clustering adds additional computational load.

(a) (b) (c)
Figure 1: 2D point sets:. An example of point sets used in our synthetic experiment. Points are drawn from GMMs, which have the same “Foreground” Gaussian (Blue points) and a different “Background” Gaussian (Red/Green points). (a) Set . (b) Set s.t. . (c) Set s.t. . The BBS score between the even sets is shown in (b). BBS between the uneven sets, w/wo sampling is shown in (c). See text for more details.

These problems brings us to our second solution. Using random sampling instead of clustering. In this case we uniformly sample points from both sets. Similar to Monte Carlo or particle filtering approaches, more points will be sampled from dense areas, where there is a higher probability of finding points. This solution alleviates the problems associated with clustering. Specifically, sampled points correctly reflect the true underlying distribution of the data, lifting the cluster weight problem. Unlike clustering, the sampled points are actual data points and not averaged cluster centers. No weights are needed, and the only pre-processing is the sampling itself.

Finally, we note that a nice side product of the proposed techniques is the fact that they reduce the problem size, and are therefore expected to accelerate BBS computation.

Synthetic experiment:  The following synthetic experiment will illustrate the problem of computing BBS with uneven sets and demonstrate the effectiveness of our proposed solutions.

Figure 2: BBS score for different set size ratios: The number of point in is fixed while the size of is increased. Increasing makes it more likely for points in to find BBP resulting in higher BBS scores (Magenta). Using clustering (Red) and Random Sampling (Green) alleviate the score bias. BBS is invariant to set size when both sets are equally large (Blue).See text for more details

Point sets and

consists of 2D points, drawn from underlying Gaussian-mixture-models (GMM),

. Each GMM consists of two Gaussians. One is considered the “Foreground” Gaussian and the other the “Background” Gaussian. The “Foreground” Gaussian is the same for both GMMs while the “Background” one is different. An example showing point sets drawn using these distributions is shown in Figure 1.

Ideally we want the BBS score of point sets and to be invariant to the size of the point sets. Figure 1 reveals that this is not the case. The BBS score of and is higher when the size of grows.

Figure 2 shows the effect of our proposed solutions. Computing BBS using all points in (Magenta curve) results in increasing BBS score as increases. If both sets are increased equally the BBS score remains constant (Blue curve). Using clustering (Red curve) and random sampling (Green curve) eliminates the BBS bias and provides scores very close to this baseline. Note how random sampling provides a better approximation to the correct BBS score compared with clustering.

Using either clustering or random sampling reduces the number of points used for the BBS computation and by doing so accelerate it. Figure 3 shows average processing time of BBS as measured in our synthetic experiment. Note how using random sampling (Green) results in constant time processing. This is a desirable property for many application and specifically for tracking. Unfortunately, this is not the case for clustering (Red) which takes more time to compute as the size of increases.

Figure 3: BBS computation time: Computation time increases when both sets are made larger (Blue). Using clustering (Red) lowers the overall runtime but does not result in a constant processing time due to increased clustering compute time. Random sampling (Green) on the other hand does run in constant time.

4 Best Buddies Tracker

In [8] BBS was successfully applied to template matching. It was shown to be robust to outliers and was able to account for template deformations. These properties make it appealing to use BBS as a similarity measure for visual tracking. Moreover, the data on which BBS was successfully evaluate in [8] was pairs of frames with a wide temporal baseline taken from a tracking benchmark (although it was not evaluated for tracking in that work).

In the original work, BBS was exhaustively applied to a query image using a sliding window. Despite the efficient computation scheme, proposed by the authors, this is still a computationally demanding process. Additionally, using the sliding window does not account for scale changes which are an inherent part of visual tracking.

Due to these limitations, the proposed Best Buddies Tracker (BBT) uses a particle filtering framework. Each particle state represents a bounding box in the image (). This way we can handle both scale and position changes. In addition, we avoid exhaustive computation. Our particle filtering framework closely follow the CONDENSATION algorithm[16], and uses BBS to infer the observation likelihood of the particles.

Computing BBS requires building a full distance matrix between all pairs of points in the template and candidate point sets. To this end we use the same distance measure as in the original BBS work. Specifically,


where superscript denotes a points appearance descriptor, which in our case are the color channel values, and superscript denotes a points location. We set as in the original BBS work.

We deploy the random sampling approach, presented in Section 3.3, in order to avoid BBS bias due to uneven sets size. We expect this to also accelerate BBS computations.

In order to better handle object appearance changes over time we use multiple templates, as will be explained next. In addition, we add a forward-backward module to the particle filter. This module verifies that tracking from the current frame to some previous reference frame indeed lands at the position of the object at that frame. The output of the forward-backward module is a confidence score that is then used to determine whether to update the template set or not. This technique fits nicely with the BBS spirit that advocates the use of best buddies both in space and in time.

Finally, in order to better handle scale changes and support both small scale changes as well as large abrupt changes we use an ensemble of several BBT trackers each with a different scale parameter setting.

Detailed information on all the tracker components is provided next.

Using Multiple Templates  In visual tracking objects may undergo a wide range of deformation as a result of in/out-of plane rotation, illumination changes, occlusions, articulation and more. In such cases using a static appearance of the object, e.g. the template from the first frame, will almost surely lead to drift. One of the common approaches for handling object appearance change over time is using multiple templates. In our case, BBT holds a template buffer comprised of templates chosen based of our forward-backward confidence score as will be explained next. For each input frame a subset of evenly spaced templates are taken from the buffer and used for the BBS computation as follows.

Each one of these templates is resized to the average particle size. This is done in order to ensure that the spatial information stored in the data of the feature space can be correctly leveraged. All points from all templates are then embedded in our spatial-appearance space. This essentially generates a “bag” of weakly localized feature points. Given some candidate window, we embed the points from that candidate in the same spatial-appearance space. In order to ensure unbiased BBS computation we randomly sample points from both candidate and template point sets and only then compute the BBS. This entire process is illustrated in Figure 4.

Figure 4: BBS with Multiple Templates: Multiple templates are taken from the template buffer (1), resized (2), and converted in to one big “bag-of-points” representing the objects appearance (3). Similarly the candidate window is also converted to a point set in a similar manner. Both sets are then randomly sampled (4) in order to ensure the BBS score will not be biased. Next a distance matrix is built (5) and BBS is computed (6). Since BBS is a statistical measure using a “bag-of-points” provides a better proxy for the underlying appearance of the object. See text for more details.

We note that using using this approach means that a candidate point can find a BBP in any one of the templates used, allowing BBT to account for global as well as local deformations affecting only specific regions of the object. The reason using this “bag-of-points” representation makes sense is that BBS is a statistical property of the data. It accounts for the probability that points where drawn from the same underlying appearance model rather than that points are actual physical correspondences. By putting points from multiple templates in our “bag” we effectively obtain a better non-parametric representation of the objects underlying appearance.

Forward-Backward Consistency  Inspired by [18], we use a forward-backward consistency check that produces a confidence score which estimates how well the tracker is locked on. This process is summarized in Algorithm 1.

We begin by measuring the BBS score between the target in the current frame, , and candidates in a reference frame, which is some previous frame at which confidence was high. Candidates are taken on a grid around the target position at that frame, .

The confidence score is taken as the intersection over union between the state of the highest scoring candidate and .

-Target appearance in current frame
-Reference frame
-State at reference frame
Output: Confidence score
1 convert to point set
2 for states on grid around  do
4        convert to point set
5        Compute
7 end for
Algorithm 1 Forward-Backward Consistency Check

Updating Template and Reference Frame  A new target template is added to the template buffer only if we were able to continue tracking, from that frame, with high confidence (), for at least consecutive frames, and no other template was added in the last frames. That is, will be added to the template buffer at time , only if,


and no other template was buffered in the last frames.
Templates are buffered in a first-in-first-out (FIFO) manner with the only exception being which is never removed. For computational reasons, only equally spaced templates are used for the actual template matching at each frame.

Choosing reference frames is done in a similar process as updating templates. That is, frame will be considered a reference frame only at time and only if,


The entire tracking flow for a frame streaming in is summarized in Algorithm 2.

Input: New frame
- Target State for new frame
- Updated template buffer and reference frame
1 Take set of evenly spaced templates from the buffer
2 Convert to point set
3 for every particle  do // Forward pass
5        convert to point set
6        // Random sampling!
8 end for
9 // Normalize weights
10 // Take MAP state
11 // Crop target template
12 Perform backward pass according to Alg. 1
13 Check if template can be updated (Eq. 4)
14 Check if reference frame can be updated (Eq. 5)
Use to draw new particle according to [16].
Algorithm 2 Best-Buddies Tracker

BBT Tracker Ensemble:  Setting the correct scale factor is critical when using particle filtering. A low scale factor will make the tracker conservative and more stable. However, it will not be able to cope with large and abrupt scale changes. A large scale factor on the other hand, can handle large scale changes, but is harder to control making the tracker less stable.

In order to cope with this problem we use an ensemble of several tracker with different scale settings. Each such tracker is independent and does not exchange data with the other trackers. Tracker predictions are fused using the Online Trajectory Optimization method of Bailer et al.[3].

The final BBT ensemble tracker is summarized in Algorithm 3.

Input: New frame
- Final target state
- Updated tracker states
1 for every tracker in ensemble do
2        Perform tracking according to Alg. 2
3        Keep
5 end for
Fuse tracker predictions using the Online Trajectory Optimization[3] to obtain .
Algorithm 3 Best-Buddies Tracker Ensemble

5 Experimental Results

We evaluate the performance of BBT on three commonly used data sets. (i) Object tracking benchmark 50, OTB-50 [34], containing 50 sequences. (ii) Object tracking benchmark 100, OTB-100 [35] containing 100 sequences (50 from OTB-50 and 50 additional sequences). (iii) Princeton tracking benchmark, PTB[31] which contains 95 sequences (this dataset also provides depth data however we only evaluate using RGB).

The performance of BBT is compared to other recently published tracking algorithms. Specifically, performance on OTB-50 and OTB-100 is compared with the following trackers: HCF [25], MEEM [36], DSST [7], KCF [14], Struck [11], TGPR [9], SCM [41], STC [37], PCOM [32]. The performance on PTB is compared with 3D-T[4], ASKCF [6], TGPR [9], KCF [14], Struck [11], VTD [20], RGB [31], MIL [2], TLD [19],CT [38].

Performance is measured according to the OPE protocol ([34]) and is based on the intersection over union (IOU) criterion which quantifies both position as well as scale accuracy. A success curve measuring tracker success for accuracies ranging from 0 to 1 is built averaging over all the sequences in each data set. The mean average precision (mAP) is taken to be the area-under-curve (AUC) of the final overall success curve.

All our experiments are conducted with fixed parameters. We use an ensemble of 4 trackers with scale parameters . Other parameters are fixed across trackers. Template buffer size is , and templates are used for BBS computations. Template and reference frame update parameters are set to . We use 200 particles in each tracker and sample up to 300 points in the random sampling process described in Section 3.3. Using this configuration our unoptimized Matlab code runs at around 1 fps.

Results for OTB-50 and OTB-100 are presented in Figure 5 and 6 respectively. In OTB-50 (Fig. 5) , BBT comes in third place among the trackers evaluated with AUC of 0.535. It is able to outperform trackers such as KCF, DSST and STC, but is outperformed by HCF and MEEM. In OTB-100 (Fig. 6) all tracker performances decrease. Overall BBT remains in third place with similar margins relative to competing trackers.

Figure 5: Success plot for OTB-50[34]. AUC shown in legend. BBT, the proposed method, shown in Green. Best viewed in color.
Figure 6: Success plot for OTB-100[35].AUC shown in legend. BBT, the proposed method, shown in Green. Best viewed in color.

Results for the PTB are summarized in Table 1. As can be seen BBT comes in second place after 3D-T. Again BBT is able to outperform TGPR, KCF and additional recently published methods. Overall its performance in different categories is consistent with other tracking methods producing better results in easier scenarios i.e. rigid objects, slow motion etc.

target type target size movement occlusion motion type
human animal rigid large small slow fast yes no passive active
3D-T[4] 1.09 0.81(1) 0.64(1) 0.73(1) 0.80(1) 0.71(1) 0.75(2) 0.75(1) 0.73(1) 0.78(1) 0.79(1) 0.73(1)
BBT (Our) 2.45 0.52(3) 0.55(2) 0.69(3) 0.58(3) 0.60(2) 0.77(1) 0.52(3) 0.52(3) 0.69(2) 0.70(3) 0.55(2)
ASKCF[6] 2.64 0.52(2) 0.50(4) 0.72(2) 0.59(2) 0.59(3) 0.67(3) 0.56(2) 0.52(2) 0.68(4) 0.72(2) 0.54(3)
TGPR111Results were not formally submitted and, are only available as raw data, therefore rank is not provided. [9] N/A 0.46 0.49 0.67 0.56 0.53 0.66 0.5 0.44 0.69 0.67 0.5
KCF[14] 3.82 0.42(4) 0.50(3) 0.65(4) 0.48(4) 0.55(4) 0.65(4) 0.47(4) 0.41(4) 0.68(3) 0.65(4) 0.47(4)
Struck[11] 5.91 0.35(5) 0.47(7) 0.53(7) 0.45(5) 0.44(7) 0.58(5) 0.39(5) 0.30(7) 0.64(5) 0.54(7) 0.41(5)
VTD[20] 6.09 0.31(7) 0.49(5) 0.54(6) 0.39(6) 0.46(5) 0.57(6) 0.37(6) 0.28(8) 0.63(6) 0.55(6) 0.38(6)
RGB[31] 7.27 0.27(10) 0.41(8) 0.55(5) 0.32(10) 0.46(6) 0.51(8) 0.36(7) 0.35(5) 0.47(9) 0.56(5) 0.34(7)
MIL[2] 8.64 0.32(6) 0.37(9) 0.38(9) 0.37(8) 0.35(9) 0.46(10) 0.31(8) 0.26(9) 0.49(8) 0.40(11) 0.34(8)
TLD[19] 8.64 0.29(9) 0.35(10) 0.44(8) 0.32(9) 0.38(8) 0.52(7) 0.30(10) 0.34(6) 0.39(10) 0.50(8) 0.31(10)
CT[38] 8.73 0.31(8) 0.47(6) 0.37(10) 0.39(7) 0.34(10) 0.49(9) 0.31(9) 0.23(11) 0.54(7) 0.42(10) 0.34(9)
Table 1: Tracking results for Princeton Tracking Benchmark[31]. Success rate() and rankings (in parentheses) for different categories. BBT, the proposed method (in bold), is in the overall second place.

We note that we find the fusion step to be a limiting factor on performance. For example, analyzing our performance on OTB-50 reveals that if we were able to choose the optimal tracker per sequence (choose the correct scale factor) we could reach mAP of 0.571. Furthermore, choosing the best tracker per frame would result in mAP of 0.641. In light of these findings, as part of our future research, we plan to search for a better fusion techniques delivering better overall performance.

6 Conclusions

The Best-Buddies Similarity between point sets, has been successfully applied to template matching, showcasing an ability to handle non-rigid deformations and automatically reject outliers, making it attractive for visual tracking.

Applying BBS to tracking requires it to handle point sets of arbitrary size in order to cope with things such as scale changes and using multiple templates. In this work we found BBS to be biased when computed between point sets with different sizes. A theoretical as well as empirical study of this problem lead us to two effective solutions: clustering and random sampling. We found random sampling favorable and more accurate as it requires no preprocessing, has no associated weights, does not alter the underlying distribution of the data and can be computed in constant time.

Using random sampling we were able to successfully apply BBS to visual tracking. This was done by integrating BBS into a particle filtering framework. By augmenting data from multiple templates we were able to extend BBS to handle the temporally varying appearance of objects being tracked, and an ensemble of BBT tracker was used to ensure good scale estimation.

Extensive experiments were performed using three commonly used tracking benchmarks. BBS demonstrated good initial performance, competitive with respect to other recently published tracking algorithms.

One of the main limiting factors on performance was found to be the fusion technique used. In light of this, our future research is aimed at finding a better fusion strategy that can lead to better performance.

Appendix - Proof of Claim from Section 3.2

Let the minimal distance between and any other point in be,


Where we use as a shorthand notation for which is the distance between points and . We note that since we are dealing with continuous distributions and since is finite then and therefore .

By construction, is the nearest neighbor of any point such that in other words, if then . If there is exactly one such point then and then . It is easy to see that, if there is more than one point that satisfies . Then, since is the nearest neighbor to all of these points, one of them will be the nearest neighbor of in and again .

This means that if there is at least one point such that then will have a BBP. Formally, we want to show that,


Similarly we can check what is the probability that no point in is -close to ,


The probability that a point was randomly drawn -close to , requires integrating over a -hypersphere around . We denote this integration region as , and then we have


Note this is a multivariate distribution, and we are integrating over all the dimensions.

Solving this integral for some arbitrary distribution can be very difficult. Fortunately, we are only interested in bounding it. Specifically, since is a smooth distribution function and since then,


The probability that a point will not be -close to , i.e. , is given by the compliment probability of equation (9). Since the points are i.i.d. we can factor over all the points and get the probability that all points in are not -close to :


Using the bounds in (10) we have that the base of this power is smaller than one, and since the power we have that,


Which means the limit in (7) holds, and thus,


That is, all point in find a BBP, and since we normalize by the size of , which is the smaller set, the BBS score goes to 1, and we are done.


  • [1] S. Avidan. Ensemble tracking. IEEE Trans. Pattern Anal. Mach. Intell., 29(2):261–271, 2007.
  • [2] B. Babenko, M.-H. Yang, and S. Belongie. Visual tracking with online multiple instance learning. In

    Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on

    , pages 983–990. IEEE, 2009.
  • [3] C. Bailer, A. Pagani, and D. Stricker. A superior tracking approach: Building a strong tracker through fusion. In European Conference on Computer Vision, pages 170–185. Springer, 2014.
  • [4] A. Bibi, T. Zhang, and B. Ghanem. 3d part-based sparse tracker with automatic synchronization and registration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1439–1448, 2016.
  • [5] M. J. Black and A. D. Jepson. Eigentracking: Robust matching and tracking of articulated objects using a view-based representation. International Journal of Computer Vision, 26(1):63–84, 1998.
  • [6] M. Camplani, S. Hannuna, M. Mirmehdi, D. Damen, A. Paiement, L. Tao, and T. Burghardt. Real-time rgb-d tracking with depth scaling kernelised correlation filters and occlusion handling. In Proceedings of the British Machine Vision Conference (BMVC). pp, pages 145–1, 2015.
  • [7] M. Danelljan, G. Häger, F. Shahbaz Khan, and M. Felsberg. Accurate scale estimation for robust visual tracking. In Proceedings of the British Machine Vision Conference. BMVA Press, 2014.
  • [8] T. Dekel, S. Oron, S. Avidan, M. Rubinstein, and W. Freeman. Best buddies similarity for robust template matching. In Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on. IEEE, 2015.
  • [9] J. Gao, H. Ling, W. Hu, and J. Xing. Transfer learning based visual tracking with gaussian processes regression. In D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, editors, Computer Vision – ECCV 2014, volume 8691 of Lecture Notes in Computer Science, pages 188–203. Springer International Publishing, 2014.
  • [10] H. Grabner, C. Leistner, and H. Bischof. Semi-supervised on-line boosting for robust tracking. In Computer Vision - ECCV 2008, 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part I, pages 234–247, 2008.
  • [11] S. Hare, A. Saffari, and P. H. S. Torr. Struck: Structured output tracking with kernels. In IEEE International Conference on Computer Vision, ICCV 2011, Barcelona, Spain, November 6-13, 2011, pages 263–270, 2011.
  • [12] J. F. Henriques, R. Caseiro, P. Martins, and J. Batista. High-speed tracking with kernelized correlation filters. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2015.
  • [13] J. F. Henriques, R. Caseiro, P. Martins, and J. Batista. High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell., 37(3):583–596, 2015.
  • [14] J. F. Henriques, R. Caseiro, P. Martins, and J. Batista. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3):583–596, 2015.
  • [15] Z. Hong, C. Wang, X. Mei, D. Prokhorov, and D. Tao. Tracking using multilevel quantizations. In Computer Vision – ECCV 2014, volume 8694 of Lecture Notes in Computer Science, pages 155–171. Springer International Publishing, Sept. 2014.
  • [16] M. Isard and A. Blake. Condensation - conditional density propagation for visual tracking. IJCV, 1998.
  • [17] A. D. Jepson, D. J. Fleet, and T. F. El-Maraghi. Robust online appearance models for visual tracking. IEEE Trans. Pattern Anal. Mach. Intell., 25(10):1296–1311, 2003.
  • [18] Z. Kalal, K. Mikolajczyk, and J. Matas. Forward-backward error: Automatic detection of tracking failures. In Pattern Recognition (ICPR), 2010 20th International Conference on, pages 2756–2759, Aug 2010.
  • [19] Z. Kalal, K. Mikolajczyk, and J. Matas. Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell., 34(7):1409–1422, 2012.
  • [20] J. Kwon and K. M. Lee. Visual tracking decomposition. In CVPR, pages 1269–1276, 2010.
  • [21] J. Kwon and K. M. Lee. Tracking by sampling trackers. In ICCV, pages 1195–1202, 2011.
  • [22] I. Leichter, M. Lindenbaum, and E. Rivlin. A general framework for combining visual trackers–the” black boxes” approach. International Journal of Computer Vision, 67(3):343–363, 2006.
  • [23] B. Liu, L. Yang, J. Huang, P. Meer, L. Gong, and C. A. Kulikowski. Robust and fast collaborative tracking with two stage sparse optimization. In Computer Vision - ECCV 2010, 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part IV, pages 624–637, 2010.
  • [24] T. Liu, G. Wang, and Q. Yang. Real-time part-based visual tracking via adaptive correlation filters. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
  • [25] C. Ma, J.-B. Huang, X. Yang, and M.-H. Yang. Hierarchical convolutional features for visual tracking. In Proceedings of the IEEE International Conference on Computer Vision, pages 3074–3082, 2015.
  • [26] X. Mei, Z. Hong, D. Prokhorov, and D. Tao. Robust multitask multiview tracking in videos. Neural Networks and Learning Systems, IEEE Transactions on, 26(11):2874–2890, Nov 2015.
  • [27] H. Nam, S. Hong, and B. Han. Online graph-based tracking. In D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, editors, Computer Vision – ECCV 2014, volume 8693 of Lecture Notes in Computer Science, pages 112–126. Springer International Publishing, 2014.
  • [28] S. Oron, A. Bar-Hillel, and S. Avidan. Extended lucas kanade tracking. In European Conference on Computer Vision (ECCV). Springer, 2014.
  • [29] S. Oron, T. Dekel, T. Xu, W. Freeman, and S. Avidan. Best buddies similarity - robust template matching using mutual nearest neighbors. 2016.
  • [30] D. A. Ross, J. Lim, R. Lin, and M. Yang. Incremental learning for robust visual tracking. International Journal of Computer Vision, 77(1-3):125–141, 2008.
  • [31] S. Song and J. Xiao. Tracking revisited using rgbd camera: Baseline and benchmark. arXiv preprint arXiv:1212.2823, 2012.
  • [32] D. Wang, H. Lu, and C. Bo. Fast and robust object tracking via probability continuous outlier model. IEEE Transactions on Image Processing, 24(12):5166–5176, 2015.
  • [33] N. Wang and D.-Y. Yeung. Ensemble-based tracking: Aggregating crowdsourced structured time series data. In ICML, pages 1107–1115, 2014.
  • [34] Y. Wu, J. Lim, and M. Yang. Online object tracking: A benchmark. In CVPR, 2013.
  • [35] Y. Wu, J. Lim, and M.-H. Yang. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9):1834–1848, 2015.
  • [36] J. Zhang, S. Ma, and S. Sclaroff. MEEM: robust tracking via multiple experts using entropy minimization. In Proc. of the European Conference on Computer Vision (ECCV), 2014.
  • [37] K. Zhang, L. Zhang, Q. Liu, D. Zhang, and M.-H. Yang. Fast visual tracking via dense spatio-temporal context learning. In European Conference on Computer Vision, pages 127–141. Springer, 2014.
  • [38] K. Zhang, L. Zhang, and M.-H. Yang. Real-time compressive tracking. In European Conference on Computer Vision, pages 864–877. Springer, 2012.
  • [39] T. Zhang, S. Liu, C. Xu, S. Yan, B. Ghanem, N. Ahuja, and M.-H. Yang. Structural sparse tracking. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
  • [40] H. Zhibin, Z. Chen, C. Wang, X. Mei, D. Prokhorov, and D. Tao. MUlti-Store Tracker (MUSTer): a Cognitive Psychology Inspired Approach to Object Tracking. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, United States, June 2015.
  • [41] W. Zhong, H. Lu, and M. Yang. Robust object tracking via sparsity-based collaborative model. CVPR, 2012.