1 Introduction
Who is thy neighbor? The question is universal and old as the Bible. In computer vision, images are typically converted into a highdimensional vector known as image descriptors.
Neigborness of images is defined as distances between their respective descriptors. This approach has had mixed success. Descriptors excel at nearestneighbors retrieval applications. However, descriptor distances are rarely effective in other neighbor based machine learning tasks like clustering.Distributionclusters from a subset of Flickr11k [43]  
Affinity matrix before and after clustering 
Conventional wisdom suggests poor clustering performance is due to two intrinsic factors. a) Images are the product of a complex interplay of geometric, illumination and occlusion factors. These are seldom constant, causing even images of the same location to vary significantly from each other. The extreme variability makes clustering difficult. This can be understood mathematically as each image being an instance of some distribution. The variability causes image datasets to be chaotic. Here, chaotic is defined as having distributions whose mean separation is significantly smaller than their standard deviation. Clustering chaotic data is illposed because data points of different distributions mingle. This can be alleviated by enhancing invariance with higher dimensional image descriptors. However, it leads to a second problem. b) As dimensions increase, “contrastloss”
[3, 10, 17] occurs. Distances between points tend to a constant, with traditional clustering metrics becoming illdefined [3, 5]. This is considered part of the curse of dimensionality
[17, 1].We offer a different perspective in which “contrastloss” is not a problem but the solution to clustering chaotic data. The core idea is simple. What was previously interpreted as “contrastloss” is actually the law of large numbers causing instances of a distribution to concentrate on a thin “hypershell”. The hollow shells mean data points from apparently overlapping distributions do not actually mingle, making choatic data intrinsically separable. We encapsulate this constraint into a second order cost that treats the rows of an affinity matrix as identifiers for instances of the same distribution. We term this distributionclustering.
Distributionclustering is fundamentally different from traditional clustering as it can disambiguate chaotic data, selfdetermine the number of clusters and is intrinsically robust to “outliers” that form their own clusters. This mindbending result provides an elegant solution to a problem previously deemed intractable. Thus, we feel it fair to conclude that “contrastloss” is a blessing rather than a curse.
1.1 Related Works
To date, there are a wide variety of clustering algorithms [32, 26, 19, 15, 21, 38] customized to various tasks. A comprehensive survey is provided in [9, 42]. Despite the variety, we believe distributionclustering is the first to utilize the peculiarities of high dimensional space. The result in a fundamentally different clustering algorithm.
This work is also part of ongoing research in the properties of high dimensional space. Pioneering research began with Beyer et al.’s [10] discovery of “contrastloss”. This was interpreted as an intrinsic hindrance to clustering and machine learning [3, 10, 17], motivating the development of subspace clustering [35, 18], projectiveclustering [2, 27], and other techniques [44, 28, 20] for alleviating “contrastloss”. This simplistic view has begun to change, with recent papers observing that “contrastloss” can be beneficial in detecting outliers [37, 47], cluster centroids [40] and scoring clusters [40]. These results indicate a gap in our knowledge but a highlevel synthesis is still lacking.
While we do not agree with Aggarwal et al.’s [3] interpretation of “contrastloss”, we are inspired by their attempts to develop a general intuition about the behavior of algorithms in highdimensional space. This motivates us to analyze the problem from both intuitive and mathematical perspectives. We hope it contributes to the general understanding of high dimensional space.
Within the larger context of artificial intelligence research, our work can be considered research on similarity functions surveyed by Cha
[13]. Unlike most other similarity functions, ours is statistical in nature, relying on extreme improbability of events to achieve separability. Such statistical similarity has been used both explicitly [11] and implicitly [30, 14, 36, 46, 39] in matching and retrieval. Many of these problems may be reformulatable in terms of the law of large numbers, “contrastloss” and highdimensional features. This is a fascinating and as yet unaddressed question.Finally, distributionclustering builds on decades of research on image descriptors [33, 25, 6] and normalization [24, 7]. These works reduce variation, making the law of large numbers more impactful at lower dimensions. As distributionclustering is based on the law of large numbers, its performance is correspondingly enhanced.
2 Visualizing High Dimensions
Our intuition about space was formed in two and three dimensions and is often misleading in high dimensions. In fact, it can be argued that the “contrastloss” curse ultimately derives from misleading visualization. This section aims to correct that.
At low dimensions, our intuition is that solids with similar parameters have significant volumetric overlap. This is not true in high dimensions.
Consider two high dimensional hyperspheres which are identical except for a small difference in radius. Their volume ratio is
(1) 
which tends to zero as the number of dimensions, . This implies almost all of a sphere’s volume is concentrated at its surface. Thus, small changes in either radius or centroid cause apparently overlapping spheres to have near zero intersecting volume as illustrated in Fig. 2, i.e., they become volumetrically separable! A more rigorous proof can be found in [23].
Intriguingly, instances of a distribution behave similarly to a hypersphere’s volume. Section 3.3 shows that when distributions have many independent dimensions, their instances concentrate on thin “hypershells”. Thus, instances of apparently overlapping distributions almost never mingle. This makes clustering chaotic data by distribution a wellposed problem, as illustrated in Fig. 3.
3 DistributionClustering (theory)
Images are often represented as high dimensional feature vectors, such as the dimensional NetVLAD [6] descriptor. This section shows how we can create indicators to group images based on their generative distributions.
Definition 1.

denotes a set of consecutive positive integers from to ;

denotes a normalized squared norm operator, i.e., for , .
Let denote a dimensional random vector where
is a random variable,

operator can also be applied on a random vectors . is a random variable formed by averaging ’s squared elements.

is a vector of each dimension’s expectation;

With a slight abuse of notation, we define , as the average variance over all dimensions.
3.1 Passive Sensing Model
Many data sources (like cameras) can be modeled as passive sensors. Data points (like image descriptors) , are instances of random vectors representing environmental factors that influence sensory outcome, e.g., camera position, time of day, weather conditions. As sensing does not influence the environment, all random vectors are mutually independent. Our goal is to cluster instances by their underlying distributions.
3.2 Quasiideal Features
An ideal feature descriptor has statistically independent dimensions. However, this is hard to ensure in practice. A more practical assumption is the quasiindependence in condition 1.
Condition 1.
Quasiindependent: A set of random variables are quasiindependent, if and only if, as , each random variable has finite number of pairwise dependencies. That is, let be the set of all pairwise dependent variables and be an indicator function, there exists such that
Quasiindependence is approximately equivalent to requiring information increases proportionally with number of random variables. When the random variables are concatenated into a feature, we term it quasiideal.
Condition 2.
Quasiideal: A dimensional random vector is quasiideal, if and only if, as , the variance of all its elements are finite and the set of all its elements, , is quasiindependent.
Treating the links of an infinitely long Markov chain as feature dimensions would create a quasiideal feature. This is useful in computer vision, as pixel values have Markov like properties of some statistical dependence on neighbors but long range statistical independence. Hence, many image based descriptors can be modeled as quasiideal.
Practicality aside, quasiideal features have useful mathematical properties, as they permit the law of large numbers to apply to distance metrics. This leads to interesting results summarized in Lemmas 1 and 2.
Lemma 1.
Let be a quasiideal random vector with dimension . The normalized squared norm of any instance is almost surely a constant:
(2) 
Proof.
As is quasiideal, the set of squared elements form a covariance matrix where the sum of elements in any row is bounded by some positive real number , i.e.,
(3) 
This implies
(4) 
Thus, as , variance tends to zero. ∎
Lemma 2.
Let and be statistically independent, quasiideal random vectors. As the dimensions .
(5) 
where
Proof.
As and are quasiideal, random vector is also quasiideal. Using Lemma 1, we know ’s variance tends to zero. The expression for its mean is:
∎
Lemma 2 is similar in spirit to Beyer et al.’s [10] “contrastloss” proof. However, it accommodates realizations from different distributions, introduces a more practical quasiindependence assumption and is simpler to derive.
Unlike [3], we consider “contrastloss” an opportunity not a liability. Lemma 2 proves that distance between instances almost always depend only on the mean and variances of the underlying distributions and not on instances’ values. This makes distance between instances a potential proxy for identifying their underlying distributions.
3.3 Distributionclusters
Identifying data points from “similar” distributions requires a definition of “similarity”. Ideally, we would follow Lemma 2’s intuition and define “similarity” as having the same mean and average variance. However, the definition needs to accommodate dimensions tending to infinity. This leads to a distributioncluster based “similarity” definition.
Let be a set of independent dimensional random vectors. such that the random vectors satisfy quasiideal conditions.
Condition 3.
Distributioncluster: forms a distributioncluster if and only if:

The normalized squared norm distance between any two distribution mean is zero, i.e.,
(6) 
All distributions have the same average variance, i.e.,
(7)
As dimensions tend to infinity, instances of a distributioncluster concentrate on a “thinshell”. This is proved in theorem 1 and validates Fig. 3’s intuition. The “hollowcenter” means data points from apparently overlapping distributions almost never mingle, creating the potential for clustering chaotic data.
Theorem 1.
If is a distributioncluster with average variance , the normalized squared distance of its instances from the cluster centroid will almost surely be , i.e., ’s instances form a thin annulus about it’s centroid.
3.4 Grouping Data by Distribution
We seek to group a set of data points by their underlying distributionclusters. This is achieved by proving that data points of a distributioncluster share unique identifiers that we term clusterindicators.
Theorem 2.
Clusterindicator: Let be a quasiideal random vector that is independent of all ’s random vectors.
forms a distributioncluster (c.f condition 3), if and only if, for any valid random vector , there exists a real number such that
(9) 
Proof.
First, the if part is proved. Given is a distributioncluster, Eq. (9) is a direct result of Lemma 2, where the distance between instances of quasiideal distributions are almost surely determined by the distributions’ mean and average variances.
Moving on to the only if proof, where it is given that satisfies Eq. (9)’s clusterindicator. W.l.o.g, we consider only elements . Let be independent but identically distributed with .
From Lemma 2, we know that

where

where
Equation (9) means , implying:
(10) 
Similarly, treating as independent but identically distributed with implies
(11) 
This proves that are members of a distributioncluster, (c.f condition 3). Repeating the process with all element pairs of will show they belong to one distributioncluster. This completes the only if proof.
∎
As argued in Sec. 3.1, image descriptors can be modeled as instances of independent, quasiideal random vectors, i.e., a set of image descriptors can be considered instances of the respective random vectors in . Theorem 2 implies that descriptors from the same distributioncluster will (almost surely) be equidistance to any other descriptor. Further, it is a unique property of descriptors from the same distributioncluster. This allows descriptors to be unambiguously assigned to distributionclusters. In summary, distributionclustering of images (and other passive sensing data) is a wellposed problem, per the definition in McGrawHill dictionary of scientific and technical terms [34]:

A solution exist. This follows from Theorem 2’s if condition where clusterindicators almost surely (in practice it can be understood as surely) identify all data points of a distributioncluster;

A solution is unique. This follows from Theorem 2’s only if condition which means clusterindicators almost never confuse data points of different distributions. This can also be understood as proving intrinsic separability of instances from different distributionclusters;
4 DistributionClustering (practical)
Our goal is to use theorem 2’s clusterindicators to group data points by their underlying distributions. From theorem 2, we know that if are instances of the same distributioncluster, the affinity matrix’s rows/ columns will be near identical. To exploit this, we define second order features as columns of the affinity matrix. Clustering is achieved by grouping secondorder features.
4.1 Secondorder Affinity
Let be a set of realizations, with an associated affinity matrix :
(12) 
The columns of are denoted as . Treating columns as features yields a set of secondorder features . The elements of encodes the distance between vector and all others in .
From theorem 2, we know that if and only if the distributions underlying come from the same distributioncluster, all their elements, except the and entries, are almost surely identical. This is encapsulated as a secondorder distance:
(13) 
which should be zero if belong to the same distributioncluster. The presence of clusters of identical rows causes the postclustering affinity matrix to display a distinctive blocky pattern shown in Fig. 1.
Second order distance can be embedded in existing clustering algorithms. For techniques like spectralclustering
[45] which require an affinity matrix, a secondorder affinity matrix is defined as :If is large, . This allows second order
features to be used directly in clustering algorithms like kmeans, which require feature inputs.
Incorporating secondorder constraints into a prior clustering algorithm does not fully utilize theorem 2. This is because realizations of the same distributioncluster have zero secondorderdistance, while most clustering algorithms only apply a distance penalty. This motivates an alternative solution we term distributionclustering
4.2 Implementing Distributionclustering
Distributionclustering can be understood as identifying indices whose mutual secondorder distance is near zero. These are grouped into one cluster and the process repeated to identify more clusters.
An algorithmic overview is as follows. Let be the indices of its smallest offdiagonal entry of affinity matrix . If are instances of a distributioncluster, Lemma 2 states the average cluster variance is . Thus they are the dataset’s lowest average variance distributioncluster. Initialize as a candidate distributioncluster. New members are recruited by finding vectors whose average secondorder distance from all distributioncluster candidates is less than threshold . If a candidate distributioncluster grows to have no less than members, accept it. Irrespective of the outcome, remove from consideration as candidate clusters. Repeat on unclustered data till all data points are clustered or it is impossible to form a candidate cluster. Some data may not be accepted in any cluster and remain outliers. Details are in Algorithm 1. For whitened descriptors [24, 7], typical parameters are .
Relative to other clustering techniques, distributionclustering has many theoretical and practical advantages:

Clustering chaotic data is a wellposed problem (c.f. Sec. 3.4);

No predefinition of cluster numbers is required;

Innate robustness to “outliers” which form their own clusters.
5 Clustering
Simulation Results
use quasiideal features created from a mixture of uniform and Gaussian distributions. To evaluate the effect of increasing dimensionality, the number of dimensions is increased from
to . Two sets are evaluated. The “Easy” set has wide separation of underlying distributions while the “Difficult” set has little separation. Results are presented in Fig. 4. We compare three different distance measures on kmeans clustering [31, 8]: norm, norm and our proposed secondorder distance in Eq. (13). We also compare spectral clustering [45] with and secondorder distance. Finally, we provide a system to system comparison between our distributionclustering, kmeans and spectralclustering. At low dimensions, the secondorder distance gives results comparable to other algorithms. However, performance steadily improves with number of dimensions. Notably, only algorithms which employ secondorder distance are effective on the “Difficult” set. This validates the theoretical prediction that (the previously illposed problem of) clustering chaotic data is made wellposed by the secondorder distance.To study the effect of mean separation on clustering performance, we repeat the previous experiment under similar conditions, except the number of dimensions are kept constant and the mean separation progressively reduced to zero. Results are presented in Fig. 5. Note that secondorder distance ensures clustering performance is relatively invariant to mean separation.
Difficult  
Easy 
Misclassified vs. Separation of distribution centers 
Real Images
with NetVLAD [6] as image descriptors are used to evaluate clustering on 5 datasets: Handwritten numbers in Mnist [29]; A mixture of images from Google searches for “Osaka castle”’ and “Christ the redeemer statue”; 2 sets of 10 object types from CalTech 101 [22]
; And a mixture of ImageNet
[16] images from the Lion, Cat, Tiger classes. Distributionclustering is evaluated against five baseline techniques: Kmeans [31, 8], spectralclustering [45], projectiveclustering [4], GMM [12] and quickshift [41]. For kmeans and GMM, the number of clusters is derived from distributionclustering. This is typically . Spectral and projectiveclustering are prohibitively slow with many clusters. Thus, their cluster numbers are fixed at .Cluster statistics are reported in Tab. 1. On standard silhouette and purity scores, distributionclustering’s performance is comparable to benchmark techniques. The performance is decent for a new approach and validates Theorem 2’s “contrastloss” constraint in the realworld. However, an interesting trend hides in the average statistics.
Breaking down the purity score to find the percentage of images deriving from pure clusters, i.e., clusters with no wrong elements, we find that distributionclustering assigns a remarkable fraction of images to pure clusters. On average, it is times better than the next best algorithm and in some cases can nearly double the performance. This is important to dataabstraction where pure clusters allow a single averagefeature to represent a set of features. In addition, distributionclustering ensures pure clusters are readily identifiable. Figure 6 plots percentage error as clusters are processed in order of variance. Distributionclustering keeps “outliers” packed into highvariance clusters, leaving lowvariance clusters especially pure. This enables concepts like image “oversegmentation” to be transferred to unorganized image sets.
Kmeans and GMM are the closest alternative to distributionclustering. However, their clusters are less pure and they are dependent on distributionclustering to initialize the number of clusters. This makes distributionclustering one of the few (only?) methods effective on highly chaotic image data like Flickr11k [43] demonstrated in Fig. 1.
Minst  Internet Images 
CalTech101 (Set1)  CalTech101 (Set2) 
Cats 
Silhouette Score  

Dataset  KMeans  Spectral  PROCLUS  GMM  QS  Ours 
[31, 8]  [45]  [4]  [12]  [41]  
MNIST  0.0082  0.0267  0.0349  0.0076  0.0730  0.038 
Internet  0.084  0.041  0.038  0.0963  0.0488  0.003 
CalTech1  0.027  0.02  0.0045  0.0373  0.0999  0.074 
CalTech2  0.028  0.084  0.0186  0.0248  0.0350  0.042 
Cats  0.007  0  0.002  0.0236  0.0752  0.0239 
Average  0.0308  0.034  0.020  0.0379  0.0532  0.036 
Purity Score  
Dataset  KMeans  Spectral  PROCLUS  GMM  QS  Ours 
MNIST  0.77  0.45  0.41  0.81  0.55  0.79 
Internet  0.96  0.90  0.86  0.97  0.82  0.97 
CalTech1  0.71  0.44  0.36  0.80  0.31  0.82 
CalTech2  0.84  0.83  0.41  0.88  0.29  0.87 
Cats  0.88  0.63  0.50  0.90  0.35  0.93 
Average 
0.83  0.65  0.51  0.87  0.46  0.88 
of images in pure clusters (excluding singletons)  
Dataset  KMeans  Spectral  PROCLUS  GMM  QS  Ours 
MNIST  0.28  0  0  0.32  0.059  0.49 
Internet  0.83  0.82  0.40  0.83  0.031  0.92 
CalTech1  0.08  0.02  0.01  0.30  0.081  0.52 
CalTech2  0.47  0.24  0  0.52  0.16  0.65 
Cats  0.34  0  0  0.40  0.017  0.72 
Average  0.40  0.22  0.082  0.47  0.070  0.66 
of pure clusters (excluding singletons)  
Dataset  KMeans  Spectral  PROCLUS  GMM  QS  Ours 
MNIST  0.43  0  0  0.44  0.62  0.49 
Internet  0.80  0.90  0.8  0.83  0.40  0.93 
CalTech1  0.30  0.20  0.09  0.46  0.75  0.52 
CalTech2  0.54  0.20  0  0.57  0.67  0.67 
Cats  0.57  0  0  0.55  0.50  0.74 
Average  0.53  0.32  0.178  0.57  0.59  0.67 
Timing
excludes feature extraction cost which is common to all algorithms. Experiments are on an i7 machine, with
dimension NetVlad [6] features computed over the images of Internet dataset. Our single core, Matlab implementation of distributionclustering takes seconds, of which seconds was spent computing the affinity matrix. Timing for other algorithms are as follows. Kmeans [31, 8]: seconds, Quick Shift [41] (Python): seconds, spectral clustering [45] (20 clusters): seconds, GMM^{1}^{1}1GMM’s timing is with covariance estimation. Fixed covariance matrix permits convergence in seconds but is inappropriate on some data. [12]: minutes and PROCLUS [4] (20 clusters on OpenSubspace V3.31): 9 minutes.Qualitative
inspection of distributionclusters show they have a purity not captured by quantitative evaluation. Figure 7 illustrates this on Colosseum images crawled from the web. Quantitatively, both distributionclustering and kmeans are nearly equal, with few clusters mixing Colosseum and “outlier” images. However, distributionclusters are qualitatively better, with images in a cluster sharing a clear, generative distribution.
Other
things readers may want to note. More qualitative evaluation of clustering is available in the supplementary. Code is available at http://www.kindofworks.com/.
Distributionclustering  Kmeans clustering 
6 Conclusion
We have shown that chaotically overlapping distributions become intrinsically separable in high dimensional space and proposed a distributionclustering algorithm to achieve it. By turning a former curse of dimensionality into a blessing, distributionclustering is a powerful technique for discovering patterns and trends in raw data. This can impact a wide range of disciplines ranging from semisupervised learning to bioinformatics.
Acknowledgment
This paper is dedicated to Professor Douglas L. Jones, Professor Minh N. Do and HongWei Ng.
Y. Matsushita is supported by JST CREST Grant Number JP17942373, Japan.
References
 [1] https://en.wikipedia.org/wiki/Curse_of_dimensionality. Accessed: 20171110.
 [2] P. K. Agarwal and N. H. Mustafa. Kmeans projective clustering. In Proceedings of ACM SIGMODSIGACTSIGART symposium on Principles of Database Systems, pages 155–165, 2004.
 [3] C. C. Aggarwal, A. Hinneburg, and D. A. Keim. On the surprising behavior of distance metrics in high dimensional spaces. In Proceedings of International Conference on Database Theory (ICDT), volume 1, pages 420–434, 2001.
 [4] C. C. Aggarwal, J. L. Wolf, P. S. Yu, C. Procopiuc, and J. S. Park. Fast algorithms for projected clustering. In Proceedings of the ACM SIGMOD International Conference on Management of Data, number 2, pages 61–72, 1999.
 [5] C. C. Aggarwal and P. S. Yu. Outlier detection for high dimensional data. In Proceedings of the ACM SIGMOD International Conference on Management of Data, number 2, pages 37–46, 2001.

[6]
R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic.
Netvlad: CNN architecture for weakly supervised place recognition.
In
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, pages 5297–5307, 2016.  [7] R. Arandjelović and A. Zisserman. Three things everyone should know to improve object retrieval. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2911–2918, 2012.
 [8] D. Arthur and S. Vassilvitskii. kmeans++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACMSIAM symposium on Discrete algorithms, pages 1027–1035, 2007.
 [9] P. Berkhin et al. A survey of clustering data mining techniques. Grouping Multidimensional Data, 25:71, 2006.
 [10] K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is nearest neighbor meaningful? In Proceedings of International Conference on Database Theory (ICDT), pages 217–235, 1999.
 [11] J. Bian, W.Y. Lin, Y. Matsushita, S.K. Yeung, T.D. Nguyen, and M.M. Cheng. Gms: Gridbased motion statistics for fast, ultrarobust feature correspondence. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2828–2837, 2017.
 [12] C. Bishop. Pattern recognition and machine learning. Springer, 2007.

[13]
S.H. Cha.
Comprehensive survey on distance/similarity measures between probability density functions.
International Journal of Mathematical Models and Methods in Applied Sciences, 1(300–307):1, 2007.  [14] A. Delvinioti, H. Jégou, L. Amsaleg, and M. E. Houle. Image retrieval with reciprocal and shared nearest neighbors. In Proceedings of International Conference on Computer Vision Theory and Applications (VISAPP), volume 2, pages 321–328, 2014.
 [15] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the royal statistical society. Series B (methodological), pages 1–38, 1977.
 [16] J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, and L. FeiFei. Imagenet: A largescale hierarchical image database. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255, 2009.
 [17] P. Domingos. A few useful things to know about machine learning. Communications of the ACM, 55(10):78–87, 2012.
 [18] E. Elhamifar and R. Vidal. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11):2765–2781, 2013.
 [19] M. Ester, H.P. Kriegel, J. Sander, X. Xu, et al. A densitybased algorithm for discovering clusters in large spatial databases with noise. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, volume 96, pages 226–231, 1996.
 [20] D. Francois, V. Wertz, and M. Verleysen. The concentration of fractional distances. IEEE Transactions on Knowledge and Data Engineering, 19(7):873–886, 2007.
 [21] M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM (JACM), 42(6):1115–1145, 1995.
 [22] G. Griffin, A. Holub, and P. Perona. Caltech256 object category dataset. 2007.

[23]
J. Hopcroft and R. Kannan.
Foundations of data science.
2014.  [24] H. Jégou and O. Chum. Negative evidences and cooccurences in image retrieval: The benefit of pca and whitening. Proceedings of European Conference on Computer Vision (ECCV), pages 774–787, 2012.
 [25] H. Jégou, M. Douze, C. Schmid, and P. Pérez. Aggregating local descriptors into a compact image representation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3304–3311, 2010.
 [26] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu. An efficient kmeans clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7):881–892, 2002.
 [27] H.P. Kriegel, P. Kröger, and A. Zimek. Clustering highdimensional data: A survey on subspace clustering, patternbased clustering, and correlation clustering. ACM Transactions on Knowledge Discovery from Data (TKDD), 3(1):1, 2009.
 [28] H.P. Kriegel, A. Zimek, et al. Anglebased outlier detection in highdimensional data. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 444–452, 2008.
 [29] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
 [30] W.Y. Lin, S. Liu, N. Jiang, M. N. Do, P. Tan, and J. Lu. Repmatch: Robust feature matching and pose for reconstructing modern cities. In Proceedings of European Conference on Computer Vision (ECCV), pages 562–579, 2016.
 [31] S. P. Lloyd. Least squares quantization in pcm. IEEE Transactions on Information Theory, 28:129–137, 1982.

[32]
A. Y. Ng, M. I. Jordan, and Y. Weiss.
On spectral clustering: Analysis and an algorithm.
In Proceedings of Advances in Neural Information Processing Systems (NIPS), pages 849–856, 2002.  [33] A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3):145–175, 2001.
 [34] S. P. Parker. McGrawHill dictionary of scientific and technical terms. McGrawHill, 1984.
 [35] L. Parsons, E. Haque, and H. Liu. Subspace clustering for high dimensional data: a review. ACM SIGKDD Explorations Newsletter, 6(1):90–105, 2004.
 [36] D. Qin, S. Gammeter, L. Bossard, T. Quack, and L. Van Gool. Hello neighbor: Accurate object retrieval with kreciprocal nearest neighbors. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 777–784, 2011.
 [37] M. Radovanović, A. Nanopoulos, and M. Ivanović. Reverse nearest neighbors in unsupervised distancebased outlier detection. IEEE transactions on Knowledge and Data Engineering, 27(5):1369–1382, 2015.
 [38] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.
 [39] M. Shi, Y. Avrithis, and H. Jégou. Early burst detection for memoryefficient image retrieval. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 605–613, 2015.
 [40] N. Tomasev, M. Radovanovic, D. Mladenic, and M. Ivanovic. The role of hubness in clustering highdimensional data. IEEE Transactions on Knowledge and Data Engineering, 26(3):739–751, 2014.
 [41] A. Vedaldi and S. Soatto. Quick shift and kernel methods for mode seeking. Proceedings of European Conference on Computer Vision (ECCV), 2008.
 [42] D. Xu and Y. Tian. A comprehensive survey of clustering algorithms. Annals of Data Science, 2(2):165–193, 2015.
 [43] Y. H. Yang, P. T. Wu, C. W. Lee, K. H. Lin, W. H. Hsu, and H. H. Chen. Contextseer: context search and recommendation at query time for shared consumer photos. In Proceedings of ACM international conference on Multimedia, pages 199–208, 2008.
 [44] D. Yu, X. Yu, and A. Wu. Making the nearest neighbor meaningful for time series classification. In Proceedings of International Congress on Image and Signal Processing, pages 2481–2485, 2011.
 [45] L. ZelnikManor and P. Perona. Selftuning spectral clustering. In Proceedings of Advances in Neural Information Processing Systems (NIPS), pages 1601–1608, 2005.
 [46] S. Zhang, M. Yang, T. Cour, K. Yu, and D. N. Metaxas. Query specific fusion for image retrieval. In Proceedings of European Conference on Computer Vision (ECCV), pages 660–673. 2012.
 [47] A. Zimek, E. Schubert, and H.P. Kriegel. A survey on unsupervised outlier detection in highdimensional numerical data. Statistical Analysis and Data Mining: The ASA Data Science Journal, 5(5):363–387, 2012.
Comments
There are no comments yet.