I Introduction
Nearest neighbor search is an essential part of many machine learning algorithms. Often, it is also the most timeconsuming stage of the procedure, see
[1]. In applications areas, such as in recommendation systems, robotics and computer vision, where fast response times are critical, using brute force linear search is often not feasible. This problem is further magnified by the availability of inreasingly complex, highdimensional data sources.
Applications that require frequent NN queries from large data sets are for example object recognition [2, 3], shape recognition [4] and image completion [5] using large databases of image descriptors, and contentbased web recommendation systems [6].
Consequently, there is a vast body of literature on the algorithms for fast nearest neighbor search. These algorithms can be divided into exact and approximate nearest neighbor search. While exact nearest neighbor search algorithms return the true nearest neighbors of the query point, they suffer from the curse of dimensionality: their performance degrades when the dimension of the data increases, rendering them no better than brute force linear search in the highdimensional regime.
In approximate nearest neighbor search, the main interest is the tradeoff between query time and accuracy, which can be measured either in terms of distance ratios or the probability of finding the true nearest neighbors. Several different approaches have been proposed and their implementations are commonly used in practical applications.
In this section, we first define the approximate nearest neighbor search problem, then review the existing approaches to it, and finally outline a new method that avoids the main drawbacks of existing methods.
Ia Approximate nearest neighbor search
Consider a metric space , a data set , a query point , and a distance metric . The nearest neighbor (NN) search is the task of finding the closest (w.r.t. ) points to from the data set , i.e., find a set for which it holds that and
for all , .
In this work, we consider applications where is the dimensional Euclidean space , and is Euclidean distance
An approximate nearest neighbor search algorithm can be divided into two separate phases: an offline phase, and an online phase. In the offline phase an index is built, and the online phase refers to completing fast nearest neighbor queries using the constructed index. In practical applications, the most critical considerations are the accuracy of the approximation (the definition of the accuracy and the required accuracy level depending on the application), and the query time needed to reach it.
As a measure of accuracy, we use recall, which is relevant especially for information retrieval and recommendation settings. It is defined as the proportion of true nearest neighbors of the query point returned by the algorithm:
Here is a set of approximate nearest neighbors returned by the algorithm, and is the set of true nearest neighbors of the query point.
Of secondary importance are the index construction time in the offline stage and the memory requirement, which are usually not visible to the user but must be feasible. Note that in many applications the index must be updated regularly when we want to add new data points into the set from where the nearest neighbors are searched from.
IB Related work
Most effective methods for approximate nearest neighbor search in highdimensional spaces can be classified into either hashing, graph, or spacepartitioning tree based strategies.
IB1 Hashing based algorithms
The most wellknown and effective hashing based algorithms are variants of localitysensitive hashing (LSH) [7, 8, 9]. In LSH, several hash functions of the same type are used. These hash functions are localitysensitive, which means that nearby points fall into the same hash bucket with higher probability than the points that are far away from each other. When answering a NN query, the query point is hashed with all the hash functions, and then a linear search is performed in the set of data points in the buckets where the query point falls into. Multiprobe LSH [10, 11, 12] improves the efficiency of LSH by searching also the nearby hash buckets of the bucket a query point is hashed into. Thus a smaller amount of hash tables is required to obtain a certain accuracy level. The choice of the hash function affects the performance and needs to be done carefully in each case.
IB2 Graph based algorithms
Approximate graph based methods such as [13, 14, 15] build a NNgraph of the data set in an offline phase, and then utilize it to do fast NN queries. For example in [15] several graphs of random subsets of the data are used to approximate the exact NNgraph of the whole data set. The data set is divided hierarchically and randomly into subsets, and the NNgraphs of these subsets are built. This process is then repeated several times, and the resulting graphs are merged and the final graph is utilized to answer nearest neighbor queries. Graph based methods are in general efficient, but the index construction is slow for larger data sets.
IB3 Spacepartitioning trees
The first space partitioningtree based strategy proposed for nearest neighbor search was the d tree [16], which divides the data set hierarchically into cells that are aligned with the coordinate axes. Nearest neighbors are then searched by performing a backtracking or priority search in this tree. d trees and other spacepartitioning trees can be utilized in approximate nearest neighbor search by terminating the search when a predefined number of data points is checked, or all the points within some predefined error bound from the query point are checked [17].
Instead of the hyperplanes, clustering algorithms can be utilized to partition the space hierarchically; for example the
means algorithm is used in means trees [18, 1]. Other examples of clustering based partitioning trees are cover trees [19], VP trees [20] and ball trees [21]. Similarly to graphbased algorithms, clustering based variants have the drawback of long index construction times due to hierarchical clustering on large data sets.
An efficient scheme to increase the accuracy of spacepartitioning trees is to utilize several parallel randomized spacepartitioning trees. Randomized d trees [22, 23]
are grown by choosing a split direction at random from dimensions of the data in which it has the highest variance. Nearest neighbor queries are answered using priority search; a single priority queue, which is ordered by distance of the query point from splitting point in the splitting direction, is maintained for all the trees.
Another variant of randomized spacepartitioning tree is the random projection tree (RP tree), which was first proposed for vector quantization in
[24], and later modified for approximate nearest neighbor search in [25]. In random projection trees the splitting hyperplanes are aligned with the random directions sampled from the unit sphere instead of the coordinate axes. Nearest neighbor queries are answered using defeatist search: first the query point is routed down in several trees, and then a brute force linear search is performed in the union of the points of all the leaves the query point fell into.IC MRPT algorithm
One of the strengths of randomized spacepartitioning trees is in their relative simplicity compared to hashing and graph based methods: they have only few tunable parameters, and their performance is quite robust with respect to them. They are also perfectly suitable for parallel implementation because the trees are independent, and thus can be easily parallelized either locally or over the network with minimal communication overhead.
However, both of the aforementioned variants of randomized spacepartitioning trees have a few weak points which limit their efficiency and scalability. First, randomization used in randomized d trees does not make the trees sufficiently decorrelated to reach high levels of accuracy efficiently. Randomization used in RP trees is sufficient, but it comes at high computational cost: computation of random projections is timeconsuming for highdimensional data sets because the random vectors have the same dimension as the data.
Second, both the priority queue search used in [22, 23] and the defeatist search used in [25] require checking a large proportion of the data points to reach high accuracy levels; this leads to slow query times. Third, because each node of an RP tree has its own random vector, the number of random vectors required is exponential with respect to tree depth. In the highdimensional case this on the one hand increases the memory required by the index unnecessarily, and on the other hand slows down the index construction process.
Because of the first two reasons the query times of the randomized spacepartitioning trees are not competitive with the fastest graph based methods (cf. experimental results in Section IV).
In this article we propose a method that uses multiple random projection trees (MRPT); it incorporates two additional features that lead to fast query times and accurate results. The algorithm has the aforementioned strengths of randomized spacepartitioning trees but avoids their drawbacks.
More specifically, our contributions are:

We show that sparse random projections can be used to obtain a fast variant of randomized spacepartitioning trees.

We propose the MRPT algorithm that combines multiple trees by voting search as a new and more efficient method to utilize randomized spacepartitioning trees for approximate nearest neighbor search.

We present a time and space complexity analysis of our algorithm.

We demonstrate experimentally that the MRPT algorithm with sparse projections and voting search outperforms the stateoftheart methods using several realworld data sets across a wide range of sample size and dimensionality.
In the following sections we describe the proposed method by first revisiting the classic RP tree method [25], and then describing the novel features that lead to increased accuracy and reduced memory footprint and query time.
Ii Index construction
Iia Classic random projection trees
At the root node of the tree, the points are projected onto a random vector generated from the
dimensional standard normal distribution
. The data set is then split into two child nodes using the median of the points in the projected space: points whose projected values are less or equal to the median in the projected space are routed into the left child node, and points with projected values greater than the median are routed into the right child node. So the pointsfall into the left branch, and the points
fall into the right branch. Here denotes the ordering of the points with respect to the values of their projections . This process is then repeated recursively for the subsets of the data in both of the child nodes, until a predefined depth is met.
For highdimensional data, the computation of random projections is slow, and the memory requirement for storing the random vectors is high. To reduce both the computation time and the memory footprint, we propose two improvements:

Instead of using dense vectors sampled from the dimensional normal distribution, we use sparse vectors to reduce both storage and computation complexity.

Instead of using different projection vectors for each intermediate node of a tree, we use one projection vector for all the nodes on the same level of a tree, so that we can further reduce the storage requirement and maximize lowlevel parallelism through vectorization.
In addition, instead of splitting at a fractile point chosen at random from the interval , as suggested in [25], we split at the median to make the performance more predictable and to enable saving the trees in a more compact format.
In the following subsections we briefly discuss the details of the above improvements. Pseudocode for constructing the index using sparse RP trees is given in detail in Algorithms 1–2 below. The proposed Algorithm 1 consists of three embedded for loops. The outermost loop (line 4  15) builds RP trees by continuously calling the GROW_TREE function in Algorithm 2. For each individual RP tree, the two inner for
loops (line 6  12) prepare a sparse random matrix
which will be used for projecting the data set to using random vectors (at line 13). Specifically, the innermost loop is for constructing a dimensional sparse vector for a given tree level. In Algorithm 2, the GROW_TREE function constructs a binary search tree by recursively splitting the highdimensional space into subspaces using the previous projection .IiB Sparse random projections
With highdimensional data sets, computation of the random vectors easily becomes a bottleneck on the performance of the algorithm. However, it is not necessary to use random vectors sampled from the dimensional standard normal distribution to approximately preserve the pairwise distances between the data points. Achlioptas [26] shows that the approximately distancepreserving lowdimensional embedding of JohnsonLindenstrausslemma is obtained also with sparse random vectors with components sampled from with respective probabilities . Li et al. [27] prove that the same components with respective probabilities , where is the dimension of the data, can be used to obtain a fold speedup without significant loss in accuracy compared to using normally distributed random vectors.
We use sparse random vectors , whose components are sampled from the standard normal distribution with probability , and are zeros with probability :
The sparsity parameter can be tuned to optimize performance but we have observed that recommended in [27] tends to give nearoptimal results in all the data sets we tested, which suggests that further finetuning of this parameter is unnecessary. This proportion is small enough to provide significant computational savings through the use of sparse matrix libraries.
IiC Compactness and speed with fewer vectors
In classic RP trees, a different random vector is used at each inner node of a tree, whereas we use the same random vector for all the sibling nodes of a tree. This choice does not affect the accuracy at all, because a query point is routed down each of the trees only once; hence, the query point is projected onto a random vector sampled from the same distribution at each level of a tree. This means that the query point is projected onto i.i.d. random vectors in both scenarios.
An RP tree has inner nodes; therefore, if each node of a tree had a different random vector as in classic RP trees, different random vectors would be required for one tree. However, when a single vector is used on each level, only vectors are required. This reduces the amount of memory required by the random vectors from exponential to linear with respect to the depth of the trees.
Having only random vectors in one tree also speeds up the index construction significantly. While some of the observed speedup is explained by a decreased amount of the random vectors that have to be generated, mostly it is due to enabling the computation of all the projections of the tree in one matrix multiplication: the projected data set can be computed from the data set and a random matrix as
Although the total amount of computation stays the same, in practice this speeds up the index construction significantly due to the cache effects and lowlevel parallelization through vectorization.
IiD Time and space complexity: Index construction
At each level of each tree the whole data set is projected onto a dimensional random vector that has on average nonzero components, so the expected^{1}^{1}1The following complexity results hold exactly, if the algorithm is modified so that each random vector has exactly nonzero components instead of the expected number of nonzero components being . index construction time is . For classic RP trees () this is , but for sparse trees () this is only .
For each tree, we need to store the points allocated to each leaf node, so the memory required by the index is at least . At each node only a split point is saved; this does not increase the space complexity because there are only inner nodes in one tree.
The expected amount of memory required by one random vector is when random vectors are saved in sparse matrix form. This is for dense RP trees, and for sparse RP trees with the sparsity parameter fixed to . Because an RP tree has inner nodes, the memory requirement for classic RP trees, which have a different random vector for each node of a tree, is . However, in our version, in which a single vector is used on each level, there are only vectors; hence, the memory requirement for sparse RP trees is .
Iii Query phase
In many approximate nearest neighbor search algorithms the query phase is further divided into two steps: a candidate generation step, and an exact search step.
In the candidate generation step, a candidate set , for which usually , is retrieved from the whole data set, and then in the exact search step approximate nearest neighbors of a query point are retrieved by performing a brute force linear search in the candidate set. In the MRPT algorithm, the candidate generation step consists of traversal of trees grown in the index construction phase.
The leaf to which a query point belongs to is retrieved by first projecting at the root node of the tree onto the same random vector as the data points, and then assigning it into the left or right branch depending on the value of the projection. If it is smaller than or equal to the cutpoint (median of the data points belonging to that node in the projected space) saved at that node, i.e.
the query point is routed into the left child node, and otherwise into the right child node. This process is then repeated recursively until a leaf is met.
The query point is routed down into a leaf in all the trees obtained in the index construction phase. The query process is thus far similar to the one described in [25]. The principal difference is the candidate set generation: in classic RP trees, the candidate set includes all the points that belong to the same leaf with the query point in at least one of the trees. A problem with this approach is that when a high number of trees are used in the tree traversal step, the size of the candidate set becomes excessively large.
In the following, we show how the frequency information (i.e., how frequently a point falls into the same cell as query point) can be utilized to improve both query performance and accuracy.
Iiia Voting search
Assume that we have constructed RP trees of depth . Each of them partitions into cells (leaves) , all of which contain or data points. For , let be an indicator function of data point and the query , which returns , if and reside in the same cell in tree , and otherwise:
Further, let be a count function of data point , which returns the number of trees in which and belong to the same leaf:
We propose a simple but effective voting search where we choose into the candidate set only data points residing in the same leaf as the query point in at least trees:
The vote threshold is a tuning parameter. A lower threshold value yields higher accuracy at the expense of increased query times.
This further pruning of the candidate set utilizes the intuitive notion that the closer the data point is to the query point, the more probably an RP tree divides the space so that they both belong to the same leaf. We emphasize that our voting scheme is not restricted to RP trees, but it can be used in combination with any form of spacepartitioning algorithms, given that there is enough randomness involved in the process to render the partitions sufficiently independent.
IiiB Time and space complexity: Query execution
When the trees are grown into some predetermined depth using the median split, the expected running time of the tree traversal step is ^{2}^{2}2Adding the leaf points into the candidate set for each trees adds a term into the time complexity of the tree traversal phase for defeatist search. Because checking the vote count is a constant time operation this term is the same for voting search. However, in both cases these terms are dominated by the exact search in the search set, which is an operation, so they are not taken into account here., because at each level of each RP tree the query point is projected onto a dimensional random vector that has on average nonzero components. When using random vectors sampled from the dimensional standard normal distribution, this is , but when using value for the sparsity parameter, it is only .
When the median split is used, each leaf has either or data points. Therefore, the size of the final search set satisfies for both defeatist and voting search. Increasing the vote threshold decreases the size of the search set in practice, which explains the significantly reduced query times when using voting search (cf. experimental results in Section IV). However, because the effect of on depends on the distribution of the data, a detailed analysis of the time complexity requires further assumptions on the data source and is not addressed here. Without any assumptions, both defeatist search and voting search have the same theoretical upper bound for .
Because of this upper bound for the size of the candidate set, the running time of the exact search in the candidate set is for both defeatist search and voting search. Thus, the total query time is for classic RP trees, and for the MRPT algorithm.
RP trees  MRPT  

Query time  
Index construction time  
Index memory 
Iv Experimental results
We assess the efficiency of our algorithm by comparing its against the stateoftheart approximate nearest neighbor search algorithms. To make the comparison as practically relevant as possible, we chose widely used methods that are available in optimized libraries.
All of the compared libraries, including ours, are implemented in C++ and compiled with similar optimizations. We make our comparison and the implementation of our algorithm available as opensource^{3}^{3}3https://github.com/ejaasaari/mrptcomparison.
All of the experiments were performed on a single computer with two Intel Xeon E5540 2.53GHz CPUs and 32GB of RAM. No parallelization beyond that achieved by the use of linear algebra libraries, such as caching and vectorization of matrix operations, was used in any of the experiments.
Iva Data sets
We made the comparisons using several realworld data sets (see Table II) across a wide range of sample size and dimensionality. The data sets used are typical in applications in which fast nearest neighbor search is required. For all data sets, we use a disjoint random sample of 100 test points to evaluate the query time and recall.
The BIGANN and GIST data sets^{4}^{4}4http://corpustexmex.irisa.fr/ consist of local 128dimensional SIFT descriptors and global 960dimensional color GIST descriptors extracted from images [28, 29], respectively. As the SIFT data set, we use a random sample of data points from the BIGANN data set.
In addition to feature descriptor data sets, we consider three image data sets for which no feature extraction has been performed: 1) the MNIST data set
^{5}^{5}5http://yann.lecun.com/exdb/mnist/ consists of pixel grayscale images each of which is represented as a vector of dimension ; 2) the Trevi data set^{6}^{6}6http://phototour.cs.washington.edu/patches/default.htm consists of pixel grayscale image patches represented as a vector of dimension [30]; and the STL10 data set^{7}^{7}7http://cs.stanford.edu/~acoates/stl10, which is an image recognition data set containing pixel images from 10 different classes of objects [31].We also use a news data set that contains web pages from different news feeds converted into a term frequencyinverse document frequency (TFIDF) representation. Dimensionality of the data is reduced to by applying latent semantic analysis (LSA) to the TFIDF data. More elaborate description of the preprocessing of the news data set is found in [6].
Finally, we include results for a synthetic data set comprised of random vectors sampled from the 4096dimensional standard normal distribution and normalized to unit length.
Data set  type  

GIST  1000000  960  image descriptors 
SIFT  2500000  128  image descriptors 
MNIST  60000  784  image 
Trevi  101120  4096  image 
STL10  100000  9216  image 
News  262144  1000  text (TFIDF + LSA) 
Random  50000  4096  synthetic 
IvB Compared algorithms
We compare the MRPT algorithm to representatives from all three major groups of approximate nearest neighbor search algorithms: tree, graph and hashingbased methods.
To test the efficiency of our proposed improvements to classic RP trees, we include our own implementation^{8}^{8}8The version of RP trees used in the comparisons is already a slightly improved version: we used RP trees with the same random vector at every level and median split (cf. section II). If we had used a different random vectors at each node of the tree we would have ran out of memory on the larger data sets. However, we emphasize that using the same random vector at every level actually slightly decreases the query times while keeping the accuracy intact; hence, the real performance gap between RP trees and the MRPT algorithm is actually even wider than shown in the results. of classic RP trees suggested in [25], and an MRPTalgorithm with vote threshold , which amounts to using sparse RP trees without the voting scheme.
Of the treebased methods we include randomized d trees [22] and hierarchical means trees [23]. Authors of [1] suggest that these two algorithms give the best performance on data sets with a wide range of dimensionality. Both of the algorithms are implemented in FLANN^{9}^{9}9http://www.cs.ubc.ca/research/flann/.
We also use as a benchmark the classic d tree algorithm [16] modified for approximate nearest neighbor search as described in [17]; it is implemented in the ANN library^{10}^{10}10http://www.cs.umd.edu/~mount/ANN/.
Among the graphbased methods, the NNdescent algorithm suggested in [14] is generally thought to be a leading method. It first constructs a NN graph using local search in the offline phase, and then utilizes this graph for fast NN queries in the online phase. The algorithm is implemented in the KGraph library^{11}^{11}11http://www.kgraph.org.
As a representative of hashingbased solutions we include a multiprobe version of crosspolytope LSH suggested in [12] as implemented in the FALCONN library^{12}^{12}12http://www.falconnlib.org. All these algorithms and implementations were selected because they represent the stateoftheart in their respective categories.
For the most important tuning parameters (as stated by the authors), we use grid search on the appropriate ranges to find the optimal parameter combinations in the experiment, or the default values whenever provided by the authors. The performance of the MRPT algorithm is controlled by the number of trees, the depth of the trees, the sparsity parameter , and the vote threshold. We fix the sparsity parameter as , and optimize the other parameters using the same grid search procedure as for the other methods. For ANN, we vary the error bound , and for FLANN, we control the accuracy primarily by tuning the maximum number of leafs to search, but also vary the number of trees for d trees and the branching factor for hierarchical means trees. KGraph is primarily controlled by the search cost parameter, but we also build the graph with varying number of nearest neighbors for each node. For FALCONN, we search for the optimal number of hash tables and hash functions. The supplementary material on GitHub provides the exact parameter ranges of grid search as well as interactive graphs that show the optimal values for each obtained result.
IvC Results
In the experiments we used three different values of that cover the range used in typical applications: and . However, due to space restrictions, we present results for multiple values of only for MNIST, which is representative of the general trend. For the other six data sets we only provide results for . (The results for are available in the supplementary material on GitHub).
Figure 1 shows results for on six data sets. The times required to reach a given recall level are shown in Table III. The MRPT algorithm is significantly faster than other treebased methods and LSH on all of the data sets, expect SIFT. It is worth noting that SIFT has a much lower dimension () than the other data sets. These results suggest that the proposed algorithm is more suitable for highdimensional data sets.
For the MNIST data set, results for all three values of (and additionally for ) are shown in Figure 2. For the other data sets the results for and follow a similar pattern as the results for the MNIST data set: the choice of has no significant effect to the relative performance of the other methods, but the relative performance of KGraph seems to degrade significantly for small values of .
KGraph is the fastest method on low recall levels, but its performance degrades rapidly on high recall levels (over 90%), thus making the MRPT algorithm the fastest method on very high recall levels (over 95%) on all the data sets except SIFT. Our hypothesis is that the flexibility obtained on the one hand by using completely random split directions, and on the other hand by using a high number of trees, which is enabled by sparsity and voting, allows the MRPT algorithm to obtain high recall levels for highdimensional data sets very efficiently.
The performance of the MRPT algorithm is also superior to classic RP trees on all of the data sets. In addition, from our two main contributions, voting seems the be more important than sparsity with respect to query times: sparse RP trees are only slightly faster than dense RP trees, the gap being somewhat larger on the higherdimensional data sets, but the voting search provides a marked improvement on all data sets. This shows that the voting search, especially when combined with sparsity, is an efficient way to reduce query time without compromising accuracy.
The numerical values in Table III indicate that for recall levels 90%, the MRPT method is the fastest in 14 out of 21 instances, while the KGraph method is fastest in 7 out 21 cases. Compared to brute force search, MRPT is roughly 25–100 times faster on all six realworld data sets at 90% recall level, and roughly 10–40 times faster even at the highest 99% recall level.
V Conclusion
We propose the multiple random projection tree (MRPT) algorithm for approximate nearest neighbor search in high dimensions. The method is based on combining multiple sparse random projection trees using a novel voting scheme where the final search is focused to points occurring most frequently among points retrieved by multiple trees. The algorithm is straightforward to implement and exploits standard fast linear algebra libraries by combining calculations into large matrix–matrix operations.
We demonstrate through extensive experiments on both real and simulated data that the proposed method is faster than stateoftheart spacepartitioning tree and hashing based algorithms on a variety of accuracy levels. It is also faster than a leading graph based algorithm (KGraph) on high accuracy levels, while being slightly slower on low accuracy levels. The good performance of MRPT is especially pronounced for highdimensional data sets.
Due to its very competitive and consistent performance, and simple and efficient index construction stage — especially compared to graphbased algorithms — the proposed MRPT method is an ideal method for a wide variety of applications where highdimensional large data sets are involved.
time (s), 100 queries  

data set  R (%)  ANN  FALCONN  FLANNkd  FLANNkmeans  KGraph  RPtrees  MRPT()  MRPT  brute force 
MNIST  80  0.89  0.21  0.02  0.02  0.02  0.09  0.05  0.02  2.59 
90  1.57  0.36  0.04  0.04  0.04  0.16  0.08  0.03  
95  2.29  0.55  0.06  0.05  0.06  0.2  0.14  0.04  
99  3.46  1.23  0.15  0.1  0.44  0.39  0.22  0.07  
News  80  2.15  0.39  0.56  0.26  0.05  0.71  0.64  0.12  23.09 
90  4.42  0.88  2.92  0.45  0.11  1.37  1.05  0.2  
95  7.35  1.23  9.55  1.39  0.25  1.98  1.62  0.28  
99  11.83  2.58  38.81  7.5  3.24  4.3  3.87  0.5  
GIST  80  20.54  21.21  6.03  1.86  0.59  4.77  4.19  1.03  52.98 
90  36.18  29.19  13.35  3.63  1.87  7.69  7.54  1.83  
95  48.13  37.92  24.41  6.22  7.94  13.6  11.89  2.91  
99  76.57  50.29  59.12  14.14    24.11  20.33  6.1  
SIFT  80  0.81  0.93  0.21  0.09  0.05  0.4  0.38  0.24  21.32 
90  1.67  1.47  0.41  0.18  0.11  0.71  0.59  0.31  
95  3.1  2.06  0.84  0.31  0.18  0.92  0.97  0.37  
99  7.88  3.2  2.81  0.86  0.35  3.22  2.16  0.62  
Trevi  80  5.5  14.62  0.97  0.7  0.16  1.48  0.94  0.27  22.15 
90  12.43  17.88  1.81  1.01  0.75  2.44  1.46  0.45  
95  18.64  18.8  3.02  1.69  18.68  3.74  2.15  0.67  
99    25.54  10.54  7.61    8.13  5.25  2.07  
STL10  80  31.31  18.62  4.29  2.04  0.18  5.46  3.06  0.88  49.43 
90  39.38  28.62  8.76  2.72  0.39  9.45  6.32  1.52  
95  45.0  31.46  13.39  3.53  1.13  11.62  8.58  2.43  
99  49.67  45.42  29.96  6.1  32.12  18.56  16.89  4.28  
Random  80  23.86  10.1  12.55  9.73  7.59  8.55  8.2  4.55  10.9 
90  27.33  12.27  13.91  11.48  9.8  10.2  9.8  6.9  
95  29.07  14.28  14.5  12.55  10.93  10.88  10.71  8.6  
99  30.47  17.03  15.37  13.7  12.02  11.59  11.42  10.36 
References
 [1] M. Muja and D. G. Lowe, “Scalable nearest neighbor algorithms for high dimensional data,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 36, no. 11, pp. 2227–2240, 2014.

[2]
D. Nister and H. Stewenius, “Scalable recognition with a vocabulary tree,” in
2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)
, vol. 2. IEEE, 2006, pp. 2161–2168.  [3] D. G. Lowe, “Distinctive image features from scaleinvariant keypoints,” International journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004.
 [4] Y. Amit and D. Geman, “Shape quantization and recognition with randomized trees,” Neural computation, vol. 9, no. 7, pp. 1545–1588, 1997.
 [5] J. Hays and A. A. Efros, “Scene completion using millions of photographs,” in ACM Transactions on Graphics (TOG), vol. 26, no. 3. ACM, 2007, p. 4.
 [6] L. Wang, S. Tasoulis, T. Roos, and J. Kangasharju, “Kvasir: Scalable provision of semantically relevant web content on big data framework,” IEEE Transactions on Big Data, vol. Advance online publication. DOI:10.1109/TBDATA.2016.2557348, 2016.

[7]
P. Indyk and R. Motwani, “Approximate nearest neighbors: towards removing the
curse of dimensionality,” in
Proceedings of the thirtieth annual ACM symposium on Theory of computing
. ACM, 1998, pp. 604–613.  [8] A. Gionis, P. Indyk, R. Motwani et al., “Similarity search in high dimensions via hashing,” in VLDB, vol. 99, no. 6, 1999, pp. 518–529.
 [9] A. Andoni and P. Indyk, “Nearoptimal hashing algorithms for approximate nearest neighbor in high dimensions,” in Foundations of Computer Science, 2006. FOCS’06. 47th Annual IEEE Symposium on. IEEE, 2006, pp. 459–468.
 [10] Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li, “Multiprobe lsh: efficient indexing for highdimensional similarity search,” in Proceedings of the 33rd international conference on Very large data bases. VLDB Endowment, 2007, pp. 950–961.
 [11] W. Dong, Z. Wang, W. Josephson, M. Charikar, and K. Li, “Modeling lsh for performance tuning,” in Proceedings of the 17th ACM conference on Information and knowledge management. ACM, 2008, pp. 669–678.
 [12] A. Andoni, P. Indyk, T. Laarhoven, I. Razenshteyn, and L. Schmidt, “Practical and optimal lsh for angular distance,” in Advances in Neural Information Processing Systems 28. Curran Associates, Inc., 2015, pp. 1225–1233.

[13]
K. Hajebi, Y. AbbasiYadkori, H. Shahbazi, and H. Zhang, “Fast approximate
nearestneighbor search with knearest neighbor graph,” in
IJCAI ProceedingsInternational Joint Conference on Artificial Intelligence
, vol. 22, no. 1, 2011, p. 1312.  [14] W. Dong, C. Moses, and K. Li, “Efficient knearest neighbor graph construction for generic similarity measures,” in Proceedings of the 20th international conference on World wide web. ACM, 2011, pp. 577–586.
 [15] J. Wang, J. Wang, G. Zeng, Z. Tu, R. Gan, and S. Li, “Scalable knn graph construction for visual descriptors,” in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012, pp. 1106–1113.
 [16] J. L. Bentley, “Multidimensional binary search trees used for associative searching,” Communications of the ACM, vol. 18, no. 9, pp. 509–517, 1975.
 [17] S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu, “An optimal algorithm for approximate nearest neighbor searching fixed dimensions,” Journal of the ACM (JACM), vol. 45, no. 6, pp. 891–923, 1998.
 [18] K. Fukunaga and P. M. Narendra, “A branch and bound algorithm for computing knearest neighbors,” IEEE transactions on computers, vol. 100, no. 7, pp. 750–753, 1975.
 [19] A. Beygelzimer, S. Kakade, and J. Langford, “Cover trees for nearest neighbor,” in Proceedings of the 23rd international conference on Machine learning. ACM, 2006, pp. 97–104.
 [20] P. N. Yianilos, “Data structures and algorithms for nearest neighbor search in general metric spaces,” in SODA, vol. 93, no. 194, 1993, pp. 311–21.
 [21] B. Leibe, K. Mikolajczyk, and B. Schiele, “Efficient clustering and matching for object class recognition.” in BMVC, 2006, pp. 789–798.
 [22] C. SilpaAnan and R. Hartley, “Optimised kdtrees for fast image descriptor matching,” in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008, pp. 1–8.
 [23] M. Muja and D. G. Lowe, “Fast approximate nearest neighbors with automatic algorithm configuration.” VISAPP (1), vol. 2, pp. 331–340, 2009.
 [24] S. Dasgupta and Y. Freund, “Random projection trees for vector quantization,” IEEE Transactions on Information Theory, vol. 55, no. 7, pp. 3229–3242, 2009.
 [25] S. Dasgupta and K. Sinha, “Randomized partition trees for nearest neighbor search,” Algorithmica, vol. 72, no. 1, pp. 237–263, 2015.
 [26] D. Achlioptas, “Databasefriendly random projections,” in Proceedings of the twentieth ACM SIGMODSIGACTSIGART symposium on Principles of database systems. ACM, 2001, pp. 274–281.
 [27] P. Li, T. J. Hastie, and K. W. Church, “Very sparse random projections,” in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2006, pp. 287–296.
 [28] H. Jégou, M. Douze, and C. Schmid, “Product quantization for nearest neighbor search,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 1, pp. 117–128, 2011.
 [29] H. Jegou, R. Tavenard, M. Douze, and L. Amsaleg, “Searching in one billion vectors: Rerank with source coding,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, 2011, pp. 861–864.
 [30] S. A. J. Winder and M. A. Brown, “Learning local image descriptors,” in 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 2007.
 [31] A. Coates, A. Y. Ng, and H. Lee, “An analysis of singlelayer networks in unsupervised feature learning,” in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011, 2011, pp. 215–223.