LSH_kNN_graph
Creation of an approximate kNN graph for diffusion application
view repo
The application of the diffusion in many computer vision and artificial intelligence projects has been shown to give excellent improvements in performance. One of the main bottlenecks of this technique is the quadratic growth of the kNN graph size due to the high-quantity of new connections between nodes in the graph, resulting in long computation times. Several strategies have been proposed to address this, but none are effective and efficient. Our novel technique, based on LSH projections, obtains the same performance as the exact kNN graph after diffusion, but in less time (approximately 18 times faster on a dataset of a hundred thousand images). The proposed method was validated and compared with other state-of-the-art on several public image datasets, including Oxford5k, Paris6k, and Oxford105k.
READ FULL TEXT VIEW PDF
Several computer vision and artificial intelligence projects are nowaday...
read it
Diffusion is commonly used as a ranking or re-ranking method in retrieva...
read it
Diffusion has shown great success in improving accuracy of unsupervised ...
read it
Query expansion is a popular method to improve the quality of image retr...
read it
Dealing with the application of grading colorectal cancer images, this w...
read it
Nonlocal neural networks have been proposed and shown to be effective in...
read it
For several decades, image restoration remains an active research topic ...
read it
Creation of an approximate kNN graph for diffusion application
Content-Based Image Retrieval (CBIR) is concerned with finding the most similar images to a query in an image dataset, selected or photographed by the user. Recent improvements in features extraction through Convolutional Neural Networks (CNN) and algorithms for embedding, like the several R-MAC strategies
[20], [6], [15], have made it possible to obtain excellent results on datasets of hundreds of thousand images in reasonable time [14]. Recently, the application of diffusion process on CBIR datasets have allowed boosting retrieval performance [11]: it permits finding more neighbour, that are close to the query on the nearest-neighbour manifold but not in the Euclidean representation space. Diffusion propagates the similarities from a query point on a pairwise affinity matrix to all the dataset elements
[25]. To apply this process, it is necessary to create a kNN graph of all image embeddings in the dataset. Generally, the more discriminative the embeddings are, the better the results achievable through diffusion.Diffusion is an iterative process that simulates a random walk on the image similarity graph. It consists of walking on the graph, from the query point, with the objective of finding the best path, i.e. to retrieve the neighbours of the query point. This is possible exploiting the weights of the edges of the kNN graph, which indicate the similarity between two nodes: the greater the weight, the more similar the two nodes are.
Unfortunately, this recent method leads to new challenges: understanding the data distribution to correctly set the diffusion parameters, dealing with the size of the kNN graph, which grows quadratically with the dataset size, and reducing the convergence time for the resolution of the linear system related to the diffusion mechanism.
The kNN graph is needed to apply diffusion and the number of the edges in the graph is important for the final retrieval performance. Furthermore, it is impossible to know how many and which edges the graph needs for achieving good performance before applying diffusion. Therefore, a common strategy used in previous works was a brute-force approach, i.e. the graph is created with the connections between all the possible pairs of nodes. Obviously, increasing the number of edges increases the size of the entire graph. For example, considering a dataset composed by images, the exact or brute-force graph will have edges and the approach will have a complexity ; if , the number of edges will be 10 billion.
Several methods proposed to implement an approximated method for the creation of the kNN graph, drastically reducing the computational time [19], [3], [1], [24].
Following this idea that not all the edges are necessary for creating the kNN graph, we propose a fast approach for the creation of an approximate version of the kNN graph, based on LSH projections [9], which maintains only the useful edges to reduce computation time and memory requirements. The new graph achieves the same retrieval results as the exact kNN graph after diffusion [25] on several public image datasets, but does so more efficiently.
The main contributions of this paper are:
An efficient algorithm based on LSH projections for the creation of an approximate kNN graph that obtains the same performance as brute-force in less time.
Several optimizations in the implementation that reduce the computational time and memory use.
The use of the multi-probe LSH [13] for improving diffusion performance on the kNN graph.
Graphs have been used for different tasks in computer vision: applying diffusion for image retrieval [11], unsupervised fine-tuning [10]
, propagating labels for semi-supervised learning
[4], creating manifold embeddings [23], and training classifiers exploiting the cycles found in the graph
[12].In particular, a k-Nearest Neighbor (kNN) graph is an undirected graph denoted by , where represents the set of nodes and represents the set of edges
. The nodes represent all the images in the dataset and the edges represent the connections between nodes. The weight of each edge determines how much the two images are similar: the higher the weight, the more similar the two images are. The weights of the edges are set using cosine similarities between embeddings.
The problem of kNN graph creation has been addressed in the literature in several ways. The simplest and naive approach is brute-force, which tends to be very slow but usually obtains the best results.
To speed up the process while retaining good retrieval accuracy, approximated kNN graphs have been used. The methods for the construction of the approximate kNN graph can be divided in two strategies: methods following the strategy of divide and conquer, and methods using local search (e.g., NN-descent [3]). Divide-and-conquer methods generally consist of two stages: subdividing the dataset in parts (divide), and creating a kNN graph for each sample followed by merging all subgraphs to form the final kNN graph (conquer).
As foreseeable, the performance and the computational time depend on the number of subdivisions. The most famous approach following the divide and conquer is based on Locality Sensitive Hashing (LSH) projections [9] for creating the approximate kNN graph [24]. The authors in [24] used a spectral decomposition of a low-rank graph matrix that needs much time because it is supervised. Chen et al.[1] follow the same strategy, but they apply recursive Lanczos bisection [1]. They proposed two divide steps: the overlap and the glue method. In the former, the current set is divided in two overlapping subsets, while in the latter, the current set is divided into two disjoint subsets and a third, called gluing set, is used to merge the two resulting disjoint subsets. Wang et al.[21] implemented an algorithm for the creation of an approximate kNN graph through the application in several iterations of the divide-and-conquer strategy. The peculiarity of the method is that the subsets of the dataset elements used during the divide phase are randomly chosen. Repeating several times this process allows to theoretically cover the entire dataset. Our approach differs from these approaches in the way the LSH projections are created. Moreover, the strategy followed for creating the graph is different and more efficient.
Regarding the second strategy (local search), Dong et al.[3] proposed an approach called NN-descent [3], based on the idea that “a neighbour of my neighbour is my neighbour.” For each image descriptor, a random kNN list is created. The algorithm starts searching random pairs on the kNN list, then it calculates the similarity between elements and finally updates the kNN list of these elements. This process continues until the number of updates is smaller than a threshold. Obviously, by increasing the number of neighbours contained in the kNN list it tends to the brute-force approach; therefore, the trade-off between the speed and accuracy performance needs to be correctly evaluated. Park et al.[16], Houle et al.[7] and Debatty et al.[2] proposed variations to the NN-descent, by adapting the basic approach to their specific application domains. Sieranoja et al.[19] proposed a solution, called Random Pair Division, that exploits both the divide and conquer and the NN-descent techniques. The division of the dataset in subsamples is executed through the random selection of two dataset descriptors: if the descriptor to be clusterized is close to the first one, it will be put in the first set, otherwise in the second one. After that, all image descriptors are clustered: if the size of each set is greater than a threshold, the subdivision process continues in the same way only for this large set. The conquer phase is executed through the application of the brute-force approach. In the end, a one-step neighbour propagation is applied for improving the final performance. It also exploits the principle of NN-descent: similar nodes that are not connected will be connected.
The proposed approach uses LSH projections to divide the global descriptors of the dataset in many subsets. We first explain LSH and then detail a fast and efficient solution for kNN graph creation for diffusion in image retrieval.
Locality-Sensitive Hashing (LSH) [9]
is one of the first hashing technique proposed for compression and indexing tasks. After the creation of some projection functions, it allows projection of points close to each other into the same bucket with high probability. It is defined as follows
[22]: a family of hash functions is called -sensitive if, for any two items and , it holds that:if
if
with , , is a distance threshold, and is the hash function. In other words, the hash function must satisfy the property to project “similar” items (with a distance lower than the threshold R) to the same bucket with a probability higher than , and have a low probability (lower than ) to do the same for “dissimilar” items (with distance higher than ).
The hash function used in LSH for Hamming space embedding is a scalar projection:
where
is the feature vector and
is a fixed random vector sampled from an-dimensional isotropic Gaussian distribution
.The hashing process is repeated times, with different Gaussian samples to increase the probability of satisfying the above constraints.Subsequently, different hashing techniques may be implemented. The multi-probe LSH [13] has the objective to reduce the number of hash tables used for the projections, exploiting the fundamental principle of LSH that similar items will be projected in the same buckets or in near buckets with high probability. During the search phase, multi-probe LSH checks also the buckets near the query bucket. In the end, this approach allows to improve the final performance, but it increases the computational time.
Given a dataset , composed by images, and a similarity measure , the kNN graph for is a undirected graph , that contains edges between the nodes and with the value from the similarity measure . The similarity measure can be calculated in different ways related to the topic. In this case we use the cosine similarity as a metric, so the similarity is calculated through the application of the dot product between the image descriptors.
Our approach, called LSH kNN graph, follows the idea to first subdivide the entire dataset in many subsets, based on the concept of similarity between the images contained in the dataset. This process is done through the use of LSH projections and allows creation of a set of buckets from several hash tables . The number of buckets depends on the bits used for the projection () and to the number of the hash tables (): . Each of these buckets will contain the projected elements . The result of this process represents an approximate result because it is not generally possible to project every element the dataset into the same bucket an as all its neighbours. It is therefore necessary to find a trade-off between the number of the buckets for each hash table (), modifying the bits used () for the hashing, and the number of hash tables () used for the projection. For this first task, using a small number of buckets allows to project more data in the same bucket. It reduces the computational time for this phase, but it increases the entire computation of the overall approach. On the other hand, a high number of buckets in each hash table increases the computational time of this step, but it reduces overall computation.
For each bucket containing a subset of the dataset, a brute-force graph with edges is constructed. Applying a brute-force construction on many subsets is faster than apply one time the brute-force on the entire dataset. In the end all the subgraphs need to be merged in the final graph .
Unlike the usual LSH [24] based method, the proposed approach does not follow exactly the divide and conquer strategy. LSH projections are applied for dividing the dataset into subsets, but for reducing the computational time it is preferable to start creating the final graph, instead to create many approximate kNN graph and then merge them using the one-step neighbour propagation algorithm. The number of elements to sort in the kNN list and the number of similarity scores to calculate improves the quality of the final graph and the retrieval accuracy, but also reduces the computational time.
We also propose a multi-probe version, called multi LSH kNN graph, to reduce the number of hash tables used. Unlike the classic multi-probe LSH algorithm [13], in which the system checks neighboring buckets during the search phase, here all the elements are projected in the neighbors buckets during the projection phase, but only in the 1-neighbourhood. This represents the set of buckets differing by one bit to the analysed bucket (i.e. with Hamming distance ). More formally, the elements obtained with the application of the multi-probe LSH are the followings:
Note that the number of neighbours of each bucket scales with the bits used for the projection as: . Even though it increases the final retrieval performance, this approach requires more time for the kNN graph creation than the previous one. To get a good trade-off between the computational time required for the similarity measure calculations and the quality of the final graph, only a percentage of the elements projected on the 1-neighbour buckets are retained. The best trade-off is reached using , which means that each element is also projected randomly in the half of its 1-neighbour buckets.
During the conquer phase, as in the previous LSH kNN graph method, all the pairs of the indexes of the images found in the buckets will be connected through the calculation of the similarity measure.
Previous works have evaluated the methods for creating approximate kNN graphs by checking the number of common edges between the approximate and the exact kNN graph. In our case the kNN graph pipelines are evaluated after the diffusion and retrieval modules. The diffusion approach adopted and the parameters configuration is the same of the work of Iscen et al. [11]. The rationale for this lies in our objective to evaluate how effective (and efficient) are our proposals for the approximate kNN graph creation in terms of retrieval accuracy when diffusion is applied. R-MAC descriptors [11] are as image embeddings in all the experiments for the creation of the kNN graphs.
There are many different image datasets for Content-Based Image Retrieval that are used to evaluate algorithms. The most used are:
Oxford5k [17] containing 5063 images, subdivided in 11 classes. All the images are used as database images and there are 55 query images, which are cropped to make the querying phase more difficult;
Paris6k [18] containing 6412 images, subdivided in 12 classes. All the images are used as database images and there are 55 query images, again cropped;
Oxford105k that is a combination of Oxford5k and 100k distractors from Flickr1M[8].
Mean Average Precision (mAP) is used on all datasets to evaluate the retrieval accuracy. mAP identifies how many elements retrieved at different score thresholds are relevant to the query image, and averages this across all queries. We use distances to compare query images with the database.
It is worth emphasizing that there are a lot of null values in the affinity matrix. In fact, on Oxford5k the approximate kNN graph constructed with LSH kNN graph method has only the of the edges of the brute-force graph. Furthermore, not all the similarity measure are useful for the diffusion process, suggesting to remove or avoid to insert edges with weight less than a threshold (), without jeopardizing the final retrieval performance. Hence, each element of the matrix can be represented as following:
From our experiments, this threshold can be set to . Given the high number of null values in the affinity matrix, sparse matrices can be used to reduce the computational time and still obtain good results also on large datasets. Moreover, considering that the matrix is symmetric, only the upper or lower values of the matrix are needed:
If the similarity value is missing, the row and the column are switched.
Two different types of sparse matrix has been tested: Compressed Row Storage (CRS) format and Coordinate (COO) format [5]. The CRS sparse matrix is composed by three vectors: values (containing the values of the dense matrix different from zero); column indexes (containing the column indexes of the elements contained in the values vector); and row pointers (containing the locations of the values vector that indicate the beginning of a new row). Instead, the COO sparse matrix is composed by three vectors: a vector representing the non-zero elements (the values), the row and the column coordinate of each value contained in the values vector. The second solution is simpler than the first to implement, but it requires more space on disk.
However, using hash tables, it happens that the same edge weight is inserted multiple times. Therefore, every time a new value is inserted in a CRS matrix, checking whether the value is already in the matrix might be a possible solution. Unfortunately, this tends to be a time consuming process. Conversely, using a COO matrix, all the values (including repeated ones) are inserted, but a sorting is performed and duplicates are removed. Applying once the sorting and removing the duplicates is faster than performing times the search, given that sorting has a complexity which is lower than the complexity of the search.
Table 1 reports the retrieval results after diffusion application of different kNN graph techniques. Note that changing the values of LSH ( and ) produces different results. The best configuration is and applying the multi LSH kNN graph approach. the best trade-off between the computational time for kNN graph creation and the final retrieval performance is the LSH kNN graph with and . NN-descent produces good results, but it needs a lot of time for the graph creation (55 seconds). Furthermore, it does not obtain results comparable to the brute-force or the LSH kNN graph method.
Method | LSH projection | kNN graph creation | mAP |
LSH kNN graph () | 0.45 s | 0.52 s | 90.95% |
LSH kNN graph () | 0.4 s | 0.95 s | 88.98% |
multi LSH kNN graph () | 0.29 s | 1.54 s | 91.13% |
NN-descent [3]* | - | 55 s | 83.81% |
RP-div [19] (size = 50)* | - | 1.16 s | 82.68% |
Wang et al.[21]* | - | 1.5 s | 90.60% |
brute-force | - | 1.33 s | 90.79% |
RP-div [19] is very fast but collecting random elements from the dataset for the divide task does not give good results in retrieval after diffusion.
The method implemented by Wang et al.[21] obtains a different result each execution, so the reported performance is the average of ten experiments. The approach is very fast, but did not achieve the best mAP. Note also that the brute-force method is executed on GPU, instead all the other methods are executed on CPU.
We also perform some experiments with regional descriptors (Table 2). The use of regional descriptors demonstrates an improvement on the final performance due to high number of descriptors for each image (usually 21). In this case the total number of descriptors used for the creation of the kNN graph are approximately 100K. Note we omit testing RP-div and NN-descent here due to poor previous accuracy/computation performance.
Method | LSH projection | kNN graph creation | mAP |
LSH kNN graph (=6, L=20) | 9 s | 100 s | 94.67% |
LSH kNN graph (=8, L=10) | 6 s | 45 s | 93.68% |
multi LSH kNN graph (=6, L=2) | 6 s | 350 s | 93.96% |
Wang et al.[21]* | - | 148 s | 91.69% |
brute-force | - | 15816 s | 93.80% |
Table 3 shows the results on the Paris6k dataset, which are similar to those obtained to Oxford5k.
Method | LSH projection | kNN graph creation | mAP |
LSH kNN graph () | 1 s | 0.80 s | 97.01% |
LSH kNN graph () | 0.78 s | 0.28 s | 95.93% |
multi LSH kNN graph () | 0.35 s | 2.28 s | 96.81% |
NN-descent [3] (neighbours = 50)* | - | 60.10 s | 94.24% |
RP-div [19] (size = 50)* | - | 3.63 s | 96.25% |
Wang et al.[21]* | - | 1.95 s | 96.75% |
brute-force | - | 1.81 s | 96.83% |
However, in this case, LSH kNN graph is the fastest approach and also it obtains the best retrieval performance after the application of diffusion. Multi-LSH kNN method obtains a good result, but in more time than the brute-force approach.
Table 4 reports results for the experiments executed on Oxford105k. again RP-div and NN-descent are not tested due to poor trade-off previously obtained.
Method | LSH projection | kNN graph creation | mAP |
LSH kNN graph () | 23 s | 77 s | 92.50% |
LSH kNN graph () | 15 s | 145 s | 90.79% |
multi LSH kNN graph () | 5 s | 420 s | 92.85% |
Wang et al.[21]* | - | 150 s | 91.00% |
brute-force | - | 4733 s | 91.45% |
Increasing the dimension of the dataset illustrates the difference in accuracy and computational time between the proposed approach and brute-force. The proposed approaches obtain better results and trade-offs than other methods. In particular, LSH kNN graph ( and ) achieves 92.50% in only 77s for the graph creation process. The multi LSH kNN graph needs more time than the previous approach, but it reaches the best mAP on this dataset equals of 92.85%.
We presented an algorithm called LSH kNN graph for the creation of an approximate kNN graph exploiting LSH projections. First, the elements of the dataset are subdivided in several subsets using an unsupervised hashing function and, then, for each one of the subsets a subgraph is created applying the brute-force approach. The application of this algorithm with sparse matrices achieves very good results even on datasets that with a large number of images. The proposed methods can generate a kNN graph faster than the brute-force approach and other state-of-the-art approaches, obtaining the same or better accuracy results after diffusion. Furthermore, another version of the algorithm called multi LSH kNN graph was proposed, which uses multi-probe LSH instead of LSH for the subdivision of the elements in the subsets, increasing the quality of the final graph due to the greater number of elements found in the buckets of the hash tables. In future work, we are pursuing the distribution these approaches across several machines to allow processing even larger datasets.
This is work is partially funded by Regione Emilia Romagna under the “Piano triennale alte competenze per la ricerca, il trasferimento tecnologico e l’imprenditorialità”.
Chen, J., Fang, H.r., Saad, Y.: Fast approximate kNN graph construction for high dimensional data via recursive lanczos bisection. Journal of Machine Learning Research
10(Sep), 1989–2012 (2009)Douze, M., Szlam, A., Hariharan, B., Jégou, H.: Low-shot learning with large-scale diffusion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3349–3358 (2018)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium on Theory of computing. pp. 604–613. ACM (1998)
Comments
There are no comments yet.