1 Introduction
For a query , maximum inner product search (MIPS) finds an item that maximizes in a dataset containing items [18]
. MIPS has a number of applications in recommender systems, computer vision and machine learning. Examples include recommendation based on user and item embeddings learned via matrix factorization
[11], object matching with visual descriptor [5], memory network training [3][10]. In practice, it is usually required to find the top items having the largest inner product with . When the dataset is large and the dimension (i.e., ) is high, exact MIPS is usually too costly and finding approximate MIPS (i.e., items with inner product close to the maximum) suffices for most applications. Therefore, we focus on approximate MIPS in this paper.Related work. Due to its broad range of applications, many algorithms for MIPS have been proposed. Treebased methods such as cone tree [18] and PCA tree [2] were first used but they suffer from poor performance on high dimensional datasets. Locality sensitive hashing (LSH) based methods, such as ALSH [19], SimpleLSH [17] and NormRange LSH [22], transform MIPS into Euclidean or angular similarity search and reuse existing hash functions. LEMP [20] and FEXIPRO [12] target exact MIPS and adopt various pruning rules to avoid unnecessary computation. Maximus [1] shows that the pruningbased methods do not always outperform bruteforce linear scans using optimized computation libraries.
ipNSW. In a proximity graph, each item is connected to some items that are most similar to it w.r.t. a given similarity function [8]. A similarity search query is processed by a walk in the graph, which keeps moving towards items that are most similar to the query. Proximity graph based methods achieve excellent recalltime performance^{1}^{1}1Recalltime performance measures the time taken to reach a given recall for query processing. for Euclidean distance nearest neighbor search (Euclidean NNS) and an number of variants have been proposed [6, 9, 21]. Among them, the navigable small word graph (NSW) [14] and its hierarchical version (HNSW) [13] represent the stateoftheart and we introduce NSW in greater details in Section 3. Morozov and Babenko (morozov:graphmips) showed that NSW also works well for MIPS. They proposed the ipNSW algorithm, which directly uses inner product as similarity function to construct and search NSW. ipNSW outperforms all existing MIPS algorithms (including those mentioned in the related work) by a large margin in terms of recalltime performance and the speedup can be an order of magnitude for achieving the same recall [16].
In spite of its excellent performance, there lacks a good understanding why ipNSW works well for MIPS. Morozov and Babenko (morozov:graphmips) proved that a greedy walk in the proximity graph will find the exact MIPS of a query if the graph is the Delaunay graph for inner product. Nevertheless, the ipNSW graph is only an approximation of the Delaunay graph, which contains much more edges than the ipNSW graph. It is not clear how accurately the ipNSW graph approximates the Delaunay graph and how the quality of the approximation affects the performance of ipNSW. Moreover, their theory does not provide insights on how to improve the performance of ipNSW. For proximity graph based similarity search algorithms, a rigorous theoretical justification is usually difficult due to the complexity of real datasets. In this case, an intuitive explanation is helpful if it leads to a better understanding of the algorithm and provides insights for performance improvements.
Contributions.
We make three main contributions in this paper. Firstly, we identify an important property of the MIPS problem — strong norm bias, which means large norm items are much more likely to be the result of MIPS. Although it is common sense that MIPS is biased towards large norm items, the interesting thing is the intensity of the norm bias we observed. In the four datasets we experimented, items ranking top 5% in norm occupy at least 87.5% and as high as 100% of the top10 MIPS result. We also found that a skewed norm distribution, in which some items have much larger norm than others, is not a must for the strong norm bias to appear, and the large cardinality of modern datasets is also an important reason behind the strong norm bias.
Secondly, we explain the excellent performance of ipNSW as matching the norm bias of the MIPS problem. We found that items with large norm have much higher indegree than the average in the proximity graph built by ipNSW and a graph walk spends a dominant portion of its computation on these items. Therefore, ipNSW performs well for MIPS because it effectively avoids unnecessary computation on smallnorm items, which are unlikely to be the results of MIPS.
Thirdly and most importantly, we propose the ipNSW+ algorithm, which significantly improves the performance of ipNSW. We found that the norm bias in ipNSW can harm the performance of MIPS by spending computation on many large norm items that do not have a good inner product with the query. To tackle this problem, we introduce an additional angular proximity graph in ipNSW+ and utilize the fact that items pointing to similar direction are likely to share similar MIPS neighbors. By retrieving the MIPS neighbors of the angular neighbors of the query, ipNSW+ avoids computation on both small norm items and large norm items that do not have a good inner product with the query. To our knowledge, ipNSW+ is the first similarity search algorithm that uses two proximity graphs constructed from different similarity functions. Experimental results show that ipNSW+ not only significantly outperforms ipNSW but also provides more robust performance under different data distributions.
2 Norm Bias in MIPS
Dataset  # items  # dimensions 

Yahoo!Music  136,736  300 
WordVector  1,000,000  300 
ImageNet  2,340,373  150 
Tiny5M  5,000,000  384 
In this section, we show that there exists strong norm bias in the MIPS problem. We also argue that large dataset cardinality also contributes to the norm bias.
To find out to what extent norm affects an item’s chance of being the result of MIPS, we conducted the following experiment. We used four datasets, i.e., Yahoo!Music, WordVector, ImageNet and Tiny5M. Some statistics of the datasets can be found in Table 1 and more details are given in Section 5. For each dataset, we found the exact top10 MIPS ^{2}^{2}2Choosing top10 MIPS is not arbitrary as it is widely adopted in related works such as ALSH [19], SimpleLSH [17] and QUIP [guo：quantization]. result of 1,000 randomly selected queries using linear scan, which gave us a result set containing 10,000 items (duplicate items exist as an item can be in the results of multiple queries). We also partitioned the items into groups according to their norm, e.g., items ranking top 5% in norm and items ranking top 20%25% in norm. Finally, for items in each norm group, we calculated the percentage they occupy in the result set, which is plotted in Figure 1.
Figure 1 shows that items with large norm are much more likely to be the result of MIPS. Specifically, items ranking top 5% in norm take up 89.5%, 87.5%, 93.1% and 100% in the ground truth top10 MIPS results for Yahoo!Music, WordVector, ImageNet and Tiny5M, respectively. One may conjecture that the norm bias is caused by skewed norm distribution, in which the top ranking items have much larger norm than the others. We plot the norm distribution of the datasets in Figure 2 and it shows that this conjecture does not hold for Yahoo!Music and Tiny5M, in which most items have a norm close to the maximum. In fact, the 95% percentile^{3}^{3}3We define , the % percentile of the norm distribution, as . of the norm distribution is only 1.16 times of the median norm for Yahoo!Music (1.37 for Tiny5M). Theorem 1 also shows that skewed norm distribution alone is not enough to explain the strong norm bias we observed.
Theorem 1.
For two independent random vectors
and in , the entries of are independent and for with , the entries of are also independent and for . For a query , we have .The proof can be found in the supplementary material. Intuitively, Theorem 1 quantifies how likely larger norm will result in larger inner product. As and , the norm of is roughly times of . We constrain the inner products to be nonnegative because negative inner product is not interesting for many practical applications such as recommendation. is a function of and we plot its curve in Figure 3
using numerical integration. The results show that larger norm only brings a modest probability (comparing with 0.5) of having larger inner product. For example, the probability of having larger inner product is only 0.56 with
. Recall that the 95% percentile norm is 1.16 times of the median for Yahoo!Music and . However, the observed norm bias (items ranking top 5% in norm take up 89.5% of the top10 MIPS result for Yahoo!Music) is much stronger than that is predicted by the norm distribution and this is also true for WordVector, ImageNet and Tiny5M.We find that large dataset cardinality also contributes to the norm bias. Consider an item with modest norm and there are items having larger norm than in the dataset. Item only has a probability of to be the MIPS of a query (if we assume all items are independent), in which and is the th item that has larger norm than . As and is large for large datasets, the probability is very small. This explanation suggests that the norm bias is stronger for larger datasets even if the norm distribution is the same. To validate, we uniformly sample the ImageNet dataset and plot the percentage that items ranking top 5% in norm occupy in the top10 MIPS result in Figure 3. Note that uniform sampling ensures that the shape of the norm distribution is the same across different sampling rate but a lower sampling rate results in smaller dataset cardinality. The results show that the top norm items take up a greater portion of the MIPS results under larger dataset cardinality, which validates our analysis. Our explanation justifies the extremely strong norm bias observed on the Tiny5M dataset even if its norm distribution is not skewed. Moreover, this explanation also implies that strong norm bias may be a universal phenomenon for modern datasets as they usually have large cardinality.
3 Understanding the Performance of ipNSW
In this section, we briefly introduce the ipNSW algorithm and show that ipNSW has excellent performance because it matches the strong norm bias of the MIPS problem.
3.1 Nsw
The query processing and index construction procedures of NSW are shown in Algorithm 1 and Algorithm 2, respectively. In Algorithm 1, a graph walk for a similarity search query starts at an entry vertex (chosen randomly or deterministically) and keeps probing the neighbors of the unchecked vertex that is most similar to in the candidate pool . The size of the candidate pool, , controls the quality of the search results and the graph walk is more likely to get stuck at local optimum under small ^{4}^{4}4A graph walk with is usually called greedy search..
For index construction, NSW does not require each item to connect to its exact top neighbors in the dataset. Items are inserted sequentially into the graph in Algorithm 2 and Algorithm 1 is used to find the approximate top
neighbors for an item in the current graph. Therefore, constructing NSW is much more efficient than constructing an exact knearest neighbor graph (knn graph). ipNSW builds and searches the graph using inner product
as the similarity function. We omit some details in Algorithm 1 and Algorithm 2 for conciseness, for example, ipNSW actually adopts multiple hierarchical layers of NSW (known as HNSW) to improve performance. Readers may refer to [13] for more details.3.2 Norm Bias in ipNSW
We built ipNSW graphs for the four datasets in Table 1 and plot the average indegree for items in each norm group in Figure 4. The results show that the large norm items have much higher indegrees than the average. To be more specific, the average indegrees for items ranking top 5% in norm are 3.2, 8.0, 11.1 and 19.8 times of the dataset average for Yahoo!Music, WordVector, ImageNet and Tiny5M, respectively. This is not surprising as the large norm items are more likely to have large inner product with other items as shown in Section 2. The insertion based graph construction procedure of ipNSW may also contribute to the skewed indegree distribution. A new item builds its connections by checking the neighbors of existing items and the initially inserted items are likely to connect to the large norm items, which means that graph construction tend to amplify the indegree skewness. Having large indegrees means that the large norm items are wellconnected in the ipNSW graph, which makes it more likely for a graph walk to reach them.
To better understand a walk in the ipNSW graph, we conducted MIPS using ipNSW for 1,000 randomly selected queries. We recorded the id of the item when inner product was computed, and plot the percentage of inner product computation conducted on items in each norm group in Figure 5. The results show that most of the inner product computation was conducted on the large norm items. For Yahoo!Music, WordVector, ImageNet and Tiny5M, items ranking top 5% in norm take up 80.7%, 93.1%, 88.6% and 100% of the inner product computation. Compared with the indegree distributions in Figure 4, the computation distributions are even more biased towards the large norm items. This suggests that a walk in the ipNSW graph reaches the large norm items very quickly and keeps moving among these items. With these results, we can conclude that ipNSW is also biased towards the large norm items, in terms of both connectivity and computation. The norm bias of ipNSW allows it to effectively avoid unnecessary computation on small norm items that are unlikely to be the result of MIPS. Therefore, ipNSW has excellent performance mainly because it matches the strong norm bias of the MIPS problem.
4 The ipNSW+ Algorithm
In this section, we present the ipNSW+ algorithm, which is motivated by an analysis indicating that the norm bias of ipNSW can lead to inefficient MIPS.
4.1 Motivation
We have shown in Section 3 that ipNSW has a strong norm bias, which helps to avoid computation on small norm items. However, this norm bias can result in inefficient MIPS and we illustrate this point with an example in Figure 6, in which is an MIPS neighbor of and is an MIPS neighbor of . As and are the MIPS neighbors of some item, they usually have large norm due to the norm bias of the MIPS problems but the angles ( and ) are not necessarily small, especially when the norm of and are very large. Suppose that is the query and the graph walk is now at , in the next step, the graph walk will compute but may not have a good inner product with due to the large angle (i.e., ) between them. This example shows that ipNSW may spend computation on many large norm items that do not have a good inner product with the query because the large norm items are well connected in the ipNSW graph.
The problem of ipNSW is caused by the rule it adopts — the MIPS neighbor of an MIPS neighbor is also likely to be an MIPS neighbor, which is not necessarily true. To improve ipNSW, we need a new rule that satisfies two requirements. First, it should match the norm bias of the MIPS problems and avoid computation on small norm items, which ipNSW does well. Second, it should also avoid computation on large norm items that do not have a good inner product with the query, which is the main problem of ipNSW.
We propose an alternative rule — the MIPS neighbor of an angular neighbor is likely to be an MIPS neighbor, which satisfies the two requirements. We define the angular similarity between two vector and as ^{5}^{5}5Angular similarity is usually defined as but is monotone w.r.t. the true angular similarity and cheaper to compute. Thus, we refer to it as angular similarity and use it in implementation. and say that is an angular neighbor of if is large. Specifically, this rule says that for a query and its angular neighbor in a dataset, if is an MIPS neighbor of in the dataset, then is likely to be large. We provide an illustration of this rule in Figure 6. In the figure, is an MIPS neighbor of , thus usually has large norm, meeting the first requirement. The angle is usually not too large as is an angular neighbor of and is small, and thus is likely to be large, meeting the second requirement. Theorem 2 formally establishes that an MIPS neighbor of an angular neighbor is a good MIPS neighbor with an assumption about .
Theorem 2.
For two vectors and in having an angular similarity , a third vector and the entries of are independent and for , given , we have .
The proof can be found in the supplementary material. If is a query and is the angular neighbor of in the dataset, which means that is large. If and are both in the dataset and is an MIPS neighbor of , we have , in which is large. Given these conditions and using Theorem 2, we have , which means the inner product between and
follows a Gaussian distribution. The mean of the distribution (
) is large as both andare large. Moreover, the variance of the distribution (
) is small as is large. Therefore, there is a good chance that is a large.Theorem 2 is also supported by empirical results from the following experiment. We conducted search for 1,000 randomly selected queries on Yahoo!Music and ImageNet. For each query, we find its ground truth top10 angular neighbors in the dataset and for each of these angular neighbors, we find its ground truth top10 MIPS neighbors in the dataset. This procedure gives us a result set containing 100 candidates (with possible duplication) for each query, which can be used to calculate the recall for top10 MIPS. The average recalls were 82.67% and 97.22% for Yahoo!Music and ImageNet, respectively, which suggests that aggregating the MIPS neighbors of the angular neighbors can obtain a good recall for MIPS. In contrast, aggregating the MIPS neighbors of the groundtruth top10 MIPS neighbors of a query only provide a recall of 67.21% for ImageNet.
4.2 ipNSW+
Based on the new rule presented in Section 4.1, we present the query processing procedure of ipNSW+ in Algorithm 3.
To find the angular neighbors of the query, ipNSW+ searches an angular NSW graph because NSW provides excellent performance on many similarity search problems. Instead of finding the exact inner product neighbors of the angular search results, ipNSW+ uses their neighbors in the inner product graph as an approximation. After the initialization (line 25 in Algorithm 3), the candidate queue already contains a good portion of the MIPS result for the query and the time spent to find them by graph walk on ipNSW can be saved. To further refine the result in , a standard graph walk on the inner product graph is conducted in line 6 of Algorithm 3.
For index construction, ipNSW+ builds and simultaneously and the items are inserted sequentially (in a random order) into the two graphs. For an item , it is first inserted into with Algorithm 2 using angular similarity as the similarity function. Then, is inserted into and the neighbors of in are found using ipNSW+ (Algorithm 3) instead of ipNSW (Algorithm 1). Empirically, we found that this provides more accurate inner product neighbors for the items and hence better search performance. One subtlety of ipNSW+ is controlling the time spent on angular neighbor search (ANS). Spending too much time for ANS means only a short time is left for result refinement by a graph walk on the inner product graph , which harms performance. As the time consumption of a graph walk in NSW is controlled by the maximum degree (the complexity of each step) and the candidate pool size (how many steps will be taken), we use smaller and for the angular graph than for the inner product graph . We show in Section 5 that ipNSW+ using fixed and without datasetspecific tuning already performs significantly better than ipNSW.
The index construction complexity of ipNSW+ is approximately twice of ipNSW as ipNSW+ constructs two proximity graphs. The index size of ipNSW+ is less than twice of ipNSW because we use small for the angular graph . These additional complexities will not be a big problem because the insertionbased graph construction of NSW is efficient and the memory of a single machine is sufficient for most datasets. However, ipNSW+ provides significantly better recalltime performance than ipNSW (see Section 5), which benefits many applications. Existing proximitygraphbased algorithms use a single proximity graph and the same similarity function is used for both index construction and query processing. In contrast, ipNSW+ uses two proximity graphs constructed from different similarity functions jointly, which is a new paradigm for proximitygraphbased similarity search and may inspire future research.
5 Experimental Results
Datasets and settings. We used the four datasets listed in Table 1. Yahoo!Music is obtained by conducting ALSbased matrix factorization [24] on the useritem ratings in the Yahoo!Music dataset ^{6}^{6}6https://webscope.sandbox.yahoo.com/catalog.php?datatype=r. We used the item embeddings as dataset items and the user embeddings as queries. WordVector is sampled from the word2vec embeddings released in [15], and ImageNet contains the visual descriptors of the ImageNet images [4]. Tiny5M is sampled from the Tiny80M dataset and contains visual descriptors of the Tiny images^{7}^{7}7http://horatio.cs.nyu.edu/mit/tiny/data/index.html. Unless otherwise stated, we test the performance of top10 MIPS and use recall as the performance metric. For top MIPS, an algorithm only returns the best items it finds. Denote the results an algorithm returns for a query as and the ground truth top MIPS of the query as , recall is defined as . We report the average recall of 1,000 randomly selected queries. We used and for the angular graph in ipNSW+ in all experiments and the parameter configurations of in ipNSW+ is the same as the inner product graph in ipNSW. The experiments were conducted on a machine with Intel Xeon E52620 CPU (6 cores) and 48 GB memory in a singlethread mode. For ipNSW+, the reported time includes searching both the angular graph and the inner product graph . We implemented ipNSW+ by modifying the code of ipNSW and did not introduce extra optimizations to make ipNSW+ run faster.
Direct comparison. We report the recalltime performance of ipNSW and ipNSW+ in Figure 7. We also tested SimpleLSH [17], the stateoftheart LSHbased method for MIPS. We used the implementation provided in [23] and tuned the parameters following [16]. However, the performance of SimpleLSH is significantly poorer and plotting its recalltime curve in Figure 7 will make the figure hard to read, and thus we report its curve in the supplementary material. As an example, SimpleLSH takes 598ms to reach a recall of 0.732 for WordVector and 1035ms to reach a recall of 0.735 for ImageNet. This is actually worse than the exact MIPS method, FEXIPRO [12], which uses multiple pruning rules to speed up linear scan, and takes 20.9ms, 196.3ms and 179.5ms on average for each query on Yahoo!Music, WordVector and ImageNet, respectively ^{8}^{8}8We did not provide the performance for FEXIPRO on Tiny5M as it goes out of memory.. FEXIPRO, however, is at least an order of magnitude slower than ipNSW and ipNSW+, as shown in Figure 7, which confirms the results in [16] that ipNSW outperforms existing algorithms. Importantly, ipNSW+ is able to further make significant improvements over ipNSW. For example, ipNSW+ reaches a recall of 0.9 at a speed that is 11 times faster than ipNSW (0.5 ms vs 5.5 ms) on the ImageNet dataset. Even on the Tiny5M dataset, which has the strongest norm bias and items ranking top 5% in norm occupy 100% of top10 MIPS result, ipNSW+ still outperforms ipNSW.
More experiments. We conducted this set of experiments on the ImageNet dataset to further examine the performance of ipNSW+. In Figure 8, we compare the recall of ipNSW and ipNSW+ with respect to the number of similarity function evaluations since similarity function evaluation is usually the most timeconsuming part of an algorithm. We count as one similarity function evaluation when ipNSW computes inner product with one item and ipNSW+ computes angular similarity or inner product with one item. The results show that ipNSW+ spends much less computation than ipNSW for the same recall, suggesting the performance gain of ipNSW+ indeed comes from a better algorithm design. In Figure 8, we compare ipNSW and ipNSW+ for top5 MIPS and top20 MIPS, which shows that ipNSW+ consistently outperforms ipNSW for different .
One surprising phenomenon is that ipNSW+ provides more robust performance than ipNSW under different transformations of the norm distribution. We created two variants of the ImageNet dataset, i.e., ImageNetA and ImageNetB, by scaling the items without changing their directions. ImageNetA and ImageNetB add 0.18 and 0.36 to the Euclidean norm of each item, respectively. The norm distributions of the transformed datasets can be found in the supplementary material. We define the tailing factor (TF) of a dataset as the ratio between the 95% percentile of the norm distribution and the median norm and say that the norm distribution is more skewed when the TF is large. The TFs of ImageNet, ImageNetA and ImageNetB are 2.05, 1.55 and 1.37, respectively. We report the performance of ipNSW and ipNSW+ on the three datasets in Figure 8. The results show that ipNSW+ has almost identical performance on the three ImageNet variants and consistently outperforms ipNSW. In contrast, the performance of ipNSW varies a lot, the best performance is achieved on ImageNetB (which has the smallest TF) while the worst performance is observed on the original ImageNet (which has the largest TF). We tried more datasets and an alternative method to scale the items in the supplementary material and the results show that ipNSW+ consistently provides more robust performance than ipNSW. Moreover, there is trend that ipNSW performs better when the TF is small.
We try to explain this phenomenon as follows. The norm bias in ipNSW is more severe when the norm distribution is more skewed. Therefore, ipNSW will compute inner product with more large norm items that do not have a good inner product with the query and hence its performance worsens. In contrast, ipNSW+ collects the MIPS neighbors of the angular neighbors and these neighbors are shown to have a good inner product with the query in Theorem 2. The stable performance of ipNSW indicates that it effectively avoids computing inner product with items having large norm but not likely to be the results of MIPS.
6 Conclusions
In this paper, we identified an interesting phenomenon for the MIPS problem — norm bias, which means that large norm items are much more likely to be the results of MIPS. We showed that ipNSW achieves excellent performance for MIPS because it also has a strong norm bias, which means that the large norm items have large indegrees in the ipNSW graph and the majority of computation is conducted on them. We also proposed the ipNSW+ algorithm, which avoids computation on large norm items that do not have a good inner product with the query. Experimental results show that ipNSW+ significantly outperforms ipNSW and is more robust to different data distributions.
References
 [1] To index or not to index: optimizing exact maximum inner product search. Cited by: §1.
 [2] (2014) Speeding up the xbox recommender system using a euclidean transformation for innerproduct spaces. In Proceedings of the 8th ACM Conference on Recommender systems, pp. 257–264. Cited by: §1.
 [3] (2016) Hierarchical memory networks. arXiv preprint arXiv:1605.07427. Cited by: §1.

[4]
(2009)
Imagenet: a largescale hierarchical image database.
In
2009 IEEE conference on computer vision and pattern recognition
, pp. 248–255. Cited by: §5.  [5] (2010) Object detection with discriminatively trained partbased models. IEEE Trans. Pattern Anal. Mach. Intell. 32, pp. 1627–1645. Cited by: §1.
 [6] (2019) Fast approximate nearest neighbor search with the navigating spreadingout graph. Proceedings of the VLDB Endowment 12 (5), pp. 461–474. Cited by: §1.
 [7] (2019) Fast approximate nearest neighbor search with the navigating spreadingout graph. Proceedings of the VLDB Endowment 12 (5), pp. 461–474. Cited by: Algorithm 1.

[8]
(2011)
Fast approximate nearestneighbor search with knearest neighbor graph.
In
TwentySecond International Joint Conference on Artificial Intelligence
, Cited by: §1.  [9] (2016) Fanng: fast approximate nearest neighbour graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5713–5722. Cited by: §1.
 [10] (2017) Scalable generalized linear bandits: online computation and hashing. In Advances in Neural Information Processing Systems, pp. 99–109. Cited by: §1.
 [11] (2009) Matrix factorization techniques for recommender systems. IEEE Computer 42, pp. 30–37. Cited by: §1.
 [12] (2017) FEXIPRO: fast and exact inner product retrieval in recommender systems. In SIGMOD, pp. 835–850. Cited by: §1, §5.
 [13] (2018) Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence. Cited by: §1, §3.1.
 [14] (2014) Approximate nearest neighbor algorithm based on navigable small world graphs. Information Systems 45, pp. 61–68. Cited by: §1.
 [15] (2013) Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pp. 3111–3119. Cited by: §5.
 [16] (2018) Nonmetric similarity graphs for maximum inner product search. In NeurIPS, pp. 4726–4735. Cited by: §1, §5, Algorithm 2.
 [17] (2015) On symmetric and asymmetric lshs for inner product search. In ICML, pp. 1926–1934. Cited by: §1, §5, footnote 2.
 [18] (2012) Maximum innerproduct search using cone trees. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 931–939. Cited by: §1, §1.
 [19] (2014) Asymmetric LSH (ALSH) for sublinear time maximum inner product search (MIPS). In NIPS, pp. 2321–2329. Cited by: §1, footnote 2.
 [20] (2015) LEMP: fast retrieval of large entries in a matrix product. In SIGMOD, pp. 107–122. Cited by: §1.
 [21] (2013) Fast neighborhood graph search using cartesian concatenation. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2128–2135. Cited by: §1.
 [22] (2018) Normranging LSH for maximum inner product search. In NeurIPS 2018, pp. 2956–2965. Cited by: §1.
 [23] (2017) A greedy approach for budgeted maximum inner product search. In Advances in Neural Information Processing Systems, pp. 5453–5462. Cited by: §5.
 [24] (2013) NOMAD: nonlocking, stochastic multimachine algorithm for asynchronous and decentralized matrix completion. CoRR. Cited by: §5.