1 Introduction
Recommendation systems are increasingly prevalent due to content delivery platforms, ecommerce websites, and mobile apps [35]. Classical collaborative filtering algorithms use matrix factorization to identify latent features that describe the user preferences and item metatopics from partially observed ratings [25]. In addition to rating information, many realworld recommendation datasets also have a wealth of side information in the form of graphs, and incorporating this information often leads to performance gains. For example, [31, 45] propose to add a graph regularization to the matrix factorization formulation to exploit additional graph structure; and [26] conduct a cofactorization of the graph and rating matrix. However, each of these only utilizes the immediate neighborhood information of each node in the side information graph.More recently, [3] incorporated graph information when learning features with a Graph Convolution Network (GCN) based recommendation algorithm. GCNs [24] constitute flexible methods for incorporating graph structure beyond firstorder neighborhoods, but their training complexity typically scales rapidly with the depth, even with subsampling techniques [11]. Intuitively, exploiting higherorder neighborhood information could benefit the generalization performance, especially when the graph is sparse, which is usually the case in practice. The main caveat of exploiting higherorder graph information is the high computational and memory cost when computing higherorder neighbors since the number of hop neighbors typically grows exponentially with .
In this paper, we aim to utilize higher order graph information without introducing much computational and memory overhead. To achieve this goal, we propose a Graph Deep Neighborhood Aware (Graph DNA) encoding, which approximately captures the higherorder neighborhood information of each node via Bloom filters [4]. Bloom filters encode neighborhood sets as
dimensional 0/1 vectors, where
for a graph with nodes, which approximately preserves membership information. This encoding can then be combined with both graph regularized or feature based collaborative filtering algorithms, with little computational and memory overhead. In addition to computational speedups, we find that Graph DNA achieves better performance over competitors, which we hypothesize is due to the unique nature of Graph DNA and its connection to the shortest path length distance. We make this connection precise with theoretical bounds in Section 2.2.We show that our Graph DNA encoding can be used with several collaborative filtering algorithms: graphregularized matrix factorization with explicit and implicit feedback [45, 31], cofactoring [26], and GCNbased recommendation systems [28]. In some cases, using information from deeper neighborhoods (like order) yields a 15x increase in performance, with graph DNA encoding yielding a 6x speedup compared to directly using the power of the graph adjacency matrix.
Related Work
Matrix factorization has been used extensively in recommendation systems with both explicit [25] and implicit [21] feedback. Such methods compute low dimensional user and item representations; their inner product approximates the observed (or to be predicted) entry in the target matrix. To incorporate graph side information in these systems, [31, 45] used a graph Laplacian based regularization framework that forces a pair of node representations to be similar if they are connected via an edge in the graph. In [43], this was extended to the implicit feedback setting. [26] proposed a method that incorporates firstorder information of the rating bipartite graph into the model by considering item cooccurrences. More recently, GCMC [3] used a GCN approach performing convolutions on the main bipartite graph by treating the firstorder side graph information as features, and [28] proposed combining GCNs and RNNs for the same task.
Methods that use higher order graph information are typically based on taking random walks on the graphs [16]. [22] extended this method to include graph side information in the model. Finally, the PageRank [29] algorithm can be seen as computing the steady state distribution of a Markov network, and similar methods for recommender systems was proposed in [1, 41].
For a complete list of related works of representation learning on graphs, we refer the interested user to [18]. For the collaborative filtering setting, [28, 3]
use Graph Convolutional Neural Networks
[14], but with some modifications. Standard GCN methods without substantial modifications cannot be directly applied to collaborative filtering rating datasets, including wellknown approaches like GCN [24] and GraphSage [17], because they are intended to solve semisupervised classification problem over graphs with nodes’ features. PinSage [42] is the GraphSage extension to nonpersonalized graphbased recommendation algorithm but not meant for collaborative filtering problems. GCMC [3] extend GCN to collaborative filtering, albeit less scalable than [42]. Our Graph DNA scheme can be used to obtain graph features in these extensions. In contrast to the abovementioned methods involving GCNs, we do not use any loss function to train our graph encoder. This property makes our graph DNA suitable for both transductive as well as inductive problems.
Bloom filters have been used in Machine Learning for multilabel classification
[12], and for hashing deep neural network models representations [36, 19, 13]. However, to the best of our knowledge, they have not been used to encode graphs, nor has this encoding been applied to recommender systems2 Methodology
We consider the problem of recommender system with a partially observed rating matrix and a Graph that encodes side information . In this section, we will introduce the Graph DNA algorithm for encoding deep neighborhood information in . In the next section, we will show how this encoded information can be applied to various graph based recommender systems.
2.1 Bloom Filter
The Bloom filter [4] is a probabilistic data structure designed to represent a set of elements. Thanks to its spaceefficiency and simplicity, Bloom filters are applied in many realworld applications such as database systems [5, 10]. A Bloom filter consists of independent hash functions . The Bloom filter of size can be represented as a length bitarray . More details about Bloom filters can be found in [7]. Here we highlight a few desirable properties of Bloom filters essential to our graph DNA encoding:

Space efficiency: classic Bloom filters use of space per inserted key, where is the false positive rate associated with this Bloom filter.

Support for the union operation of two Bloom filters: the Bloom filter for the union of two sets can be obtained by performing bitwise ‘OR’ operations on the underlying bitarrays of the two Bloom filters.

Size of the Bloom filter can be approximated by the number of nonzeros in the underlying bit array: in particular, given a Bloom filter representation of a set : the number of elements of
can be estimated as
, where is the number of nonzero elements in array . As a result, the number of common nonzero bits of and can be used as a proxy for .
2.2 Graph DNA Encoding Via Bloom Filters
Now we introduce our Graph DNA encoding. The main idea is to encode the deep (multihop) neighborhood aware embedding for each node in the graph approximately using the Bloom filter, which helps avoid performing computationally expensive graph adjacency matrix multiplications. In Graph DNA, we have Bloom filters for the graph nodes. All the Bloom filters share the same hash functions. The role of is to store the deep neighborhood information of the th node. Taking advantage of the union operations of Bloom filters, one node’s neighborhood information can be propagated to its neighbors in an iterative manner using gossip algorithms [34]. Initially, each contains only the node itself. At the th iteration, is updated by taking union with node ’s immediate neighbors’ Bloom filters . By induction, we see that after the iterations, represents , where is the shortest path distance between nodes and in . As the last step, we stack array representations of all Bloom filters and form a sparse matrix , where the th row of is the bit representation of . As a practical measure, to prevent oversaturation of Bloom filters for popular nodes in the graph, we add a hyperparameter to control the max saturation level allowed for Bloom filters. This would also prevent hub nodes dominating in graph DNA encoding. The pseudocode for the proposed encoding algorithm is given in Algorithm 1. We use graph DNA to denote our obtained graph encoding after applying Algorithm 1 with looping from 1 to . We also give a simple example to illustrate how the graph DNA is encoded into Bloom filter representations in Figure 1. Our usage of Bloom filters is very different from previous works in [30, 33, 37], which use Bloom filter for standard hashing and is unrelated to graph encoding.
It is intuitive that the number of 1bits in common between two Bloom filters should be closely related to the size of the intersection of their neighborhoods. However, there may also be false positives in the bitrepresentations. We control precisely the size of such false positives and the number of common bits in the following theorem. The following theorem only applies to Bloom filters without the max saturation threshold .
Theorem 1.
Suppose that the Bloom filters have bits and the hash functions are independent for all nodes. Consider two nodes , their hop neighborhoods , and their depth Bloom filters , respectively. Let be the number of common 1bits in the Bloom filters of (the inner product of the vectorized Bloom filters, ). There exists universal constants , such that for any
, with probability
,(1) 
where denotes the symmetric difference. Furthermore, for any there exists a constant such that if then
(2) 
This theorem is a corollary of the more precise Theorem 2, which is stated in the Appendix. In order to establish these results, we provide Lemma 1, which demonstrates that the bits of Bloom filters are negatively associated (basic properties of negative associativity can be found in [15, 23]), and this property is preserved under bitwise ‘or’ and ‘and’ operations on independent Bloom filters. As a result, enjoys ChernovHoeffding bounds, and the result follows by analyzing its expectation.
Remark 1.
When the neighborhoods have no intersection, then we have that which is approaching when (the number of bits in the Bloom filters are taken to be large enough) by (1).
Remark 2.
Graph DNA encodes deep neighborhood information such that for any two nodes whose shortest path length distance is at most , we only need to run Algorithm 1 for iterations. For example, in Figure 2, nodes and are 6 hops away on the shortest path, but they will start to share their bits’ representations after 3 iterations because the node ’s information can be propagated to node and after exactly 3 iterations. Theorem 1 and the remarks that follow it demonstrate that by increasing the number of hash functions and the number of bits in the Bloom filter, the number of common 1bits in these Bloom filters becomes an accurate surrogate for .
The Bloom filter matrix can also be viewed as the adjacency matrix of a bipartite graph between the nodes in the original graph and meta nodes of Bloom filters. In this way, nodes and have a bit in common in their Bloom filter representations if they are both connected to at least one meta node in . This property saves memory and time required for graph encoding, allowing us to use instead of the adjacency matrix in graph Laplacian regularization methods [31], and to use as side features in graph convolutional network based geometric matrix factorization algorithm [28, 3] with little computational and memory overhead. We elaborate on this in the following section.
3 Collaborative Filtering with Graph DNA
Suppose we are given the sparse rating matrix with users and items, and a graph encoding relationships between users. For simplicity, we do not assume a graph on the items, though including it is straightforward.
3.1 Graph Regularized Matrix Factorization
The objective function of Graph Regularized Matrix Factorization (GRMF) [8, 31, 45] is:
(3) 
where are the embeddings associated with users and items respectively, is the trace operator, are tuning coefficients, and is the Laplacian of .
The last term is called graph regularization, which tries to enforce similar nodes (measured by edge weights in ) to have similar embeddings. One naive way [9] to extend this to higherorder graph regularization is to replace the graph with and then use the graph Laplacian of to replace in (3). Computing for even small is computationally infeasible for most realworld applications, and we will soon lose the sparsity of the graph, leading to memory issues. Sampling or thresholding could mitigate the problem but suffers from performance degradation.
In contrast, our graph DNA from Algorithm 1 does not suffer from any of the issues. Theorem 1 implies that the space complexity of our method is only of order for a graph with nodes, instead of . The reduced number of nonzero elements using graph DNA leads to a significant speedup in many cases.
We can easily use graph DNA in GRMF as follows: we treat the bits as new pseudonodes and add them to the original graph . We then have nodes in a modified graph :
(4) 
To account for the new nodes, we expand to by appending parameters for the metanodes. The objective function for GRMF with Graph DNA with be the same as (3) except replacing and with and . At the prediction stage, we discard the metanode embeddings.
For implicit feedback data, when is a 0/1 matrix, weighted matrix factorization is a widely used algorithm [21, 20]. The only difference is that the loss function in (3) is replaced by where is a hyperparameter reflecting the confidence of zero entries. In this case, we can apply the Graph DNA encoding as before trivially. We also describe how to apply graph DNA towards CoFactor [38, 26] and Graph Convolutional Matrix Completion [3] in the Appendix.
4 Experiments
We show that our proposed Graph DNA encoding technique can improve the performance of 4 popular graphbased recommendation algorithms: graphregularized matrix factorization, cofactorization, weighted matrix factorization, and GCNbased graph convolution matrix factorization. All experiments except GCN are conducted on a server with Intel Xeon E52699 v3 @ 2.30GHz CPU and 256 RAM. The GCN experiments are conducted on Google Cloud with Nvidia V100 GPU.
Simulation Study
We first simulate a user/item rating dataset with user graph as side information, generate its graph DNA, and use it on a downstream task: matrix factorization.
We randomly generate user and item embeddings from standard Gaussian distributions, and construct an ErdősRényi Random graphs of users. User embeddings are generated using Algorithm
LABEL:alg:sim in Appendix: at each propagation step, each user’s embedding is updated by an average of its current embedding and its neighbors’ embeddings. Based on user and item embeddings after iterations of propagation, we generate the underlying ratings for each useritem pairs according to the inner product of their embeddings, and then sample a small portion of the dense rating matrix as training and test sets.We implement our graph DNA encoding algorithm in python using a scalable python library [2] to generate Bloom filter matrix . We adapt the GRMF C++ code to solve the objective function of GRMF_DNAK with our Bloom filter enhanced graph . We compare the following variants:

MF: classical matrix factorization only with regularization without graph information.

GRMF_: GRMF with regularization and using , , …, [9].

GRMF_DNA: GRMF with but using our proposed graph DNA.
We report the prediction performance with Root Mean Squared Error (RMSE) on test data. All results are reported on the test set, with all relevant hyperparameters tuned on a heldout validation set. To accurately measure how large the relative gain is from using deeper information, we introduce a new metric called Relative Graph Gain (RGG) for using information
, which is defined as:(5) 
where RMSE is measured for the same method with different graph information. This metric would be 0 if only first order graph information is utilized and is only defined when the denominator is positive.
In Table 1, we can easily see that using a deeper neighborhood helps the recommendation performances on this synthetic dataset. Graph DNA3’s gain is 166% larger than that of using firstorder graph . We can see an increase in performance gain for an increase in depth when . This is expected because we set during our creation of this dataset.
Graph Regularized Matrix Factorization for Explicit Feedback
Next, we show that graph DNA can improve the performance of GRMF for explicit feedback. We conduct experiments on two real datasets: Douban [27] and Flixster [44]. Both datasets contain explicit feedback with ratings from 1 to 5. There are 129,490 users, 58,541 items in Douban. There are 147,612 users, 48,794 items in Flixster. Both datasets have a graph defined on the respective sets of users.
We preprocessed Douban and Flixster following the same procedure in [31, 39]. The experimental setups and comparisons are almost identical to the synthetic data experiment (see details in section 4). Due to the exponentially growing nonzero elements in the graph as we go deeper (see Table LABEL:tab:nnz), we are unable to run full GRMF_ and GRMF_ for these datasets. In fact, GRMF_ itself is too slow so we thresholded by only considering entries whose values are equal to or larger than 4. For the Bloom filter, we set a false positive rate of 0.1 and use capacity of 500 for Bloom filters, resulting in .
We can see from Table 1 that deeper graph information always helps. For Douban, graph DNA3 is most effective, giving a relative graph gain of 82.79% compared to only 2% gain when using or naively. Interestingly for Flixster, using is better than using . However, Graph DNA3 and DNA4 yield x and x performance improvements respectively, lending credence to the implicit regularization property of graph DNA. For a fixed size Bloom filter, the computational complexity of graph DNA scales linearly with depth , as compared to exponentially for GRMF_. We measure the speed in Table 2. The memory cost is only a fraction of after hashing. Such low memory and computational complexity allow us to scale to larger , compared to baseline methods.
Synthetic  Douban  Flixster  
Dataset  RMSE ()  % RGG  RMSE ()  % RGG  RMSE ()  % RGG 
MF  2.9971    7.3107    8.8111   
GRMF_  2.7823  0  7.2398  0  8.8049  0 
GRMF_  2.6543  59.5903  7.2381  2.3977  8.7849  322.5806 
GRMF_  2.5687  99.4413  7.2432  4.7954  8.7932  188.7097 
GRMF_  2.5562  105.2607         
GRMF_  2.4853  138.2682         
GRMF_  2.4852  138.3147         
GRMF_DNA1  2.4303  163.8734  7.2191  29.1960  8.8013  58.0645 
GRMF_DNA2  2.4510  154.2365  7.2359  5.5007  8.8007  67.7419 
GRMF_DNA3  2.4247  166.4804  7.1811  82.7927  8.7383  1074.1935 
GRMF_DNA4  2.4466  156.2849  7.1971  60.2257  8.7122  1495.1613 
CoFactor_      7.2743  0  8.7957  0 
CoFactor_DNA3      7.2623  32.9670  8.7354  391.5584 
Graph Statistics  Graph DNA Encoding Time (secs)  

Dataset  Number of Nodes  Graph Density  DNA1  DNA2  DNA3  DNA4 
Douban  129,490  0.0102%  132.2717  266.3740  403.9747  580.1547 
Flixster  147,612  0.0117%  157.3103  317.7706  482.0360  686.8048 
Dataset  Methods  MAP  HLU  P@  P@  N@  N@ 

Douban  GRWMF_  8.340  13.033  14.944  10.371  14.944  12.564 
GRWMF_DNA3  8.400  13.110  14.991  10.397  14.991  12.619  
Flixster  GRWMF_  10.889  14.909  12.303  7.9927  12.303  12.734 
GRWMF_DNA3  11.612  15.687  12.644  8.1583  12.644  13.399 
Dataset  Methods  Test RMSE ()  Time/epoch (secs) 
% RGG  Speedup 

Douban  SRGCNN (reported by [3])  8.0100       
GCMC  7.3109 0.0150  0.0410    9.72x  
GCMC_  7.3698 0.0737  0.3985  N/A  1.00x  
GCMC_  7.3123 0.0139  0.4221  N/A  0.94x  
GCMC_DNA2  7.3117 0.0129  0.1709  N/A  2.33x  
Flixster  SRGCNN (reported by [3])  9.2600       
GCMC  9.2614 0.0578  0.0232    13.65x  
GCMC_  9.2374 0.1045  0.3166  0  1.00x  
GCMC_  8.9344 0.0333  0.3291  1262.4999  0.96x  
GCMC_DNA2  8.9536 0.0770  0.0524  1182.4999  6.04x  
Yahoo Music  SRGCNN (reported by [3])  22.4000       
GCMC  22.6697 0.3530  0.0684    1.75x  
GCMC_  21.3672 0.4190  0.1198  0  1.00x  
GCMC_  20.2189 0.8664  0.1177  88.1612  1.02x  
GCMC_DNA2  19.3879 0.2874  0.0896  151.9616  1.34x 
CoFactorization with Graph for Explicit Feedback
We show our graph DNA can improve CoFactor [38, 26] as well. The results are in Table 1. We find that applying DNA3 to the CoFactor method improves performance on both the datasets, more so for Flixster. This is consistent with our observations for GRMF in Table 1: deep graph information is more helpful for Flixster than Douban. Applying Graph DNA to CoFactor is detailed in the Appendix.
Graph Regularized Weighted Matrix Factorization for Implicit feedback
We follow the same procedure as in [40] to set ratings of 4 and above to 1, and the rest to 0. We compare the baseline graph based weighted matrix factorization [21, 20] with our proposed weighted matrix factorization with DNA3. We do not compare with Bayesian personalized ranking [32] and the recently proposed SQLrank [40] as they cannot easily utilize graph information.
The results are summarized in Table 3 with experimental details in the Appendix. Again, using DNA3 achieves better prediction results over the baseline in terms of every single metric on both Douban and Flixster datasets.
Graph Convolutional Matrix Factorization
Graph Convolutional Matrix Completion (GCMC) is a graph convolutional network (GCN) based geometric matrix completion method [3]. In [3]
, the side graphs over users and items are represented as the adjacency matrices and these onehot encodings are treated as features for nodes in the graph. Convolutions of these features are performed on the bipartite rating graph. We find in our experiments that using these onehot encodings of the graph as feature is an inferior choice both in terms of performance and speed. To capture higher order side graph information, it is better to use
for some constant . Again, we can use graph DNA instead to efficiently encode and store the higher order information before feeding it into GCMC. The exact means to use Graph DNA is detailed in the Appendix.We use the same split of three realworld datasets and follow the exact procedures as in [3, 28]
. We tuned hyperparameters using a validation dataset and obtain the best test results found within 200 epochs using optimal parameters. We repeated the experiments 6 times and report the mean and standard deviation of test RMSE. After some tuning, we use the capacity of 10 Bloom filters for Douban and 60 for Flixster, as the latter has a much denser secondorder graph. With a false positive rate of 0.1, this implies that we use 96bits Bloom filters for Douban and 960 bits for Flixster. So the feature dimension is reduced from 3000 to 96 and 960 when using our graph DNA2, which leads to a significant speedup. The original GCMC method did not scale up well beyond 3000 by 3000 rating matrices with the user and the item side graphs as it requires using normalized adjacency matrix as user/item features. PinSage
[42], while scalable, does not utilize the user/item side graphs. Furthermore, it is not feasible to have dimensional features for the nodes, where is the number of nodes in side graphs. By contrast, our method only requires dimensional features. We can see from Table 4 that we outperform both GCNbased methods [3] and [28] in terms of speed and performance by a large margin.Speed Comparisons
Finally, we compare the speedups obtained by graph DNA with GRMF . Since both algorithms scale with the number of edges in the constructed graph, we see that the Bloom filter based method scales substantially better compared to computing and using in Figure 3.
5 Conclusion
In this paper, we proposed Graph DNA, a deep neighborhood aware encoding scheme for collaborative filtering with graph information. We make use of Bloom filters to incorporate higher order graph information, without the need to explicitly minimize a loss function. The resulting encoding is extremely space and computationally efficient, and lends itself well to multiple algorithms that make use of graph information, including Graph Convolutional Networks. Experiments show that Graph DNA encoding outperforms several baseline methods on multiple datasets in both speed and performance.
References
 Abbassi and Mirrokni [2007] Zeinab Abbassi and Vahab S Mirrokni. A recommender system based on local random walks and spectral methods. In Proceedings of the 9th WebKDD and 1st SNAKDD 2007 workshop on Web mining and social network analysis, pages 102–108. ACM, 2007.
 Almeida et al. [2007] Paulo Sérgio Almeida, Carlos Baquero, Nuno Preguiça, and David Hutchison. Scalable bloom filters. Information Processing Letters, 101(6):255–261, 2007.
 Berg et al. [2017] Rianne van den Berg, Thomas N Kipf, and Max Welling. Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263, 2017.
 Bloom [1970] Burton H Bloom. Space/time tradeoffs in hash coding with allowable errors. Communications of the ACM, 13(7):422–426, 1970.
 Borthakur et al. [2011] Dhruba Borthakur, Jonathan Gray, Joydeep Sen Sarma, Kannan Muthukkaruppan, Nicolas Spiegelberg, Hairong Kuang, Karthik Ranganathan, Dmytro Molkov, Aravind Menon, Samuel Rash, et al. Apache hadoop goes realtime at facebook. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 1071–1080. ACM, 2011.

Breese et al. [1998]
John S Breese, David Heckerman, and Carl Kadie.
Empirical analysis of predictive algorithms for collaborative
filtering.
In
Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
, pages 43–52. Morgan Kaufmann Publishers Inc., 1998.  Broder and Mitzenmacher [2004] Andrei Broder and Michael Mitzenmacher. Network applications of bloom filters: A survey. Internet mathematics, 1(4):485–509, 2004.
 Cai et al. [2011] Deng Cai, Xiaofei He, Jiawei Han, and Thomas S Huang. Graph regularized nonnegative matrix factorization for data representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8):1548–1560, 2011.
 Cao et al. [2015] Shaosheng Cao, Wei Lu, and Qiongkai Xu. Grarep: Learning graph representations with global structural information. In Proceedings of the 24th ACM international on conference on information and knowledge management, pages 891–900. ACM, 2015.
 Chang et al. [2008] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C Hsieh, Deborah A Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E Gruber. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 26(2):4, 2008.

Chen et al. [2018]
Jianfei Chen, Jun Zhu, and Le Song.
Stochastic training of graph convolutional networks with variance reduction.
In International Conference on Machine Learning, pages 941–949, 2018.  Cisse et al. [2013] Moustapha M Cisse, Nicolas Usunier, Thierry Artieres, and Patrick Gallinari. Robust bloom filters for large multilabel classification tasks. In Advances in Neural Information Processing Systems, pages 1851–1859, 2013.
 Courbariaux et al. [2015] Matthieu Courbariaux, Yoshua Bengio, and JeanPierre David. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in neural information processing systems, pages 3123–3131, 2015.
 Defferrard et al. [2016] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, pages 3844–3852, 2016.
 Dubhashi and Ranjan [1998] Devdatt Dubhashi and Desh Ranjan. Balls and bins: A study in negative dependence. Random Structures & Algorithms, 13(2):99–124, 1998.
 Gori et al. [2007] Marco Gori, Augusto Pucci, V Roma, and I Siena. Itemrank: A randomwalk based scoring algorithm for recommender engines. In IJCAI, volume 7, pages 2766–2771, 2007.
 Hamilton et al. [2017a] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pages 1024–1034, 2017a.
 Hamilton et al. [2017b] William L Hamilton, Rex Ying, and Jure Leskovec. Representation learning on graphs: Methods and applications. arXiv preprint arXiv:1709.05584, 2017b.
 Han et al. [2015] Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.
 Hsieh et al. [2015] ChoJui Hsieh, Nagarajan Natarajan, and Inderjit Dhillon. Pu learning for matrix completion. In International Conference on Machine Learning, pages 2445–2453, 2015.
 Hu et al. [2008] Yifan Hu, Yehuda Koren, and Chris Volinsky. Collaborative filtering for implicit feedback datasets. In Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on, pages 263–272. Ieee, 2008.
 Jamali and Ester [2009] Mohsen Jamali and Martin Ester. Trustwalker: a random walk model for combining trustbased and itembased recommendation. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 397–406. ACM, 2009.

JoagDev et al. [1983]
Kumar JoagDev, Frank Proschan, et al.
Negative association of random variables with applications.
The Annals of Statistics, 11(1):286–295, 1983.  Kipf and Welling [2016] Thomas N Kipf and Max Welling. Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
 Koren et al. [2009] Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization techniques for recommender systems. Computer, (8):30–37, 2009.
 Liang et al. [2016] Dawen Liang, Jaan Altosaar, Laurent Charlin, and David M Blei. Factorization meets the item embedding: Regularizing matrix factorization with item cooccurrence. In Proceedings of the 10th ACM conference on recommender systems, pages 59–66. ACM, 2016.
 Ma et al. [2011] Hao Ma, Dengyong Zhou, Chao Liu, Michael R Lyu, and Irwin King. Recommender systems with social regularization. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 287–296. ACM, 2011.
 Monti et al. [2017] Federico Monti, Michael Bronstein, and Xavier Bresson. Geometric matrix completion with recurrent multigraph neural networks. In Advances in Neural Information Processing Systems, pages 3697–3707, 2017.
 Page et al. [1999] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999.
 Pozo et al. [2016] Manuel Pozo, Raja Chiky, Farid Meziane, and Elisabeth Métais. An item/user representation for recommender systems based on bloom filters. In 2016 IEEE Tenth International Conference on Research Challenges in Information Science (RCIS), pages 1–12. IEEE, 2016.
 Rao et al. [2015] Nikhil Rao, HsiangFu Yu, Pradeep K Ravikumar, and Inderjit S Dhillon. Collaborative filtering with graph information: Consistency and scalable methods. In Advances in neural information processing systems, pages 2107–2115, 2015.
 Rendle et al. [2009] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars SchmidtThieme. Bpr: Bayesian personalized ranking from implicit feedback. In Proceedings of the twentyfifth conference on uncertainty in artificial intelligence, pages 452–461. AUAI Press, 2009.
 Serrà and Karatzoglou [2017] Joan Serrà and Alexandros Karatzoglou. Getting deep recommenders fit: Bloom embeddings for sparse binary input/output networks. In Proceedings of the Eleventh ACM Conference on Recommender Systems, pages 279–287. ACM, 2017.
 Shah et al. [2009] Devavrat Shah et al. Gossip algorithms. Foundations and Trends® in Networking, 3(1):1–125, 2009.
 Shani et al. [2008] Guy Shani, Max Chickering, and Christopher Meek. Mining recommendations from the web. In Proceedings of the 2008 ACM conference on Recommender systems, pages 35–42. ACM, 2008.
 Shi et al. [2009] Qinfeng Shi, James Petterson, Gideon Dror, John Langford, Alex Smola, and SVN Vishwanathan. Hash kernels for structured data. Journal of Machine Learning Research, 10(Nov):2615–2637, 2009.
 Shinde and Savant [2016] Anita Shinde and Ila Savant. User based collaborative filtering using bloom filter with mapreduce. In Proceedings of International Conference on ICT for Sustainable Development, pages 115–123. Springer, 2016.
 Singh and Gordon [2008] Ajit P Singh and Geoffrey J Gordon. Relational learning via collective matrix factorization. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 650–658. ACM, 2008.
 Wu et al. [2017] Liwei Wu, ChoJui Hsieh, and James Sharpnack. Largescale collaborative ranking in nearlinear time. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 515–524. ACM, 2017.
 Wu et al. [2018] Liwei Wu, ChoJui Hsieh, and James Sharpnack. Sqlrank: A listwise approach to collaborative ranking. In Proceedings of Machine Learning Research (35th International Conference on Machine Learning), volume 80, 2018.
 Xie et al. [2015] Wenlei Xie, David Bindel, Alan Demers, and Johannes Gehrke. Edgeweighted personalized pagerank: breaking a decadeold performance barrier. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1325–1334. ACM, 2015.
 Ying et al. [2018] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. Graph convolutional neural networks for webscale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 974–983. ACM, 2018.
 Yu et al. [2017] HsiangFu Yu, HsinYuan Huang, Inderjit S Dhillon, and ChihJen Lin. A unified algorithm for oneclass structured matrix factorization with side information. In AAAI, pages 2845–2851, 2017.
 Zafarani and Liu [2009] R. Zafarani and H. Liu. Social computing data repository at ASU, 2009. URL http://socialcomputing.asu.edu.
 Zhou et al. [2012] Tinghui Zhou, Hanhuai Shan, Arindam Banerjee, and Guillermo Sapiro. Kernelized probabilistic matrix factorization: Exploiting graphs and side information. In Proceedings of the 2012 SIAM international Conference on Data mining, pages 403–414. SIAM, 2012.
6 Appendix
6.1 Theory for Bloom Filters
Theorem 2.
Let be the Bloom filter bitarrays for with independent hash functions for all elements of and be their symmetric difference. Let be the number of common 1bits in , then we have that,
and where
We prove Theorem 2 in subsection 6.1.2. For now, we prove Theorem 1 which is in fact a corollary of this main result.
Proof of Theorem 1.
We can see that there exist such that for any ,
Then we have that with probability ,
Note that because ,
Moreover, for ,
Hence,
Suppose that for some , then we have that
The function is decreasing and the limit as is . Thus, for any , there exists an such that if then . If this is the case then
∎
6.1.1 Negative Associativity of Bloom Filters
First, let us go over the definition of negative associativity. Random variables, , are negatively associative (NA), if for any functions , both monotonically increasing or decreasing, and disjoint sets ,
where are the variables restricted to these sets.
Lemma 1.
(1) Let be two independent random bitarrays that are both NA. Then , the elementwise ‘or’ operation, and , the elementwise ‘and’ operation, are both NA. So NA is closed under elementwise ‘or’ and ‘and’ operations.
(2) Let be the th bit in any Bloom filter of the set with independent hash functions, then the random bits, , are NA.
Proof.
(1) We show that NA is closed under both elementwise operations. First, note that the concatenation is NA, by closure of NA under independent union (Property P7 in [23]). Then on the disjoint sets, , apply the bit operation to produce the resulting array. Operation ‘or’ is monotonically increasing because, , ‘and’ is as well because . Finally we conclude by closure of NA under monotonic increasing functions on disjoint sets (Property P6 in [23]).
(2) Consider hash function for node . Let be the bit Bloom filter bitarray for this vertex and hash function only, then has only a single bit that is and the rest are . By the 01 property for binary bits, we know that has NA entries (Lemma 8 in [15]), since . Then the Bloom filter, , of is —the ‘or’ operation applied to all hashes and vertices, and we conclude by property (1). ∎
6.1.2 Proof of Theorem 2
Consider the partition of into , , . Let be the Bloom filter bitarrays for and let be those for respectively.
Notice that , where the bit operations are elementwise. If all hash functions are independent, then are independent. Notice that for a given node and hash function the bit selected is random, but unique, which means that the elements of the bitarrays are not necessarily independent for any Bloom filter. However, the bitarray is negatively associative by Lemma 1. Let , then we have that,
The probability that bit in one of the bitarrays is is
This can give us an expression in terms of for the expectation of . We have that by Hoeffding’s inequality for negatively associative random variables [15],
It remains to provide intelligible bounds on . By the inequalities ,
Also,
so by the inequality,
we have that
Furthermore, notice that the LHS is minimized when ,
We then have that
and
Comments
There are no comments yet.