Graph DNA: Deep Neighborhood Aware Graph Encoding for Collaborative Filtering

05/29/2019 ∙ by Liwei Wu, et al. ∙ University of California-Davis 0

In this paper, we consider recommender systems with side information in the form of graphs. Existing collaborative filtering algorithms mainly utilize only immediate neighborhood information and have a hard time taking advantage of deeper neighborhoods beyond 1-2 hops. The main caveat of exploiting deeper graph information is the rapidly growing time and space complexity when incorporating information from these neighborhoods. In this paper, we propose using Graph DNA, a novel Deep Neighborhood Aware graph encoding algorithm, for exploiting deeper neighborhood information. DNA encoding computes approximate deep neighborhood information in linear time using Bloom filters, a space-efficient probabilistic data structure and results in a per-node encoding that is logarithmic in the number of nodes in the graph. It can be used in conjunction with both feature-based and graph-regularization-based collaborative filtering algorithms. Graph DNA has the advantages of being memory and time efficient and providing additional regularization when compared to directly using higher order graph information. We conduct experiments on real-world datasets, showing graph DNA can be easily used with 4 popular collaborative filtering algorithms and consistently leads to a performance boost with little computational and memory overhead.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recommendation systems are increasingly prevalent due to content delivery platforms, e-commerce websites, and mobile apps [35]. Classical collaborative filtering algorithms use matrix factorization to identify latent features that describe the user preferences and item meta-topics from partially observed ratings [25]. In addition to rating information, many real-world recommendation datasets also have a wealth of side information in the form of graphs, and incorporating this information often leads to performance gains. For example, [31, 45] propose to add a graph regularization to the matrix factorization formulation to exploit additional graph structure; and [26] conduct a co-factorization of the graph and rating matrix. However, each of these only utilizes the immediate neighborhood information of each node in the side information graph.More recently, [3] incorporated graph information when learning features with a Graph Convolution Network (GCN) based recommendation algorithm. GCNs [24] constitute flexible methods for incorporating graph structure beyond first-order neighborhoods, but their training complexity typically scales rapidly with the depth, even with sub-sampling techniques [11]. Intuitively, exploiting higher-order neighborhood information could benefit the generalization performance, especially when the graph is sparse, which is usually the case in practice. The main caveat of exploiting higher-order graph information is the high computational and memory cost when computing higher-order neighbors since the number of -hop neighbors typically grows exponentially with .

In this paper, we aim to utilize higher order graph information without introducing much computational and memory overhead. To achieve this goal, we propose a Graph Deep Neighborhood Aware (Graph DNA) encoding, which approximately captures the higher-order neighborhood information of each node via Bloom filters [4]. Bloom filters encode neighborhood sets as

dimensional 0/1 vectors, where

for a graph with nodes, which approximately preserves membership information. This encoding can then be combined with both graph regularized or feature based collaborative filtering algorithms, with little computational and memory overhead. In addition to computational speedups, we find that Graph DNA achieves better performance over competitors, which we hypothesize is due to the unique nature of Graph DNA and its connection to the shortest path length distance. We make this connection precise with theoretical bounds in Section 2.2.

We show that our Graph DNA encoding can be used with several collaborative filtering algorithms: graph-regularized matrix factorization with explicit and implicit feedback [45, 31], co-factoring [26], and GCN-based recommendation systems [28]. In some cases, using information from deeper neighborhoods (like order) yields a 15x increase in performance, with graph DNA encoding yielding a 6x speedup compared to directly using the power of the graph adjacency matrix.

Related Work

Matrix factorization has been used extensively in recommendation systems with both explicit [25] and implicit [21] feedback. Such methods compute low dimensional user and item representations; their inner product approximates the observed (or to be predicted) entry in the target matrix. To incorporate graph side information in these systems, [31, 45] used a graph Laplacian based regularization framework that forces a pair of node representations to be similar if they are connected via an edge in the graph. In [43], this was extended to the implicit feedback setting. [26] proposed a method that incorporates first-order information of the rating bipartite graph into the model by considering item co-occurrences. More recently, GC-MC [3] used a GCN approach performing convolutions on the main bipartite graph by treating the first-order side graph information as features, and [28] proposed combining GCNs and RNNs for the same task.

Methods that use higher order graph information are typically based on taking random walks on the graphs [16]. [22] extended this method to include graph side information in the model. Finally, the PageRank [29] algorithm can be seen as computing the steady state distribution of a Markov network, and similar methods for recommender systems was proposed in [1, 41].

For a complete list of related works of representation learning on graphs, we refer the interested user to [18]. For the collaborative filtering setting, [28, 3]

use Graph Convolutional Neural Networks

[14], but with some modifications. Standard GCN methods without substantial modifications cannot be directly applied to collaborative filtering rating datasets, including well-known approaches like GCN [24] and GraphSage [17], because they are intended to solve semi-supervised classification problem over graphs with nodes’ features. PinSage [42] is the GraphSage extension to non-personalized graph-based recommendation algorithm but not meant for collaborative filtering problems. GC-MC [3] extend GCN to collaborative filtering, albeit less scalable than [42]

. Our Graph DNA scheme can be used to obtain graph features in these extensions. In contrast to the above-mentioned methods involving GCNs, we do not use any loss function to train our graph encoder. This property makes our graph DNA suitable for both transductive as well as inductive problems.

Bloom filters have been used in Machine Learning for multi-label classification

[12], and for hashing deep neural network models representations [36, 19, 13]. However, to the best of our knowledge, they have not been used to encode graphs, nor has this encoding been applied to recommender systems

2 Methodology

We consider the problem of recommender system with a partially observed rating matrix and a Graph that encodes side information . In this section, we will introduce the Graph DNA algorithm for encoding deep neighborhood information in . In the next section, we will show how this encoded information can be applied to various graph based recommender systems.

2.1 Bloom Filter

The Bloom filter [4] is a probabilistic data structure designed to represent a set of elements. Thanks to its space-efficiency and simplicity, Bloom filters are applied in many real-world applications such as database systems [5, 10]. A Bloom filter consists of independent hash functions . The Bloom filter of size can be represented as a length bit-array . More details about Bloom filters can be found in [7]. Here we highlight a few desirable properties of Bloom filters essential to our graph DNA encoding:

  1. Space efficiency: classic Bloom filters use of space per inserted key, where is the false positive rate associated with this Bloom filter.

  2. Support for the union operation of two Bloom filters: the Bloom filter for the union of two sets can be obtained by performing bitwise ‘OR’ operations on the underlying bit-arrays of the two Bloom filters.

  3. Size of the Bloom filter can be approximated by the number of nonzeros in the underlying bit array: in particular, given a Bloom filter representation of a set : the number of elements of

    can be estimated as

    , where is the number of non-zero elements in array . As a result, the number of common nonzero bits of and can be used as a proxy for .

1:: a graph of nodes, : the length of codes, : the number of hash functions, : the number of iterations, : tuning parameter to control the number of elements hashed.
2:: a boolean matrix to denote the bipartite relationship between nodes and bits.
3: Pick hash functions
4: : GraphBloom Initialization
5:
6:
7: : times neighborhood propagations
8: :
9: : degree-1 neighbors
10: : break;
11:
12:
Algorithm 1 Graph DNA Encoding with Bloom Filters

2.2 Graph DNA Encoding Via Bloom Filters

Now we introduce our Graph DNA encoding. The main idea is to encode the deep (multi-hop) neighborhood aware embedding for each node in the graph approximately using the Bloom filter, which helps avoid performing computationally expensive graph adjacency matrix multiplications. In Graph DNA, we have Bloom filters for the graph nodes. All the Bloom filters share the same hash functions. The role of is to store the deep neighborhood information of the -th node. Taking advantage of the union operations of Bloom filters, one node’s neighborhood information can be propagated to its neighbors in an iterative manner using gossip algorithms [34]. Initially, each contains only the node itself. At the -th iteration, is updated by taking union with node ’s immediate neighbors’ Bloom filters . By induction, we see that after the iterations, represents , where is the shortest path distance between nodes and in . As the last step, we stack array representations of all Bloom filters and form a sparse matrix , where the -th row of is the bit representation of . As a practical measure, to prevent over-saturation of Bloom filters for popular nodes in the graph, we add a hyper-parameter to control the max saturation level allowed for Bloom filters. This would also prevent hub nodes dominating in graph DNA encoding. The pseudo-code for the proposed encoding algorithm is given in Algorithm 1. We use graph DNA- to denote our obtained graph encoding after applying Algorithm 1 with looping from 1 to . We also give a simple example to illustrate how the graph DNA is encoded into Bloom filter representations in Figure 1. Our usage of Bloom filters is very different from previous works in [30, 33, 37], which use Bloom filter for standard hashing and is unrelated to graph encoding.

It is intuitive that the number of 1-bits in common between two Bloom filters should be closely related to the size of the intersection of their neighborhoods. However, there may also be false positives in the bit-representations. We control precisely the size of such false positives and the number of common bits in the following theorem. The following theorem only applies to Bloom filters without the max saturation threshold .

Theorem 1.

Suppose that the Bloom filters have bits and the hash functions are independent for all nodes. Consider two nodes , their -hop neighborhoods , and their -depth Bloom filters , respectively. Let be the number of common 1-bits in the Bloom filters of (the inner product of the vectorized Bloom filters, ). There exists universal constants , such that for any

, with probability

,

(1)

where denotes the symmetric difference. Furthermore, for any there exists a constant such that if then

(2)

This theorem is a corollary of the more precise Theorem 2, which is stated in the Appendix. In order to establish these results, we provide Lemma 1, which demonstrates that the bits of Bloom filters are negatively associated (basic properties of negative associativity can be found in [15, 23]), and this property is preserved under bitwise ‘or’ and ‘and’ operations on independent Bloom filters. As a result, enjoys Chernov-Hoeffding bounds, and the result follows by analyzing its expectation.

Remark 1.

When the neighborhoods have no intersection, then we have that which is approaching when (the number of bits in the Bloom filters are taken to be large enough) by (1).

Remark 2.

Generally, (2) states that when the number of hashed functions for the intersection is large, , but dominated by the number of bits, , then we have that almost surely. For fixed neighborhood sizes, we can take and , and obtain that by (1) and by (2).

Graph DNA encodes deep neighborhood information such that for any two nodes whose shortest path length distance is at most , we only need to run Algorithm 1 for iterations. For example, in Figure 2, nodes and are 6 hops away on the shortest path, but they will start to share their bits’ representations after 3 iterations because the node ’s information can be propagated to node and after exactly 3 iterations. Theorem 1 and the remarks that follow it demonstrate that by increasing the number of hash functions and the number of bits in the Bloom filter, the number of common 1-bits in these Bloom filters becomes an accurate surrogate for .

The Bloom filter matrix can also be viewed as the adjacency matrix of a bipartite graph between the nodes in the original graph and meta nodes of Bloom filters. In this way, nodes and have a bit in common in their Bloom filter representations if they are both connected to at least one meta node in . This property saves memory and time required for graph encoding, allowing us to use instead of the adjacency matrix in graph Laplacian regularization methods [31], and to use as side features in graph convolutional network based geometric matrix factorization algorithm [28, 3] with little computational and memory overhead. We elaborate on this in the following section.

Figure 1: Illustration of Algorithm 1: the graph DNA encoding procedure. The curly brackets at each node indicate the nodes encoded at a particular step. At each node’s Bloom filter only encodes itself, and multi-hop neighbors are included as d increases.

Figure 2: Illustration of our proposed DNA encoding method (DNA-3), with the corresponding bipartite graph representation.

3 Collaborative Filtering with Graph DNA

Suppose we are given the sparse rating matrix with users and items, and a graph encoding relationships between users. For simplicity, we do not assume a graph on the items, though including it is straightforward.

3.1 Graph Regularized Matrix Factorization

The objective function of Graph Regularized Matrix Factorization (GRMF) [8, 31, 45] is:

(3)

where are the embeddings associated with users and items respectively, is the trace operator, are tuning coefficients, and is the Laplacian of .

The last term is called graph regularization, which tries to enforce similar nodes (measured by edge weights in ) to have similar embeddings. One naive way [9] to extend this to higher-order graph regularization is to replace the graph with and then use the graph Laplacian of to replace in (3). Computing for even small is computationally infeasible for most real-world applications, and we will soon lose the sparsity of the graph, leading to memory issues. Sampling or thresholding could mitigate the problem but suffers from performance degradation.

In contrast, our graph DNA from Algorithm 1 does not suffer from any of the issues. Theorem 1 implies that the space complexity of our method is only of order for a graph with nodes, instead of . The reduced number of non-zero elements using graph DNA leads to a significant speed-up in many cases.

We can easily use graph DNA in GRMF as follows: we treat the bits as new pseudo-nodes and add them to the original graph . We then have nodes in a modified graph :

(4)

To account for the new nodes, we expand to by appending parameters for the meta-nodes. The objective function for GRMF with Graph DNA with be the same as (3) except replacing and with and . At the prediction stage, we discard the meta-node embeddings.

For implicit feedback data, when is a 0/1 matrix, weighted matrix factorization is a widely used algorithm [21, 20]. The only difference is that the loss function in (3) is replaced by where is a hyper-parameter reflecting the confidence of zero entries. In this case, we can apply the Graph DNA encoding as before trivially. We also describe how to apply graph DNA towards Co-Factor [38, 26] and Graph Convolutional Matrix Completion [3] in the Appendix.

4 Experiments

We show that our proposed Graph DNA encoding technique can improve the performance of 4 popular graph-based recommendation algorithms: graph-regularized matrix factorization, co-factorization, weighted matrix factorization, and GCN-based graph convolution matrix factorization. All experiments except GCN are conducted on a server with Intel Xeon E5-2699 v3 @ 2.30GHz CPU and 256 RAM. The GCN experiments are conducted on Google Cloud with Nvidia V100 GPU.

Simulation Study

We first simulate a user/item rating dataset with user graph as side information, generate its graph DNA, and use it on a downstream task: matrix factorization.

We randomly generate user and item embeddings from standard Gaussian distributions, and construct an Erdős-Rényi Random graphs of users. User embeddings are generated using Algorithm 

LABEL:alg:sim in Appendix: at each propagation step, each user’s embedding is updated by an average of its current embedding and its neighbors’ embeddings. Based on user and item embeddings after iterations of propagation, we generate the underlying ratings for each user-item pairs according to the inner product of their embeddings, and then sample a small portion of the dense rating matrix as training and test sets.

We implement our graph DNA encoding algorithm in python using a scalable python library [2] to generate Bloom filter matrix . We adapt the GRMF C++ code to solve the objective function of GRMF_DNA-K with our Bloom filter enhanced graph . We compare the following variants:

  1. MF: classical matrix factorization only with regularization without graph information.

  2. GRMF_: GRMF with regularization and using , , …, [9].

  3. GRMF_DNA-: GRMF with but using our proposed graph DNA-.

We report the prediction performance with Root Mean Squared Error (RMSE) on test data. All results are reported on the test set, with all relevant hyperparameters tuned on a held-out validation set. To accurately measure how large the relative gain is from using deeper information, we introduce a new metric called Relative Graph Gain (RGG) for using information

, which is defined as:

(5)

where RMSE is measured for the same method with different graph information. This metric would be 0 if only first order graph information is utilized and is only defined when the denominator is positive.

In Table 1, we can easily see that using a deeper neighborhood helps the recommendation performances on this synthetic dataset. Graph DNA-3’s gain is 166% larger than that of using first-order graph . We can see an increase in performance gain for an increase in depth when . This is expected because we set during our creation of this dataset.

Graph Regularized Matrix Factorization for Explicit Feedback

Next, we show that graph DNA can improve the performance of GRMF for explicit feedback. We conduct experiments on two real datasets: Douban [27] and Flixster [44]. Both datasets contain explicit feedback with ratings from 1 to 5. There are 129,490 users, 58,541 items in Douban. There are 147,612 users, 48,794 items in Flixster. Both datasets have a graph defined on the respective sets of users.

We pre-processed Douban and Flixster following the same procedure in [31, 39]. The experimental setups and comparisons are almost identical to the synthetic data experiment (see details in section 4). Due to the exponentially growing non-zero elements in the graph as we go deeper (see Table LABEL:tab:nnz), we are unable to run full GRMF_ and GRMF_ for these datasets. In fact, GRMF_ itself is too slow so we thresholded by only considering entries whose values are equal to or larger than 4. For the Bloom filter, we set a false positive rate of 0.1 and use capacity of 500 for Bloom filters, resulting in .

We can see from Table 1 that deeper graph information always helps. For Douban, graph DNA-3 is most effective, giving a relative graph gain of 82.79% compared to only 2% gain when using or naively. Interestingly for Flixster, using is better than using . However, Graph DNA-3 and DNA-4 yield x and x performance improvements respectively, lending credence to the implicit regularization property of graph DNA. For a fixed size Bloom filter, the computational complexity of graph DNA scales linearly with depth , as compared to exponentially for GRMF_. We measure the speed in Table 2. The memory cost is only a fraction of after hashing. Such low memory and computational complexity allow us to scale to larger , compared to baseline methods.

Synthetic Douban Flixster
Dataset RMSE () % RGG RMSE () % RGG RMSE () % RGG
MF 2.9971 - 7.3107 - 8.8111 -
GRMF_ 2.7823 0 7.2398 0 8.8049 0
GRMF_ 2.6543 59.5903 7.2381 2.3977 8.7849 322.5806
GRMF_ 2.5687 99.4413 7.2432 -4.7954 8.7932 188.7097
GRMF_ 2.5562 105.2607 - - - -
GRMF_ 2.4853 138.2682 - - - -
GRMF_ 2.4852 138.3147 - - - -
GRMF_DNA-1 2.4303 163.8734 7.2191 29.1960 8.8013 58.0645
GRMF_DNA-2 2.4510 154.2365 7.2359 5.5007 8.8007 67.7419
GRMF_DNA-3 2.4247 166.4804 7.1811 82.7927 8.7383 1074.1935
GRMF_DNA-4 2.4466 156.2849 7.1971 60.2257 8.7122 1495.1613
Co-Factor_ - - 7.2743 0 8.7957 0
Co-Factor_DNA-3 - - 7.2623 32.9670 8.7354 391.5584
Table 1: Comparison of Graph Regularized Matrix Factorization Variants for Explicit Feedback on Synthetic, Douban and Flixster data. We use rank . RGG is the Relative Graph Gain in (5).
Graph Statistics Graph DNA Encoding Time (secs)
Dataset Number of Nodes Graph Density DNA-1 DNA-2 DNA-3 DNA-4
Douban 129,490 0.0102% 132.2717 266.3740 403.9747 580.1547
Flixster 147,612 0.0117% 157.3103 317.7706 482.0360 686.8048
Table 2: Graph DNA (Algorithm 1) Encoding Speed. We set number and implement Graph DNA using single-core python. We can scale up linearly in terms of depth for a fixed .
Dataset Methods MAP HLU P@ P@ N@ N@
Douban GRWMF_ 8.340 13.033 14.944 10.371 14.944 12.564
GRWMF_DNA-3 8.400 13.110 14.991 10.397 14.991 12.619
Flixster GRWMF_ 10.889 14.909 12.303 7.9927 12.303 12.734
GRWMF_DNA-3 11.612 15.687 12.644 8.1583 12.644 13.399
Table 3: Comparison of GRWMF Variants for Implicit Feedback on Douban and Flixster datasets. P stands for precision and N stands for NDCG. We use rank and all results are in .
Dataset Methods Test RMSE ()

Time/epoch (secs)

% RGG Speedup
Douban SRGCNN (reported by [3]) 8.0100 - - -
GC-MC 7.3109 0.0150 0.0410 - 9.72x
GC-MC_ 7.3698 0.0737 0.3985 N/A 1.00x
GC-MC_ 7.3123 0.0139 0.4221 N/A 0.94x
GC-MC_DNA-2 7.3117 0.0129 0.1709 N/A 2.33x
Flixster SRGCNN (reported by [3]) 9.2600 - - -
GC-MC 9.2614 0.0578 0.0232 - 13.65x
GC-MC_ 9.2374 0.1045 0.3166 0 1.00x
GC-MC_ 8.9344 0.0333 0.3291 1262.4999 0.96x
GC-MC_DNA-2 8.9536 0.0770 0.0524 1182.4999 6.04x
Yahoo Music SRGCNN (reported by [3]) 22.4000 - - -
GC-MC 22.6697 0.3530 0.0684 - 1.75x
GC-MC_ 21.3672 0.4190 0.1198 0 1.00x
GC-MC_ 20.2189 0.8664 0.1177 88.1612 1.02x
GC-MC_DNA-2 19.3879 0.2874 0.0896 151.9616 1.34x
Table 4: Comparison of GCN Methods for Explicit Feedback on Douban, Flixster and Yahoo Music datasets (3000 by 3000 as in [3, 28]). All the methods except GC-MC utilize side graph information.
Co-Factorization with Graph for Explicit Feedback

We show our graph DNA can improve Co-Factor [38, 26] as well. The results are in Table 1. We find that applying DNA-3 to the Co-Factor method improves performance on both the datasets, more so for Flixster. This is consistent with our observations for GRMF in Table 1: deep graph information is more helpful for Flixster than Douban. Applying Graph DNA to Co-Factor is detailed in the Appendix.

Graph Regularized Weighted Matrix Factorization for Implicit feedback

We follow the same procedure as in [40] to set ratings of 4 and above to 1, and the rest to 0. We compare the baseline graph based weighted matrix factorization [21, 20] with our proposed weighted matrix factorization with DNA-3. We do not compare with Bayesian personalized ranking [32] and the recently proposed SQL-rank [40] as they cannot easily utilize graph information.

The results are summarized in Table 3 with experimental details in the Appendix. Again, using DNA-3 achieves better prediction results over the baseline in terms of every single metric on both Douban and Flixster datasets.

Graph Convolutional Matrix Factorization

Graph Convolutional Matrix Completion (GC-MC) is a graph convolutional network (GCN) based geometric matrix completion method [3]. In [3]

, the side graphs over users and items are represented as the adjacency matrices and these one-hot encodings are treated as features for nodes in the graph. Convolutions of these features are performed on the bipartite rating graph. We find in our experiments that using these one-hot encodings of the graph as feature is an inferior choice both in terms of performance and speed. To capture higher order side graph information, it is better to use

for some constant . Again, we can use graph DNA instead to efficiently encode and store the higher order information before feeding it into GC-MC. The exact means to use Graph DNA is detailed in the Appendix.

Figure 3: Compare Training Speed of GRMF, with and without Graph DNA.

We use the same split of three real-world datasets and follow the exact procedures as in [3, 28]

. We tuned hyperparameters using a validation dataset and obtain the best test results found within 200 epochs using optimal parameters. We repeated the experiments 6 times and report the mean and standard deviation of test RMSE. After some tuning, we use the capacity of 10 Bloom filters for Douban and 60 for Flixster, as the latter has a much denser second-order graph. With a false positive rate of 0.1, this implies that we use 96-bits Bloom filters for Douban and 960 bits for Flixster. So the feature dimension is reduced from 3000 to 96 and 960 when using our graph DNA-2, which leads to a significant speed-up. The original GC-MC method did not scale up well beyond 3000 by 3000 rating matrices with the user and the item side graphs as it requires using normalized adjacency matrix as user/item features. PinSage

[42], while scalable, does not utilize the user/item side graphs. Furthermore, it is not feasible to have dimensional features for the nodes, where is the number of nodes in side graphs. By contrast, our method only requires dimensional features. We can see from Table 4 that we outperform both GCN-based methods [3] and [28] in terms of speed and performance by a large margin.

Speed Comparisons

Finally, we compare the speed-ups obtained by graph DNA- with GRMF . Since both algorithms scale with the number of edges in the constructed graph, we see that the Bloom filter based method scales substantially better compared to computing and using in Figure 3.

5 Conclusion

In this paper, we proposed Graph DNA, a deep neighborhood aware encoding scheme for collaborative filtering with graph information. We make use of Bloom filters to incorporate higher order graph information, without the need to explicitly minimize a loss function. The resulting encoding is extremely space and computationally efficient, and lends itself well to multiple algorithms that make use of graph information, including Graph Convolutional Networks. Experiments show that Graph DNA encoding outperforms several baseline methods on multiple datasets in both speed and performance.

References

  • Abbassi and Mirrokni [2007] Zeinab Abbassi and Vahab S Mirrokni. A recommender system based on local random walks and spectral methods. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pages 102–108. ACM, 2007.
  • Almeida et al. [2007] Paulo Sérgio Almeida, Carlos Baquero, Nuno Preguiça, and David Hutchison. Scalable bloom filters. Information Processing Letters, 101(6):255–261, 2007.
  • Berg et al. [2017] Rianne van den Berg, Thomas N Kipf, and Max Welling. Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263, 2017.
  • Bloom [1970] Burton H Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422–426, 1970.
  • Borthakur et al. [2011] Dhruba Borthakur, Jonathan Gray, Joydeep Sen Sarma, Kannan Muthukkaruppan, Nicolas Spiegelberg, Hairong Kuang, Karthik Ranganathan, Dmytro Molkov, Aravind Menon, Samuel Rash, et al. Apache hadoop goes realtime at facebook. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 1071–1080. ACM, 2011.
  • Breese et al. [1998] John S Breese, David Heckerman, and Carl Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In

    Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence

    , pages 43–52. Morgan Kaufmann Publishers Inc., 1998.
  • Broder and Mitzenmacher [2004] Andrei Broder and Michael Mitzenmacher. Network applications of bloom filters: A survey. Internet mathematics, 1(4):485–509, 2004.
  • Cai et al. [2011] Deng Cai, Xiaofei He, Jiawei Han, and Thomas S Huang. Graph regularized nonnegative matrix factorization for data representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8):1548–1560, 2011.
  • Cao et al. [2015] Shaosheng Cao, Wei Lu, and Qiongkai Xu. Grarep: Learning graph representations with global structural information. In Proceedings of the 24th ACM international on conference on information and knowledge management, pages 891–900. ACM, 2015.
  • Chang et al. [2008] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C Hsieh, Deborah A Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E Gruber. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 26(2):4, 2008.
  • Chen et al. [2018] Jianfei Chen, Jun Zhu, and Le Song.

    Stochastic training of graph convolutional networks with variance reduction.

    In International Conference on Machine Learning, pages 941–949, 2018.
  • Cisse et al. [2013] Moustapha M Cisse, Nicolas Usunier, Thierry Artieres, and Patrick Gallinari. Robust bloom filters for large multilabel classification tasks. In Advances in Neural Information Processing Systems, pages 1851–1859, 2013.
  • Courbariaux et al. [2015] Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in neural information processing systems, pages 3123–3131, 2015.
  • Defferrard et al. [2016] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, pages 3844–3852, 2016.
  • Dubhashi and Ranjan [1998] Devdatt Dubhashi and Desh Ranjan. Balls and bins: A study in negative dependence. Random Structures & Algorithms, 13(2):99–124, 1998.
  • Gori et al. [2007] Marco Gori, Augusto Pucci, V Roma, and I Siena. Itemrank: A random-walk based scoring algorithm for recommender engines. In IJCAI, volume 7, pages 2766–2771, 2007.
  • Hamilton et al. [2017a] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pages 1024–1034, 2017a.
  • Hamilton et al. [2017b] William L Hamilton, Rex Ying, and Jure Leskovec. Representation learning on graphs: Methods and applications. arXiv preprint arXiv:1709.05584, 2017b.
  • Han et al. [2015] Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.
  • Hsieh et al. [2015] Cho-Jui Hsieh, Nagarajan Natarajan, and Inderjit Dhillon. Pu learning for matrix completion. In International Conference on Machine Learning, pages 2445–2453, 2015.
  • Hu et al. [2008] Yifan Hu, Yehuda Koren, and Chris Volinsky. Collaborative filtering for implicit feedback datasets. In Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on, pages 263–272. Ieee, 2008.
  • Jamali and Ester [2009] Mohsen Jamali and Martin Ester. Trustwalker: a random walk model for combining trust-based and item-based recommendation. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 397–406. ACM, 2009.
  • Joag-Dev et al. [1983] Kumar Joag-Dev, Frank Proschan, et al.

    Negative association of random variables with applications.

    The Annals of Statistics, 11(1):286–295, 1983.
  • Kipf and Welling [2016] Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
  • Koren et al. [2009] Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization techniques for recommender systems. Computer, (8):30–37, 2009.
  • Liang et al. [2016] Dawen Liang, Jaan Altosaar, Laurent Charlin, and David M Blei. Factorization meets the item embedding: Regularizing matrix factorization with item co-occurrence. In Proceedings of the 10th ACM conference on recommender systems, pages 59–66. ACM, 2016.
  • Ma et al. [2011] Hao Ma, Dengyong Zhou, Chao Liu, Michael R Lyu, and Irwin King. Recommender systems with social regularization. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 287–296. ACM, 2011.
  • Monti et al. [2017] Federico Monti, Michael Bronstein, and Xavier Bresson. Geometric matrix completion with recurrent multi-graph neural networks. In Advances in Neural Information Processing Systems, pages 3697–3707, 2017.
  • Page et al. [1999] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999.
  • Pozo et al. [2016] Manuel Pozo, Raja Chiky, Farid Meziane, and Elisabeth Métais. An item/user representation for recommender systems based on bloom filters. In 2016 IEEE Tenth International Conference on Research Challenges in Information Science (RCIS), pages 1–12. IEEE, 2016.
  • Rao et al. [2015] Nikhil Rao, Hsiang-Fu Yu, Pradeep K Ravikumar, and Inderjit S Dhillon. Collaborative filtering with graph information: Consistency and scalable methods. In Advances in neural information processing systems, pages 2107–2115, 2015.
  • Rendle et al. [2009] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. Bpr: Bayesian personalized ranking from implicit feedback. In Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence, pages 452–461. AUAI Press, 2009.
  • Serrà and Karatzoglou [2017] Joan Serrà and Alexandros Karatzoglou. Getting deep recommenders fit: Bloom embeddings for sparse binary input/output networks. In Proceedings of the Eleventh ACM Conference on Recommender Systems, pages 279–287. ACM, 2017.
  • Shah et al. [2009] Devavrat Shah et al. Gossip algorithms. Foundations and Trends® in Networking, 3(1):1–125, 2009.
  • Shani et al. [2008] Guy Shani, Max Chickering, and Christopher Meek. Mining recommendations from the web. In Proceedings of the 2008 ACM conference on Recommender systems, pages 35–42. ACM, 2008.
  • Shi et al. [2009] Qinfeng Shi, James Petterson, Gideon Dror, John Langford, Alex Smola, and SVN Vishwanathan. Hash kernels for structured data. Journal of Machine Learning Research, 10(Nov):2615–2637, 2009.
  • Shinde and Savant [2016] Anita Shinde and Ila Savant. User based collaborative filtering using bloom filter with mapreduce. In Proceedings of International Conference on ICT for Sustainable Development, pages 115–123. Springer, 2016.
  • Singh and Gordon [2008] Ajit P Singh and Geoffrey J Gordon. Relational learning via collective matrix factorization. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 650–658. ACM, 2008.
  • Wu et al. [2017] Liwei Wu, Cho-Jui Hsieh, and James Sharpnack. Large-scale collaborative ranking in near-linear time. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 515–524. ACM, 2017.
  • Wu et al. [2018] Liwei Wu, Cho-Jui Hsieh, and James Sharpnack. Sql-rank: A listwise approach to collaborative ranking. In Proceedings of Machine Learning Research (35th International Conference on Machine Learning), volume 80, 2018.
  • Xie et al. [2015] Wenlei Xie, David Bindel, Alan Demers, and Johannes Gehrke. Edge-weighted personalized pagerank: breaking a decade-old performance barrier. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1325–1334. ACM, 2015.
  • Ying et al. [2018] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 974–983. ACM, 2018.
  • Yu et al. [2017] Hsiang-Fu Yu, Hsin-Yuan Huang, Inderjit S Dhillon, and Chih-Jen Lin. A unified algorithm for one-class structured matrix factorization with side information. In AAAI, pages 2845–2851, 2017.
  • Zafarani and Liu [2009] R. Zafarani and H. Liu. Social computing data repository at ASU, 2009. URL http://socialcomputing.asu.edu.
  • Zhou et al. [2012] Tinghui Zhou, Hanhuai Shan, Arindam Banerjee, and Guillermo Sapiro. Kernelized probabilistic matrix factorization: Exploiting graphs and side information. In Proceedings of the 2012 SIAM international Conference on Data mining, pages 403–414. SIAM, 2012.

6 Appendix

6.1 Theory for Bloom Filters

Theorem 2.

Let be the Bloom filter bitarrays for with independent hash functions for all elements of and be their symmetric difference. Let be the number of common 1-bits in , then we have that,

and where

We prove Theorem 2 in subsection 6.1.2. For now, we prove Theorem 1 which is in fact a corollary of this main result.

Proof of Theorem 1.

We can see that there exist such that for any ,

Then we have that with probability ,

Note that because ,

Moreover, for ,

Hence,

Suppose that for some , then we have that

The function is decreasing and the limit as is . Thus, for any , there exists an such that if then . If this is the case then

6.1.1 Negative Associativity of Bloom Filters

First, let us go over the definition of negative associativity. Random variables, , are negatively associative (NA), if for any functions , both monotonically increasing or decreasing, and disjoint sets ,

where are the variables restricted to these sets.

Lemma 1.

(1) Let be two independent random bitarrays that are both NA. Then , the elementwise ‘or’ operation, and , the elementwise ‘and’ operation, are both NA. So NA is closed under elementwise ‘or’ and ‘and’ operations.

(2) Let be the th bit in any Bloom filter of the set with independent hash functions, then the random bits, , are NA.

Proof.

(1) We show that NA is closed under both elementwise operations. First, note that the concatenation is NA, by closure of NA under independent union (Property P7 in [23]). Then on the disjoint sets, , apply the bit operation to produce the resulting array. Operation ‘or’ is monotonically increasing because, , ‘and’ is as well because . Finally we conclude by closure of NA under monotonic increasing functions on disjoint sets (Property P6 in [23]).

(2) Consider hash function for node . Let be the -bit Bloom filter bitarray for this vertex and hash function only, then has only a single bit that is and the rest are . By the 0-1 property for binary bits, we know that has NA entries (Lemma 8 in [15]), since . Then the Bloom filter, , of is —the ‘or’ operation applied to all hashes and vertices, and we conclude by property (1). ∎

6.1.2 Proof of Theorem 2

Consider the partition of into , , . Let be the Bloom filter bitarrays for and let be those for respectively.

Notice that , where the bit operations are elementwise. If all hash functions are independent, then are independent. Notice that for a given node and hash function the bit selected is random, but unique, which means that the elements of the bitarrays are not necessarily independent for any Bloom filter. However, the bitarray is negatively associative by Lemma 1. Let , then we have that,

The probability that bit in one of the bitarrays is is

This can give us an expression in terms of for the expectation of . We have that by Hoeffding’s inequality for negatively associative random variables [15],

It remains to provide intelligible bounds on . By the inequalities ,

Also,

so by the inequality,

we have that

Furthermore, notice that the LHS is minimized when ,

We then have that

and

class BloomFilter:

def :

def :

def :

def :

return

Algorithm 2 A Standard Bloom Filter