Inductive Graph Pattern Learning for Recommender Systems Based on a Graph Neural Network

04/26/2019 ∙ by Muhan Zhang, et al. ∙ Washington University in St Louis 0

Most modern successful recommender systems are based on matrix factorization techniques, i.e., learning a latent embedding for each user and each item from the given rating matrix and use the embeddings to complete the matrix. However, these learned latent embeddings are inherently transductive and are not designed to generalize to unseen users/items or new tasks. In this paper, we aim to learn an inductive model for recommender systems based on the local graph patterns around user-item pairs. The inductive model can generalize to unseen nodes/items, and potentially also transfer to other tasks. To learn such a model, we extract a local enclosing subgraph for each training (user, item) pair, and feed the subgraphs to a graph neural network (GNN) to train a rating prediction model. We show that our model achieves highly competitive performance with state-of-the-art transductive methods, and is more stable when the rating matrix is sparse. Furthermore, our transfer learning experiment validates that the learned model is transferrable to new tasks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

Code Repositories

IGPL

Learn graph patterns for recommender systems based on a GNN.


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Collaborative filtering (CF) techniques for recommender systems leverage collected ratings of items by users to make new recommendations. These collected ratings can be written as entries of an rating matrix, where is the number of users and is the number of items. Modern CF-based recommender systems try to solve the matrix completion problem through matrix factorization techniques, which have achieved great successes (Adomavicius and Tuzhilin, 2005; Schafer et al., 2007; Koren et al., 2009; Bobadilla et al., 2013).

However, matrix factorization is intrinsically transductive, meaning that the learned latent features (embeddings) for users/items are not generalizable to new users/items. When new users/items come to the system or new ratings are made, it often requires a complete retraining to get the new embeddings. Such a behavior makes matrix factorization unsuitable for some applications that require timely recommendations in fast-evolving environments, such as news recommendation etc. Content-based recommender systems alleviate this problem by using user/item content features (Lops et al., 2011). However, these features are not always available and can be hard to extract. Therefore, in this paper, we aim to explore inductive CF methods for recommender systems, where a model learned out of training users-items is directly applicable to unseen users-items without the need of retraining.

So on which data can we train an inductive model for recommender systems? The answer is graphs. If for each existing rating we add an edge between the associated user and item, we can build a bipartite graph where an edge can only exist between a user and an item. Subsequently, predicting unknown ratings corresponds to predicting labeled links in this bipartite graph. This transforms the matrix completion problem to a link prediction problem (Lü and Zhou, 2011)

. One large category of link prediction methods are heuristic methods, which compute some heuristic scores such as common neighbors

(Liben-Nowell and Kleinberg, 2007) and Katz index (Katz, 1953) based on local or global graph patterns. Heuristic methods use these predefined graph structure features for link prediction and are inductive, because these features are not restricted to certain links but applicable to the entire graph.

Can we also find some heuristics for recommender systems? Intuitively, such heuristics should exist. For example, if a user likes an item , we may expect to see very often that is also liked by some other user who shares a similar taste to . By similar taste, we mean and have together both liked some other item . In the bipartite graph, such a pattern is realized as a “like” path connecting . If there are many such paths between and , we may infer that is highly likely to like . Thus, we may count the number of such paths as an indicator of how likely likes . In fact, many neighborhood-based recommender systems (Desrosiers and Karypis, 2011) rely on such heuristics. However, in this work, we do not use any predefined fixed heuristics, but learn heuristics from the existing bipartite graph.

Present work Inspired by (Zhang and Chen, 2017, 2018), we aim to learn graph structure features related to ratings automatically from local enclosing subgraphs around user-item links. An -hop enclosing subgraph for a user-item pair is defined to be the subgraph induced from the whole bipartite graph by nodes and the neighbors of and within hops. Such local subgraphs contain rich structure information about the link existence (Zhang and Chen, 2018). For example, the number of paths can be just computed from ’s 1-hop enclosing subgraph. By feeding these enclosing subgraphs to a graph neural network (GNN) (Scarselli et al., 2009; Defferrard et al., 2016; Kipf and Welling, 2016; Zhang et al., 2018), we train a graph regression model that maps a subgraph to the rating of its target link. Due to the superior graph learning ability, a GNN can learn highly expressive graph structure features useful for rating prediction. Figure 1 illustrates the overall framework. Our model is inductive, as we can freely apply the trained model on other unseen links’ enclosing subgraphs without retraining. We can even transfer the model to other similar tasks. We evaluate our model on benchmark datasets, and show that it is highly competitive with state-of-the-art transductive methods. Our model also shows good performance under transfer learning and sparse rating settings.

Figure 1. Our framework. Red/blue mean likes/dislikes. Note that the features listed inside the box are only for illustration – the real learned features can be very complex.

2. Inductive Graph Pattern Learning (Igpl)

We present our inductive graph pattern learning (IGPL) framework for recommender systems in this section. Some related work is included in Appendix A. IGPL extracts a local enclosing subgraph around each user-item pair, and trains a GNN regression model on these enclosing subgraphs to predict the ratings. We will use to denote the undirected bipartite graph constructed from the training rating matrix. In , a node is either a user-type node (denoted by ) or an item-type node (denoted by ). Edges only exist between user type and item type. An edge also has a type , corresponding to the rating that gives to . We use to denote the set of all possible ratings. We use to denote the set of ’s neighbors that connect to with edge type .

2.1. Enclosing subgraph extraction

The first part of the IGPL framework is enclosing subgraph extraction. For each training (user, item, rating) tuple, we extract from an -hop enclosing subgraph around the user-item pair. We will feed these enclosing subgraphs to a GNN and regress on their ratings. Then, for each testing user-item pair, we again extract its -hop enclosing subgraph, and use the trained GNN model to predict its rating. Algorithm 1 describes how we extract -hop enclosing subgraphs.

1:  input: , target user-item , the bipartite graph
2:  output: enclosing subgraph for
3:  
4:  for  do
5:     
6:     
7:     
8:     
9:  end for
10:  return   induced by from
Algorithm 1 Enclosing Subgraph Extraction

2.2. Node labeling

The second part of IGPL is node labeling. Before we feed enclosing subgraphs to the GNN, we need to apply a node labeling to each enclosing subgraph. A node labeling is a function that returns an integer label for every node in the subgraph. The purpose is to use different labels to mark nodes’ different roles in a subgraph. For example, 1) we need to differentiate the target user and item nodes between which the target rating is located, and 2) we need to differentiate user-type nodes from item-type nodes. To achieve these goals, we propose a node labeling scheme as follows: We first give label 0 to the target user and label 1 to the target item. Then for other nodes, we determine their labels according to at which hop they are included in the subgraph in Algorithm 1. If a user-type node is included at the hop (), we will give it a label . If an item-type node is included at the hop, we will give it . Such a node labeling can sufficiently discriminate: 1) target nodes from “context” nodes, 2) users from items (users always have even labels), and 3) nodes of different distances to the target user/item. Note that this is not the only possible way of node labeling, but we empirically verified its excellent performance.

2.3. Graph Neural Network Training

The last part of IGPL is to train a graph neural network (GNN) model predicting ratings from enclosing subgraphs. A GNN is typically composed of: 1) message passing layers which aggregate neighboring nodes’ features to the center to extract a feature vector for each node, and 2) a global pooling layer to summarize a graph representation from node features. To handle different edge types, we adopt relational graph convolutional operator (R-GCN)

(Schlichtkrull et al., 2018) as our GNN’s message passing layers. The R-GCN layer has the following form:

(1)

where denotes node ’s input feature vector, denotes its output feature vector, and are learnable parameter matrices. In a R-GCN layer, neighbors connected to with different edge types have different parameter matrices. Thus, it is able to learn from the rich graph patterns inside the edge types. We apply several R-GCN layers with tanh activations between layers. The node feature vectors from all layers are concatenated for each node as its final representation.

To pool the node representations into a graph representation, we leverage the SortPooling layer from (Zhang et al., 2018). In SortPooling, node representations are sorted according to their continuous Weifeiler-Lehman colors represented by their last-layer features. Then, standard 1-D convolutional layers are applied to these sorted representations to learn the final graph representation from both individual nodes and the global topology contained in the node ordering. We empirically verified R-GCN and SortPooling’s superior performance over plain graph convolution (Kipf and Welling, 2016) and sum-pooling (Duvenaud et al., 2015).

After getting the final graph representation, we add a linear regression layer with mean squared error (MSE) loss between predictions and ground truth ratings. There are several additional notes: 1) We use the one-hot encodings of node labels as the initial node features in our experiments. However, one could concatenate them with additional node information, such as content features of nodes. To illustrate the power of learning graph patterns for recommender systems, we do not use any side information in our method, but learn from subgraphs only. 2) Before feeding a training enclosing subgraph to the GNN, we need to remove the link between the target user-item to remove label information.

3. Experiments

Following the setup of (Monti et al., 2017), we conduct experiments on four standard datasets: Flixster (Jamali and Ester, 2010), Douban (Ma et al., 2011), YahooMusic (Dror et al., 2011) and MovieLens (Miller et al., 2003). For MovieLens, we train and evaluate on the canonical u1.base/u1.test train/test split. For Flixster, Douban and YahooMusic we use the preprocessed subsets provided by (Monti et al., 2017). Dataset statistics are summarized in Table 1. We implemented the GNN in IGPL using PyTorch_Geometric (Fey and Lenssen, 2019)

. We tuned all hyperparameters based on validation performance. The final architecture uses 4 R-GCN layers with 32, 32, 32, 1 hidden dimensions, respectively. Basis decomposition with 4 bases is used to reduce the number of parameters in

(Schlichtkrull et al., 2018). After SortPooling, we apply two 1-D convolutional layers with 16 and 32 output channels, respectively, following (Zhang et al., 2018). The final linear regression layer has 128 hidden units and a dropout rate 0.5. We use 1-hop enclosing subgraphs for all datasets, and find them sufficiently good. We find using 2- or more-hop subgraphs can slightly increase the performance but take longer training time. We train our model using the Adam optimizer (Kingma and Ba, 2014)

with an initial learning rate of 0.001, and multiply the learning rate by 0.1 every 50 epochs. Our code is available at

https://github.com/muhanzhang/IGPL.

Dataset Users Items Ratings Density Rating types
Flixster 3,000 3,000 26,173 0.0029 0.5, 1, 1.5, …, 5
Douban 3,000 3,000 136,891 0.0152 1, 2, 3, 4, 5
YahooMusic 3,000 3,000 5,335 0.0006 1, 2, 3, …, 100
MovieLens 943 1,682 100,000 0.0630 1, 2, 3, 4, 5
Table 1. Statistics of each dataset.

3.1. Flixster, Douban and YahooMusic

For these three datasets, we compare our IGPL with GRALS (Rao et al., 2015), sRGCNN (Monti et al., 2017), and GC-MC (Berg et al., 2017). Among them, GRALS is a graph regularized matrix completion algorithm. GC-MC and sRGCNN are GNN-assisted matrix completion methods, where GNNs are used to learn better user/item latent features to reconstruct the rating matrix. Thus, they are still transductive models. In contrast, our IGPL uses a GNN to inductively learn graph patterns which are not associated with particular nodes/edges, but are generally applicable to any part of the graph. Note that all baselines here use side information such as user-user or item-item graphs, while IGPL does not use any side information. We train our model for 40 epochs with a batch size of 50. Table 2 shows the results. Our model achieves state-of-the-art results on these three datasets, outperforming all three transductive baselines.

Model Flixster Douban YahooMusic
GRALS (Rao et al., 2015) 1.245 0.833 38.0
sRGCNN (Monti et al., 2017) 0.926 0.801 22.4
GC-MC (Berg et al., 2017) 0.917 0.734 20.5
IGPL (ours) 0.893 0.727 19.4
Table 2. RMSE test results on Flixster, Douban and YahooMusic. Baseline numbers are taken from (Berg et al., 2017).

3.2. Transfer learning

To verify the transferability of the learned model, we conduct a transfer learning experiment. We retrain a model on Flixster by rounding its rating types to 1,2, …, 5 (the same as Douban), and then directly test this model on Douban (both Flixster and Douban are movie rating datasets). We get a test RMSE of 0.8365. Note that this result is got without using any Douban data for training, yet is already comparable with the baseline GRALS (0.833). This experiment shows that the model learned by IGPL is transferrable to new tasks, one property which transductive models hardly get.

3.3. MovieLens

We further conduct experiments on MovieLens. We compare against baselines including matrix completion variants MC (Candès and Recht, 2009), IMC (Jain and Dhillon, 2013), and GMC (Kalofolias et al., 2014), as well as GRALS, sRGCNN and GC-MC. User/item side information are used in baselines if possible. For IGPL, we train our model for 60 epochs with a batch size of 50. Results are summarized in Table 3. As we can see, IGPL achieves excellent performance, outperforming a number of matrix completion baselines except GC-MC.

Model MovieLens
MC (Candès and Recht, 2009) 0.973
IMC (Jain and Dhillon, 2013) 1.653
GMC (Kalofolias et al., 2014) 0.996
GRALS (Rao et al., 2015) 0.945
sRGCNN (Monti et al., 2017) 0.929
GC-MC (Berg et al., 2017) 0.905
IGPL (ours) 0.924
Table 3. RMSE test results on MovieLens. Baseline numbers are taken from (Berg et al., 2017).

3.4. Sparse rating matrix analysis

To gain insight into when inductive graph pattern learning is more suitable than traditional transductive methods, we compare IGPL with GC-MC on MovieLens under different sparsity levels of the rating matrix. We sort all the training ratings according to their timestamps, and sparsify the rating matrix by keeping first 20%, 40%, 60%, 80% and 100% ratings only, in order to simulate different phases of a recommender system’s data collection. We train both models on the sparsified rating matrix, and evaluate on the original MovieLens test set. The results are shown in Figure 2. As we can see, IGPL performs consistently better than GC-MC when the sparsity is less than 80%. This indicates that IGPL has more stable performance on sparse ratings, and that useful graph patterns could still be learned even the rating matrix is very sparse. It also suggests that during the very initial phase of a recommender system, using graph patterns for recommendation might be a better choice than matrix factorization.

Figure 2. MovieLens results under different sparsity ratios.

3.5. Visualization

Finally, we visualize 10 testing enclosing subgraphs with the highest and lowest predicted ratings for Flixster in Figure 3. As expected, there are substantially different patterns between high-score and low-score subgraphs. For example, high-score subgraphs typically show both high user bias and high item bias, while low-score subgraphs only show low user bias and have less ratings to the target item. See Appendix B for more visualization results.

Figure 3. Top 5 and bottom 5 are the testing enclosing subgraphs in Flixster with the highest and lowest predicted ratings, respectively. For each subgraph, red nodes in the left are users; blue nodes in the right are items; the predicted rating is shown below it. The bottom red and blue nodes are the target user and item. We visualize edge ratings using the color map shown in the right. Higher ratings are redder.

4. Conclusion

We propose a new paradigm, IGPL, for recommender systems. Instead of learning transductive latent features, IGPL learns graph patterns related to ratings inductively. IGPL not only shows highly competitive performance with traditional matrix completion baselines in standard settings, but also shows exclusive advantages in transfer learning and sparse rating matrix settings. We believe IGPL will open a new direction on learning inductive recommender systems.

References

  • (1)
  • Adomavicius and Tuzhilin (2005) Gediminas Adomavicius and Alexander Tuzhilin. 2005. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. Knowledge and Data Engineering, IEEE Transactions on 17, 6 (2005), 734–749.
  • Berg et al. (2017) Rianne van den Berg, Thomas N Kipf, and Max Welling. 2017. Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263 (2017).
  • Bobadilla et al. (2013) Jesús Bobadilla, Fernando Ortega, Antonio Hernando, and Abraham Gutiérrez. 2013. Recommender systems survey. Knowledge-based systems 46 (2013), 109–132.
  • Bruna et al. (2013) Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2013. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013).
  • Candès and Recht (2009) Emmanuel J Candès and Benjamin Recht. 2009. Exact matrix completion via convex optimization. Foundations of Computational mathematics 9, 6 (2009), 717–772.
  • Chen et al. (2005) Hsinchun Chen, Xin Li, and Zan Huang. 2005. Link prediction approach to collaborative filtering. In Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’05). IEEE, 141–142.
  • Dai et al. (2016) Hanjun Dai, Bo Dai, and Le Song. 2016. Discriminative Embeddings of Latent Variable Models for Structured Data. In

    Proceedings of The 33rd International Conference on Machine Learning

    . 2702–2711.
  • Defferrard et al. (2016) Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems. 3837–3845.
  • Desrosiers and Karypis (2011) Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommendation methods. In Recommender systems handbook. Springer, 107–144.
  • Dror et al. (2011) Gideon Dror, Noam Koenigstein, Yehuda Koren, and Markus Weimer. 2011. The yahoo! music dataset and kdd-cup’11. In Proceedings of the 2011 International Conference on KDD Cup 2011-Volume 18. JMLR. org, 3–18.
  • Duvenaud et al. (2015) David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. 2015. Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems. 2224–2232.
  • Fey and Lenssen (2019) Matthias Fey and Jan E. Lenssen. 2019.

    Fast Graph Representation Learning with PyTorch Geometric. In

    ICLR Workshop on Representation Learning on Graphs and Manifolds.
  • Hamilton et al. (2017) Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems. 1025–1035.
  • Jain and Dhillon (2013) Prateek Jain and Inderjit S Dhillon. 2013. Provable inductive matrix completion. arXiv preprint arXiv:1306.0626 (2013).
  • Jamali and Ester (2010) Mohsen Jamali and Martin Ester. 2010. A matrix factorization technique with trust propagation for recommendation in social networks. In Proceedings of the fourth ACM conference on Recommender systems. ACM, 135–142.
  • Kalofolias et al. (2014) Vassilis Kalofolias, Xavier Bresson, Michael Bronstein, and Pierre Vandergheynst. 2014. Matrix completion on graphs. arXiv preprint arXiv:1408.1717 (2014).
  • Katz (1953) Leo Katz. 1953. A new status index derived from sociometric analysis. Psychometrika 18, 1 (1953), 39–43.
  • Kingma and Ba (2014) Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
  • Kipf and Welling (2016) Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
  • Koren et al. (2009) Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 8 (2009), 30–37.
  • Li and Chen (2013) Xin Li and Hsinchun Chen. 2013. Recommendation as link prediction in bipartite graphs: A graph kernel-based machine learning approach. Decision Support Systems 54, 2 (2013), 880–890.
  • Li et al. (2015) Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2015. Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015).
  • Liben-Nowell and Kleinberg (2007) David Liben-Nowell and Jon Kleinberg. 2007. The link-prediction problem for social networks. Journal of the American society for information science and technology 58, 7 (2007), 1019–1031.
  • Lops et al. (2011) Pasquale Lops, Marco De Gemmis, and Giovanni Semeraro. 2011. Content-based recommender systems: State of the art and trends. In Recommender systems handbook. Springer, 73–105.
  • Lü and Zhou (2011) Linyuan Lü and Tao Zhou. 2011. Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications 390, 6 (2011), 1150–1170.
  • Ma et al. (2011) Hao Ma, Dengyong Zhou, Chao Liu, Michael R Lyu, and Irwin King. 2011. Recommender systems with social regularization. In Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 287–296.
  • Miller et al. (2003) Bradley N Miller, Istvan Albert, Shyong K Lam, Joseph A Konstan, and John Riedl. 2003. MovieLens unplugged: experiences with an occasionally connected recommender system. In Proceedings of the 8th international conference on Intelligent user interfaces. ACM, 263–266.
  • Monti et al. (2017) Federico Monti, Michael Bronstein, and Xavier Bresson. 2017. Geometric matrix completion with recurrent multi-graph neural networks. In Advances in Neural Information Processing Systems. 3700–3710.
  • Niepert et al. (2016) Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning convolutional neural networks for graphs. In International conference on machine learning. 2014–2023.
  • Rao et al. (2015) Nikhil Rao, Hsiang-Fu Yu, Pradeep K Ravikumar, and Inderjit S Dhillon. 2015. Collaborative filtering with graph information: Consistency and scalable methods. In Advances in neural information processing systems. 2107–2115.
  • Scarselli et al. (2009) Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2009. The graph neural network model. IEEE Transactions on Neural Networks 20, 1 (2009), 61–80.
  • Schafer et al. (2007) J Ben Schafer, Dan Frankowski, Jon Herlocker, and Shilad Sen. 2007. Collaborative filtering recommender systems. In The adaptive web. Springer, 291–324.
  • Schlichtkrull et al. (2018) Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In European Semantic Web Conference. Springer, 593–607.
  • Wu et al. (2019) Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S Yu. 2019. A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596 (2019).
  • Zhang and Chen (2017) Muhan Zhang and Yixin Chen. 2017. Weisfeiler-Lehman neural machine for link prediction. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 575–583.
  • Zhang and Chen (2018) Muhan Zhang and Yixin Chen. 2018. Link Prediction Based on Graph Neural Networks. arXiv preprint arXiv:1802.09691 (2018).
  • Zhang et al. (2018) Muhan Zhang, Zhicheng Cui, Marion Neumann, and Yixin Chen. 2018.

    An End-to-End Deep Learning Architecture for Graph Classification. In

    AAAI. 4438–4445.
  • Zhou et al. (2007) Tao Zhou, Jie Ren, Matúš Medo, and Yi-Cheng Zhang. 2007. Bipartite network projection and personal recommendation. Physical Review E 76, 4 (2007), 046115.

Appendix A Related Work

Graph neural networks   Graph neural network (GNN) is a new type of neural network for learning over graphs (Scarselli et al., 2009; Bruna et al., 2013; Duvenaud et al., 2015; Kipf and Welling, 2016; Niepert et al., 2016; Li et al., 2015; Dai et al., 2016; Hamilton et al., 2017; Zhang et al., 2018)

. GNNs iteratively pass messages between each node and its neighbors in order to extract local substructure features around nodes. Then, an aggregation operation such as summing is applied to all nodes to get a graph feature vector. GNNs are parametric models. The learnable parameters in the message passing layers equip GNNs with excellent graph representation learning abilities and flexibility for different kinds of graphs. GNNs have gained great popularity in recent years, achieving state-of-the-art performance on semi-supervised node classification

(Kipf and Welling, 2016), network embedding (Hamilton et al., 2017), graph classification (Zhang et al., 2018) etc. A GNN usually consists of 1) message passing layers that extract local substructure features around nodes, and 2) a global pooling layer which aggregates node features into a graph representation for graph-level tasks such as graph classification or regression. Please refer to (Wu et al., 2019) for an overview. Our work introduces a novel application of GNN in the recommender system field.

Graph-based matrix completion   The matrix completion problem has been studied from a graph point of view previously. Monti et al. (2017) develops a multi-graph CNN model to extract user and item latent features from their respective networks and use the latent features to predict the ratings. Berg et al. (2017) directly operates on user-item bipartite graphs to extract user and item latent features using a GNN. In (Chen et al., 2005; Zhou et al., 2007), traditional link prediction heuristics are adapted to bipartite graphs and show promising performance for recommender systems. Our work differs in that we do not use any predefined heuristics, but learn general graph structure features using a GNN. Another similar work to ours is (Li and Chen, 2013), where graph kernels are used to learn graph structure features. However, graph kernels require quadratic time and space complexity to compute and store the kernel matrices thus unsuitable for modern recommender systems.

Graph pattern learning for link prediction   Learning supervised heuristics (graph patterns) has been studied for link prediction in simple graphs. Zhang and Chen (2017) proposes Weisfeiler-Lehman Neural Machine (WLNM), which learns graph structure features using a fully-connected neural network on the subgraphs’ adjacency matrices. Later, they improve this work by replacing the fully-connected neural network with a GNN and achieves state-of-the-art link prediction results (Zhang and Chen, 2018). Our work generalizes this line of research to predicting labeled links in bipartite graphs.

Appendix B More Visualization Results

Here we show the visualization results for the other three datasets: Douban, YahooMusic and MovieLens, in Figure 4, 5 and 6, respectively. As we can see, high-score and low-score subgraphs show vastly different patterns in every dataset.

Figure 4. Top 5 and bottom 5 are the testing enclosing subgraphs in Douban with the highest and lowest predicted ratings, respectively. For each subgraph, red nodes in the left are users; blue nodes in the right are items; the predicted rating is shown below it. The bottom red and blue nodes are the target user and item. We visualize edge ratings using the color map shown in the right. Higher ratings are redder.
Figure 5. Top 5 and bottom 5 are the testing enclosing subgraphs in YahooMusic with the highest and lowest predicted ratings, respectively.
Figure 6. Top 5 and bottom 5 are the testing enclosing subgraphs in MovieLens with the highest and lowest predicted ratings, respectively.