Multi-Graph Convolution Collaborative Filtering

01/01/2020 ∙ by Jianing Sun, et al. ∙ HUAWEI Technologies Co., Ltd. McGill University 0

Personalized recommendation is ubiquitous, playing an important role in many online services. Substantial research has been dedicated to learning vector representations of users and items with the goal of predicting a user's preference for an item based on the similarity of the representations. Techniques range from classic matrix factorization to more recent deep learning based methods. However, we argue that existing methods do not make full use of the information that is available from user-item interaction data and the similarities between user pairs and item pairs. In this work, we develop a graph convolution-based recommendation framework, named Multi-Graph Convolution Collaborative Filtering (Multi-GCCF), which explicitly incorporates multiple graphs in the embedding learning process. Multi-GCCF not only expressively models the high-order information via a partite user-item interaction graph, but also integrates the proximal information by building and processing user-user and item-item graphs. Furthermore, we consider the intrinsic difference between user nodes and item nodes when performing graph convolution on the bipartite graph. We conduct extensive experiments on four publicly accessible benchmarks, showing significant improvements relative to several state-of-the-art collaborative filtering and graph neural network-based recommendation models. Further experiments quantitatively verify the effectiveness of each component of our proposed model and demonstrate that the learned embeddings capture the important relationship structure.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Rapid and accurate prediction of users’ preferences is the ultimate goal of today’s recommender systems [19]. Accurate personalized recommender systems benefit both demand-side and supply-side, including the content publisher and platform. Therefore, recommender systems not only attract great interest in academia [18, 9, 10], but also are widely developed in industry [1, 2]. The core method behind recommender systems is collaborative filtering (CF) [20, 12]. The basic assumptions underpinning collaborative filtering are that similar users tend to like the same item and items with similar audiences tend to receive similar ratings from an individual.

One of the most successful methods for performing collaborative filtering is matrix factorization (MF) [12, 8, 13]. MF models characterize both items and users by vectors in the same space, inferred from the observed entries of the user-item historical interaction. More recently, deep learning models have been introduced to boost the performance of traditional MF models. However, as observed in [22], deep learning-based recommendation models are not sufficient to yield optimal embeddings because they consider only user and item features. There is no explicit incorporation of user-item interactions when developing embeddings; the interactions are only used to define the learning objectives for the model training. A second limitation of the deep learning models is the reliance on the explicit feedback from users, which is usually relatively sparse.

Bearing these limitations in mind, a natural strategy is to develop mechanisms to directly involve the user-item interactions in the embedding construction. Recent works by Ying et al. [24] and Wang et al. [22] have demonstrated the effectiveness of processing the bipartite graph, reporting improvements over the state-of-the-art models.

Despite their effectiveness, we perceive two important limitations. First, these models ignore the intrinsic difference between the two types of nodes in the bipartite graph (users and items). When aggregating information from neighboring nodes in the graph during the embedding construction procedure, the architectures in [24, 22] combine the information in the same way, using a function that has no dependence on the nature of the node. However, there is an important intrinsic difference between users and items in a real environment. This suggests that the aggregation and transformation functions should be dependent on the type of entity. Second, user-user and item-item relationships are also very important signals. Although two-hop neighborhoods in the bipartite graph capture these to some extent, it is reasonable to assume that we can improve the recommendation quality by constructing and learning from graphs that directly model user-user and item-item relationships.

In this paper, we propose a novel graph convolutional neural network (GCNN)-based recommender system framework,

Multi-GCCF, with two key innovations:

  • [leftmargin=*]

  • Capturing the intrinsic difference between users and items: we apply separate aggregation and transformation functions to process user nodes and item nodes when learning with a graph neural network.

    We find that the user and item embeddings are learned more precisely and the recommendation performance is improved.

  • Modeling user-user and item-item relationships explicitly: we construct separate user-user and item-item graphs. Multi-GCCF conducts learning simultaneously on all three graphs and employs a multi-graph encoding layer to integrate the information provided by the user-item, user-user, and item-item graphs.

We conduct empirical studies on four real-world datasets, which comprise more than one million user-item interactions. Extensive results demonstrate the superiority of Multi-GCCF over the strongest state-of-the-art models.

Ii Related Work

Ii-a Model-based Collaborative Filtering methods

Model-based CF methods learn the similarities between items and users by fitting a model to the user-item interaction data. Latent factor models are common, such as probabilistic Latent Semantic Analysis (pLAS [8]) and the most widely used approach, Matrix Factorization (MF [12]). Koren proposed SVD++ [13], which combines information about a user’s “neighbors”, i.e. items she has previously interacted with, and matrix factorization for prediction. Factorization machines [18, 10] provide a mechanism to incorporate side information such as user demographics and item attributes.

MF-based methods are limited because they are confined to the inner-product as a mechanism for measuring the similarity between embeddings of users and items. Recently, neural networks have been incorporated into collaborative filtering architectures [7, 15, 14, 4, 16]. These use a combination of fully-connected layers, convolution, inner-products and sub-nets to capture complex similarity relationships.

Fig. 1: The overall architecture of Multi-GCCF.

Ii-B Graph-based recommendation

Graphs are a natural tool for representing rich pairwise relationship information in recommendation systems. Early works [3, 6, 23] used label propagation and random walks on the user-item interaction graph to derive similarity scores for user-item pairs. With the emerging field in Graph Neural Networks (GNNs) [11, 5, 26, 25], more recent works have started to apply graph neural networks [21, 24, 22]. Graph Convolutional Matrix Completion (GCMC) [21]

treats the recommendation problem as a matrix completion task and employs a graph convolution autoencoder. PinSAGE 

[24] applies a graph neural network on the item-item graph formed by modeling the similarity between items. Neural Graph Collaborative Filtering (NGCF) [22] processes the bipartite user-item interaction graph to learn user and item embeddings.

Iii Methodology

In this section, we explain the three key components of our method. First, we develop a Bipartite Graph Convolutional Neural Network (Bipar-GCN) that acts as an encoder to generate user and item embeddings, by processing the user-item interaction bipartite graph. Second, a Multi-Graph Encoding layer (MGE) encodes latent information by constructing and processing multiple graphs: besides the user-item bipartite graph, another two graphs represent user-user similarities and item-item similarities respectively. Third, a skip connection structure between the initial node feature and final embedding allows us to exploit any residual information in the raw feature that has not been captured by the graph processing. The overall framework of Multi-GCCF is depicted in Figure 1.

Iii-a Bipartite Graph Convolutional Neural Networks

In a recommendation scenario, the user-item interaction can be readily formulated as a bipartite graph with two types of nodes. We apply a Bipartite Graph Convolutional Neural Network (Bipar-GCN) with one side representing user nodes and the other side representing item nodes, as shown in Figure 2. The Bipar-GCN layer consists of two phases: forward sampling and backward aggregating. The forward sampling phase is designed to deal with the long-tailed nature of the degree distributions in the bipartite graph. For example, popular items may attract many interactions from users while other items may attract very few.

After sampling the neighbors from layers to , Bipar-GCN encodes the user and item nodes by iteratively aggregating -hop neighborhood information via graph convolution. There are initial embeddings and that are learned for each user and item . These embeddings are learned at the same time as the parameters of the GCNs. If there are informative input features or , then the initial embedding can be a function of the features (e.g., the output of an MLP applied to ). The layer- embeddings of the target user can be represented as:

(1)

where are the initial user embeddings, represents concatenation, is the activation function, is the layer- (user) transformation weight matrix shared across all user nodes. is the learned neighborhood embedding. To achieve permutation invariance in the neighborhood, we apply an element-wise weighted mean aggregator:

(2)

Here is the layer- (user) aggregator weight matrix, which is shared across all user nodes at layer , and denotes the mean of the vectors in the argument set.

Similarly, the embedding of target item node can be generated using another set of (item) transformation and aggregator weight matrices:

(3)
Fig. 2: The accumulation of information in the bipartite user-item interaction graph.

Iii-B Multi-Graph Encoding Layer

To alleviate the data sparsity problem in CF, we propose a Multi-Graph Encoding (MGE) layer, which generates an additional embedding for a target user or item node by constructing two additional graphs and applying graph convolutional learning on them.

In particular, in addition to the user-item bipartite graph, we construct a user-user graph and an item-item graph to capture the proximity information among users and items. This proximity information can make up for the very sparse user-item interaction bipartite graph. The graphs are constructed by computing pairwise cosine similarities on the rows or columns of the rating/click matrix.

In the MGE layer, we generate embeddings for target nodes by aggregating the neighborhood features using a one-hop graph convolution layer and a sum aggregator:

(4)

Here denotes the one-hop neighbourhood of user in the user-user graph and denotes the one-hop neighbourhood of item in the item-item graph. and are the learnable user and item aggregation weight matrices, respectively.

In contrast to the Bipar-GCN layer, no additional neighbour sampling is performed in the MGE layer. We select thresholds based on the cosine similarity that lead to an average degree of 10 for each graph.

By merging the outputs of the Bipar-GCN and MGE layers together, we can take advantage of the different dependency relationships encoded by the three graphs. All three graphs can be easily constructed from historical interaction data alone, with very limited additional computation cost.

Iii-C Skip-connection with Original Node Features

We further refine the embedding with information passed directly from the original node features. The intuition behind this is that both Bipar-GCN and MGE focus on extracting latent information based on relationships. As a result, the impact of the initial node features becomes less dominant. The skip connections allows the architecture to re-emphasize these features.

We pass the original features through a single fully-connected layer to generate skip-connection embeddings.

Iii-D Information Fusion

The bipartite-GCN, MGE layer and skip connections reveal latent information from three perspectives. It is important to determine how to merge these different embeddings effectively. In this work we investigate three methods to summarize the individual embeddings into a single embedding vector:  element-wise sum,  concatenate, and attention mechanism. The exact operation of these three methods is described in Table I. We experimentally compare them in Section IV.

Formula
Element-wise sum
Concatenation
Attention
TABLE I: Comparison of different message fusion methods.

Iii-E Model Training

We adapt our model to allow forward and backward propagation for mini-batches of triplet pairs . To be more specific, we select unique user and item nodes and from mini-batch pairs, then obtain low-dimensional embeddings

after information fusion, with stochastic gradient descent on the widely-used Bayesian Personalized Recommendation (BPR) 

[17] loss for optimizing recommendation models. The objective function is as follows:

(5)

where denotes the training batch. indicates observed positive interactions. indicates sampled unobserved negative interactions. is the model parameter set and , , and are the learned embeddings. We conduct regularization on both model parameters and generated embeddings to prevent overfitting (regularization coefficients and ).

Gowalla Amazon-Books Amazon-CDs Yelp2018
Recall@20 NDCG@20 Recall@20 NDCG@20 Recall@20 NDCG@20 Recall@20 NDCG@20
BPRMF 0.1291 0.1878 0.0250 0.0518 0.0865 0.0849 0.0494 0.0662
NeuMF 0.1326 0.1985 0.0253 0.0535 0.0913 0.1043 0.0513 0.0719
GC-MC 0.1395 0.1960 0.0288 0.0551 0.1245 0.1158 0.0597 0.0741
PinSage 0.1380 0.1947 0.0283 0.0545 0.1236 0.1118 0.0612 0.0750
NGCF 0.1547 0.2237 0.0344 0.0630 0.1239 0.1138 0.0581 0.0719
Multi-GCCF (=64) 0.1595 0.2126 0.0363 0.0656 0.1390 0.1271 0.0667 0.0810
Multi-GCCF (=128) 0.1649 0.2208 0.0391 0.0705 0.1543 0.1350 0.0686 0.0835
TABLE II: The overall performance comparison. Underline indicates the second best model performance. Asterisks denote scenarios where a Wilcoxon signed rank test indicates a statistically significant difference between the scores of the best and second-best algorithms.

Iv Experimental Evaluation

We perform experiments on four real-world datasets to evaluate our model. Further, we conduct extensive ablation studies on each proposed component (Bipar-GCN, MGE and skip connect). We also provide a visualization of the learned representation.

Iv-a Datasets and Evaluation Metrics

To evaluate the effectiveness of our method, we conduct extensive experiments on four benchmark datasets: GowallaAmazon-Books, Amazon-CDs and Yelp2018 111https://snap.stanford.edu/data/loc-gowalla.html; http://jmcauley.ucsd.edu/data/amazon/; https://www.yelp.com/dataset/challenge. These datasets are publicly accessible, real-world data with various domains, sizes, and sparsity. For all datasets, we filter out users and items with fewer than 10 interactions. Table III summarizes their statistics.

For all experiments, we evaluate our model and baselines in terms of Recall@k and NDCG@k (we report Recall@20 and NDCG@20). Recall@k indicates the coverage of true (preferred) items as a result of top- recommendation. NDCG@k (normalized discounted cumulative gain) is a measure of ranking quality.

Dataset #User #Items #Interactions Density
Gowalla 29,858 40,981 1,027,370 0.084%
Yelp2018 45,919 45,538 1,185,065 0.056%
Amazon-Books 52,643 91,599 2,984,108 0.062%
Amazon-CD 43,169 35,648 777,426 0.051%
TABLE III: Statistics of evaluation datasets.

Iv-B Baseline Algorithms

We studied the performance of the following models.
Classical collaborative filtering methods: BPRMF [17]. NeuMF [7]. Graph neural network-based collaborative filtering methods: GC-MC[21]. PinSage [16]. NGCF [22].

Our proposed method: Multi-GCCF, which contains two graph convolution layers on the user-item bipartite graph (2-hop aggregation), and one graph convolution layer on top of both the user-user graph and the item-item graph to model the similarities between user-pairs and item-pairs.

Iv-C Parameter Settings

We optimize all models using the Adam optimizer with the xavier initialization. The embedding size is fixed to 64 and the batch size to 1024, for all baseline models. Grid search is applied to choose the learning rate and the coefficient of normalization over the ranges {0.0001, 0.001, 0.01, 0.1} and {, , …, }, respectively. As in [22], for GC-MC and NGCF, we also tune the dropout rate and network structure. Pre-training [7]

is used in NGCF and GC-MC to improve performance. We implement our Multi-GCCF model in PyTorch and use two Bipar-GCN layers with neighborhood sampling sizes

and . The output dimension of the first layer is fixed to 128; the final output dimension is selected from for different experiments. We set the input node embedding dimension to . The neighborhood dropout ratio is set to . The regularization parameters in the objective function are set to and .

Iv-D Comparison with Baselines

Table II reports the overall performance compared with baselines. Each result is the average performance from 5 runs with random weight initializations.

We make the following observations:

  • [leftmargin=*]

  • Multi-GCCF consistently yields the best performance for all datasets. More precisely, Multi-GCCF improves over the strongest baselines with respect to recall@20 by %, %, %, and % for Yelp2018, Amazon-CDs, Amazon-Books and Gowalla, respectively. Multi-GCCF further outperforms the strongest baselines by %, %, % and % on recall@20 for Yelp2018, Amazon-CDs, Amazon-Books and Gowalla, respectively, when increasing the latent dimension. For the NDCG@20 metric, Multi-GCCF outperforms the next best method by % to % on three dataset. Further, This suggests that, exploiting the latent information by utilizing multiple graphs and efficiently integrating different embeddings, Multi-GCCF ranks relevant items higher in the recommendation list.

Architecture Yelp2018
Recall@20 NDCG@20
Best baseline (=64) 0.0612 0.0744
Best baseline (=128) 0.0527 0.0641
1-hop Bipar-GCN 0.0650 0.0791
2-hop Bipar-GCN 0.0661 0.0804
2-hop Bipar-GCN + skip connect 0.0675 0.0821
2-hop Bipar-GCN + MGE 0.0672 0.0818
Multi-GCCF (=128) 0.0686 0.0835
TABLE IV: Ablation studies.
(a) BPRMF
(b) Multi-GCCF
Fig. 3: Visualization of the t-SNE transformed representations derived between BPRMF and Multi-GCCF on Amazon-CDs. Numbers in the legend are user IDs.

Iv-E Ablation Analysis

To assess and verify the effectiveness of the individual components of our proposed Multi-GCCF model, we conduct an ablation analysis on Gowalla and Yelp2018 in Table IV. The table illustrates the performance contribution of each component. The output embedding size is 128 for all ablation experiments. We compare to baselines because they outperform the versions.

We make the following observations:

  • [leftmargin=*]

  • All three main components of our proposed model, Bipar-GCN layer, MGE layer, and skip connection, are demonstrated to be effective.

  • Our designed Bipar-GCN can greatly boost the performance with even one graph convolution layer on both the user side and the item side. Increasing the number of graph convolution layers can slightly improve the performance.

  • Both MGE layer and skip connections lead to significant performance improvement.

  • Combining all three components leads to further improvement, indicating that the different embeddings are effectively capturing different information about users, items, and user-item relationships.

Gowalla Amazon-CDs
Recall@20 NDCG@20 Recall@20 NDCG@20
element-wise sum 0.1649 0.2208 0.1543 0.1350
concatenation 0.1575 0.2179 0.1432 0.1253
attention 0.1615 0.2162 0.1426 0.1248
TABLE V: Comparison of different information fusion methods when .

Iv-F Effect of Different Information Fusion Methods

As we obtain three embeddings from different perspectives, we compare different methods to summarize them into one vector: element-wise sum, concatenation, and attention. Table V shows the experimental results for Gowalla and Amazon-CDs. We make the following observations: Summation performs much better than concatenation and attention. Summation generates an embedding of the same dimension as the component embeddings and does not involve any additional learnable parameters. The additional flexibility of attention and concatenation may harm the generalization capability of the model.

Iv-G Embedding Visualization

Figure 3 provides a visualization of the representations derived from BPRMF and Multi-GCCF. Nodes with the same color represent all the item embeddings from one user’s clicked/visited history, including test items that remain unobserved during training. We find that both BPRMF and our proposed model have the tendency to encode the items that are preferred by the same user close to one another. However, Multi-GCCF generates tighter clusters, achieving a strong grouping effect for items that have been preferred by the same user.

V Conclusion

In this paper we have presented a novel collaborative filtering procedure that incorporates multiple graphs to explicitly represent user-item, user-user and item-item relationships. The proposed model, Multi-GCCF, constructs three embeddings learned from different perspectives on the available data. Extensive experiments on four real-world datasets demonstrate the effectiveness of our approach, and an ablation study quantitatively verifies that each component makes an important contribution. Our proposed Multi-GCCF approach is well supported under the user-friendly and efficient GNN library developed in MindSpore, a unified training and inference Huawei AI framework.

References

  • [1] H. Cheng, L. Koc, J. Harmsen, et al. (2016) Wide & deep learning for recommender systems. In Proc. Workshop Deep Learning for Recommender Systems, Cited by: §I.
  • [2] P. Covington, J. Adams, and E. Sargin (2016) Deep neural networks for youtube recommendations. In Proc. ACM Conf. Recommender Systems, Cited by: §I.
  • [3] M. Gori and A. Pucci (2007) ItemRank: A random-walk based scoring algorithm for recommender engines. In

    Proc. Int. Joint Conf. Artificial Intelligence

    ,
    Hyderabad, India. Cited by: §II-B.
  • [4] H. Guo, R. Tang, Y. Ye, Z. Li, and X. He (2017) DeepFM: A factorization-machine based neural network for CTR prediction. In Proc. Int. Joint Conf. Artificial Intelligence, Cited by: §II-A.
  • [5] W. L. Hamilton, Z. Ying, and J. Leskovec (2017) Inductive representation learning on large graphs. In Proc. Adv. Neural Inf. Proc. Systems, Cited by: §II-B.
  • [6] X. He, M. Gao, M. Kan, and D. Wang (2017) BiRank: towards ranking on bipartite graphs. CoRR abs/1708.04396. External Links: Link, 1708.04396 Cited by: §II-B.
  • [7] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T. Chua (2017) Neural collaborative filtering. In Proc. Int. Conf. World Wide Web, Cited by: §II-A, §IV-B, §IV-C.
  • [8] T. Hofmann (2004) Latent semantic models for collaborative filtering. ACM Trans. Inf. Syst. 22, pp. 89–115. Cited by: §I, §II-A.
  • [9] Y. Hu, Y. Koren, and C. Volinsky (2008) Collaborative filtering for implicit feedback datasets. In Proc. IEEE Int. Conf. Data Mining, Cited by: §I.
  • [10] Y. Juan, Y. Zhuang, W. Chin, and C. Lin (2016) Field-aware factorization machines for CTR prediction. In Proc. ACM Conf. Recommender Systems, Cited by: §I, §II-A.
  • [11] T. N. Kipf and M. Welling (2017) Semi-supervised classification with graph convolutional networks. In Proc. Int. Conf. Learning Representations, Cited by: §II-B.
  • [12] Y. Koren, R. M. Bell, and C. Volinsky (2009) Matrix factorization techniques for recommender systems. IEEE Computer 42 (8), pp. 30–37. Cited by: §I, §I, §II-A.
  • [13] Y. Koren (2008) Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Cited by: §I, §II-A.
  • [14] J. Lian, X. Zhou, F. Zhang, Z. Chen, X. Xie, and G. Sun (2018) xDeepFM: combining explicit and implicit feature interactions for recommender systems. In Proc. ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, Cited by: §II-A.
  • [15] B. Liu, R. Tang, Y. Chen, J. Yu, H. Guo, and Y. Zhang (2019) Feature generation by convolutional neural network for click-through rate prediction. In Proc. World Wide Web Conference, Cited by: §II-A.
  • [16] Y. Qu, B. Fang, W. Zhang, R. Tang, M. Niu, H. Guo, Y. Yu, and X. He (2019) Product-based neural networks for user response prediction over multi-field categorical data. ACM Trans. Inf. Syst. 37 (1), pp. 5:1–5:35. Cited by: §II-A, §IV-B.
  • [17] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme (2009) BPR: Bayesian personalized ranking from implicit feedback. In Proc. Conf. Uncertainty in Artificial Intelligence, Cited by: §III-E, §IV-B.
  • [18] S. Rendle (2010) Factorization machines. In Proc. IEEE Int. Conf. Data Mining, Cited by: §I, §II-A.
  • [19] F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor (2010) Recommender systems handbook. 1st edition, Springer-Verlag, Berlin, Heidelberg. External Links: ISBN 0387858199, 9780387858197 Cited by: §I.
  • [20] J. B. Schafer, D. Frankowski, J. L. Herlocker, and S. Sen (2007) Collaborative filtering recommender systems. In The Adaptive Web, Methods and Strategies of Web Personalization, Cited by: §I.
  • [21] R. van den Berg, T. N. Kipf, and M. Welling (2018) Graph convolutional matrix completion. In Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Cited by: §II-B, §IV-B.
  • [22] X. Wang, X. He, M. Wang, F. Feng, and T. Chua (2019) Neural graph collaborative filtering. In Proc. ACM Int. Conf. Research and Development in Information Retrieval, Cited by: §I, §I, §I, §II-B, §IV-B, §IV-C.
  • [23] J. Yang, C. Chen, C. Wang, and M. Tsai (2018) HOP-rec: high-order proximity for implicit recommendation. In Proc ACM Conf. Recommender Systems, Cited by: §II-B.
  • [24] R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec (2018) Graph convolutional neural networks for web-scale recommender systems. In Proc. ACM Int. Conf. Knowledge Discovery & Data Mining, Cited by: §I, §I, §II-B.
  • [25] Y. Zhang, S. Pal, M. Coates, and D. Üstebay (2019) Bayesian graph convolutional neural networks for semi-supervised classification. In Proc. AAAI Int. Conf. Artificial Intelligence, Cited by: §II-B.
  • [26] Y. Zhang and M. Rabbat (2018) A graph-CNN for 3D point cloud classification. In Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, Cited by: §II-B.