I Introduction
Rapid and accurate prediction of users’ preferences is the ultimate goal of today’s recommender systems [19]. Accurate personalized recommender systems benefit both demandside and supplyside, including the content publisher and platform. Therefore, recommender systems not only attract great interest in academia [18, 9, 10], but also are widely developed in industry [1, 2]. The core method behind recommender systems is collaborative filtering (CF) [20, 12]. The basic assumptions underpinning collaborative filtering are that similar users tend to like the same item and items with similar audiences tend to receive similar ratings from an individual.
One of the most successful methods for performing collaborative filtering is matrix factorization (MF) [12, 8, 13]. MF models characterize both items and users by vectors in the same space, inferred from the observed entries of the useritem historical interaction. More recently, deep learning models have been introduced to boost the performance of traditional MF models. However, as observed in [22], deep learningbased recommendation models are not sufficient to yield optimal embeddings because they consider only user and item features. There is no explicit incorporation of useritem interactions when developing embeddings; the interactions are only used to define the learning objectives for the model training. A second limitation of the deep learning models is the reliance on the explicit feedback from users, which is usually relatively sparse.
Bearing these limitations in mind, a natural strategy is to develop mechanisms to directly involve the useritem interactions in the embedding construction. Recent works by Ying et al. [24] and Wang et al. [22] have demonstrated the effectiveness of processing the bipartite graph, reporting improvements over the stateoftheart models.
Despite their effectiveness, we perceive two important limitations. First, these models ignore the intrinsic difference between the two types of nodes in the bipartite graph (users and items). When aggregating information from neighboring nodes in the graph during the embedding construction procedure, the architectures in [24, 22] combine the information in the same way, using a function that has no dependence on the nature of the node. However, there is an important intrinsic difference between users and items in a real environment. This suggests that the aggregation and transformation functions should be dependent on the type of entity. Second, useruser and itemitem relationships are also very important signals. Although twohop neighborhoods in the bipartite graph capture these to some extent, it is reasonable to assume that we can improve the recommendation quality by constructing and learning from graphs that directly model useruser and itemitem relationships.
In this paper, we propose a novel graph convolutional neural network (GCNN)based recommender system framework,
MultiGCCF, with two key innovations:
[leftmargin=*]

Capturing the intrinsic difference between users and items: we apply separate aggregation and transformation functions to process user nodes and item nodes when learning with a graph neural network.
We find that the user and item embeddings are learned more precisely and the recommendation performance is improved.

Modeling useruser and itemitem relationships explicitly: we construct separate useruser and itemitem graphs. MultiGCCF conducts learning simultaneously on all three graphs and employs a multigraph encoding layer to integrate the information provided by the useritem, useruser, and itemitem graphs.
We conduct empirical studies on four realworld datasets, which comprise more than one million useritem interactions. Extensive results demonstrate the superiority of MultiGCCF over the strongest stateoftheart models.
Ii Related Work
Iia Modelbased Collaborative Filtering methods
Modelbased CF methods learn the similarities between items and users by fitting a model to the useritem interaction data. Latent factor models are common, such as probabilistic Latent Semantic Analysis (pLAS [8]) and the most widely used approach, Matrix Factorization (MF [12]). Koren proposed SVD++ [13], which combines information about a user’s “neighbors”, i.e. items she has previously interacted with, and matrix factorization for prediction. Factorization machines [18, 10] provide a mechanism to incorporate side information such as user demographics and item attributes.
MFbased methods are limited because they are confined to the innerproduct as a mechanism for measuring the similarity between embeddings of users and items. Recently, neural networks have been incorporated into collaborative filtering architectures [7, 15, 14, 4, 16]. These use a combination of fullyconnected layers, convolution, innerproducts and subnets to capture complex similarity relationships.
IiB Graphbased recommendation
Graphs are a natural tool for representing rich pairwise relationship information in recommendation systems. Early works [3, 6, 23] used label propagation and random walks on the useritem interaction graph to derive similarity scores for useritem pairs. With the emerging field in Graph Neural Networks (GNNs) [11, 5, 26, 25], more recent works have started to apply graph neural networks [21, 24, 22]. Graph Convolutional Matrix Completion (GCMC) [21]
treats the recommendation problem as a matrix completion task and employs a graph convolution autoencoder. PinSAGE
[24] applies a graph neural network on the itemitem graph formed by modeling the similarity between items. Neural Graph Collaborative Filtering (NGCF) [22] processes the bipartite useritem interaction graph to learn user and item embeddings.Iii Methodology
In this section, we explain the three key components of our method. First, we develop a Bipartite Graph Convolutional Neural Network (BiparGCN) that acts as an encoder to generate user and item embeddings, by processing the useritem interaction bipartite graph. Second, a MultiGraph Encoding layer (MGE) encodes latent information by constructing and processing multiple graphs: besides the useritem bipartite graph, another two graphs represent useruser similarities and itemitem similarities respectively. Third, a skip connection structure between the initial node feature and final embedding allows us to exploit any residual information in the raw feature that has not been captured by the graph processing. The overall framework of MultiGCCF is depicted in Figure 1.
Iiia Bipartite Graph Convolutional Neural Networks
In a recommendation scenario, the useritem interaction can be readily formulated as a bipartite graph with two types of nodes. We apply a Bipartite Graph Convolutional Neural Network (BiparGCN) with one side representing user nodes and the other side representing item nodes, as shown in Figure 2. The BiparGCN layer consists of two phases: forward sampling and backward aggregating. The forward sampling phase is designed to deal with the longtailed nature of the degree distributions in the bipartite graph. For example, popular items may attract many interactions from users while other items may attract very few.
After sampling the neighbors from layers to , BiparGCN encodes the user and item nodes by iteratively aggregating hop neighborhood information via graph convolution. There are initial embeddings and that are learned for each user and item . These embeddings are learned at the same time as the parameters of the GCNs. If there are informative input features or , then the initial embedding can be a function of the features (e.g., the output of an MLP applied to ). The layer embeddings of the target user can be represented as:
(1) 
where are the initial user embeddings, represents concatenation, is the activation function, is the layer (user) transformation weight matrix shared across all user nodes. is the learned neighborhood embedding. To achieve permutation invariance in the neighborhood, we apply an elementwise weighted mean aggregator:
(2)  
Here is the layer (user) aggregator weight matrix, which is shared across all user nodes at layer , and denotes the mean of the vectors in the argument set.
Similarly, the embedding of target item node can be generated using another set of (item) transformation and aggregator weight matrices:
(3)  
IiiB MultiGraph Encoding Layer
To alleviate the data sparsity problem in CF, we propose a MultiGraph Encoding (MGE) layer, which generates an additional embedding for a target user or item node by constructing two additional graphs and applying graph convolutional learning on them.
In particular, in addition to the useritem bipartite graph, we construct a useruser graph and an itemitem graph to capture the proximity information among users and items. This proximity information can make up for the very sparse useritem interaction bipartite graph. The graphs are constructed by computing pairwise cosine similarities on the rows or columns of the rating/click matrix.
In the MGE layer, we generate embeddings for target nodes by aggregating the neighborhood features using a onehop graph convolution layer and a sum aggregator:

(4) 
Here denotes the onehop neighbourhood of user in the useruser graph and denotes the onehop neighbourhood of item in the itemitem graph. and are the learnable user and item aggregation weight matrices, respectively.
In contrast to the BiparGCN layer, no additional neighbour sampling is performed in the MGE layer. We select thresholds based on the cosine similarity that lead to an average degree of 10 for each graph.
By merging the outputs of the BiparGCN and MGE layers together, we can take advantage of the different dependency relationships encoded by the three graphs. All three graphs can be easily constructed from historical interaction data alone, with very limited additional computation cost.
IiiC Skipconnection with Original Node Features
We further refine the embedding with information passed directly from the original node features. The intuition behind this is that both BiparGCN and MGE focus on extracting latent information based on relationships. As a result, the impact of the initial node features becomes less dominant. The skip connections allows the architecture to reemphasize these features.
We pass the original features through a single fullyconnected layer to generate skipconnection embeddings.
IiiD Information Fusion
The bipartiteGCN, MGE layer and skip connections reveal latent information from three perspectives. It is important to determine how to merge these different embeddings effectively. In this work we investigate three methods to summarize the individual embeddings into a single embedding vector: elementwise sum, concatenate, and attention mechanism. The exact operation of these three methods is described in Table I. We experimentally compare them in Section IV.
Formula  
Elementwise sum  
Concatenation  
Attention  
IiiE Model Training
We adapt our model to allow forward and backward propagation for minibatches of triplet pairs . To be more specific, we select unique user and item nodes and from minibatch pairs, then obtain lowdimensional embeddings
after information fusion, with stochastic gradient descent on the widelyused Bayesian Personalized Recommendation (BPR)
[17] loss for optimizing recommendation models. The objective function is as follows:

(5) 
where denotes the training batch. indicates observed positive interactions. indicates sampled unobserved negative interactions. is the model parameter set and , , and are the learned embeddings. We conduct regularization on both model parameters and generated embeddings to prevent overfitting (regularization coefficients and ).
Gowalla  AmazonBooks  AmazonCDs  Yelp2018  
Recall@20  NDCG@20  Recall@20  NDCG@20  Recall@20  NDCG@20  Recall@20  NDCG@20  
BPRMF  0.1291  0.1878  0.0250  0.0518  0.0865  0.0849  0.0494  0.0662 
NeuMF  0.1326  0.1985  0.0253  0.0535  0.0913  0.1043  0.0513  0.0719 
GCMC  0.1395  0.1960  0.0288  0.0551  0.1245  0.1158  0.0597  0.0741 
PinSage  0.1380  0.1947  0.0283  0.0545  0.1236  0.1118  0.0612  0.0750 
NGCF  0.1547  0.2237  0.0344  0.0630  0.1239  0.1138  0.0581  0.0719 
MultiGCCF (=64)  0.1595  0.2126  0.0363  0.0656  0.1390  0.1271  0.0667  0.0810 
MultiGCCF (=128)  0.1649  0.2208  0.0391  0.0705  0.1543  0.1350  0.0686  0.0835 
Iv Experimental Evaluation
We perform experiments on four realworld datasets to evaluate our model. Further, we conduct extensive ablation studies on each proposed component (BiparGCN, MGE and skip connect). We also provide a visualization of the learned representation.
Iva Datasets and Evaluation Metrics
To evaluate the effectiveness of our method, we conduct extensive experiments on four benchmark datasets: Gowalla, AmazonBooks, AmazonCDs and Yelp2018 ^{1}^{1}1https://snap.stanford.edu/data/locgowalla.html; http://jmcauley.ucsd.edu/data/amazon/; https://www.yelp.com/dataset/challenge. These datasets are publicly accessible, realworld data with various domains, sizes, and sparsity. For all datasets, we filter out users and items with fewer than 10 interactions. Table III summarizes their statistics.
For all experiments, we evaluate our model and baselines in terms of Recall@k and NDCG@k (we report Recall@20 and NDCG@20). Recall@k indicates the coverage of true (preferred) items as a result of top recommendation. NDCG@k (normalized discounted cumulative gain) is a measure of ranking quality.
Dataset  #User  #Items  #Interactions  Density 
Gowalla  29,858  40,981  1,027,370  0.084% 
Yelp2018  45,919  45,538  1,185,065  0.056% 
AmazonBooks  52,643  91,599  2,984,108  0.062% 
AmazonCD  43,169  35,648  777,426  0.051% 
IvB Baseline Algorithms
We studied the performance of the following models.
Classical collaborative filtering methods: BPRMF [17]. NeuMF [7]. Graph neural networkbased collaborative filtering methods: GCMC[21]. PinSage [16]. NGCF [22].
Our proposed method: MultiGCCF, which contains two graph convolution layers on the useritem bipartite graph (2hop aggregation), and one graph convolution layer on top of both the useruser graph and the itemitem graph to model the similarities between userpairs and itempairs.
IvC Parameter Settings
We optimize all models using the Adam optimizer with the xavier initialization. The embedding size is fixed to 64 and the batch size to 1024, for all baseline models. Grid search is applied to choose the learning rate and the coefficient of normalization over the ranges {0.0001, 0.001, 0.01, 0.1} and {, , …, }, respectively. As in [22], for GCMC and NGCF, we also tune the dropout rate and network structure. Pretraining [7]
is used in NGCF and GCMC to improve performance. We implement our MultiGCCF model in PyTorch and use two BiparGCN layers with neighborhood sampling sizes
and . The output dimension of the first layer is fixed to 128; the final output dimension is selected from for different experiments. We set the input node embedding dimension to . The neighborhood dropout ratio is set to . The regularization parameters in the objective function are set to and .IvD Comparison with Baselines
Table II reports the overall performance compared with baselines. Each result is the average performance from 5 runs with random weight initializations.
We make the following observations:

[leftmargin=*]

MultiGCCF consistently yields the best performance for all datasets. More precisely, MultiGCCF improves over the strongest baselines with respect to recall@20 by %, %, %, and % for Yelp2018, AmazonCDs, AmazonBooks and Gowalla, respectively. MultiGCCF further outperforms the strongest baselines by %, %, % and % on recall@20 for Yelp2018, AmazonCDs, AmazonBooks and Gowalla, respectively, when increasing the latent dimension. For the NDCG@20 metric, MultiGCCF outperforms the next best method by % to % on three dataset. Further, This suggests that, exploiting the latent information by utilizing multiple graphs and efficiently integrating different embeddings, MultiGCCF ranks relevant items higher in the recommendation list.
Architecture  Yelp2018  
Recall@20  NDCG@20  
Best baseline (=64)  0.0612  0.0744 
Best baseline (=128)  0.0527  0.0641 
1hop BiparGCN  0.0650  0.0791 
2hop BiparGCN  0.0661  0.0804 
2hop BiparGCN + skip connect  0.0675  0.0821 
2hop BiparGCN + MGE  0.0672  0.0818 
MultiGCCF (=128)  0.0686  0.0835 
IvE Ablation Analysis
To assess and verify the effectiveness of the individual components of our proposed MultiGCCF model, we conduct an ablation analysis on Gowalla and Yelp2018 in Table IV. The table illustrates the performance contribution of each component. The output embedding size is 128 for all ablation experiments. We compare to baselines because they outperform the versions.
We make the following observations:

[leftmargin=*]

All three main components of our proposed model, BiparGCN layer, MGE layer, and skip connection, are demonstrated to be effective.

Our designed BiparGCN can greatly boost the performance with even one graph convolution layer on both the user side and the item side. Increasing the number of graph convolution layers can slightly improve the performance.

Both MGE layer and skip connections lead to significant performance improvement.

Combining all three components leads to further improvement, indicating that the different embeddings are effectively capturing different information about users, items, and useritem relationships.
Gowalla  AmazonCDs  
Recall@20  NDCG@20  Recall@20  NDCG@20  
elementwise sum  0.1649  0.2208  0.1543  0.1350 
concatenation  0.1575  0.2179  0.1432  0.1253 
attention  0.1615  0.2162  0.1426  0.1248 
IvF Effect of Different Information Fusion Methods
As we obtain three embeddings from different perspectives, we compare different methods to summarize them into one vector: elementwise sum, concatenation, and attention. Table V shows the experimental results for Gowalla and AmazonCDs. We make the following observations: Summation performs much better than concatenation and attention. Summation generates an embedding of the same dimension as the component embeddings and does not involve any additional learnable parameters. The additional flexibility of attention and concatenation may harm the generalization capability of the model.
IvG Embedding Visualization
Figure 3 provides a visualization of the representations derived from BPRMF and MultiGCCF. Nodes with the same color represent all the item embeddings from one user’s clicked/visited history, including test items that remain unobserved during training. We find that both BPRMF and our proposed model have the tendency to encode the items that are preferred by the same user close to one another. However, MultiGCCF generates tighter clusters, achieving a strong grouping effect for items that have been preferred by the same user.
V Conclusion
In this paper we have presented a novel collaborative filtering procedure that incorporates multiple graphs to explicitly represent useritem, useruser and itemitem relationships. The proposed model, MultiGCCF, constructs three embeddings learned from different perspectives on the available data. Extensive experiments on four realworld datasets demonstrate the effectiveness of our approach, and an ablation study quantitatively verifies that each component makes an important contribution. Our proposed MultiGCCF approach is well supported under the userfriendly and efficient GNN library developed in MindSpore, a unified training and inference Huawei AI framework.
References
 [1] (2016) Wide & deep learning for recommender systems. In Proc. Workshop Deep Learning for Recommender Systems, Cited by: §I.
 [2] (2016) Deep neural networks for youtube recommendations. In Proc. ACM Conf. Recommender Systems, Cited by: §I.

[3]
(2007)
ItemRank: A randomwalk based scoring algorithm for recommender engines.
In
Proc. Int. Joint Conf. Artificial Intelligence
, Hyderabad, India. Cited by: §IIB.  [4] (2017) DeepFM: A factorizationmachine based neural network for CTR prediction. In Proc. Int. Joint Conf. Artificial Intelligence, Cited by: §IIA.
 [5] (2017) Inductive representation learning on large graphs. In Proc. Adv. Neural Inf. Proc. Systems, Cited by: §IIB.
 [6] (2017) BiRank: towards ranking on bipartite graphs. CoRR abs/1708.04396. External Links: Link, 1708.04396 Cited by: §IIB.
 [7] (2017) Neural collaborative filtering. In Proc. Int. Conf. World Wide Web, Cited by: §IIA, §IVB, §IVC.
 [8] (2004) Latent semantic models for collaborative filtering. ACM Trans. Inf. Syst. 22, pp. 89–115. Cited by: §I, §IIA.
 [9] (2008) Collaborative filtering for implicit feedback datasets. In Proc. IEEE Int. Conf. Data Mining, Cited by: §I.
 [10] (2016) Fieldaware factorization machines for CTR prediction. In Proc. ACM Conf. Recommender Systems, Cited by: §I, §IIA.
 [11] (2017) Semisupervised classification with graph convolutional networks. In Proc. Int. Conf. Learning Representations, Cited by: §IIB.
 [12] (2009) Matrix factorization techniques for recommender systems. IEEE Computer 42 (8), pp. 30–37. Cited by: §I, §I, §IIA.
 [13] (2008) Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Cited by: §I, §IIA.
 [14] (2018) xDeepFM: combining explicit and implicit feature interactions for recommender systems. In Proc. ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, Cited by: §IIA.
 [15] (2019) Feature generation by convolutional neural network for clickthrough rate prediction. In Proc. World Wide Web Conference, Cited by: §IIA.
 [16] (2019) Productbased neural networks for user response prediction over multifield categorical data. ACM Trans. Inf. Syst. 37 (1), pp. 5:1–5:35. Cited by: §IIA, §IVB.
 [17] (2009) BPR: Bayesian personalized ranking from implicit feedback. In Proc. Conf. Uncertainty in Artificial Intelligence, Cited by: §IIIE, §IVB.
 [18] (2010) Factorization machines. In Proc. IEEE Int. Conf. Data Mining, Cited by: §I, §IIA.
 [19] (2010) Recommender systems handbook. 1st edition, SpringerVerlag, Berlin, Heidelberg. External Links: ISBN 0387858199, 9780387858197 Cited by: §I.
 [20] (2007) Collaborative filtering recommender systems. In The Adaptive Web, Methods and Strategies of Web Personalization, Cited by: §I.
 [21] (2018) Graph convolutional matrix completion. In Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Cited by: §IIB, §IVB.
 [22] (2019) Neural graph collaborative filtering. In Proc. ACM Int. Conf. Research and Development in Information Retrieval, Cited by: §I, §I, §I, §IIB, §IVB, §IVC.
 [23] (2018) HOPrec: highorder proximity for implicit recommendation. In Proc ACM Conf. Recommender Systems, Cited by: §IIB.
 [24] (2018) Graph convolutional neural networks for webscale recommender systems. In Proc. ACM Int. Conf. Knowledge Discovery & Data Mining, Cited by: §I, §I, §IIB.
 [25] (2019) Bayesian graph convolutional neural networks for semisupervised classification. In Proc. AAAI Int. Conf. Artificial Intelligence, Cited by: §IIB.
 [26] (2018) A graphCNN for 3D point cloud classification. In Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, Cited by: §IIB.
Comments
There are no comments yet.