RGCF: Refined Graph Convolution Collaborative Filtering with concise and expressive embedding

07/07/2020 ∙ by Kang Liu, et al. ∙ Hefei University of Technology 0

Graph Convolution Network (GCN) has attracted significant attention and become the most popular method for learning graph representations. In recent years, many efforts have been focused on integrating GCN into the recommender tasks and have made remarkable progress. At its core is to explicitly capture high-order connectivities between the nodes in user-item bipartite graph. However, we theoretically and empirically find an inherent drawback existed in these GCN-based recommendation methods, where GCN is directly applied to aggregate neighboring nodes will introduce noise and information redundancy. Consequently, the these models' capability of capturing high-order connectivities among different nodes is limited, leading to suboptimal performance of the recommender tasks. The main reason is that the the nonlinear network layer inside GCN structure is not suitable for extracting non-sematic features(such as one-hot ID feature) in the collaborative filtering scenarios. In this work, we develop a new GCN-based Collaborative Filtering model, named Refined Graph convolution Collaborative Filtering(RGCF), where the construction of the embeddings of users (items) are delicately redesigned from several aspects during the aggregation on the graph. Compared to the state-of-the-art GCN-based recommendation, RGCF is more capable for capturing the implicit high-order connectivities inside the graph and the resultant vector representations are more expressive. We conduct extensive experiments on three public million-size datasets, demonstrating that our RGCF significantly outperforms state-of-the-art models. We release our code at https://github.com/hfutmars/RGCF.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Modern recommendation system has been widely applied to many online services, such as video recommendation (Sargin, 2016), music recommendation (Fang et al., 2017), E-commerce (He and McAuley, 2016b), and social network (Wang, 2019). Collaborative Filtering (CF) is the mainstream of modern recommendation algorithms (Riedl, 2001)(Su and Khoshgoftaar, 2009). The basic assumption of CF is that similar users would exhibit similar interest on same items. Matrix Factorization (MF) is the most classical CF method, which vectorizes all users and items only with their ID features, and reconstruct their historical interactions with the inner product of them (Koren Y, 2009). MF can achieve a good performance with sufficient interaction data. However, since the issue of sparsity is ubiquitous in modern recommendations, MF fails to learn expressive vector representations for users and items.

1.1. Why Graph Convolution Networks?

In order to solve the performance bottleneck caused by sparsity of datasets, many efforts have been devoted to constructing complex embedding functions. Specifically, integrating all available useful information into the embedding representations can improve the model performance. SVD++ (Koren, 2008) is the pioneer work that incorporates user historically interacted items into user’s embedding construction to model his/her preference to get expressive embedding. However, SVD++ only encodes explicit connectivities between user and item into the embedding function, while forgoing the modeling of the implicit connectivities, which can be viewed as the paths between current node and its multi-hop neighboring nodes in user-item bipartite graph (aka. High-order connectivities). Graph-based methods (Wang, 2017)(Tsai, 2018) are capable of capturing such high-order connectivities due to its capability of learning path information. For example, HOP-Rec (Tsai, 2018) indirectly integrates high-order connectivities into the embedding learning process by using random walk to enrich the interaction data for a user with multi-hop connected items. Apart from Graph-based methods which indirectly using high-order connectivities to enrich training data, GCN-based methods (Welling, 2017)(Leskovec, 2018)(Kipf and Welling, 2017) directly encode high-order connectivities into the embeddings function and achieve the significant improvement against other CF methods, illustrating that GCN is the state-of-the-art approach for capturing high-order connectivities inside the user-item interaction graph structure.

1.2. Why Not the Nonlinear Network Layers of GCN?

It is worth mentioning that in some GCN-based machine learning tasks, such as image classification

(Gupta, 2018) and node classification (Kipf and Welling, 2017)

, nonlinear network layers is necessary for feature extraction since the initial vector representations contain abundant and diverse information. In contrast, IDs of users(or items) used by most CF methods carry no complicated patterns or diverse semantic information that can be mined. We argue that directly using nonlinear graph convolution layers to process ID features like

(Chua, 2019b)(Leskovec, 2018) will inevitably brings noises to the learned embeddings, degrading the capacity of capturing high-order connectivities. To be specific, the network layers in GCN fail to distill useful information and features from the aggregated embedding inputs mapped by the one-hot ID features only. Meanwhile, too many parameters in the network layers are prone lead to the issue of overfitting and introduce redundant information into the embedding outputs. As discussed above, the nonlinear network layers in traditional GCN structure is not suitable for recommendation tasks. We elaborate on this in Section 2.2.

1.3. Why Not the Layer Aggregation Mechanism?

To the best of our knowledge, NGCF(Chua, 2019b) is the state-of-the-art GCN-based CF method. In NGCF, layer-aggregation mechanism(Xu, 2018) is applied to concatenate embeddings obtained at each convolution layer as the final embeddings. Despite its effectiveness, we argue that such layer-aggregation mechanism is unnecessary in the CF scenarios when the negative impact of nonlinear network layers is removed. Specifically, the graph convolution can be seen as a linear aggregation process without the network layers. For a target node, the embedding obtained at N-th convolution layer is equivalent to the linear combination of the initial embeddings of all neighbors within N hops, and the concatenation of embeddings obtained at each layer can also be seen as a similar linear combination. As such, using layer-aggregation concatenation mechanism in the CF scenarios is redundant and meaningless. The reason why such layer-aggregation mechanism can work in NGCF is that the information redundancy and noise generated by the network layers can be weakened by the embedding concatenation of each layer. However, in our research we actually find that the nonlinear network structure and the Layer-Aggregation Mechanism limit the model’s learning process for high-order connectivities capturing in the CF scenarios. This assumption is detailed in Section 2.3. In addition, the element-wise product terms in the aggregation process of NGCF (Chua, 2019b) are also redundant to the representations for users and items, we detail this in Section 2.2. The above assumptions are verified in Section 3.3.

1.4. Our Proposal and Contributions

In this work, we discussed the limitations of traditional GCN structure for capturing the high-order relations among different entity nodes in recommendation tasks, and propose a new GCN-based CF model, RGCF, where the entities’ embeddings are reconstructed with refined graph convolution structure and some strategies are intuitively used to reduce noise and redundancy existed in GCN-based methods. Firstly, a linear weighted average operation is used to instead the complex and nonlinear network layers in the embedding function of the GCN-based methods. Then, we simply use embeddings obtained at last layer as final representations to avoid information overlap, which is caused by embedding concatenation of each convolution layer (Layer-Aggregation Mechanism). Lastly, the element-wise product terms are removed in embedding generating process. In addition, we further improve the model performance by changing the weight of self-loop nodes in the aggregation process on user-item graph. We conduct extensive experiments on the three public datasets, and the results show that RGCF achieves the significant improvement against other state-of-the-art baselines. To be more specific, our model improves over the NGCF w.r.t recall@20 by 17.19%, 22.18%, and 40.70% in Gowalla, Yelp2018, and Amazon-Book respectively.

The main contributions of this work are as follows.

  • We analyze and verify the redundancy defect of the GCN-based recommendation methods, and highlight its negative impact on model capability of capturing high-order connectivities.

  • We present RGCF model to eliminate the representation redundancies inside the GCN-based methods by designing the refined graph convolution structure. In RGCF, the entities’ embeddings are more capable of capturing high-order connectivities better than previous methods.

  • We conduct extensive experiments on three public million-size datasets, empirically demonstrating the state-of-the-art performance of our RGCF.

The rest of this paper is organized as follows. Section 2 elaborates our proposed RGCF and discusses the information redundancy. In Section 3, we report the experimental results and analyse the effectiveness and rationality of our proposed RGCF. We give a brief review of related work in Section 4 and a conclusion of this paper in Section 5.

2. Methodology

In this section, we first brief the basic concept of GCN (Kipf and Welling, 2017) and NGCF (Chua, 2019b), and then present our model structure details, as illustrated in Figure 1. Lastly, we have a discussion about the negative impact of information redundancy on GCN-based method.

Figure 1. Illustration of our RGCF model which integrates high-order connectivities into the embeddings for user and item and outputs the matching score for that user-item pair, is the connected item set for user , and is the connected user set for item , equals to .

2.1. Preliminary

Graph Convolution Networks The core idea of GCN (Kipf and Welling, 2017) is to capture graph structure information by transforming and aggregating information of neighboring nodes. To be specific, GCN includes multiple convolutional layers, in which layer depends on the output of layer . In each layer, the information of the target entity can be aggregated by its neighbor nodes. As such, high-order embeddings can be effectively captured by stacking such multiple convolutional layers. The convolutional operation can be formulated as follows:

(1)

where is a adjacency matrix which a self-loop is added,

is the count of the total nodes, and I is identity matrix,

denotes the diagonal node degree matrix with elements , and are the matrix which respectively denote embedding collection obtained at layer and for all nodes, is the embedding length.

Neural Graph Collaborative Filtering To the best of our knowledge, NGCF (Chua, 2019b) is the state-of-the-art GCN-based Collaborative Filtering method. Distinct from standard GCN (Kipf and Welling, 2017), NGCF integrates the element-wise product of the target nodes and its neighboring nodes into the embedding function, and use the concatenation of the embeddings obtained at each layer as the final representations. The multi-layer-aggregation process for user can be formulated as follows:

(2)

where and are the weight matrices at layer , and are the embeddings at layer and for current user u, respectively. is the embedding for item i at layer , is the graph Laplacian norm (Kipf and Welling, 2017) to normalize the embeddings aggregated from previous layer, where and respectively denote u’s and i’s neighborhood,

is the nonlinear activation function.

2.2. Model

In this section, we present a detailed description of our RGCF model. As Figure 3 shows, the embeddings of users and items are generated separately. User(item) embeddings are generated by propagating information iteratively from the first layer to the last one. In each layer, entity(user or item) is embedded by aggregating the information both from the neighbor nodes and the entity itself.

We use the user embedding construction to detail the aggregation process over the graph (the light blue part of Figure 1), and the item embedding aggregation is similar. Similar to the convolution operation in GCN, our model outputs the user’s embedding by the sum of embeddings of itself and the vectors aggregated by its neighboring nodes. We formulate this iterative embedding process across multiple layer as Algorithm 1:

Input: Initial embedding for node ; set of u’s neighborhood embeddings ; self-loop weight ; and depth of message aggregation layers .
output: The embedding representation obtained at the convolution layer for node .

1:  Let .
2:  while  do
3:     .
4:     
5:     
6:     
7:  end while
8:  return
Algorithm 1 Embedding Generating

Concretely, for target user , we first initialize the embedding of itself and its neighboring nodes as , by mapping from ID. Then the embedding representation of user is iteratively aggregated from Layer 1 to . In Algorithm 1, Line 3 indicates the message aggregated from the neighboring nodes by using the embeddings from previous layer, and the message that aggregated from the user itself is defined on Line 4. reflects historical interaction information of user , can be viewed as the intrinsic properties of the node itself. We argue that the above two messages have different contributions for generating the final representation for node , thus hyper-parameter is set to control the weight of the message from itself. We report the model performance with different parameter settings for in Section 3.4. is the graph Laplacian norm to normalize embeddings aggregated from previous layer, where and respectively denote u’s and i’s neighborhood. After stacking such message aggregation operations, we get final representation for node . We can generate the representation for item node in the similar way.

Matrix Implementation. In practice, we use sparse matrix multiplications to implement the abovementioned embedding function. The detailed operations can be formulated as follows:

(3)

where is a adjacency matrix in which a weighted self-loop is added, and are the number of users and items, and is identity matrix, is a hyper-parameter to control the weight of self-loop, denotes the diagonal node degree matrix with elements , and are the matrix which denote embedding collection for all users and items obtained at layer and , respectively, and is the embedding length. It is worth noting that, distinct from traditional GCN-based methods (Chua, 2019b), the network layers are removed in our embedding generating process since they bring no benefit to model performance.

2.3. Prediction

Distinct from concatenating multiple representations obtained at each convolution layers in NGCF (Chua, 2019b), we use the embeddings obtained at last layer as the final representations, which is same as the standard GCN (Kipf and Welling, 2017). The key reason is that concatenating representations at different layers may result in the issue of information redundancy. To be specific, the embeddings obtained at layer actually contain most information comes from previous layers since the aggregation operation at previous layers is a linear operation. Thereby, in RGCF, we get the final representations for user and item as follows:

(4)

where and are the embeddings obtained at last layer for user and item respectively.

Inner product is applied to predict the matching score of a user-item pair . We formulate the prediction function as follows:

(5)

where is a predicted preference score for towards the target item , and are the final representations for user and item , and denote the bias for and , respectively. Note that setting bias terms can help distinguish nodes with different popularity since that the nodes with a large number of interactions can learn a larger bias than the nodes with a small number of interactions. That is to say, the value of the bias depends on the popularity of the nodes. This term can alleviate the negative impact of oversmoothing and improve model performance especially for top-N ranking task in recommendation.

2.4. Training

Loss Function. We use Bayesian Personalized Ranking(BPR) loss (Schmidt-Thieme, 2009) to optimize the parameters for our model. The basic assumption for BPR loss is that the observed interactions can reflect stronger preference than unobserved ones, that is to say, the predicting score for an observed user-item pair should be higher than that of the unobserved one. The loss function for our model is formulated as follows:

(6)

where is the training data, denotes the observed item set for user ,

is the sigmoid function; we apply

regularization on and parameterized by and respectively, and are the final embeddings obtained at last layer and the biases for all users and items respectively.

Optimizer. Mini-batch Adam optimizer (Kingma and Ba, 2015) is applied to optimize our model and update the model parameters. Note that the parameters that need to be updated are the embeddings mapped from ID and the biases for all users and items, which is almost equals to that of BiasSVD (Jiang et al., 2013).

2.5. Discussion on Information Redundancy

Why network layer is redundant? Distinct from traditional GCN-based methods, the nonlinear network layers are removed in our RGCF since they bring no benefit to model performance. Although the network layers can find hidden patterns from complex input embeddings which usually contain rich side information, the expressiveness of the embeddings will be limited if the inputs do not have complex patterns ( embeddings mapping from ID mapping). Meanwhile, the overfitting problem caused by too many parameters of network layers cannot be completely eliminated even if dropout technology is applied.

Why layer-aggregation mechanism is redundant?

Because the embedding aggregation at each layers is a linear transformation, the embeddings obtained at layer

already contain the information inside the embeddings of its previous layers. As such, embeddings concatenation of each layer is equivalent to multifoldly consider the contribution of low-order interactions, where the contribution of high-order interactions are relatively weakened consequently. This kind of analysis nicely supports our argument that redundancies exist in traditional GCN-based recommendation methods lead to poor of high-order connectivies capturing capacity. We use the following simplified formula which ignores the influence of graph Laplacian norm to justify this assumption.

(7)

where and denote embedding matrices obtained at first layer and second layer. We can see that contains . In this way, concatenating embedding of each layer is unnecessary when network layers are removed in RGCF. It is worth mentioning that the concatenation operation in NGCF can be effective. This is because that in NGCF the defective embeddings impaired by non-linear network layer may be remedied by the concatenation operation to some extent we conduct some experimental comparison in Section 3.3 to verify this assumption.

Why product term is redundant? In NGCF, the product term in equation 2 magnify the preference score of the user-item pair, which can increase the affinity of the interacted nodes and help speed up the model convergence. In addition, this term can weaken the negative impact of the information redundancy and noise generated by nonlinear graph convolution, which is similar to the abovementioned concatenation operation. In fact, such product term is also redundant while the interaction function is inner product. To be specific, the result of the inner product of and can reconstruct the information of the product term . We further verify this assumption in Section 3.3.

3. Experiments

In this section, we conduct experiments on three public datasets to evaluate the performance for our proposed model. We aim to answer the following research questions:

  • RQ1: How does our proposed RGCF perform compared to other state-of-the-art CF models?

  • RQ2: Whether the refined graph convolution structure is helpful for capturing high-order connectivities and further improving the model performance?

  • RQ3: How do the key hyper-parameter settings affect the performance of our proposed RGCF?

3.1. Experimental Settings

Dataset Description. We conduct experiments on three datasets: Gowalla (Blei, 2016), Yelp2018, and Amazon-book, which are the same as that used in NGCF (Chua, 2019b). We show the statistics of the three datasets in Table 1. To ensure the quality of the dataset, 10-core setting is applied to retain the users and items with at least ten interactions. For each dataset, we sample of historical interactions for each user as the training set, and treat the remaining as the test set, meanwhile, we resample of historical interactions from the training data as the validation set to tune the hyper-parameters.

Dataset User # Item # Interaction # Density
Gowalla 29,858 40,981 1,027,370 0.00084
Yelp2018 31,668 38,048 1,561,406 0.00130
Amazon-Book 52,643 91,599 2,984,108 0.00062
Table 1. Statistics of the datasets.

Evaluation Metrics. We select two widely-used evaluation protocols (Chua, 2019b): recall and ndcg to evaluate the model performance. Specifically, we compute the average recall and ndcg for each user in the test set.

Baselines. We compare our proposed method with the following baselines:

  • MF (Koren Y, 2009): This is a matrix factorization method with Bayesian Personalized Ranking(BPR) loss, which is widely used for recommendation baseline.

  • SVD++ (Koren, 2008): It is a variant of MF, which uses the user’s historical interactions to model the user’s preferences. It can also be regarded as a one-layer GCN, and it only passes messages for user embeddings. For fairly comparison, we use Bayesian Personalized Ranking(BPR) loss to optimize svd++.

  • NeuMF (Chua, 2017b)

    : It is a state-of-the-art neural collaborative filtering method which uses nonlinear neural networks as interaction function.

  • HOP-Rec (Tsai, 2018): It is a state-of-the-art graph-based method, which uses random walk to enrich the interaction data between users and their multi-hop connected items.

  • GC-MC (Welling, 2017): This model adopts GCN technique which just contains one layer convolution operation to generate the users and items representations.

  • NGCF (Chua, 2019b): It is a state-of-the-art GCN-based method, which combines embeddings obtained at different GCN layer as the final users and items representations.

Parameter Settings. To make a fair comparison, we set the embedding size as 64 for all models. We apply a grid search strategy to tune the following hyper-parameters: the learning rate is searched in 0.001,0.0005,0.0001,0.00005, the coefficient of normalization is searched in , and the weight of self-loop is searched in 0.0,0.3,0.5,0.7,1.0,1.2,1.5,1.7,2.0 . In addition, We use early stopping strategy to prevent overfitting. Our experiment results show that the optimal learning rate is and the optimal coefficient of normalization is for Gowalla, for Yelp2018, and for Amazon-book respectively.

Method Gowalla Yelp2018 Amazon-Book
recall ndcg recall ndcg recall ndcg
MF 0.1291 0.1878 0.0433 0.0864 0.0250 0.0518
SVD++ 0.1439 0.2198 0.0507 0.0975 0.0332 0.0607
NeuMF 0.1326 0.1985 0.0449 0.0886 0.0253 0.0535
HOP-Rec 0.1399 0.2128 0.0524 0.0989 0.0309 0.0606
GC-MC 0.1395 0.1960 0.0462 0.0922 0.0288 0.0551
NGCF 0.1547 0.2237 0.0559 0.1037 0.0344 0.0630
Ours 0.1813 0.2457 0.0683 0.1212 0.0484 0.0840
%Improv. 17.19% 9.83% 22.18% 16.87% 40.70% 33.33%
Table 2. Overall performance comparison w.r.t. recall@20 and ndcg@20 on Gowalla, Yelp2018, and Amazon-Book.

3.2. Performance Comparison (RQ1)

We compare the performance of all the methods in this section. Table 2 reports the performance of recall@20 and ndcg@20 for all compared methods. We have the following findings:

  • MF achieves poor performance in three datasets, indicating that simple inner product is insufficient to capture complex connectivities between users and items. NeuMF outperforms MF across all datasets, which verifies the effectiveness of applying neural networks to distill the nonlinear relations between users and items.

  • Compared to MF and NeuMF, the performance of GC-MC demonstrates that integrating the first-order connectivities into the embedding process is helpful for improving the expressiveness of the embeddings.

  • HOP-Rec generally outperforms GC-MC across all cases. The key reason is that HOP-Rec exploits the high-order neighbors to enrich the training data while GC-MC considers the first-order neighbors only.

  • The performance of SVD ++ is significantly better than that of GC-MC, which also verifies that using a nonlinear network layer to process id embeddings will add information redundancy and noise to the representations, thereby degrading the model performance.

  • NGCF consistently outperforms HOP-Rec, which demonstrates that explicitly integrating high-order connectivities into the embedding process is more efficient than exploiting high-order interactions to enrich the training data. Meanwhile, the performance of NGCF is slightly higher than that of SVD ++. The main reason is that NGCF integrates high-order interactions into the embeddings of users and items, while SVD ++ only integrates first-order interactions into user embeddings.

  • Our RGCF yields the best performance on all the datasets compared to all the baselines. Specifically, RGCF improves over the strongest baselines w.r.t. recall@20 by , , and in Gowalla, Yelp2018, and Amazon-Book, respectively. The significant improvements across all cases verify that our refined graph convolution structure and other strategies to reduce noise and information redundancy is rational and effective.

3.3. Is Refined Graph Convolution Structure Effective? (RQ2)

In this section, we first verify that the three components in GCN-based methods introduced in section 2.5 are redundant. And then, we set the experimental comparison different number of convolution layers to verify whether our proposed RGCF can enhance the capacity of high-order connectivities capturing.

Method Gowalla Yelp2018 Amazon-Book
recall ndcg recall ndcg recall ndcg
RGCF+np 0.0584 0.0777 0.0216 0.0465 0.0164 0.0355
RGCF+n 0.0608 0.0825 0.0238 0.0515 0.0166 0.0354
RGCF+npc 0.1547 0.2237 0.0559 0.1037 0.0344 0.0630
RGCF+nc 0.1616 0.2361 0.0562 0.1041 0.0359 0.0646
RGCF+pc 0.1579 0.2314 0.0584 0.1073 0.0366 0.0675
RGCF+p 0.1665 0.2392 0.06246 0.11392 0.03802 0.07092
RGCF+c 0.1680 0.2334 0.0585 0.1072 0.0373 0.0689
RGCF 0.1813 0.2457 0.0683 0.1212 0.0484 0.0840
Table 3. Performance of RGCF variants with different information redundancy.

3.3.1. Impact of information redundancy.

We have analyzed the redundancy issues of some state-of-the-art GCN models in Section 2.5, namely (1) non-linear network layers redundancy, (2) embedding concatenation redundancy, and (3) element-wise product redundancy. For the sake of presentation, we divide the experiment into two parts: Part A with non-linear network layers, and Part B without non-linear network layers.

For experiments with network layers(Part A), we have the following derived model from RGCF:

  • RGCF+n denotes the variant model in which only the non-linear network layers redundancy is reserved.

  • RGCF+np denotes the variant model with the redundancies of non-linear network layers and product terms, means that embedding concatenation redundancy is removed in NGCF.

  • RGCF+nc denotes the variant model with the redundancies of non-linear network layers and the embedding concatenation, means that product term redundancy is removed in NGCF.

  • RGCF+npc denotes the variant model which contains the above three redundancies, which is equivalent to NGCF.

For experiments without non-linear network layers(Part B), we have the following derived model from RGCF similarly:

  • RGCF+c denotes the variant model in which only the embedding concatenation redundancy is reserved.

  • RGCF+p denotes the variant model in which only the product terms redundancy is reserved.

  • RGCF+pc denotes the variant model with the redundancies of the embedding concatenation and product terms.

  • RGCF indicates that the three redundancies are all removed.

(a) Gowalla-recall.
(b) yelp2018-recall.
(c) amazon-book-recall.
(d) Gowalla-ndcg.
(e) yelp2018-ndcg.
(f) amazon-book-ndcg.
Figure 2. Performance of NGCF and RGCF with different number of convolution layers w.r.t. and on Yelp2018 and Gowalla.

Table 3 reports the experimental results. We have the following findings:

  • RGCF+n slightly outperforms RGCF+np, RGCF+nc slightly outperforms RGCF+npc, and RGCF+c outperforms RGCF+pc, which all indicate that element-wise product term is redundant and brings no benefit to model performance.

  • Compared to RGCF+np and RGCF+n, RGCF+npc and RGCF+nc achieve significant improvements. On the contrary, RGCF+p outperforms RGCF+pc. This result verifies the abovementioned assumption in Section 2.5. Specifically, embedding concatenation can partially remedy the impairing effect on embedding quality caused by non-linear network layers. However, RGCF+p outperforms RGCF+npc, which means that removing both redundancies at the same time can further improve the model performance.

  • Compared to RGCF+npc, RGCF+pc achieves the better performance. This result demonstrates that the network layers in GCN fail to extract the useful features from the inputs mapped by ID and further limit the model performance.

  • RGCF+c and RGCF+p slightly outperforms RGCF+pc across all the cases, which verifies that the product terms in NGCF and concatenation operation easily lead to the issue of information redundancy and removing these terms from embedding function can facilitate the recommendation task.

  • RGCF consistently achieves the best performance. This result demonstrates that our refined graph convolution structure which eliminates the above three redundancy issues can greatly enhance the learning process of high-order connectivities and further improve the recommendation cases.

3.3.2. Effect of convolution layer numbers

To illustrate the impact for RGCF the number of convolution layers , we demonstrate the experimental result Recall@20 and NDCG@20 on Gowalla and Yelp2018 with different in Figure 2. Jointly analysing the Figure 2, we have the following observations:

  • The performance of NGCF and RGCF Recall@20 and NDCG@20 is improved significantly with the increasing of the depth of layers in most cases. Such result demonstrates that high-order interaction is essential for modeling user preference.

  • As the depth of layers increases, the performance of NGCF improves slightly, while RGCF has impressive improvement across all the cases. This is because that our RGCF model can benefit much more from the growth of the layers depth than NGCF, verifying again that the refined structure in RGCF is capable of capturing the high-order connectivities in the user-item interaction graph.

  • When the depth of layers increases to four, both the model performance of both RGCF and NGCF slightly decrease due to overfitting. such result shows that conducting three graph convolution layers is sufficient to model expressive embeddings for users and items.

3.4. Study of hyper-parameters (RQ3)

In this study, we investigate the impact of different self-loop weight and regularization coefficient on the performance of our proposed model.

3.4.1. Effect of self-loop weight

To investigate how self-loop weight affects the model performance. We search the in the range of . Figure 3 plots the effect of self-loop weight w.r.t. and on the three datasets. Specifically, our RGCF achieves the best performance when for Gowalla, for Yelp2018, and for Amazon-Book, respectively. Such experimental result shows that the importance of self-loop is different on different datasets. Therefore finding a appropriate value of self-loop weight can be an effective strategy to further improve the recommendation task.

3.4.2. Effect of regularization coefficient

Figure 4 show the test performance recall@20 and NDCG@20 of RGCF with regarding to different regularization coefficient settings on three datasets. We tune the regularization coefficient in the range of . From the experimental results, We found that RGCF achieves the best performance when for Gowalla, for yelp2018, and for Amazon-book respectively.

(a) Gowalla.
(b) Yelp2018.
(c) Amazon-Book.
Figure 3. Performance of RGCF with different self-loop weights w.r.t. and on Yelp2018, Gowalla, and Amazon-Book.
(a) Gowalla.
(b) Yelp2018.
(c) Gowalla.
Figure 4. Performance of NGCF and RGCF with different regularization coefficient w.r.t. and on Yelp2018, Gowalla, and Amazon-book.

4. Related Work

This section introduces factorization-based CF methods and GCN-based CF methods, which are most related to our work.

4.1. Factorization-based CF methods

The core idea of the factorization-based methods is to parameterize all users and items and use the product of the user matrix and the item matrix to reconstruct the interaction matrix. For example, Matrix Factorization (MF) (Koren Y, 2009) obtains vector representations of users and items by mapping their IDs. In order to improve the expressiveness of user embeddings, SVD++ integrates historically interacted item embeddings into user embeddings (Koren, 2008). Meanwhile, many works believe that some auxiliary properties which are related to users and items, such as age, gender, occupation, price and multimedia feature (He and McAuley, 2016a)(He and McAuley, 2016b), are relevant to user preferences, and integrate such properties into the embeddings to improve the model performance. Despite the effectiveness of the abovementioned methods, these methods ignore the importance of modeling high-order connectivities. Some works can capture such high-order connectivities. For example, HOSLIM (Christakopoulou and Karypis, 2014) encodes high-order interactions into the embeddings but the time complexity is too high to handle the million-size dataset. DICF (Xue et al., 2019) and NCF (Chua, 2017b) apply the nonlinear neural netowrks as interaction function to capture high-order interactions. HOP-Rec (Tsai, 2018) is a fusion algorithm of graph method and matrix factorization method, which uses the random-walk to find higher-order neighboring nodes as a positive sample of the target node, achieving convincing results. However, HOP-Rec only uses high-order interactions to enrich the training data, the embedding representations of users and items lack explicit encoding of higher-order connectivities.

4.2. GCN-based CF methods

The GCN-based methods (Kipf and Welling, 2017)(Weinberger, 2019)(Leskovec, 2017) are capable of capturing high-order interaction connectivities between graph nodes, which is integrated into the node embedding representations. In recent years, many works have applied GCN techniques to the research field of recommendation system. GC-MC (Welling, 2017) uses GCN to construct an encoder to aggregate the information of first-order neighbors into the embedding representations of the target nodes. Compared with GC-MC, PinSage(Leskovec, 2018) extends the message aggregation function to higher-order cases and achieves the better model performance. The Section 4.4.1 in NGCF (Chua, 2019b) has proven that high-order neighborhood information aggregation can improve the expressiveness of the embeddings. NGCF is a new work that combines GCN and MF to integrate high-order connectivities into the users and items embedding representations and predict the preference score with the inner product of them.

Despite their effectiveness, we theoretically and empirically find that these methods suffer from some redundancy problems discussed in section 2.5 and the capability of capturing high-order connectivities is suboptimal. We design a refined graph convolution structure to avoid these information redundancy problems and achieve significant performance improvement shown in Table 2.

5. Conclusion

In this work, we highlight that refined graph convolution in the embedding generating process and other strategies to reduce information redundancy are critical important to enhance the model capability of capturing high-order connectivities, and further improve the expressiveness of the embeddings for users and items. We present a new GCN-based CF model, RGCF, which alleviates the negative impact caused by information redundancy and achieves significant improvements against other state-of-the-art recommendation models. Experimental results and further analysis demonstrate the effectiveness and rationality of our proposed RGCF.

In future work, we wish to further improve the RGCF performance using the attention mechanism (Vaswani and Polosukhin, 2017)(Chua, 2017a) to precisely assign the weight for neighboring nodes. Meanwhile, we are interested in integrating the causal inference (Bonner and Vasile, 2018)

and knowledge graph

(Chua, 2019c)(Chua, 2019a) into our RGCF to improve the interpretability in recommendation.

References

  • D. M. Blei (2016) Modeling user exposure in recommendation. WWW. Cited by: §3.1.
  • S. Bonner and F. Vasile (2018) Causal embeddings for recommendation. RecSys. Cited by: §5.
  • E. Christakopoulou and G. Karypis (2014) HOSLIM: higher-order sparse linear method for top-n recommender systems. Cited by: §4.1.
  • T.-S. Chua (2017a) Attentive collaborative filtering: multimedia recommendation with item- and component-level attention. SIGIR. Cited by: §5.
  • T. Chua (2017b) Neural collaborative filtering. WWW (), pp. 173–182. Cited by: 3rd item, §4.1.
  • T. Chua (2019a) KGAT: knowledge graph attention network for recommendation. KDD. Cited by: §5.
  • T. Chua (2019b) Neural graph collaborative filtering. SIGIR (), pp. 165–174. Cited by: §1.2, §1.3, §2.1, §2.2, §2.3, §2, 6th item, §3.1, §3.1, §4.2.
  • T. Chua (2019c) Unifying knowledge graph learning and recommendation: towards a better understanding of user preferences. WWW. Cited by: §5.
  • J. Fang, D. Grunberg, S. Lui, and Y. Wang (2017) Development of a music recommendation system for motivating exercise. pp. 83–86. Cited by: §1.
  • A. Gupta (2018) Zero-shot recognition via semantic embeddings and knowledge graphs. CVPR, pp. 6857–6866. Cited by: §1.2.
  • R. He and J. McAuley (2016a) Ups and downs:modeling the visual evolution of fashion trends with one-class collaborative filtering. WWW. Cited by: §4.1.
  • R. He and J. McAuley (2016b) VBPR: visual bayesian personalized ranking from implicit feedback. AAAI. Cited by: §1, §4.1.
  • X. Jiang, Z. Niu, J. Guo, G. Mustafa, Z. Lin, B. Chen, and Q. Zhou (2013) Novel boosting frameworks to improve the performance of collaborative filtering. pp. 87–99. Cited by: §2.4.
  • D. P. Kingma and J. Ba (2015) Adam: a method for stochastic optimization. ICLR (), pp. . Cited by: §2.4.
  • T. N. Kipf and M. Welling (2017) Semi-supervised classification with graph convolutional networks. ICLR (), pp. . Cited by: §1.1, §1.2, §2.1, §2.1, §2.1, §2.3, §2, §4.2.
  • V. C. Koren Y (2009) Matrix factorization techniques for recommender systems. IEEE Computer 42 (8), pp. 30–37. Cited by: §1, 1st item, §4.1.
  • Y. Koren (2008) Factorization meets the neighborhood: a multifaceted collaborative filtering model. KDD (), pp. 426–434. Cited by: §1.1, 2nd item, §4.1.
  • J. Leskovec (2017) Inductive representation learning on large graphs. NeurIPS, pp. 1025–1035. Cited by: §4.2.
  • J. Leskovec (2018)

    Graph convolutional neural networks for web-scale recommender systems

    .

    KDD(Data Science track)

    , pp. 974–983.
    Cited by: §1.1, §1.2, §4.2.
  • J. Riedl (2001) Item-based collaborative filtering recommendation algorithms. WWW, pp. 285–295. Cited by: §1.
  • E. Sargin (2016) Causal embeddings for recommendation. RecSys, pp. 191–198. Cited by: §1.
  • L. Schmidt-Thieme (2009) BPR: bayesian personalized ranking from implicit feedback. UAI (), pp. 452–461. Cited by: §2.4.
  • X. Su and T. M. Khoshgoftaar (2009) A survey of collaborative filtering techniques.

    Advances in Artificial Intelligence

    , pp. 1–19.
    Cited by: §1.
  • M. Tsai (2018) HOP-rec: high-order proximity for implicit recommendation. RecSys (), pp. 140–144. Cited by: §1.1, 4th item, §4.1.
  • A. Vaswani and I. Polosukhin (2017) Attention is all you need. CoRR. Cited by: §5.
  • D. Wang (2017) BiRank:towards ranking on bipartite graphs. TKDE 29 (1), pp. 57–71. Cited by: §1.1.
  • M. Wang (2019) A neural influence diffusion model for social recommendation. SIGIR, pp. 235–244. Cited by: §1.
  • K. Q. Weinberger (2019) Simplifying graph convolutional networks. ICML (), pp. 6861–6871. Cited by: §4.2.
  • M. Welling (2017) Graph convolutional matrix completion. KDD. Cited by: §1.1, 5th item, §4.2.
  • K. Xu (2018) Representation learning on graphs with jumping knowledge networks. ICML 80, pp. 5449–5458. Cited by: §1.3.
  • F. Xue, X. He, X. Wang, J. Xu, K. Liu, and R. Hong (2019) Deep item-based collaborative filtering for top-n recommendation. TOIS (), pp. 33:1–33:25. Cited by: §4.1.