1. Introduction
Modern recommendation system has been widely applied to many online services, such as video recommendation (Sargin, 2016), music recommendation (Fang et al., 2017), Ecommerce (He and McAuley, 2016b), and social network (Wang, 2019). Collaborative Filtering (CF) is the mainstream of modern recommendation algorithms (Riedl, 2001)(Su and Khoshgoftaar, 2009). The basic assumption of CF is that similar users would exhibit similar interest on same items. Matrix Factorization (MF) is the most classical CF method, which vectorizes all users and items only with their ID features, and reconstruct their historical interactions with the inner product of them (Koren Y, 2009). MF can achieve a good performance with sufficient interaction data. However, since the issue of sparsity is ubiquitous in modern recommendations, MF fails to learn expressive vector representations for users and items.
1.1. Why Graph Convolution Networks?
In order to solve the performance bottleneck caused by sparsity of datasets, many efforts have been devoted to constructing complex embedding functions. Specifically, integrating all available useful information into the embedding representations can improve the model performance. SVD++ (Koren, 2008) is the pioneer work that incorporates user historically interacted items into user’s embedding construction to model his/her preference to get expressive embedding. However, SVD++ only encodes explicit connectivities between user and item into the embedding function, while forgoing the modeling of the implicit connectivities, which can be viewed as the paths between current node and its multihop neighboring nodes in useritem bipartite graph (aka. Highorder connectivities). Graphbased methods (Wang, 2017)(Tsai, 2018) are capable of capturing such highorder connectivities due to its capability of learning path information. For example, HOPRec (Tsai, 2018) indirectly integrates highorder connectivities into the embedding learning process by using random walk to enrich the interaction data for a user with multihop connected items. Apart from Graphbased methods which indirectly using highorder connectivities to enrich training data, GCNbased methods (Welling, 2017)(Leskovec, 2018)(Kipf and Welling, 2017) directly encode highorder connectivities into the embeddings function and achieve the significant improvement against other CF methods, illustrating that GCN is the stateoftheart approach for capturing highorder connectivities inside the useritem interaction graph structure.
1.2. Why Not the Nonlinear Network Layers of GCN?
It is worth mentioning that in some GCNbased machine learning tasks, such as image classification
(Gupta, 2018) and node classification (Kipf and Welling, 2017), nonlinear network layers is necessary for feature extraction since the initial vector representations contain abundant and diverse information. In contrast, IDs of users(or items) used by most CF methods carry no complicated patterns or diverse semantic information that can be mined. We argue that directly using nonlinear graph convolution layers to process ID features like
(Chua, 2019b)(Leskovec, 2018) will inevitably brings noises to the learned embeddings, degrading the capacity of capturing highorder connectivities. To be specific, the network layers in GCN fail to distill useful information and features from the aggregated embedding inputs mapped by the onehot ID features only. Meanwhile, too many parameters in the network layers are prone lead to the issue of overfitting and introduce redundant information into the embedding outputs. As discussed above, the nonlinear network layers in traditional GCN structure is not suitable for recommendation tasks. We elaborate on this in Section 2.2.1.3. Why Not the Layer Aggregation Mechanism?
To the best of our knowledge, NGCF(Chua, 2019b) is the stateoftheart GCNbased CF method. In NGCF, layeraggregation mechanism(Xu, 2018) is applied to concatenate embeddings obtained at each convolution layer as the final embeddings. Despite its effectiveness, we argue that such layeraggregation mechanism is unnecessary in the CF scenarios when the negative impact of nonlinear network layers is removed. Specifically, the graph convolution can be seen as a linear aggregation process without the network layers. For a target node, the embedding obtained at Nth convolution layer is equivalent to the linear combination of the initial embeddings of all neighbors within N hops, and the concatenation of embeddings obtained at each layer can also be seen as a similar linear combination. As such, using layeraggregation concatenation mechanism in the CF scenarios is redundant and meaningless. The reason why such layeraggregation mechanism can work in NGCF is that the information redundancy and noise generated by the network layers can be weakened by the embedding concatenation of each layer. However, in our research we actually find that the nonlinear network structure and the LayerAggregation Mechanism limit the model’s learning process for highorder connectivities capturing in the CF scenarios. This assumption is detailed in Section 2.3. In addition, the elementwise product terms in the aggregation process of NGCF (Chua, 2019b) are also redundant to the representations for users and items, we detail this in Section 2.2. The above assumptions are verified in Section 3.3.
1.4. Our Proposal and Contributions
In this work, we discussed the limitations of traditional GCN structure for capturing the highorder relations among different entity nodes in recommendation tasks, and propose a new GCNbased CF model, RGCF, where the entities’ embeddings are reconstructed with refined graph convolution structure and some strategies are intuitively used to reduce noise and redundancy existed in GCNbased methods. Firstly, a linear weighted average operation is used to instead the complex and nonlinear network layers in the embedding function of the GCNbased methods. Then, we simply use embeddings obtained at last layer as final representations to avoid information overlap, which is caused by embedding concatenation of each convolution layer (LayerAggregation Mechanism). Lastly, the elementwise product terms are removed in embedding generating process. In addition, we further improve the model performance by changing the weight of selfloop nodes in the aggregation process on useritem graph. We conduct extensive experiments on the three public datasets, and the results show that RGCF achieves the significant improvement against other stateoftheart baselines. To be more specific, our model improves over the NGCF w.r.t recall@20 by 17.19%, 22.18%, and 40.70% in Gowalla, Yelp2018, and AmazonBook respectively.
The main contributions of this work are as follows.

We analyze and verify the redundancy defect of the GCNbased recommendation methods, and highlight its negative impact on model capability of capturing highorder connectivities.

We present RGCF model to eliminate the representation redundancies inside the GCNbased methods by designing the refined graph convolution structure. In RGCF, the entities’ embeddings are more capable of capturing highorder connectivities better than previous methods.

We conduct extensive experiments on three public millionsize datasets, empirically demonstrating the stateoftheart performance of our RGCF.
The rest of this paper is organized as follows. Section 2 elaborates our proposed RGCF and discusses the information redundancy. In Section 3, we report the experimental results and analyse the effectiveness and rationality of our proposed RGCF. We give a brief review of related work in Section 4 and a conclusion of this paper in Section 5.
2. Methodology
In this section, we first brief the basic concept of GCN (Kipf and Welling, 2017) and NGCF (Chua, 2019b), and then present our model structure details, as illustrated in Figure 1. Lastly, we have a discussion about the negative impact of information redundancy on GCNbased method.
2.1. Preliminary
Graph Convolution Networks The core idea of GCN (Kipf and Welling, 2017) is to capture graph structure information by transforming and aggregating information of neighboring nodes. To be specific, GCN includes multiple convolutional layers, in which layer depends on the output of layer . In each layer, the information of the target entity can be aggregated by its neighbor nodes. As such, highorder embeddings can be effectively captured by stacking such multiple convolutional layers. The convolutional operation can be formulated as follows:
(1) 
where is a adjacency matrix which a selfloop is added,
is the count of the total nodes, and I is identity matrix,
denotes the diagonal node degree matrix with elements , and are the matrix which respectively denote embedding collection obtained at layer and for all nodes, is the embedding length.Neural Graph Collaborative Filtering To the best of our knowledge, NGCF (Chua, 2019b) is the stateoftheart GCNbased Collaborative Filtering method. Distinct from standard GCN (Kipf and Welling, 2017), NGCF integrates the elementwise product of the target nodes and its neighboring nodes into the embedding function, and use the concatenation of the embeddings obtained at each layer as the final representations. The multilayeraggregation process for user can be formulated as follows:
(2) 
where and are the weight matrices at layer , and are the embeddings at layer and for current user u, respectively. is the embedding for item i at layer , is the graph Laplacian norm (Kipf and Welling, 2017) to normalize the embeddings aggregated from previous layer, where and respectively denote u’s and i’s neighborhood,
is the nonlinear activation function.
2.2. Model
In this section, we present a detailed description of our RGCF model. As Figure 3 shows, the embeddings of users and items are generated separately. User(item) embeddings are generated by propagating information iteratively from the first layer to the last one. In each layer, entity(user or item) is embedded by aggregating the information both from the neighbor nodes and the entity itself.
We use the user embedding construction to detail the aggregation process over the graph (the light blue part of Figure 1), and the item embedding aggregation is similar. Similar to the convolution operation in GCN, our model outputs the user’s embedding by the sum of embeddings of itself and the vectors aggregated by its neighboring nodes. We formulate this iterative embedding process across multiple layer as Algorithm 1:
Concretely, for target user , we first initialize the embedding of itself and its neighboring nodes as , by mapping from ID. Then the embedding representation of user is iteratively aggregated from Layer 1 to . In Algorithm 1, Line 3 indicates the message aggregated from the neighboring nodes by using the embeddings from previous layer, and the message that aggregated from the user itself is defined on Line 4. reflects historical interaction information of user , can be viewed as the intrinsic properties of the node itself. We argue that the above two messages have different contributions for generating the final representation for node , thus hyperparameter is set to control the weight of the message from itself. We report the model performance with different parameter settings for in Section 3.4. is the graph Laplacian norm to normalize embeddings aggregated from previous layer, where and respectively denote u’s and i’s neighborhood. After stacking such message aggregation operations, we get final representation for node . We can generate the representation for item node in the similar way.
Matrix Implementation. In practice, we use sparse matrix multiplications to implement the abovementioned embedding function. The detailed operations can be formulated as follows:
(3) 
where is a adjacency matrix in which a weighted selfloop is added, and are the number of users and items, and is identity matrix, is a hyperparameter to control the weight of selfloop, denotes the diagonal node degree matrix with elements , and are the matrix which denote embedding collection for all users and items obtained at layer and , respectively, and is the embedding length. It is worth noting that, distinct from traditional GCNbased methods (Chua, 2019b), the network layers are removed in our embedding generating process since they bring no benefit to model performance.
2.3. Prediction
Distinct from concatenating multiple representations obtained at each convolution layers in NGCF (Chua, 2019b), we use the embeddings obtained at last layer as the final representations, which is same as the standard GCN (Kipf and Welling, 2017). The key reason is that concatenating representations at different layers may result in the issue of information redundancy. To be specific, the embeddings obtained at layer actually contain most information comes from previous layers since the aggregation operation at previous layers is a linear operation. Thereby, in RGCF, we get the final representations for user and item as follows:
(4) 
where and are the embeddings obtained at last layer for user and item respectively.
Inner product is applied to predict the matching score of a useritem pair . We formulate the prediction function as follows:
(5) 
where is a predicted preference score for towards the target item , and are the final representations for user and item , and denote the bias for and , respectively. Note that setting bias terms can help distinguish nodes with different popularity since that the nodes with a large number of interactions can learn a larger bias than the nodes with a small number of interactions. That is to say, the value of the bias depends on the popularity of the nodes. This term can alleviate the negative impact of oversmoothing and improve model performance especially for topN ranking task in recommendation.
2.4. Training
Loss Function. We use Bayesian Personalized Ranking(BPR) loss (SchmidtThieme, 2009) to optimize the parameters for our model. The basic assumption for BPR loss is that the observed interactions can reflect stronger preference than unobserved ones, that is to say, the predicting score for an observed useritem pair should be higher than that of the unobserved one. The loss function for our model is formulated as follows:
(6) 
where is the training data, denotes the observed item set for user ,
is the sigmoid function; we apply
regularization on and parameterized by and respectively, and are the final embeddings obtained at last layer and the biases for all users and items respectively.Optimizer. Minibatch Adam optimizer (Kingma and Ba, 2015) is applied to optimize our model and update the model parameters. Note that the parameters that need to be updated are the embeddings mapped from ID and the biases for all users and items, which is almost equals to that of BiasSVD (Jiang et al., 2013).
2.5. Discussion on Information Redundancy
Why network layer is redundant? Distinct from traditional GCNbased methods, the nonlinear network layers are removed in our RGCF since they bring no benefit to model performance. Although the network layers can find hidden patterns from complex input embeddings which usually contain rich side information, the expressiveness of the embeddings will be limited if the inputs do not have complex patterns ( embeddings mapping from ID mapping). Meanwhile, the overfitting problem caused by too many parameters of network layers cannot be completely eliminated even if dropout technology is applied.
Why layeraggregation mechanism is redundant?
Because the embedding aggregation at each layers is a linear transformation, the embeddings obtained at layer
already contain the information inside the embeddings of its previous layers. As such, embeddings concatenation of each layer is equivalent to multifoldly consider the contribution of loworder interactions, where the contribution of highorder interactions are relatively weakened consequently. This kind of analysis nicely supports our argument that redundancies exist in traditional GCNbased recommendation methods lead to poor of highorder connectivies capturing capacity. We use the following simplified formula which ignores the influence of graph Laplacian norm to justify this assumption.(7) 
where and denote embedding matrices obtained at first layer and second layer. We can see that contains . In this way, concatenating embedding of each layer is unnecessary when network layers are removed in RGCF. It is worth mentioning that the concatenation operation in NGCF can be effective. This is because that in NGCF the defective embeddings impaired by nonlinear network layer may be remedied by the concatenation operation to some extent we conduct some experimental comparison in Section 3.3 to verify this assumption.
Why product term is redundant? In NGCF, the product term in equation 2 magnify the preference score of the useritem pair, which can increase the affinity of the interacted nodes and help speed up the model convergence. In addition, this term can weaken the negative impact of the information redundancy and noise generated by nonlinear graph convolution, which is similar to the abovementioned concatenation operation. In fact, such product term is also redundant while the interaction function is inner product. To be specific, the result of the inner product of and can reconstruct the information of the product term . We further verify this assumption in Section 3.3.
3. Experiments
In this section, we conduct experiments on three public datasets to evaluate the performance for our proposed model. We aim to answer the following research questions:

RQ1: How does our proposed RGCF perform compared to other stateoftheart CF models?

RQ2: Whether the refined graph convolution structure is helpful for capturing highorder connectivities and further improving the model performance?

RQ3: How do the key hyperparameter settings affect the performance of our proposed RGCF?
3.1. Experimental Settings
Dataset Description. We conduct experiments on three datasets: Gowalla (Blei, 2016), Yelp2018, and Amazonbook, which are the same as that used in NGCF (Chua, 2019b). We show the statistics of the three datasets in Table 1. To ensure the quality of the dataset, 10core setting is applied to retain the users and items with at least ten interactions. For each dataset, we sample of historical interactions for each user as the training set, and treat the remaining as the test set, meanwhile, we resample of historical interactions from the training data as the validation set to tune the hyperparameters.
Dataset  User #  Item #  Interaction #  Density 

Gowalla  29,858  40,981  1,027,370  0.00084 
Yelp2018  31,668  38,048  1,561,406  0.00130 
AmazonBook  52,643  91,599  2,984,108  0.00062 
Evaluation Metrics. We select two widelyused evaluation protocols (Chua, 2019b): recall and ndcg to evaluate the model performance. Specifically, we compute the average recall and ndcg for each user in the test set.
Baselines. We compare our proposed method with the following baselines:

MF (Koren Y, 2009): This is a matrix factorization method with Bayesian Personalized Ranking(BPR) loss, which is widely used for recommendation baseline.

SVD++ (Koren, 2008): It is a variant of MF, which uses the user’s historical interactions to model the user’s preferences. It can also be regarded as a onelayer GCN, and it only passes messages for user embeddings. For fairly comparison, we use Bayesian Personalized Ranking(BPR) loss to optimize svd++.

NeuMF (Chua, 2017b)
: It is a stateoftheart neural collaborative filtering method which uses nonlinear neural networks as interaction function.

HOPRec (Tsai, 2018): It is a stateoftheart graphbased method, which uses random walk to enrich the interaction data between users and their multihop connected items.

GCMC (Welling, 2017): This model adopts GCN technique which just contains one layer convolution operation to generate the users and items representations.

NGCF (Chua, 2019b): It is a stateoftheart GCNbased method, which combines embeddings obtained at different GCN layer as the final users and items representations.
Parameter Settings. To make a fair comparison, we set the embedding size as 64 for all models. We apply a grid search strategy to tune the following hyperparameters: the learning rate is searched in 0.001,0.0005,0.0001,0.00005, the coefficient of normalization is searched in , and the weight of selfloop is searched in 0.0,0.3,0.5,0.7,1.0,1.2,1.5,1.7,2.0 . In addition, We use early stopping strategy to prevent overfitting. Our experiment results show that the optimal learning rate is and the optimal coefficient of normalization is for Gowalla, for Yelp2018, and for Amazonbook respectively.
Method  Gowalla  Yelp2018  AmazonBook  
recall  ndcg  recall  ndcg  recall  ndcg  
MF  0.1291  0.1878  0.0433  0.0864  0.0250  0.0518 
SVD++  0.1439  0.2198  0.0507  0.0975  0.0332  0.0607 
NeuMF  0.1326  0.1985  0.0449  0.0886  0.0253  0.0535 
HOPRec  0.1399  0.2128  0.0524  0.0989  0.0309  0.0606 
GCMC  0.1395  0.1960  0.0462  0.0922  0.0288  0.0551 
NGCF  0.1547  0.2237  0.0559  0.1037  0.0344  0.0630 
Ours  0.1813  0.2457  0.0683  0.1212  0.0484  0.0840 
%Improv.  17.19%  9.83%  22.18%  16.87%  40.70%  33.33% 
3.2. Performance Comparison (RQ1)
We compare the performance of all the methods in this section. Table 2 reports the performance of recall@20 and ndcg@20 for all compared methods. We have the following findings:

MF achieves poor performance in three datasets, indicating that simple inner product is insufficient to capture complex connectivities between users and items. NeuMF outperforms MF across all datasets, which verifies the effectiveness of applying neural networks to distill the nonlinear relations between users and items.

Compared to MF and NeuMF, the performance of GCMC demonstrates that integrating the firstorder connectivities into the embedding process is helpful for improving the expressiveness of the embeddings.

HOPRec generally outperforms GCMC across all cases. The key reason is that HOPRec exploits the highorder neighbors to enrich the training data while GCMC considers the firstorder neighbors only.

The performance of SVD ++ is significantly better than that of GCMC, which also verifies that using a nonlinear network layer to process id embeddings will add information redundancy and noise to the representations, thereby degrading the model performance.

NGCF consistently outperforms HOPRec, which demonstrates that explicitly integrating highorder connectivities into the embedding process is more efficient than exploiting highorder interactions to enrich the training data. Meanwhile, the performance of NGCF is slightly higher than that of SVD ++. The main reason is that NGCF integrates highorder interactions into the embeddings of users and items, while SVD ++ only integrates firstorder interactions into user embeddings.

Our RGCF yields the best performance on all the datasets compared to all the baselines. Specifically, RGCF improves over the strongest baselines w.r.t. recall@20 by , , and in Gowalla, Yelp2018, and AmazonBook, respectively. The significant improvements across all cases verify that our refined graph convolution structure and other strategies to reduce noise and information redundancy is rational and effective.
3.3. Is Refined Graph Convolution Structure Effective? (RQ2)
In this section, we first verify that the three components in GCNbased methods introduced in section 2.5 are redundant. And then, we set the experimental comparison different number of convolution layers to verify whether our proposed RGCF can enhance the capacity of highorder connectivities capturing.
Method  Gowalla  Yelp2018  AmazonBook  
recall  ndcg  recall  ndcg  recall  ndcg  
RGCF+np  0.0584  0.0777  0.0216  0.0465  0.0164  0.0355 
RGCF+n  0.0608  0.0825  0.0238  0.0515  0.0166  0.0354 
RGCF+npc  0.1547  0.2237  0.0559  0.1037  0.0344  0.0630 
RGCF+nc  0.1616  0.2361  0.0562  0.1041  0.0359  0.0646 
RGCF+pc  0.1579  0.2314  0.0584  0.1073  0.0366  0.0675 
RGCF+p  0.1665  0.2392  0.06246  0.11392  0.03802  0.07092 
RGCF+c  0.1680  0.2334  0.0585  0.1072  0.0373  0.0689 
RGCF  0.1813  0.2457  0.0683  0.1212  0.0484  0.0840 
3.3.1. Impact of information redundancy.
We have analyzed the redundancy issues of some stateoftheart GCN models in Section 2.5, namely (1) nonlinear network layers redundancy, (2) embedding concatenation redundancy, and (3) elementwise product redundancy. For the sake of presentation, we divide the experiment into two parts: Part A with nonlinear network layers, and Part B without nonlinear network layers.
For experiments with network layers(Part A), we have the following derived model from RGCF:

RGCF+n denotes the variant model in which only the nonlinear network layers redundancy is reserved.

RGCF+np denotes the variant model with the redundancies of nonlinear network layers and product terms, means that embedding concatenation redundancy is removed in NGCF.

RGCF+nc denotes the variant model with the redundancies of nonlinear network layers and the embedding concatenation, means that product term redundancy is removed in NGCF.

RGCF+npc denotes the variant model which contains the above three redundancies, which is equivalent to NGCF.
For experiments without nonlinear network layers(Part B), we have the following derived model from RGCF similarly:

RGCF+c denotes the variant model in which only the embedding concatenation redundancy is reserved.

RGCF+p denotes the variant model in which only the product terms redundancy is reserved.

RGCF+pc denotes the variant model with the redundancies of the embedding concatenation and product terms.

RGCF indicates that the three redundancies are all removed.
Table 3 reports the experimental results. We have the following findings:

RGCF+n slightly outperforms RGCF+np, RGCF+nc slightly outperforms RGCF+npc, and RGCF+c outperforms RGCF+pc, which all indicate that elementwise product term is redundant and brings no benefit to model performance.

Compared to RGCF+np and RGCF+n, RGCF+npc and RGCF+nc achieve significant improvements. On the contrary, RGCF+p outperforms RGCF+pc. This result verifies the abovementioned assumption in Section 2.5. Specifically, embedding concatenation can partially remedy the impairing effect on embedding quality caused by nonlinear network layers. However, RGCF+p outperforms RGCF+npc, which means that removing both redundancies at the same time can further improve the model performance.

Compared to RGCF+npc, RGCF+pc achieves the better performance. This result demonstrates that the network layers in GCN fail to extract the useful features from the inputs mapped by ID and further limit the model performance.

RGCF+c and RGCF+p slightly outperforms RGCF+pc across all the cases, which verifies that the product terms in NGCF and concatenation operation easily lead to the issue of information redundancy and removing these terms from embedding function can facilitate the recommendation task.

RGCF consistently achieves the best performance. This result demonstrates that our refined graph convolution structure which eliminates the above three redundancy issues can greatly enhance the learning process of highorder connectivities and further improve the recommendation cases.
3.3.2. Effect of convolution layer numbers
To illustrate the impact for RGCF the number of convolution layers , we demonstrate the experimental result Recall@20 and NDCG@20 on Gowalla and Yelp2018 with different in Figure 2. Jointly analysing the Figure 2, we have the following observations:

The performance of NGCF and RGCF Recall@20 and NDCG@20 is improved significantly with the increasing of the depth of layers in most cases. Such result demonstrates that highorder interaction is essential for modeling user preference.

As the depth of layers increases, the performance of NGCF improves slightly, while RGCF has impressive improvement across all the cases. This is because that our RGCF model can benefit much more from the growth of the layers depth than NGCF, verifying again that the refined structure in RGCF is capable of capturing the highorder connectivities in the useritem interaction graph.

When the depth of layers increases to four, both the model performance of both RGCF and NGCF slightly decrease due to overfitting. such result shows that conducting three graph convolution layers is sufficient to model expressive embeddings for users and items.
3.4. Study of hyperparameters (RQ3)
In this study, we investigate the impact of different selfloop weight and regularization coefficient on the performance of our proposed model.
3.4.1. Effect of selfloop weight
To investigate how selfloop weight affects the model performance. We search the in the range of . Figure 3 plots the effect of selfloop weight w.r.t. and on the three datasets. Specifically, our RGCF achieves the best performance when for Gowalla, for Yelp2018, and for AmazonBook, respectively. Such experimental result shows that the importance of selfloop is different on different datasets. Therefore finding a appropriate value of selfloop weight can be an effective strategy to further improve the recommendation task.
3.4.2. Effect of regularization coefficient
Figure 4 show the test performance recall@20 and NDCG@20 of RGCF with regarding to different regularization coefficient settings on three datasets. We tune the regularization coefficient in the range of . From the experimental results, We found that RGCF achieves the best performance when for Gowalla, for yelp2018, and for Amazonbook respectively.
4. Related Work
This section introduces factorizationbased CF methods and GCNbased CF methods, which are most related to our work.
4.1. Factorizationbased CF methods
The core idea of the factorizationbased methods is to parameterize all users and items and use the product of the user matrix and the item matrix to reconstruct the interaction matrix. For example, Matrix Factorization (MF) (Koren Y, 2009) obtains vector representations of users and items by mapping their IDs. In order to improve the expressiveness of user embeddings, SVD++ integrates historically interacted item embeddings into user embeddings (Koren, 2008). Meanwhile, many works believe that some auxiliary properties which are related to users and items, such as age, gender, occupation, price and multimedia feature (He and McAuley, 2016a)(He and McAuley, 2016b), are relevant to user preferences, and integrate such properties into the embeddings to improve the model performance. Despite the effectiveness of the abovementioned methods, these methods ignore the importance of modeling highorder connectivities. Some works can capture such highorder connectivities. For example, HOSLIM (Christakopoulou and Karypis, 2014) encodes highorder interactions into the embeddings but the time complexity is too high to handle the millionsize dataset. DICF (Xue et al., 2019) and NCF (Chua, 2017b) apply the nonlinear neural netowrks as interaction function to capture highorder interactions. HOPRec (Tsai, 2018) is a fusion algorithm of graph method and matrix factorization method, which uses the randomwalk to find higherorder neighboring nodes as a positive sample of the target node, achieving convincing results. However, HOPRec only uses highorder interactions to enrich the training data, the embedding representations of users and items lack explicit encoding of higherorder connectivities.
4.2. GCNbased CF methods
The GCNbased methods (Kipf and Welling, 2017)(Weinberger, 2019)(Leskovec, 2017) are capable of capturing highorder interaction connectivities between graph nodes, which is integrated into the node embedding representations. In recent years, many works have applied GCN techniques to the research field of recommendation system. GCMC (Welling, 2017) uses GCN to construct an encoder to aggregate the information of firstorder neighbors into the embedding representations of the target nodes. Compared with GCMC, PinSage(Leskovec, 2018) extends the message aggregation function to higherorder cases and achieves the better model performance. The Section 4.4.1 in NGCF (Chua, 2019b) has proven that highorder neighborhood information aggregation can improve the expressiveness of the embeddings. NGCF is a new work that combines GCN and MF to integrate highorder connectivities into the users and items embedding representations and predict the preference score with the inner product of them.
Despite their effectiveness, we theoretically and empirically find that these methods suffer from some redundancy problems discussed in section 2.5 and the capability of capturing highorder connectivities is suboptimal. We design a refined graph convolution structure to avoid these information redundancy problems and achieve significant performance improvement shown in Table 2.
5. Conclusion
In this work, we highlight that refined graph convolution in the embedding generating process and other strategies to reduce information redundancy are critical important to enhance the model capability of capturing highorder connectivities, and further improve the expressiveness of the embeddings for users and items. We present a new GCNbased CF model, RGCF, which alleviates the negative impact caused by information redundancy and achieves significant improvements against other stateoftheart recommendation models. Experimental results and further analysis demonstrate the effectiveness and rationality of our proposed RGCF.
In future work, we wish to further improve the RGCF performance using the attention mechanism (Vaswani and Polosukhin, 2017)(Chua, 2017a) to precisely assign the weight for neighboring nodes. Meanwhile, we are interested in integrating the causal inference (Bonner and Vasile, 2018)
and knowledge graph
(Chua, 2019c)(Chua, 2019a) into our RGCF to improve the interpretability in recommendation.References
 Modeling user exposure in recommendation. WWW. Cited by: §3.1.
 Causal embeddings for recommendation. RecSys. Cited by: §5.
 HOSLIM: higherorder sparse linear method for topn recommender systems. Cited by: §4.1.
 Attentive collaborative filtering: multimedia recommendation with item and componentlevel attention. SIGIR. Cited by: §5.
 Neural collaborative filtering. WWW (), pp. 173–182. Cited by: 3rd item, §4.1.
 KGAT: knowledge graph attention network for recommendation. KDD. Cited by: §5.
 Neural graph collaborative filtering. SIGIR (), pp. 165–174. Cited by: §1.2, §1.3, §2.1, §2.2, §2.3, §2, 6th item, §3.1, §3.1, §4.2.
 Unifying knowledge graph learning and recommendation: towards a better understanding of user preferences. WWW. Cited by: §5.
 Development of a music recommendation system for motivating exercise. pp. 83–86. Cited by: §1.
 Zeroshot recognition via semantic embeddings and knowledge graphs. CVPR, pp. 6857–6866. Cited by: §1.2.
 Ups and downs:modeling the visual evolution of fashion trends with oneclass collaborative filtering. WWW. Cited by: §4.1.
 VBPR: visual bayesian personalized ranking from implicit feedback. AAAI. Cited by: §1, §4.1.
 Novel boosting frameworks to improve the performance of collaborative filtering. pp. 87–99. Cited by: §2.4.
 Adam: a method for stochastic optimization. ICLR (), pp. . Cited by: §2.4.
 Semisupervised classification with graph convolutional networks. ICLR (), pp. . Cited by: §1.1, §1.2, §2.1, §2.1, §2.1, §2.3, §2, §4.2.
 Matrix factorization techniques for recommender systems. IEEE Computer 42 (8), pp. 30–37. Cited by: §1, 1st item, §4.1.
 Factorization meets the neighborhood: a multifaceted collaborative filtering model. KDD (), pp. 426–434. Cited by: §1.1, 2nd item, §4.1.
 Inductive representation learning on large graphs. NeurIPS, pp. 1025–1035. Cited by: §4.2.

Graph convolutional neural networks for webscale recommender systems
.KDD(Data Science track)
, pp. 974–983. Cited by: §1.1, §1.2, §4.2.  Itembased collaborative filtering recommendation algorithms. WWW, pp. 285–295. Cited by: §1.
 Causal embeddings for recommendation. RecSys, pp. 191–198. Cited by: §1.
 BPR: bayesian personalized ranking from implicit feedback. UAI (), pp. 452–461. Cited by: §2.4.

A survey of collaborative filtering techniques.
Advances in Artificial Intelligence
, pp. 1–19. Cited by: §1.  HOPrec: highorder proximity for implicit recommendation. RecSys (), pp. 140–144. Cited by: §1.1, 4th item, §4.1.
 Attention is all you need. CoRR. Cited by: §5.
 BiRank:towards ranking on bipartite graphs. TKDE 29 (1), pp. 57–71. Cited by: §1.1.
 A neural influence diffusion model for social recommendation. SIGIR, pp. 235–244. Cited by: §1.
 Simplifying graph convolutional networks. ICML (), pp. 6861–6871. Cited by: §4.2.
 Graph convolutional matrix completion. KDD. Cited by: §1.1, 5th item, §4.2.
 Representation learning on graphs with jumping knowledge networks. ICML 80, pp. 5449–5458. Cited by: §1.3.
 Deep itembased collaborative filtering for topn recommendation. TOIS (), pp. 33:1–33:25. Cited by: §4.1.
Comments
There are no comments yet.