Introduction
Recent years have witnessed the boom of GCNs, which are efficient variants of CNNs for dealing with graph based data [Kipf and Welling2017, Hamilton, Ying, and Leskovec2017]
. The key idea of GCNs is to stack multiple layers that iteratively perform the following two steps at each layer: node embedding with convolutional neighborhood aggregation; followed by a nonlinear transformation of node embeddings parameterized by a neural network. Therefore, the higherorder similarity of a node could be effectively captured
[Xu et al.2019b, Li, Han, and Wu2018]. These models show competing performance for tasks such as unsupervised node (graph) representation learning [Li et al.2016], semisupervised node (graph) learning [Kipf and Welling2017, CampsValls, Marsheva, and Zhou2007], and so on.As many realworld data show the graph structure, GCNs have been widely applied to applications such as social network analysis [Xu et al.2019a], transportation network [Zhao et al.2019], and recommender systems [Berg, Kipf, and Welling2018]. In this paper, we focus on applying GCNs to CF based recommender systems. CF provides personalized item suggestions to users by learning user and item embeddings from their historical behavior data [Rendle et al.2009, He et al.2017]. In fact, by treating the useritem historical behavior as a bipartite graph with edges between users and items, CF can be naturally transformed into the edge prediction problem in the graph. This graph representation of useritem behavior advances previous useritem 6 interaction matrix with more higherorder user and item correlations, and provides the possibility to alleviate the data sparsity issue in CF with graph structure modeling [Wang et al.2019, Berg, Kipf, and Welling2018, Ying et al.2018]. Some earlier works applied personalized random walks [Liu and Yang2008] or relied on graph regularization models with auxiliary graph data (e.g., social network) for recommendation [Gu, Zhou, and Ding2010, Huang, Chung, and Chen2004]. These models suffered from a huge time complexity with personalized random walk, and most of these models relied on carefully designing the random walk process. Recently, plenty of researchers pay more attention to apply GCNs for recommendation [Wu, Liu, and Yang2018, Wang et al.2019, Berg, Kipf, and Welling2018, Ying et al.2018]. For example, PinSage designed sampling techniques for graph convolution aggregation to alleviate the computational burden in the recommendation process [Ying et al.2018]. By feeding the user and item free embeddings as input, NGCF was specially designed for GCN based CF [Wang et al.2019]. NGCF iteratively propagates user and item embeddings in the graph to distill the collaborative signals with graph convolutions. These GCN based recommender models show better performance compared to traditional models.
Despite the relative success of GCN based recommendation, we argue that two important problems in GCN based CF still remain unsolved. On one hand, for user and item embeddings, GCNs follow the two steps of neighborhood aggregation with graph convolutional operations and nonlinear transformations. While graph convolutional operations are effective for aggregating the neighborhood information and modeling higher order graph structure, is the additional complexity introduced by the nonlinear feature transformation in GCNs necessary? On the other hand, most of the current GCN based models could only stack very few layers (e.g., 2 layers). In fact, the graph convolution operation is a special kind of Graph Laplacian smoothing [Li, Han, and Wu2018, Klicpera, Bojchevski, and Günnemann2019]. With th layer of GCNs, the Laplacian smoothing is performed to incorporate the up to th neighbors. Therefore, the oversmoothing effect exists with deep layers, as the higher layer neighbors tend to be indistinguishable for each node. With limited useritem interaction records in the recommendation [Wu et al.2017, Wu et al.2016], this problem would become more severe since the training records are very sparse. Intuitively, with the increasing of the stacking layers, the smoothing effect could alleviate the data sparsity of CF at first, but the over smoothing effect introduced by more layers would neglect each user’s uniqueness and degrade the recommendation performance. How to better model the graph structure while avoiding the over smoothing effect in this process remains pretty much open.
To tackle the above two issues, we revisit the graph based CF models with a linear residual graph convolutional approach. Our main contributions lie in two aspects: on one hand, we empirically analyze the uniqueness of CF from most graph based tasks, and show that removing the nonlinearity would enhance the recommendation performance with less complexity, which is consistent with the recent theories in simplifying GCNs [Wu et al.2019a]. Furthermore, to alleviate the over smoothing problem in the iterative process, we propose to learn the residual useritem preference at each layer. Thus, the user uniqueness is preserved at the lower layers, while the higher layers of the GCNs could focus on learning users’ residual preferences that could not be captured from each user’s limited historical records. Please note that this idea is inspired the ResNet architecture in CNNs [He et al.2016, Wu, Shen, and Van Den Hengel2019], and our work focuses on how to extend the formulation of the residual part in CF with the interaction prediction between users and items under GCNs. We then show that with linear residual learning, our proposed model degenerates to a linear model that effectively leverages the useritem graph structure for recommendation. In summary, in contrast to current GCN based recommendation models, our proposed model is easier to train, scales to large datasets. Finally, we perform extensive experiments on two large realworld CF datasets, and the results clearly show the effectiveness and efficiency of our proposed model.
Preliminaries and Related Work
Considering a graph , with is the set of nodes and is the adjacency matrix, in which denotes the edge between node and node . If there is a directed edge from node to node , then , otherwise it is 0. For ease of notation, we use to denote the neighbor set of node , i.e., the node set that connects to. We use to denote the normalized adjacency matrix with added self loops, with is the adjacency matrix of the graph with added selfconnections, and
is the identity matrix.
is the degree matrix of .Graph Convolutional Networks
For each node , we use
to denote the node initial embedding, which is usually the feature vector
of node (i.e, ). In a graph , the key idea of GCNs is to stack steps in a recursive message passing or feature propagation manner to learn node embeddings [Berg, Kipf, and Welling2018, Hamilton, Ying, and Leskovec2017, Gilmer et al.2017]. Specifically, for each node at the step, it is computed recursively with following two steps: feature propagation and nonlinear feature transformation.Feature propagation. For each node , the feature aggregation step aggregates the embeddings from graph neighbors and its own embedding at previous layer .
Earlier works focus on how to model the aggregation functions [Hamilton, Ying, and Leskovec2017, Gilmer et al.2017, Velickovic et al.2018, Kipf and Welling2017]. As the focus of this paper is not to design more sophisticated feature aggregation function, we follow the widely used feature aggregation function proposed in Kipf et al. [Kipf and Welling2017], which is empirically effective and has been adopted by many GCN variants [Kipf and Welling2017, Berg, Kipf, and Welling2018, Wu et al.2019a]:
(1) 
In fact, given the features at th layer, feature propagation output layer can be regarded as the Laplacian smoothing on the features at the previous layer [Li, Han, and Wu2018, Zhu, Ghahramani, and Lafferty2003].
Nonlinear transformation.
The nonlinear transformation layer is a standard Multilayer Perceptron (MLP). By feeding the output of the feature propagation step, the nonlinear transformation produces the
th layer embedding of each node as:(2) 
where
is a nonlinear activation function.
After iteratively performing the two steps in each layer with a defined depth , the final embedding of each node at depth is . For most GCN based applications, there is a prediction function as:
(3) 
As GCNs derive inspiration primarily from the CNNs in the deep learning community, it inherits considerable nonlinearity and complexity from the nonlinear transformations as shown in Eq.(
2). Researchers exploit the possibility of simplifying GCNs. Recently, a Simple Graph Convolution (SGC) is proposed [Wu et al.2019a], which removes the nonlinear transformation in Eq.(2) as:(4) 
where we can rewrite as a single matrix , and the above linear matrix multiplication turns to:
(5) 
With the formulation of SGC, GCNs reduce to the iterative simple feature propagations with very few parameters. Therefore, it is easy to tune and scales to large datasets. As verified by researchers, SGCN corresponds to a fixed low pass filter on graph spectral domain. Besides, the empirical evaluations show that SGCN does not negatively impact accuracy in many graph based tasks with huge time improvement [Wu et al.2019a].
Graph Convolutional based Recommendation
In a recommender system, there are two sets of entities: a userset with users () and an itemset (). As implicit feedback is the most common form in many recommender systems, we focus on implicit feedback based CF in this paper [Rendle et al.2009], and it is easy to extend the proposed model for rating prediction in CF. Users show ratings to the items with a rating matrix , with denotes user likes item , otherwise it equals 0. With the rating matrix, accurately learning user embedding matrix and item embedding matrix is a key to the success of recommendation performance. Earlier works focus on shallow matrix factorization based models [Koren, Bell, and Volinsky2009, Rendle et al.2009]. Deep learning based models, e.g., NeuMF [He et al.2017], and Wide&Deep [Cheng et al.2016] modeled the interaction between users and items with a deep neural network structure.
With the huge success of GCNs, researchers attempted to formulate recommendation as a useritem bipartite graph, and adapted GCNs for recommendation [Wang et al.2019, Monti, Bronstein, and Bresson2017, Ying et al.2018]. Earlier works on GCN based models relied on the spectral theories of graphs, and are computationally costly when applying in realworld recommendations [Monti, Bronstein, and Bresson2017, Zheng et al.2018]. Some of recent works on GCN based recommendation models focused on the spatial domain [Wu, Liu, and Yang2018, Wang et al.2019, Berg, Kipf, and Welling2018, Ying et al.2018]. PinSage was designed for similar item recommendation under the content based model, with the item features and the itemitem correlation graph as the inputs [Ying et al.2018] . GCMC [Berg, Kipf, and Welling2018] and NGCF [Wang et al.2019] are specifically designed under the CF setting. Given ratings of users to items, the useritem bipartite graph is denoted as , with is constructed from the rating matrix as:
(6) 
Let denote the free embedding matrix of users and items. By feeding the free embedding matrix into GCNs with bipartite graph , i.e., . Then, GCNs iteratively perform with embedding propagation step in Eq.(1) and nonlinear transformation with Eq.(2) and each user’s (item’s) embeddings can be updated in the iterative process. Therefore, the final embedding explicitly injects the up to th order collective connections between users and items. All the parameters (including the initial free embedding matrix , the transformation parameters ()) can be learned in an endtoend manner. GCMC could be seen a special case of NGCF with , i.e., only the first order connectivity of the useritem bipartite graph is modeled [Berg, Kipf, and Welling2018].
Deep Network Architecture Design
Theoretically, deep neural networks could approximate complex functions [Goodfellow, Bengio, and Courville2016]
. However, many researchers found stacking deeper layers in the network usually would not correspondingly increase performance in practice. For example, in the computer vision domain, directly stacking more layers in CNNs would complex the model training process, which leads to degradation of the image classification performance. For example, many CNNs variants have been proposed to how to stack more deep layers to improve image classification performance.
[He et al.2016, Huang et al.2017]. Researchers argued that the degradation of the deeper layers in CNNs is not caused by overfitting, but the harder training process with higher training error compared to the relatively shallower models. Therefore, a deep residual learning framework, i.e., ResNet, is proposed to reformulate the layers as learning residual functions, which is easier to train compared to directly learning original functions [He et al.2016]. In CF based recommender systems, simply relying on the deep neural networks would also not perform well due to the sparseness of user behavior data. Therefore, many deep learning based CF models have two parts: a shallow wide part and a deep neural network part, such as NeuMF [He et al.2017] and Wide&Deep [Cheng et al.2016]. The deep architecture design problem also exists in GCN variants. For example, many GCN based models achieve the best performance with layer depth of 2 [Hamilton, Ying, and Leskovec2017, Wu et al.2019b]. As the local network structure varies from node to node, researchers proposed to aggregate all layer representations at the last layer [Xu et al.2018], or allowed the root node teleport to the later layers [Klicpera, Bojchevski, and Günnemann2019]. In order to overcome the limitations of GCN models with limited labeled data, cotraining and selftraining approaches are proposed to train GCNs to supplement sparse labeled data [Li, Han, and Wu2018]. We differ from these works on two aspects. First, our model is based on a GCN with linear structure compared to these nonlinear GCNs. Moreover, our proposed architecture is concerned with how to better preserve the previous layer information with a residual network structure.Linear Residual Graph Convolutional Collaborative Filtering
Overall Structure of the Proposed Model
In this part, we propose Linear Residual Graph Convolutional Collaborative Filtering (LRGCCF) which is a general GCN based CF model for recommendation. The overall architecture of LRGCCF is shown in Figure 1. LRGCCF advances current GCN based models with two characteristics: (1) At each layer of the feature propagation step, we use a simple linear embedding propagation without any nonlinear transformations. (2) For predicting users’ preferences of items, we propose a residual based network structure to overcome the limitations of previous works.
Linear Embedding Propagation
Given the useritem bipartite graph as formulated in Eq.(6), let denotes the free embeddings of users and items, with the first rows of the matrix, i.e., is the user embedding submatrix, and is the item embedding submatrix. Then, LRGCCF takes the embedding matrix as input:
(7) 
which resembles the embedding based models in CF. Notably, different from GCN based tasks with node features as fixed input data, the embedding matrix is unknown and needs to be trained in LRGCCF.
Following the theoretical elegance with graph spectral connections and empirical competing results of SGCN [Wu et al.2019a], at each iteration step , we assume the embedding matrix is a linear aggregation of the embedding matrix at the previous layer as:
(8) 
where denotes the normalized adjacency matrix with added self loops, is the linear transformation.
In fact, Eq.(8) with matrix form is equivalent to modeling each user ’s and each item ’s updated embedding as:
(9)  
(10) 
which () is the diagonal degree of item (user ) in the useritem bipartite graph . is neighbors of node () in graph .
Residual Preference Prediction
With a predefined depth , the recursive linear embedding propagation would stop at the th layer with output of the embedding matrix . For each user (item), () captures the up to Kth order bipartite graph similarity. Then, many embedding based recommendation models would predict the preference as the inner product between user and item latent vectors as:
(11) 
where denotes vector inner product operation.
In practice, most GCN based variants, as well as GCN based recommendation models, achieve the best performance with [Kipf and Welling2017, Hamilton, Ying, and Leskovec2017, Ying et al.2018]. The overall trend for these GCN variants is that: the performance increases as increases from 0 to 1 (2), and drops quickly as continues to increase, the performance drops quickly. We speculate a possible reason is that, at the th layer, the embedding of each node is smoothed by the kth order neighbors in the bipartite graph. Therefore, as increases from 0 to , the node embeddings at deeper layers tend to be over smoothed, i.e., they are more similar with less distinctive information. This problem not only exists in GCNs, but is much more severe in CF with very sparse user behavior data for model learning. To validate the over assumption, we show the performance of GCN based recommendation with useritem bipartite graph using the predicted function in Eq.(11) with different depth . When , the GCN based recommendation model degenerates to BPR [Rendle et al.2009]. To empirically show the over smoothing hypothesis, with each value of
, we calculate the average pairwise useruser (itemitem) embedding similarity with cosine similarity at the
th layer output. Specifically, for each pair of user and user , their similarity is calculated as:. Then, we plot the mean and variance of the cosine similarity of all pairs in Figure
2, with the recommendation performance is listed at the bottom. We have two observations from this figure. First, the variance between user (item) embeddings are smaller when increases, due to the fact of the up to th order smoothness with neighborhood regularization. Second, when , the recommendation performance is rather good. As we increase from 0 to 2, the performance increases less than 10%. Therefore, we empirically conclude that BPR () could already approximate preference of user to a large extent.Based on the above two observations, we argue that: instead of directly approximating the user preference of each useritem pair at each layer, we perform the residual preference learning as:
(12) 
We hypothesis that it is easier to optimize the residual rating than to optimize the original rating, and the residual learning could help to alleviate the over smoothing effect with deeper layers.
Based on the residual preference prediction in above Eq.(12), we have:
(13) 
The above equation is equivalent to concatenate embedding of each layer to form the final embedding of each node. This is quite reasonable as each node’s subgraph varies, and recording each layer’s representation to form the final embedding of each node is more informative.
Model Learning
By putting the linear embedding propagation equation (Eq.(8)) into vector representation of the residual prediction function (Eq.(Residual Preference Prediction)), we have:
(14) 
where is reparameterized as with linear multiplication. denotes the th power of .
Since we focus on implicit feedbacks, we adopt the pairwise ranking based loss function in BPR as:
(15) 
where
is a sigmoid function.
, with , and . is a regularization parameter that controls the complexity of user and item free embedding matrices. denotes the pairwise training data for with represents the itemset that positively shows feedback.Model Discussion
Detailed Analysis of The Proposed Model. Based on the prediction function in Eq.(Model Learning), we observe that LRGCCF is not a deep neural network but a wide linear model. The linearization has several advantages: First, as LRGCCF is built on the recent progress of SGC [Wu et al.2019a]
, it is theoretically connected as a low pass filter of graph on the spectral domain. Second, with the linear embedding propagation and residual preference learning, LRGCCF is much easier to train compared to nonlinear GCN based models. Last but not least, as our model does not have any hidden layers compared to deep learning based models, we do not need back propagation training algorithms. Instead, we could resort to the stochastic gradient descent for model learning. Therefore, LRGCCF is much more time efficient compared to classical GCN based models.
Model  Graph Structure  Model Property  

First  Higher  Linear  Residual  
order  order  Propagation  Prediction  
GCMC  
Pinsage  
NGCF  
LRGCCF 
Connections with Previous Works. We compare the key characteristics of our proposed model with three closely related GCN based recommendation models: GCMC [Berg, Kipf, and Welling2018], PinSage [Ying et al.2018], and NGCF [Wang et al.2019]. In Table 1, NGCF is one of the first few attempts that also uses a residual prediction function by taking each user (item)’s embedding as a concatenation of all layers’ embeddings. However, the authors simply use this “trick” without any detailed explanation. We empirically show the reason why taking the output of the last layer embedding fails for CF, and shows using residual prediction is equivalent to concatenate all the layer’s embeddings as the final embedding of each node in the useritem bipartite graph. For PinSage, it has lower time complexity compared to its deep learning based counterparts (e.g., GCMC and NGCF) as this model designed a sampling technique in feature aggregation process.
Experiments
Experimental Setup
Datasets. We conduct experiments on two publicly available datasets: Amazon Books ^{1}^{1}1http://jmcauley.ucsd.edu/data/amazon/index.html and Gowalla [Liang et al.2016]. We summarize the statistics of two datasets in Table 2. In data preprocessing step, we remove users (items) that have less than 10 interaction records. After that, we randomly select 80% of the records for training, 10% for validation and the remaining 10% for test.
Dataset  Users  Items  Ratings  Rating Density 

Amazon Books  52,643  91,599  2,984,108  0.062% 
Gowalla  29,859  40,981  1,027,370  0.084% 
Evaluation Metrics and Baselines. Since we focus on recommending items to users, we use two widely adopted ranking metrics for topN recommendation evaluation: HR@N and NDCG@N [Chen et al.2017]. For each user, we select all unrated items as the negative items and combine them with the positive items the user likes in the ranking process. We compare our proposed LRGCCF model with various stateoftheart baselines, including the classical model BPR [Rendle et al.2009], three graph convolutional based recommendation models: GCMC [van den Berg, Kipf, and Welling2017], PinSage [Ying et al.2018], and NGCF [Wang et al.2019]. NGCF differs from PinSage as it adopts the residual learning process. Besides, in order to better verify the effectiveness of the linear and the residual learning part, we design two variants of the GCMC: LinearGCMC (LGCMC), and ResudialGCMC (RGCMC), with L denotes replacing the original nonlinear transformation with linear embedding propagation, and R denotes the preference prediction. For the baseline of NGCF, as illustrated in Table1, it adopts the residual preference learning, and when varying the nonlinear embedding propagation to linear propagation, i.e., LNGCF is the same as LRGCCF, so we do not design variants of NGCF. For our proposed model LRGCCF, we design a simplified version of LinearGCCF (LGCCF). In LGCCF, we remove the residual learning process.
Parameter Settings.
We implement our LRGCCF model in Pytorch. There are two important parameters in our proposed model: the dimension D of the user and item embedding matrix
, and the regularization parameter in the objective function (Eq.15). The embedding size is fixed to 64 for all models. In our proposed LRGCCF model, we try the regularization parameter in the range , and findreaches the best performance. We initialize the model parameters with a Gaussian distribution of mean 0 and standard deviation 0.01. There are several parameters in the baselines, for fair comparison, all the parameters in the baselines are also tuned to achieve the best performance. For our proposed model, we empirically find that
equals the identity matrix, i.e., each parameter in is not learned but directly set as the identity matrix reaches the best performance.Overall Comparison
Models  N=10  N=20  N=30  N=40  N=50  

HR  NDCG  HR  NDCG  HR  NDCG  HR  NDCG  HR  NDCG  
BPR  0.01851  0.01710  0.02853  0.02169  0.03821  0.02564  0.04737  0.02911  0.05556  0.03205 
GCMC  0.02063  0.01898  0.03196  0.02408  0.04242  0.02835  0.05226  0.03206  0.06133  0.03532 
PinSage  0.02043  0.01872  0.03210  0.02404  0.04298  0.02844  0.05239  0.03199  0.06165  0.03529 
NGCF  0.02071  0.01892  0.03244  0.02425  0.04343  0.02872  0.05329  0.03243  0.06263  0.03576 
LGCMC  0.02092  0.01916  0.03248  0.02443  0.04355  0.02894  0.05394  0.03286  0.06335  0.03623 
RGCMC  0.01962  0.01796  0.03084  0.02307  0.04153  0.02742  0.05139  0.03115  0.06032  0.03434 
LGCCF  0.02067  0.01909  0.03200  0.02424  0.04312  0.02876  0.05310  0.03254  0.06218  0.03579 
LRGCCF  0.02209  0.02040  0.03407  0.02583  0.04532  0.03039  0.05532  0.03416  0.06498  0.03761 
Models  N=10  N=20  N=30  N=40  N=50  

HR  NDCG  HR  NDCG  HR  NDCG  HR  NDCG  HR  NDCG  
BPR  0.1041  0.1011  0.1378  0.1126  0.1664  0.1221  0.1908  0.1299  0.2122  0.1365 
GCMC  0.1042  0.1010  0.1388  0.1127  0.1701  0.1222  0.1969  0.1307  0.2213  0.1381 
PinSage  0.1057  0.1042  0.1390  0.1153  0.1682  0.1250  0.1935  0.1330  0.2146  0.1395 
NGCF  0.1083  0.1094  0.1403  0.1197  0.1679  0.1288  0.1931  0.1368  0.2142  0.1432 
LGCMC  0.1045  0.1010  0.1399  0.1132  0.1701  0.1234  0.1957  0.1316  0.2184  0.1386 
RGCMC  0.1034  0.1000  0.1391  0.1123  0.1690  0.1224  0.1941  0.1305  0.2163  0.1373 
LGCCF  0.1044  0.1007  0.1412  0.1135  0.1721  0.1240  0.1977  0.1322  0.2196  0.1390 
LRGCCF  0.1148  0.1136  0.1518  0.1259  0.1836  0.1365  0.2113  0.1453  0.2355  0.1527 
Table 3 and Table 4 report the overall performance comparison results on HR@N and NDCG@N. GCMC, PinSage, and NGCF improve over BPR by leveraging the useritem bipartite graph information. In particular, GCMC and PinSage show the effectiveness of modeling the information passing of a graph. NGCF is the baseline that captures higherorder useritem bipartite graph structure. It performs better than most baselines. Our proposed LRGCCF model consistently outperforms NGCF, thus showing the effectiveness of modeling the user preference by the residual preference prediction and the linear embedding propagation.
In our proposed LRGCCF, the linear embedding propagation and residual preference learning are essential parts. To gain the effectiveness of these parts, we study the performance of the variants of baselines and our simplified model of LGCMC. We first analyze the performance of the linear embedding propagation by comparing the linear embedding based models with the counterparts that use nonlinear embeddings, i.e., LGCMC vs. GCMC. We find LGCMC outperforms GCMC to a large margin, and similar trends exist when comparing LRGCCF and NGCF, empirically showing the effectiveness of the linear embedding propagation compared to the nonlinear embedding propagation for GCN based recommendations. Next, we compare the performance of residual learning by comparing the results of RGCMC vs GCMC, the results of NGCF vs. PinSage, and the results of LRGCCF and LGCCF. RGCMC does not show comparable performance as GCMC, we guess a possible reason is that GCMC is based on the firstorder neighborhood aggregation. For the firstorder neighborhood, each neighbor has limited neighbors and the over smoothing effect does not apply with firstorder neighbors. With deep layers, the over smoothing effect becomes more severe. Thus, NGCF outperforms PinSage, and LRGCCF outperforms LGCCF when modeling higherorder graph structure with residual learning. Last but not least, by combing the linear propagation and the residual learning together in LRGCCF, the proposed model outperforms all the remaining models, showing the effectiveness of fusing these two parts for CF.
Instead of the nonlinear transformation of feature propagation, our work differs from these works in a linearization method to accelerate the training process at the same time. In practice, we find that LRGCCF is very easy to train. On Amazon Books dataset, with the best depth for each graph based recommendation model, at each iteration, the average runtime is about 30s for GCMC (=1), and 38s for PinSage (=2) and NGCF (=2), and about 20s for our proposed LRGCCF (=4) on a Ubuntu server with a single GTX 1080Ti. With larger Kth order graph embedding propagations, LRGCCF costs less time with the linear embedding propagation. The runtime time on the Gowalla dataset for each model is about one third of the time compared to the time cost of the Amazon Books, and the overall trend of the time comparison is similar as analyzed above.
Depth K  Amazon Books  Gowalla  

HR@20  NDCG@20  HR@20  NDCG@20  
K=0  0.0285  0.0217  0.1378  0.1126 
K=1  0.0317  0.0242  0.1504  0.1246 
K=2  0.0327  0.0248  0.1506  0.1248 
K=3  0.0337  0.0255  0.1518  0.1259 
K=4  0.0341  0.0258  0.1494  0.1241 
K=5  0.0340  0.0257  0.1504  0.1247 
Detailed Model Analysis
We would analyze the influence of the recursive label propagation depth , and a detailed analysis of the learned embeddings of the residual preference prediction in LRGCCF.
Table 5 shows the results on LRGCCF with different K values. Particularly, the layerwise propagation part disappears when =0, i.e., our proposed model degenerates to BPR. As can be observed from Table 5, when K increase from 0 to 1, the performance increases quickly on both datasets. For Amazon Books, the best performance reaches with four propagation depth. Meanwhile, our model reaches the best performance when =3 on Gowalla.
In order to better show the effect of residual preference prediction, we design a simplified version of our proposed model that only removes the residual structure in our proposed model. We call the simplified model as LGCCF. For LGCCF and LRGCCF, with each predefined depth , we calculate the cosine similarity of each pair of users (items) between their Kth layer output embedding, i.e., for each node of the graph. The statistics of the mean and variance of useruser (itemitem) embedding similarities are shown in Figure 3. It obviously shows our proposed model has larger variance of the useruser cosine similarity compared to its counterparts LGCCF that does not perform residual learning. This empirically validates that the residual learning could partially alleviate the over smoothing issue, and achieves better performance. Please note that, the overall trend on the Gowalla dataset is similar, and we do not show it due to page limit.
Conclusions
In this paper, we revisited the current GCN based recommendation models, and proposed a LRGCCF model for CF based recommendation. LRGCCF was mainly composed of two parts: First, with the recent progress of simple GCNs, we empirically removed the nonlinear transformations in GCNs, and replaced it with linear embedding propagations. Second, to reduce the over smoothing effect introduced by higher layers of graph convolutions, we designed a residual preference prediction part with a residual preference learning process at each layer. Extensive experimental results clearly showed the effectiveness and efficiency of our proposed model. In the future, we would like to explore how to better integrate the representations of different layers with well defined deep neural architectures for better enhancing CF based recommendation.
Acknowledgments.
This work was supported in part by grants from the National Key Research and Development Program of China (2018YFB0804205), the National Natural Science Foundation of China (Grant No. 61725203, 61972125, 61602147, 61932009, 61732008, 61722204), and Zhejiang Lab (No.2019KE0AB04).
References
 [Berg, Kipf, and Welling2018] Berg, R. v. d.; Kipf, T. N.; and Welling, M. 2018. Graph convolutional matrix completion. In KDD Workshop.
 [CampsValls, Marsheva, and Zhou2007] CampsValls, G.; Marsheva, T. V. B.; and Zhou, D. 2007. Semisupervised graphbased hyperspectral image classification. TGRS 45(10):3044–3054.
 [Chen et al.2017] Chen, J.; Zhang, H.; He, X.; Nie, L.; Liu, W.; and Chua, T.S. 2017. Attentive collaborative filtering: Multimedia recommendation with itemand componentlevel attention. In SIGIR, 335–344.
 [Cheng et al.2016] Cheng, H.T.; Koc, L.; Harmsen, J.; Shaked, T.; Chandra, T.; Aradhye, H.; Anderson, G.; Corrado, G.; Chai, W.; Ispir, M.; et al. 2016. Wide & deep learning for recommender systems. In Recsys workshop, 7–10.
 [Gilmer et al.2017] Gilmer, J.; Schoenholz, S. S.; Riley, P. F.; Vinyals, O.; and Dahl, G. E. 2017. Neural message passing for quantum chemistry. In ICML, 1263–1272.
 [Goodfellow, Bengio, and Courville2016] Goodfellow, I.; Bengio, Y.; and Courville, A. 2016. Deep learning. MIT press.
 [Gu, Zhou, and Ding2010] Gu, Q.; Zhou, J.; and Ding, C. 2010. Collaborative filtering: Weighted nonnegative matrix factorization incorporating user and item graphs. In SDM, 199–210.
 [Hamilton, Ying, and Leskovec2017] Hamilton, W.; Ying, Z.; and Leskovec, J. 2017. Inductive representation learning on large graphs. In NIPS, 1024–1034.
 [He et al.2016] He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In CVPR, 770–778.
 [He et al.2017] He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; and Chua, T.S. 2017. Neural collaborative filtering. In WWW, 173–182.
 [Huang et al.2017] Huang, G.; Liu, Z.; Van Der Maaten, L.; and Weinberger, K. Q. 2017. Densely connected convolutional networks. In CVPR, 4700–4708.
 [Huang, Chung, and Chen2004] Huang, Z.; Chung, W.; and Chen, H. 2004. A graph model for ecommerce recommender systems. JASIST 55(3):259–274.
 [Kipf and Welling2017] Kipf, T. N., and Welling, M. 2017. Semisupervised classification with graph convolutional networks. In ICLR.
 [Klicpera, Bojchevski, and Günnemann2019] Klicpera, J.; Bojchevski, A.; and Günnemann, S. 2019. Predict then propagate: Graph neural networks meet personalized pagerank. ICLR.
 [Koren, Bell, and Volinsky2009] Koren, Y.; Bell, R.; and Volinsky, C. 2009. Matrix factorization techniques for recommender systems. Computer 42(8):30–37.
 [Li et al.2016] Li, D.; Hung, W.C.; Huang, J.B.; Wang, S.; Ahuja, N.; and Yang, M.H. 2016. Unsupervised visual representation learning by graphbased consistent constraints. In ECCV, 678–694.

[Li, Han, and Wu2018]
Li, Q.; Han, Z.; and Wu, X.M.
2018.
Deeper insights into graph convolutional networks for semisupervised learning.
In AAAI.  [Liang et al.2016] Liang, D.; Charlin, L.; McInerney, J.; and Blei, D. M. 2016. Modeling user exposure in recommendation. In WWW, 951–961.
 [Liu and Yang2008] Liu, N. N., and Yang, Q. 2008. Eigenrank: a rankingoriented approach to collaborative filtering. In SIGIR, 83–90.
 [Monti, Bronstein, and Bresson2017] Monti, F.; Bronstein, M.; and Bresson, X. 2017. Geometric matrix completion with recurrent multigraph neural networks. In NIPS, 3697–3707.
 [Rendle et al.2009] Rendle, S.; Freudenthaler, C.; Gantner, Z.; and SchmidtThieme, L. 2009. Bpr: Bayesian personalized ranking from implicit feedback. In UAI, 452–461.
 [van den Berg, Kipf, and Welling2017] van den Berg, R.; Kipf, T. N.; and Welling, M. 2017. Graph convolutional matrix completion. KDD.
 [Velickovic et al.2018] Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; and Bengio, Y. 2018. Graph attention networks. In ICLR.
 [Wang et al.2019] Wang, X.; He, X.; Wang, M.; Feng, F.; and Chua, T.S. 2019. Neural graph collaborative filtering. In SIGIR.
 [Wu et al.2016] Wu, L.; Liu, Q.; Chen, E.; Yuan, N. J.; Guo, G.; and Xie, X. 2016. Relevance meets coverage: A unified framework to generate diversified recommendations. TIST 7(3):39.
 [Wu et al.2017] Wu, L.; Ge, Y.; Liu, Q.; Chen, E.; Hong, R.; Du, J.; and Wang, M. 2017. Modeling the evolution of users’ preferences and social links in social networking services. TKDE 29(6):1240–1253.
 [Wu et al.2019a] Wu, F.; Zhang, T.; Souza Jr, A. H. d.; Fifty, C.; Yu, T.; and Weinberger, K. Q. 2019a. Simplifying graph convolutional networks. In ICML, 6861–6871.
 [Wu et al.2019b] Wu, L.; Sun, P.; Fu, Y.; Hong, R.; Wang, X.; and Wang, M. 2019b. A neural influence diffusion model for social recommendation. In SIGIR, 235–244.
 [Wu, Liu, and Yang2018] Wu, Y.; Liu, H.; and Yang, Y. 2018. Graph convolutional matrix completion for bipartite edge prediction. In KDIR, 51–60.
 [Wu, Shen, and Van Den Hengel2019] Wu, Z.; Shen, C.; and Van Den Hengel, A. 2019. Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognition 90:119–133.
 [Xu et al.2018] Xu, K.; Li, C.; Tian, Y.; Sonobe, T.; Kawarabayashi, K.i.; and Jegelka, S. 2018. Representation learning on graphs with jumping knowledge networks. In ICML.
 [Xu et al.2019a] Xu, F.; Lian, J.; Han, Z.; Li, Y.; Xu, Y.; and Xie, X. 2019a. Relationaware graph convolutional networks for agentinitiated social ecommerce recommendation. In CIKM, 529–538.
 [Xu et al.2019b] Xu, K.; Hu, W.; Leskovec, J.; and Jegelka, S. 2019b. How powerful are graph neural networks? In ICLR.

[Ying et al.2018]
Ying, R.; He, R.; Chen, K.; Eksombatchai, P.; Hamilton, W. L.; and Leskovec, J.
2018.
Graph convolutional neural networks for webscale recommender systems.
In SIGKDD, 974–983.  [Zhao et al.2019] Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; and Li, H. 2019. Tgcn: A temporal graph convolutional network for traffic prediction. TITS.
 [Zheng et al.2018] Zheng, L.; Lu, C.T.; Jiang, F.; Zhang, J.; and Yu, P. S. 2018. Spectral collaborative filtering. In RecSys, 311–319.
 [Zhu, Ghahramani, and Lafferty2003] Zhu, X.; Ghahramani, Z.; and Lafferty, J. D. 2003. Semisupervised learning using gaussian fields and harmonic functions. In ICML, 912–919.
Comments
There are no comments yet.