1 Introduction
Graph Representation Learning (GRL) presents a very promising direction in terms of machine learning on graphs
[6, 13, 2, 28]. A central idea in GRL is to represent each node in the graph as a vector of floatingpoint numbers to capture some desired properties of a node with respect to the graph. For node2vec
[6], this property is the neighborhood of the node; for GraphSage[7], it is about capturing the feature neighborhood of a node based on a selected neighbor set; for Graph Convolutional Network (GCN) [13], the purpose of a node embedding is to capture both feature and neighborhood similarity. These methods have been profoundly useful in several domains such as bioinspired machine learning and genomics [5, 15, 30], spam detection [19, 20][21, 31, 32, 18] and recommendation systems [27, 33, 9, 24, 34].In recommendation systems, GRL has been applied to further advance collaborative filtering algorithms by considering multihop relationships between users and items [27]. The authors in [27] further proposed the notions of message dropout and node dropout to reduce overfitting in GCN like methods. In a followup study [9], it was demonstrated that simplifying GCN network by reducing nonlinearity from the network can give a boost to the performance of these higherorder methods. Their work also corresponded with a similar study done in [29]
where the authors argued that for GCN, even after removing nonlinearity and collapsing weight matrices into a single one, the performance does not degrade in downstream tasks. The research carried out in the above papers compares several stateoftheart methods to the proposed methods and shows that the simplicity of models is leading to higher performance, credited to better generalization of the models. Motivated by these studies, we set out to benchmark a simple SVD based approach in this paper on the recommendation systems problem to understand if further simplicity of the modelling approach can improve the performance metrics. In the proposed method, we first generate user and item embeddings using SVD of the adjacency matrix of the useritem interaction graph and then employ a twolayer neural network with these embeddings as inputs to estimate the relevance of an item to a user.
Given the success of multihop graph neural network models in previous studies, we augment the simple SVD method to consider a twohop adjacency matrix for generating the embeddings and found that this method outperforms the simpler onehop SVD method as well. Empirical results on three public datasets demonstrate that the performance of the proposed methods is indeed comparable to stateofart approaches, and these methods beat many of them despite their simplicity. For two out of three datasets, the methods even outperform all compared approaches and effectively establish new stateoftheart performance with the margin of improvement as much as 10%.
The rest of the paper is divided into three sections: Section 2 describes the proposed methods, Section 3 contains the empirical experiments, and Section 4 provides the conclusion and future work.
2 Proposed Methods
In this section, we elaborate on the proposed methods. We first discuss the SVD based baseline followed by an extension of the same by using twohop matrices. In the last part, we describe the loss function and model training. Before discussing the proposed methods, we list our notations in Table
1.Symbol  Definition 

Adjacency matrix between Users and Items  
Symmetric adjacency matrix between Users and Items  
Laplacian Normalization of matrix  
D  Degree Matrix of 
u  User 
i  Item 
Embedding of user generated from SVD of  
Embedding of item generated from SVD of  
Embedding of user outputted from the layer of perceptron model 

Embedding of item outputted from the layer of perceptron model  
Concatenation of : and (j = )  
Concatenation of : and ( j = ) 
2.1 Simple SVD Baseline
Matrix factorization is a wellstudied problem in linear algebra and has been extensively applied to recommendation systems [26, 16, 4, 3], typically in the form of collaborative filtering. In this paper, we propose a simple approach to generate user and item embeddings using Single Value Decomposition (SVD) [14] of the adjacency matrix between users and items. Using a twolayer perceptron model, we transform these embeddings in a supervised fashion to learn the relevance between the user and item pairs. We call this method Simple SVD Baseline (SSB).
To compute the SVD embeddings, we consider the adjacency matrix of the useritem interaction graph, . We first convert the asymmetric matrix to a symmetric matrix as follows:
We then compute a Laplacian Normalization of as discussed in [13]: , where is the Laplacian Normalized of adjacency matrix, , and is the degree matrix derived from .
We perform matrix factorization on using Truncated SVD on this normalized matrix to generate user embeddings () and item embeddings (), where the number of components in Truncated SVD correspond to the embedding dimension. We use Truncated SVD [8] since it has shown to be scalable on large matrices.
After generating these embeddings, we transform them through a twolayer perceptron model (
, as the activation function) and concatenate the output of both the layers of the perceptron model along with original SVD embedding to generate a user embedding (
) or an item embedding (). The intuition behind using the perceptron model is to allow supervised transformation of and to learn the relevance between user and item. Fig. 1 shows the model architecture. The dot product between and acts as the relevance score for the useritem pair and is optimized by the model through tuning of the weights of the twolayer perceptron model via backpropagation.2.2 TwoHop SVD Approach
Motivated by the success of multihop graph neural networks and the performance of the SSB approach on recommendation tasks, we attempt at joining both of these into a single method to capture higherorder relationships between users and items, similar to graph neural networks like GCN [13].
The overall model architecture remains the same as SSB, except for the change in how the and embeddings are computed. To compute an embedding that can capture the twohop signals, we compute the second power of the Laplacian Normalized adjacency matrix, , and then compute its Truncated SVD. We finally concatenate the embeddings from SVD of (corresponding to onehop neighborhood) and SVD of (corresponding to twohop neighborhood) to generate and embeddings for this approach. The embedding size of TSA is the size of the vector after this concatenation. Since this approach contains twohop signals from the graph, we denote this method as TwoHop SVD Approach (TSA).
2.3 Model Training
The learnable parameters in the proposed methods are only the weights of the multilayer perceptron model. To optimize the useritem relevance, we employ the Bayesian Personalized Ranking (BPR) loss [23] similar to [27]. It is a pairwise loss that encourages correct predictions on observed instances than on unobserved instances. We use the Adam optimizer [12]
in a minibatch setting, where the batch size is a hyperparameter.
3 Experiments
3.1 Datasets and Performance Metrics
We use the same three datasets (Gowalla, Yelp2018 and AmazonBook) as [27, 9] with the same train and test split in order to make a fair comparison with the already reported results. Table 2 summarizes the dataset statistics. We refer the reader to [27] for more details of the datasets. We evaluate the performance on mean NDCG@K and mean Recall@K per user for K=20. We keep K=20 to enable fair comparison with previous studies which use the same metrics [27, 9]. For the rest of the paper, we denote Recall@20 as Recall and NDCG@20 as NDCG. It should be noted that for both Recall and NDCG, the items retrieved for top20 are solely from the test partition of the dataset.
3.2 Hyperparameter Tuning
For the proposed methods, there are four key hyperparameters  SVD embedding size ( and
), batch size, learning rate and size of the multilayer perceptron. For this study, we fix the size of the multilayer perceptron to be 512 neurons each and keep the learning rate as
motivated by the experiments in [9]. We tune the SVD embedding size over the following set: . We keep the batch size as 1024 for Gowalla and Yelp2018, and 2048 for AmazonBook as used in the study of NGCF [27] and LightGCN [9].Dataset  #Users  #Items  #Interactions  Density 

Gowalla  29,858  40,981  1,027,370  0.00084 
Yelp2018  31,688  38,048  1,561,406  0.00130 
AmazonBook  52,643  91,599  2,984,108  0.00062 
3.3 Empirical Results
3.3.1 Comparison with stateoftheart methods
In this section, we report the performance metrics for the proposed methods  Simple SVD Baseline (SSB) and TwoHop SVD Approach (TSA). We benchmark the approach against NGCF [27], MultVAE[17], GRMF [22], LightGCN [9], MF [23], and NeuMF [10]. Although MF [23], and NeuMF [10] methods are relatively older methods to compare. However, we report their performance here as these are closely related to matrix factorization in the context of recommendation systems. LightGCN [9] is the stateoftheart method showing the best performance compared to all related approaches as shown in their paper. Table 3 shows the performance metrics for the proposed methods and the compared methods. We replicate the results of these approaches from the original papers of NGCF [27] and LightGCN [9]. We follow the same experimental methodology as stated in the papers and followed in the code and datasets made available by the authors of these studies to make a fair comparison.
Dataset  Gowalla  Yelp2018  AmazonBook  

Recall  NDCG  Recall  NDCG  Recall  NDCG  
MF  0.1291  0.1109  0.0433  0.0354  0.0250  0.0196 
NeuMF  0.1399  0.1212  0.0451  0.0363  0.0258  0.0200 
NGCF  0.157  0.1327  0.0579  0.0477  0.0344  0.0263 
MultVAE  0.1641  0.1335  0.0584  0.0450  0.0407  0.0315 
GRMF  0.1477  0.1205  0.0571  0.0462  0.0354  0.0270 
LightGCN  0.183  0.1554  0.0649  0.0530  0.0411  0.0315 
SBB ()  0.169  0.1401  0.0647  0.0534  0.0408  0.0325 
TSA ()  0.1704  0.1415  0.0657  0.0542  0.0456  0.0364 
We can observe that TSA performs considerably better for AmazonBook dataset than all the compared stateofart methods, including LightGCN and NGCF. The relative gain of TSA over LightGCN, which performs the best among the compared methods, is approximately 9.86% in Recall@20 and 13.4% in NDCG@20. The performance of SSB is also higher than all considered approaches in terms of NDCG and only 0.7% short of LightGCN and still better than all compared approaches.
We can see that for Yelp2018 dataset, TSA performed marginally better than LightGCN [9] in terms of both Recall ( 1% relatively) and NDCG ( 2% relatively). In contrast, SBB performs marginally lower than LightGCN [9] in terms of Recall but has a slightly higher NDCG. However, both SBB and TSA performs significantly better than all other compared approaches, including NGCF [27] which uses multihop relations among users to exploit higherorder signals for predicting relevance.
For the Gowalla dataset, the proposed methods perform poorly compared to LightGCN. There is 1.3% absolute difference in Recall and 1.5% absolute difference in NDCG. However, despite being fairly plain, the proposed approaches outperform all other approaches, including some which are inherently more complex such as MultVAE and NGCF.
We believe that the generalization power of the proposed methods has led to the improvement in performance in Yelp2018 and AmazonBook datasets. However, in the case of Gowalla, the method seems to be underfitting because the signals of SBB and TSA are only limited to onehop and twohop neighbors respectively. As shown in [9], as the number of layers of GCN is increased to four (equivalent to fourhops of neighborhood), the performance increases. We leave this aspect of experimentation to be addressed in future work.
3.3.2 Comparison of SBB and TSA
In this section, we point out the comparison between SBB and TSA approaches. In Table 3, we can observe that TSA always performs better than SBB, given the additional signal from twohop neighbors. Comparing the proposed methods, SBB and TSA, we only see a small relative increase in performance in Gowalla and Yelp2018 and a more significant uplift in the case of TSA for AmazonBook dataset. This is in line with the studies of LightGCN [9], and NGCF [27] where it was shown that as the number of layers is increased for the GCN model (which is equivalent to covering more hops in neighborhood), the performance increases in equivalent amount. We believe the improvement of metrics in TSA is the addition of signal from twohops away which is not present in SBB. Interestingly, for Yelp2018 and AmazonBook datasets, the twohop proposed approach, TSA, is able to outperform the fourhop approaches of LightGCN and NGCF.
SSB  TSA  

Training Loss  0.04477  0.03847  
Training Recall  0.14736  0.16262  
Training NDCG  0.25875  0.28243  

21.83s  503.04s 
Regarding training loss, we observe that SSB, on convergence, has a higher training loss than TSA. While TSA performs better than SSB, it does so at the expense of additional time to compute the twohop adjacency matrix; and the additional Truncated SVD on the twohop matrix. In Table 4, we summarize the comparison between SSB and TSA on the training metrics, while test performance is shown in Table 3. We run our experiments on Intel(R) Xeon(R) CPU E52690 v4 @ 2.60GHz with 6 cores and 110 Giga Bytes of RAM.
3.3.3 Impact of Embedding Size on Performance on TSA
Out of the four hyperparameters discussed, we only optimize the SVD based embedding size for users and items, which become the input to the multilayer perceptron. In this section, we will describe the observations on how the performance and training loss changes as we change the embedding dimensions for TSA, which performs better than SSB across datasets. We vary the embedding dimension as follows: . Fig. 2(a) shows the training loss for different embedding sizes for the TSA approach. As expected, it can be seen that as the embedding size increases, the loss decreases faster and also to a lower value. We observe a similar trend in test performance metrics, and Fig. 2(b) shows Recall@20 for the test set for different embedding sizes, and it can be seen that higher embedding sizes lead to better performance.
4 Conclusions and Future Work
In this paper, we started out to benchmark SVD based methods against the stateoftheart GRL methods. We propose two approaches for the same, and experiments on three realworld open datasets demonstrate that these methods are powerful enough to beat many GRL methods and even come out as stateoftheart themselves in two out three datasets. We observed the most significant relative gain of over 10% against the stateoftheart methods.
This particular work raises many research questions, and we envision the following future work. We plan on investigating how to generalize the approach from twohop to nhop since we saw in earlier research articles that with a higher order of neighborhood, we could expect better performance. There is also a need to propose an inductive version of these methods since transductive versions (as proposed in this paper), do not work with new nodes or new edges in the graph and would require frequent retraining in the current form. We also plan to investigate the aspects around the better implementation of SVD in big data frameworks such as Spark [1, 25]. This would help us further understand the time taken for the proposed methods for the training of models. We also want to explore how matrix factorization or SVD can be integrated with GRL to improve empirical performance and is it possible to extract the goodness from both methods and merge them. We also plan to work on data profiling for the proposed methods and existing literature to understand why one approach performs well on some datasets but does not perform equally well on other datasets. On the empirical investigation front, we also intend to benchmark these approaches on the Open Graph Benchmark [11]. We also plan to test out these approaches in tasks beyond recommendation systems such as graphbased formulations in NLP, social network modelling and graph applications in biology.
Through our work, we want to highlight that matrix factorization based methods still contribute as important baselines and should not be ignored in empirical benchmarking while making more advances in GRL or recommendation systems.
References
 [1] (2020) Twostep classification with svd preprocessing of distributed massive datasets in apache spark. Algorithms 13 (3), pp. 71. Cited by: §4.
 [2] (2017) Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263. Cited by: §1.
 [3] (2012) A hybrid onlineproduct recommendation system: combining implicit ratingbased collaborative filtering and sequential pattern analysis. electronic commerce research and applications 11 (4), pp. 309–317. Cited by: §2.1.
 [4] (2008) Feature weighting in content based recommendation system using social network analysis. In Proceedings of the 17th international conference on World Wide Web, pp. 1041–1042. Cited by: §2.1.
 [5] (2021) Predicting anticancer hyperfoods with graph convolutional networks. Human Genomics 15 (1), pp. 1–12. Cited by: §1.
 [6] (2016) Node2vec: scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855–864. Cited by: §1.
 [7] (2017) Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 1025–1035. Cited by: §1.
 [8] (1990) Truncated singular value decomposition solutions to discrete illposed problems with illdetermined numerical rank. SIAM Journal on Scientific and Statistical Computing 11 (3), pp. 503–518. Cited by: Figure 1, §2.1.
 [9] (2020) Lightgcn: simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pp. 639–648. Cited by: §1, §1, Figure 1, §3.1, §3.2, §3.3.1, §3.3.1, §3.3.1, §3.3.2, Table 3.
 [10] (2017) Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web, pp. 173–182. Cited by: §3.3.1.
 [11] (2020) Open graph benchmark: datasets for machine learning on graphs. arXiv preprint arXiv:2005.00687. Cited by: §4.
 [12] (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §2.3.
 [13] (2016) Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §1, §2.1, §2.2.
 [14] (1980) The singular value decomposition: its computation and some applications. IEEE Transactions on automatic control 25 (2), pp. 164–176. Cited by: §2.1.
 [15] (2021) Network machine learning maps phytochemically rich “hyperfoods” to fight covid19. Human Genomics 15 (1), pp. 1–11. Cited by: §1.
 [16] (2019) An improved collaborative filtering recommendation algorithm and recommendation strategy. Mobile Information Systems 2019. Cited by: §2.1.

[17]
(2018)
Variational autoencoders for collaborative filtering
. In Proceedings of the 2018 world wide web conference, pp. 689–698. Cited by: §3.3.1.  [18] (2019) Heterogeneous graph attention networks for semisupervised short text classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP), pp. 4823–4832. Cited by: §1.
 [19] (2018) Heterogeneous graph neural networks for malicious account detection. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 2077–2085. Cited by: §1.
 [20] (2020) Categoryaware graph neural networks for improving ecommerce review helpfulness prediction. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 2693–2700. Cited by: §1.
 [21] (2021) Hetegcn: heterogeneous graph convolutional networks for text classification. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp. 860–868. Cited by: §1.
 [22] (2015) Collaborative filtering with graph information: consistency and scalable methods. In Advances in Neural Information Processing Systems, pp. 2107–2115. Cited by: §3.3.1.
 [23] (2012) BPR: bayesian personalized ranking from implicit feedback. arXiv preprint arXiv:1205.2618. Cited by: §2.3, §3.3.1.

[24]
Multigraph convolutional neural networks for representation learning in recommendation
. Cited by: §1. 
[25]
(2015)
A sparkbased big data platform for massive remote sensing data processing.
In
International Conference on Data Science
, pp. 120–126. Cited by: §4.  [26] (1998) Clustering methods for collaborative filtering. In AAAI workshop on recommendation systems, Vol. 1, pp. 114–129. Cited by: §2.1.
 [27] (2019) Neural graph collaborative filtering. In SIGIR, Cited by: §1, §1, Figure 1, §2.3, §3.1, §3.2, §3.3.1, §3.3.1, §3.3.2, Table 3.
 [28] (2019) Heterogeneous graph attention network. In The World Wide Web Conference, pp. 2022–2032. Cited by: §1.
 [29] (2019) Simplifying graph convolutional networks. In International conference on machine learning, pp. 6861–6871. Cited by: §1.
 [30] (2021) BridgeDPI: a novel graph neural network for predicting drugprotein interactions. arXiv preprint arXiv:2101.12547. Cited by: §1.

[31]
(2019)
Graph convolutional networks for text classification.
In
Proceedings of the AAAI conference on artificial intelligence
, Vol. 33, pp. 7370–7377. Cited by: §1.  [32] (2020) Every document owns its structure: inductive text classification via graph neural networks. arXiv preprint arXiv:2004.13826. Cited by: §1.
 [33] (2017) Metagraph based recommendation fusion over heterogeneous information networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 635–644. Cited by: §1.
 [34] (2019) Tgcn: a temporal graph convolutional network for traffic prediction. IEEE Transactions on Intelligent Transportation Systems 21 (9), pp. 3848–3858. Cited by: §1.
Comments
There are no comments yet.