Revisiting SVD to generate powerful Node Embeddings for Recommendation Systems

10/05/2021
by   Amar Budhiraja, et al.
0

Graph Representation Learning (GRL) is an upcoming and promising area in recommendation systems. In this paper, we revisit the Singular Value Decomposition (SVD) of adjacency matrix for embedding generation of users and items and use a two-layer neural network on top of these embeddings to learn relevance between user-item pairs. Inspired by the success of higher-order learning in GRL, we further propose an extension of this method to include two-hop neighbors for SVD through the second order of the adjacency matrix and demonstrate improved performance compared with the simple SVD method which only uses one-hop neighbors. Empirical validation on three publicly available datasets of recommendation system demonstrates that the proposed methods, despite being simple, beat many state-of-the-art methods and for two of three datasets beats all of them up to a margin of 10 to shed light on the effectiveness of matrix factorization approaches, specifically SVD, in the deep learning era and show that these methods still contribute as important baselines in recommendation systems.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

07/17/2019

Block based Singular Value Decomposition approach to matrix factorization for recommender systems

With the abundance of data in recent years, interesting challenges are p...
10/23/2020

NGAT4Rec: Neighbor-Aware Graph Attention Network For Recommendation

Learning informative representations (aka. embeddings) of users and item...
04/21/2021

Accurate and fast matrix factorization for low-rank learning

In this paper we tackle two important challenges related to the accurate...
12/05/2012

Using Wikipedia to Boost SVD Recommender Systems

Singular Value Decomposition (SVD) has been used successfully in recent ...
10/03/2019

Quantum tensor singular value decomposition with applications to recommendation systems

In this paper, we present a quantum singular value decomposition algorit...
10/25/2021

NetMF+: Network Embedding Based on Fast and Effective Single-Pass Randomized Matrix Factorization

In this work, we propose NetMF+, a fast, memory-efficient, scalable, and...
11/04/2021

Continuous Encryption Functions for Security Over Networks

This paper presents a study of continuous encryption functions (CEFs) of...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Graph Representation Learning (GRL) presents a very promising direction in terms of machine learning on graphs

[6, 13, 2, 28]

. A central idea in GRL is to represent each node in the graph as a vector of floating-point numbers to capture some desired properties of a node with respect to the graph. For node2vec

[6], this property is the neighborhood of the node; for GraphSage[7], it is about capturing the feature neighborhood of a node based on a selected neighbor set; for Graph Convolutional Network (GCN) [13], the purpose of a node embedding is to capture both feature and neighborhood similarity. These methods have been profoundly useful in several domains such as bio-inspired machine learning and genomics [5, 15, 30], spam detection [19, 20]

, natural language processing

[21, 31, 32, 18] and recommendation systems [27, 33, 9, 24, 34].

In recommendation systems, GRL has been applied to further advance collaborative filtering algorithms by considering multi-hop relationships between users and items [27]. The authors in [27] further proposed the notions of message dropout and node dropout to reduce overfitting in GCN like methods. In a follow-up study [9], it was demonstrated that simplifying GCN network by reducing non-linearity from the network can give a boost to the performance of these higher-order methods. Their work also corresponded with a similar study done in [29]

where the authors argued that for GCN, even after removing non-linearity and collapsing weight matrices into a single one, the performance does not degrade in downstream tasks. The research carried out in the above papers compares several state-of-the-art methods to the proposed methods and shows that the simplicity of models is leading to higher performance, credited to better generalization of the models. Motivated by these studies, we set out to benchmark a simple SVD based approach in this paper on the recommendation systems problem to understand if further simplicity of the modelling approach can improve the performance metrics. In the proposed method, we first generate user and item embeddings using SVD of the adjacency matrix of the user-item interaction graph and then employ a two-layer neural network with these embeddings as inputs to estimate the relevance of an item to a user.

Given the success of multi-hop graph neural network models in previous studies, we augment the simple SVD method to consider a two-hop adjacency matrix for generating the embeddings and found that this method outperforms the simpler one-hop SVD method as well. Empirical results on three public datasets demonstrate that the performance of the proposed methods is indeed comparable to state-of-art approaches, and these methods beat many of them despite their simplicity. For two out of three datasets, the methods even outperform all compared approaches and effectively establish new state-of-the-art performance with the margin of improvement as much as 10%.

The rest of the paper is divided into three sections: Section 2 describes the proposed methods, Section 3 contains the empirical experiments, and Section 4 provides the conclusion and future work.

2 Proposed Methods

In this section, we elaborate on the proposed methods. We first discuss the SVD based baseline followed by an extension of the same by using two-hop matrices. In the last part, we describe the loss function and model training. Before discussing the proposed methods, we list our notations in Table

1.

Symbol Definition
Adjacency matrix between Users and Items
Symmetric adjacency matrix between Users and Items
Laplacian Normalization of matrix
D Degree Matrix of
u User
i Item
Embedding of user generated from SVD of
Embedding of item generated from SVD of
Embedding of user outputted from the

layer of perceptron model

Embedding of item outputted from the layer of perceptron model
Concatenation of : and (j = )
Concatenation of : and ( j = )
Table 1: Symbols and their meaning used in the paper.

2.1 Simple SVD Baseline

Matrix factorization is a well-studied problem in linear algebra and has been extensively applied to recommendation systems [26, 16, 4, 3], typically in the form of collaborative filtering. In this paper, we propose a simple approach to generate user and item embeddings using Single Value Decomposition (SVD) [14] of the adjacency matrix between users and items. Using a two-layer perceptron model, we transform these embeddings in a supervised fashion to learn the relevance between the user and item pairs. We call this method Simple SVD Baseline (SSB).

To compute the SVD embeddings, we consider the adjacency matrix of the user-item interaction graph, . We first convert the asymmetric matrix to a symmetric matrix as follows:

Figure 1: Model architecture for SSB. and are the user embedding and item embedding respectively generated from the Truncated SVD [8] of . and are the embedding outputs from the first and second layers of MLP respectively. Finally, , and are concatenated together to form the user embedding . The item embedding is constructed in the same way as . Dot product between and

is used the score for the user-item pair and the same is optimized through backpropagation using pairwise BPR loss similar to the previous studies

[27, 9].

We then compute a Laplacian Normalization of as discussed in [13]: , where is the Laplacian Normalized of adjacency matrix, , and is the degree matrix derived from .

We perform matrix factorization on using Truncated SVD on this normalized matrix to generate user embeddings () and item embeddings (), where the number of components in Truncated SVD correspond to the embedding dimension. We use Truncated SVD [8] since it has shown to be scalable on large matrices.

After generating these embeddings, we transform them through a two-layer perceptron model (

, as the activation function) and concatenate the output of both the layers of the perceptron model along with original SVD embedding to generate a user embedding (

) or an item embedding (). The intuition behind using the perceptron model is to allow supervised transformation of and to learn the relevance between user and item. Fig. 1 shows the model architecture. The dot product between and acts as the relevance score for the user-item pair and is optimized by the model through tuning of the weights of the two-layer perceptron model via back-propagation.

2.2 Two-Hop SVD Approach

Motivated by the success of multi-hop graph neural networks and the performance of the SSB approach on recommendation tasks, we attempt at joining both of these into a single method to capture higher-order relationships between users and items, similar to graph neural networks like GCN [13].

The overall model architecture remains the same as SSB, except for the change in how the and embeddings are computed. To compute an embedding that can capture the two-hop signals, we compute the second power of the Laplacian Normalized adjacency matrix, , and then compute its Truncated SVD. We finally concatenate the embeddings from SVD of (corresponding to one-hop neighborhood) and SVD of (corresponding to two-hop neighborhood) to generate and embeddings for this approach. The embedding size of TSA is the size of the vector after this concatenation. Since this approach contains two-hop signals from the graph, we denote this method as Two-Hop SVD Approach (TSA).

2.3 Model Training

The learnable parameters in the proposed methods are only the weights of the multi-layer perceptron model. To optimize the user-item relevance, we employ the Bayesian Personalized Ranking (BPR) loss [23] similar to [27]. It is a pair-wise loss that encourages correct predictions on observed instances than on unobserved instances. We use the Adam optimizer [12]

in a mini-batch setting, where the batch size is a hyperparameter.

3 Experiments

3.1 Datasets and Performance Metrics

We use the same three datasets (Gowalla, Yelp2018 and Amazon-Book) as [27, 9] with the same train and test split in order to make a fair comparison with the already reported results. Table 2 summarizes the dataset statistics. We refer the reader to [27] for more details of the datasets. We evaluate the performance on mean NDCG@K and mean Recall@K per user for K=20. We keep K=20 to enable fair comparison with previous studies which use the same metrics [27, 9]. For the rest of the paper, we denote Recall@20 as Recall and NDCG@20 as NDCG. It should be noted that for both Recall and NDCG, the items retrieved for top-20 are solely from the test partition of the dataset.

3.2 Hyperparameter Tuning

For the proposed methods, there are four key hyperparameters - SVD embedding size ( and

), batch size, learning rate and size of the multi-layer perceptron. For this study, we fix the size of the multi-layer perceptron to be 512 neurons each and keep the learning rate as

motivated by the experiments in [9]. We tune the SVD embedding size over the following set: . We keep the batch size as 1024 for Gowalla and Yelp2018, and 2048 for Amazon-Book as used in the study of NGCF [27] and LightGCN [9].

Dataset #Users #Items #Interactions Density
Gowalla 29,858 40,981 1,027,370 0.00084
Yelp2018 31,688 38,048 1,561,406 0.00130
Amazon-Book 52,643 91,599 2,984,108 0.00062
Table 2: Dataset Statistics

3.3 Empirical Results

3.3.1 Comparison with state-of-the-art methods

In this section, we report the performance metrics for the proposed methods - Simple SVD Baseline (SSB) and Two-Hop SVD Approach (TSA). We benchmark the approach against NGCF [27], Mult-VAE[17], GRMF [22], LightGCN [9], MF [23], and NeuMF [10]. Although MF [23], and NeuMF [10] methods are relatively older methods to compare. However, we report their performance here as these are closely related to matrix factorization in the context of recommendation systems. LightGCN [9] is the state-of-the-art method showing the best performance compared to all related approaches as shown in their paper. Table 3 shows the performance metrics for the proposed methods and the compared methods. We replicate the results of these approaches from the original papers of NGCF [27] and LightGCN [9]. We follow the same experimental methodology as stated in the papers and followed in the code and datasets made available by the authors of these studies to make a fair comparison.

Dataset Gowalla Yelp2018 Amazon-Book
Recall NDCG Recall NDCG Recall NDCG
MF 0.1291 0.1109 0.0433 0.0354 0.0250 0.0196
NeuMF 0.1399 0.1212 0.0451 0.0363 0.0258 0.0200
NGCF 0.157 0.1327 0.0579 0.0477 0.0344 0.0263
Mult-VAE 0.1641 0.1335 0.0584 0.0450 0.0407 0.0315
GRMF 0.1477 0.1205 0.0571 0.0462 0.0354 0.0270
LightGCN 0.183 0.1554 0.0649 0.0530 0.0411 0.0315
SBB () 0.169 0.1401 0.0647 0.0534 0.0408 0.0325
TSA () 0.1704 0.1415 0.0657 0.0542 0.0456 0.0364
Table 3: Comparison of the proposed methods - SBB and TSA with related methods. It can be observed that the proposed methods, despite being very simple, beat all the compared methods for the Yelp2018 and Amazon-Book datasets. For the Gowalla dataset, the proposed methods prove to be a strong baseline and was able to beat all but one state-of-the-art methods on this dataset. is the size of Truncated SVD embedding which is same for both users and items. Results reused from [9]; Results reused from [27].

We can observe that TSA performs considerably better for Amazon-Book dataset than all the compared state-of-art methods, including LightGCN and NGCF. The relative gain of TSA over LightGCN, which performs the best among the compared methods, is approximately 9.86% in Recall@20 and 13.4% in NDCG@20. The performance of SSB is also higher than all considered approaches in terms of NDCG and only 0.7% short of LightGCN and still better than all compared approaches.

We can see that for Yelp2018 dataset, TSA performed marginally better than LightGCN [9] in terms of both Recall ( 1% relatively) and NDCG ( 2% relatively). In contrast, SBB performs marginally lower than LightGCN [9] in terms of Recall but has a slightly higher NDCG. However, both SBB and TSA performs significantly better than all other compared approaches, including NGCF [27] which uses multi-hop relations among users to exploit higher-order signals for predicting relevance.

For the Gowalla dataset, the proposed methods perform poorly compared to LightGCN. There is  1.3% absolute difference in Recall and  1.5% absolute difference in NDCG. However, despite being fairly plain, the proposed approaches outperform all other approaches, including some which are inherently more complex such as Mult-VAE and NGCF.

We believe that the generalization power of the proposed methods has led to the improvement in performance in Yelp2018 and Amazon-Book datasets. However, in the case of Gowalla, the method seems to be underfitting because the signals of SBB and TSA are only limited to one-hop and two-hop neighbors respectively. As shown in [9], as the number of layers of GCN is increased to four (equivalent to four-hops of neighborhood), the performance increases. We leave this aspect of experimentation to be addressed in future work.

3.3.2 Comparison of SBB and TSA

In this section, we point out the comparison between SBB and TSA approaches. In Table 3, we can observe that TSA always performs better than SBB, given the additional signal from two-hop neighbors. Comparing the proposed methods, SBB and TSA, we only see a small relative increase in performance in Gowalla and Yelp-2018 and a more significant uplift in the case of TSA for Amazon-Book dataset. This is in line with the studies of LightGCN [9], and NGCF [27] where it was shown that as the number of layers is increased for the GCN model (which is equivalent to covering more hops in neighborhood), the performance increases in equivalent amount. We believe the improvement of metrics in TSA is the addition of signal from two-hops away which is not present in SBB. Interestingly, for Yelp2018 and Amazon-Book datasets, the two-hop proposed approach, TSA, is able to outperform the four-hop approaches of LightGCN and NGCF.

SSB TSA
Training Loss 0.04477 0.03847
Training Recall 0.14736 0.16262
Training NDCG 0.25875 0.28243
Time for SVD
Computation
21.83s 503.04s
Table 4: Comparison of the proposed methods with each with respect to training metrics on Yelp-2018 dataset. It should be noticed that SVD is only performed once at the start of training the model.
(a)

Training Loss per epoch for Yelp2018 datatset

(b) Test Recall@20 per epoch for Yelp-2018
Figure 2: Training Loss and Test Recall@20 per epoch for the Yelp2018 for TSA method. For higher embedding dimension, Training Loss is lower and Test Recall is higher.

Regarding training loss, we observe that SSB, on convergence, has a higher training loss than TSA. While TSA performs better than SSB, it does so at the expense of additional time to compute the two-hop adjacency matrix; and the additional Truncated SVD on the two-hop matrix. In Table 4, we summarize the comparison between SSB and TSA on the training metrics, while test performance is shown in Table 3. We run our experiments on Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz with 6 cores and 110 Giga Bytes of RAM.

3.3.3 Impact of Embedding Size on Performance on TSA

Out of the four hyperparameters discussed, we only optimize the SVD based embedding size for users and items, which become the input to the multi-layer perceptron. In this section, we will describe the observations on how the performance and training loss changes as we change the embedding dimensions for TSA, which performs better than SSB across datasets. We vary the embedding dimension as follows: . Fig. 2(a) shows the training loss for different embedding sizes for the TSA approach. As expected, it can be seen that as the embedding size increases, the loss decreases faster and also to a lower value. We observe a similar trend in test performance metrics, and Fig. 2(b) shows Recall@20 for the test set for different embedding sizes, and it can be seen that higher embedding sizes lead to better performance.

4 Conclusions and Future Work

In this paper, we started out to benchmark SVD based methods against the state-of-the-art GRL methods. We propose two approaches for the same, and experiments on three real-world open datasets demonstrate that these methods are powerful enough to beat many GRL methods and even come out as state-of-the-art themselves in two out three datasets. We observed the most significant relative gain of over 10% against the state-of-the-art methods.

This particular work raises many research questions, and we envision the following future work. We plan on investigating how to generalize the approach from two-hop to n-hop since we saw in earlier research articles that with a higher order of neighborhood, we could expect better performance. There is also a need to propose an inductive version of these methods since transductive versions (as proposed in this paper), do not work with new nodes or new edges in the graph and would require frequent retraining in the current form. We also plan to investigate the aspects around the better implementation of SVD in big data frameworks such as Spark [1, 25]. This would help us further understand the time taken for the proposed methods for the training of models. We also want to explore how matrix factorization or SVD can be integrated with GRL to improve empirical performance and is it possible to extract the goodness from both methods and merge them. We also plan to work on data profiling for the proposed methods and existing literature to understand why one approach performs well on some datasets but does not perform equally well on other datasets. On the empirical investigation front, we also intend to benchmark these approaches on the Open Graph Benchmark [11]. We also plan to test out these approaches in tasks beyond recommendation systems such as graph-based formulations in NLP, social network modelling and graph applications in biology.

Through our work, we want to highlight that matrix factorization based methods still contribute as important baselines and should not be ignored in empirical benchmarking while making more advances in GRL or recommendation systems.

References

  • [1] A. Alexopoulos, G. Drakopoulos, A. Kanavos, P. Mylonas, and G. Vonitsanos (2020) Two-step classification with svd preprocessing of distributed massive datasets in apache spark. Algorithms 13 (3), pp. 71. Cited by: §4.
  • [2] R. v. d. Berg, T. N. Kipf, and M. Welling (2017) Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263. Cited by: §1.
  • [3] K. Choi, D. Yoo, G. Kim, and Y. Suh (2012) A hybrid online-product recommendation system: combining implicit rating-based collaborative filtering and sequential pattern analysis. electronic commerce research and applications 11 (4), pp. 309–317. Cited by: §2.1.
  • [4] S. Debnath, N. Ganguly, and P. Mitra (2008) Feature weighting in content based recommendation system using social network analysis. In Proceedings of the 17th international conference on World Wide Web, pp. 1041–1042. Cited by: §2.1.
  • [5] G. Gonzalez, S. Gong, I. Laponogov, M. Bronstein, and K. Veselkov (2021) Predicting anticancer hyperfoods with graph convolutional networks. Human Genomics 15 (1), pp. 1–12. Cited by: §1.
  • [6] A. Grover and J. Leskovec (2016) Node2vec: scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855–864. Cited by: §1.
  • [7] W. L. Hamilton, R. Ying, and J. Leskovec (2017) Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 1025–1035. Cited by: §1.
  • [8] P. C. Hansen (1990) Truncated singular value decomposition solutions to discrete ill-posed problems with ill-determined numerical rank. SIAM Journal on Scientific and Statistical Computing 11 (3), pp. 503–518. Cited by: Figure 1, §2.1.
  • [9] X. He, K. Deng, X. Wang, Y. Li, Y. Zhang, and M. Wang (2020) Lightgcn: simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pp. 639–648. Cited by: §1, §1, Figure 1, §3.1, §3.2, §3.3.1, §3.3.1, §3.3.1, §3.3.2, Table 3.
  • [10] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T. Chua (2017) Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web, pp. 173–182. Cited by: §3.3.1.
  • [11] W. Hu, M. Fey, M. Zitnik, Y. Dong, H. Ren, B. Liu, M. Catasta, and J. Leskovec (2020) Open graph benchmark: datasets for machine learning on graphs. arXiv preprint arXiv:2005.00687. Cited by: §4.
  • [12] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §2.3.
  • [13] T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §1, §2.1, §2.2.
  • [14] V. Klema and A. Laub (1980) The singular value decomposition: its computation and some applications. IEEE Transactions on automatic control 25 (2), pp. 164–176. Cited by: §2.1.
  • [15] I. Laponogov, G. Gonzalez, M. Shepherd, A. Qureshi, D. Veselkov, G. Charkoftaki, V. Vasiliou, J. Youssef, R. Mirnezami, M. Bronstein, et al. (2021) Network machine learning maps phytochemically rich “hyperfoods” to fight covid-19. Human Genomics 15 (1), pp. 1–11. Cited by: §1.
  • [16] X. Li and D. Li (2019) An improved collaborative filtering recommendation algorithm and recommendation strategy. Mobile Information Systems 2019. Cited by: §2.1.
  • [17] D. Liang, R. G. Krishnan, M. D. Hoffman, and T. Jebara (2018)

    Variational autoencoders for collaborative filtering

    .
    In Proceedings of the 2018 world wide web conference, pp. 689–698. Cited by: §3.3.1.
  • [18] H. Linmei, T. Yang, C. Shi, H. Ji, and X. Li (2019) Heterogeneous graph attention networks for semi-supervised short text classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 4823–4832. Cited by: §1.
  • [19] Z. Liu, C. Chen, X. Yang, J. Zhou, X. Li, and L. Song (2018) Heterogeneous graph neural networks for malicious account detection. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 2077–2085. Cited by: §1.
  • [20] X. Qu, Z. Li, J. Wang, Z. Zhang, P. Zou, J. Jiang, J. Huang, R. Xiao, J. Zhang, and J. Gao (2020) Category-aware graph neural networks for improving e-commerce review helpfulness prediction. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 2693–2700. Cited by: §1.
  • [21] R. Ragesh, S. Sellamanickam, A. Iyer, R. Bairi, and V. Lingam (2021) Hetegcn: heterogeneous graph convolutional networks for text classification. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp. 860–868. Cited by: §1.
  • [22] N. Rao, H. Yu, P. K. Ravikumar, and I. S. Dhillon (2015) Collaborative filtering with graph information: consistency and scalable methods. In Advances in Neural Information Processing Systems, pp. 2107–2115. Cited by: §3.3.1.
  • [23] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme (2012) BPR: bayesian personalized ranking from implicit feedback. arXiv preprint arXiv:1205.2618. Cited by: §2.3, §3.3.1.
  • [24] J. Sun and Y. Zhang

    Multi-graph convolutional neural networks for representation learning in recommendation

    .
    Cited by: §1.
  • [25] Z. Sun, F. Chen, M. Chi, and Y. Zhu (2015) A spark-based big data platform for massive remote sensing data processing. In

    International Conference on Data Science

    ,
    pp. 120–126. Cited by: §4.
  • [26] L. H. Ungar and D. P. Foster (1998) Clustering methods for collaborative filtering. In AAAI workshop on recommendation systems, Vol. 1, pp. 114–129. Cited by: §2.1.
  • [27] X. Wang, X. He, M. Wang, F. Feng, and T. Chua (2019) Neural graph collaborative filtering. In SIGIR, Cited by: §1, §1, Figure 1, §2.3, §3.1, §3.2, §3.3.1, §3.3.1, §3.3.2, Table 3.
  • [28] X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, and P. S. Yu (2019) Heterogeneous graph attention network. In The World Wide Web Conference, pp. 2022–2032. Cited by: §1.
  • [29] F. Wu, A. Souza, T. Zhang, C. Fifty, T. Yu, and K. Weinberger (2019) Simplifying graph convolutional networks. In International conference on machine learning, pp. 6861–6871. Cited by: §1.
  • [30] Y. Wu, M. Gao, M. Zeng, F. Chen, M. Li, and J. Zhang (2021) BridgeDPI: a novel graph neural network for predicting drug-protein interactions. arXiv preprint arXiv:2101.12547. Cited by: §1.
  • [31] L. Yao, C. Mao, and Y. Luo (2019) Graph convolutional networks for text classification. In

    Proceedings of the AAAI conference on artificial intelligence

    ,
    Vol. 33, pp. 7370–7377. Cited by: §1.
  • [32] Y. Zhang, X. Yu, Z. Cui, S. Wu, Z. Wen, and L. Wang (2020) Every document owns its structure: inductive text classification via graph neural networks. arXiv preprint arXiv:2004.13826. Cited by: §1.
  • [33] H. Zhao, Q. Yao, J. Li, Y. Song, and D. L. Lee (2017) Meta-graph based recommendation fusion over heterogeneous information networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 635–644. Cited by: §1.
  • [34] L. Zhao, Y. Song, C. Zhang, Y. Liu, P. Wang, T. Lin, M. Deng, and H. Li (2019) T-gcn: a temporal graph convolutional network for traffic prediction. IEEE Transactions on Intelligent Transportation Systems 21 (9), pp. 3848–3858. Cited by: §1.