Recommender systems (Koren et al., 2009) have been widely deployed in lots of applications, such as item recommendation on shopping web site, friend recommendation on social web site and so on. There are two main kinds of methods for recommender systems, content-based filtering (Pazzani & Billsus, 2007) and collaborative filtering (Breese et al., 1998). Content-based filtering methods recommend new items that are most similar to users’ historical favorite items. Collaborative filtering methods use collective ratings to make new recommendations by similar rating patterns between users or items.
By regarding rows as users, columns as items and entries as ratings on items by users, the task of recommender systems can be formulated as a matrix completion (MC) problem (Candès & Recht, 2009, 2012). MC has attracted lots of attention in recent years. MC models aim to predict the missing entries in a matrix given a small subset of observed entries. Under the low-rank setting, (Candès & Recht, 2009, 2012) have proved that matrix can be exactly recovered given sufficiently large number of observed entries, although it is a NP-hard problem. One efficient solution for MC problem is to adopt matrix factorization (MF) techniques (Monti et al., 2017).
In many real applications, besides the rating matrix which contains the ratings on items by users, other side information is also available. Typical side information includes attributes of users/items and the relationship (link) graphs between users/items. Therefore, there have appeared a few works to incorporate the attributes of users/items to boost the performance of matrix completion models (Jain & Dhillon, 2013; Xu et al., 2013). Furthermore, geometric matrix completion (GMC) models (Li & Yeung, 2009; Kalofolias et al., 2014; Rao et al., 2015; Monti et al., 2017) have also been proposed for recommendation by integrating the relationship (link) graphs among users/items into matrix completion. For example, the methods in (Li & Yeung, 2009; Agarwal & Chen, 2009; Adams et al., 2010; Porteous et al., 2010; Ma et al., 2011; Cai et al., 2011; Menon et al., 2011; Kalofolias et al., 2014; Rao et al., 2015) propose to encode the structural (geometric) information of graphs via graph Laplacian regularization (Belkin & Niyogi, 2001, 2003) which tries to impose smoothness priors on latent factors (embeddings) of users/items. These graph regularization based methods have shown promising performance in real applications.
Recently, geometric deep learning techniques (Bruna et al., 2014; Gori et al., 2005; Li et al., 2016; Henaff et al., 2015; Sukhbaatar et al., 2016; Defferrard et al., 2016; Kipf & Welling, 2017) are proposed to learn meaningful representations for geometric structure data, such as graphs and manifolds. In particular, geometric deep learning on graphs (GDLG) (Defferrard et al., 2016; Monti et al., 2017) has been proposed to solve the GMC problem, showing better performance than existing GMC methods including graph regularization based methods. To the best of our knowledge, there exists only one GDLG method for GMC, which is called recurrent multi-graph convolutional neural network (RMGCNN) (Monti et al., 2017). Based on spectral graph convolution framework (Defferrard et al., 2016)
, RMGCNN defines two-dimensional graph convolutional filters to process multi-graphs. The graph embeddings extracted by the two-dimensional graph convolutional filters are fed into a Long Short-Term Memory (LSTM) recurrent neural network (RNN)(Hochreiter & Schmidhuber, 1997) to perform diffusion process, which is actually feature transformation. After that, the final embeddings are used to do matrix completion task. A factorized (matrix factorization) version, called separable RMGCNN (sRMGCNN), is also proposed in (Monti et al., 2017) for efficiency improvement. RMGCNN combines graph convolutional network (GCN) and recurrent neural network (RNN) together for GMC. Experimental results in (Monti et al., 2017) show that the GCN part and RNN part can improve the performance of matrix completion simultaneously. However, matrix completion with pure GCN, named MGCNN in (Monti et al., 2017), is shown to be worse than RMGCNN in experiments.
In this paper, we propose a new GMC method, called convolutional geometric matrix completion (CGMC), for recommendation with graphs among users/items. CGMC is a pure GCN-based method. The contributions of CGMC are listed as follows:
In CGMC, a new graph convolutional network is designed, by taking only the first two terms of Chebyshev polynomials in spectral graph convolution (Defferrard et al., 2016) and adopting weighted policy to control the contribution between self-connections and neighbors for graph embeddings.
Because the roles of users in the rating matrix and user graph are different, the latent factors (embeddings) to represent users for rating matrix and those for user graph should also have some difference, although the users are the same. Hence, in CGMC, a fully connected layer is added to the output of GCN to project the user graph embeddings to a compatible space for rating matrix. Similar operations are also performed for items.
CGMC integrates GCN and MC into a unified deep learning framework, in which the two components (GCN and MC) can give feedback to each other.
Experimental results on real datasets show that CGMC can outperform other state-of-the-art methods including RMGCNN. Hence, with properly designed network architecture for graph convolution, our work shows that pure GCN-based method can also achieve the best performance.
2 Related Work
In this section, we introduce the related work of CGMC, including matrix completion (MC), geometric matrix completion (GMC), geometric deep learning on graphs (GDLG), and GDLG based GMC.
2.1 Matrix Completion
Suppose is a rating matrix, with being the number of users and being the number of items. Given a subset of the entries ,
. Matrix completion problem aims to estimate. It is formulated as follows (Candès & Recht, 2009, 2012; Cai et al., 2010):
where is the nuclear norm of the matrix . is the projection operator, where if , else .
One solution to solve the MC problem is to reformulate it as the following matrix factorization (MF) problem:
where and are latent factor representation for users and items, respectively.
2.2 Geometric Matrix Completion
Geometric matrix completion (GMC) (Li & Yeung, 2009; Agarwal & Chen, 2009; Adams et al., 2010; Porteous et al., 2010; Ma et al., 2011; Cai et al., 2011; Menon et al., 2011; Kalofolias et al., 2014; Rao et al., 2015) has been developed to exploit the relationship (link) graph among users/items to assist the matrix completion process. One kind of GMC methods is to adopt graph Laplacian for regularization. GRALS (Rao et al., 2015) is one representative of this kind, which is formulated as follows:
where and are the normalized graph Laplacian of user graph and item graph , respectively. and is a diagonal matrix with diagonal entry .
is an identity matrix whose dimensionality depends on the context.can be similarly computed based on .
2.3 Geometric Deep Learning on Graphs
Recently, there have appeared a few works that attempt to perform geometric deep learning on graphs (GDLG) (Gori et al., 2005; Li et al., 2016; Sukhbaatar et al., 2016). In particular, inspired by spectral graph theory in graph signal processing (Hammond et al., 2011; Shuman et al., 2013), spectral graph convolution is proposed in (Bruna et al., 2014; Henaff et al., 2015).
Suppose is a signal of a graph ( is the number of nodes in graph), then spectral graph convolution operator is defined as follows:
where is Hadamard product, is the normalized graph Laplacian of ,
is the matrix of orthogonal eigenvectors of, is the diagonal matrix of eigen values of . is the convolutional filters that we need to learn. For convenience, we denote as .
The computational complexity for the above convolutional operation is high. Hence, (Defferrard et al., 2016) proposed to approximate with Chebyshev polynomials:
denotes the largest eigenvalue,, , . Then, we have:
which only costs where is the number of non-zero values of . Here, . The above formulation is the combination of , and if all paths between node and node have length less than (Hammond et al., 2011). It’s -localized and it captures the information of -hop neighbors. (Defferrard et al., 2016) stacked the above -localized convolution to construct a spectral graph convolutional network (GCN).
A variant of spectral graph convolutional network (GCN) is proposed in (Kipf & Welling, 2017). We call it GCN-kw in this paper. GCN-kw is a simplified version of the GCN in (Defferrard et al., 2016) by assuming and :
where denotes the diagonal degree matrix of . Then, by constraining ,
As the eigenvalues of are in the range , repeated application of such a filter can result in numerical instability. This can be remedied by a re-normalization:
where and . Here, we can see that self-connections and neighbors contribute equally to graph embeddings, which is not flexible enough.
2.4 GDLG based GMC
To the best of our knowledge, RMGCNN (Monti et al., 2017) is the only work which has applied geometric deep learning on graphs (GDLG) for GMC. RMGCNN adopts GCN (Defferrard et al., 2016) to extract graph embeddings for users and items, and then combines with recurrent neural network (RNN) to perform diffusion process. The factorized version of RMGCNN (Monti et al., 2017) is shown as follows:
where and are the graph embeddings extracted by GCN and RNN for users and items respectively, and are graphs on users and items respectively, denotes the graph embedding iterates for iterations, and represent graph Laplacian regularization.
3 Convolutional GMC
In this section, we present the details of our new GDLG-based GMC method, called convolutional geometric matrix completion (CGMC). CGMC is a pure GCN-based method. CGMC shows that GMC with only GCN can outperform the GCN+RNN method RMGCNN to achieve the state-of-the art performance.
CGMC is formulated as follows. Firstly, a new GCN is proposed to extract graph embedding, which is called convolutional graph embedding (CGE) in this paper, for user/item representation. Then, a fully connected layer is added to the output of GCN to project user/item graph embeddings to a compatible space for rating matrix. After that, GCN and MC are integrated into a unified deep learning framework to get CGMC.
3.1 Convolutional Graph Embedding (CGE)
Here, we propose a new GCN to get the convolutional graph embedding (CGE) for graph node representation.
By taking in the spectral graph convolution of (4), we have:
where we let and is the link matrix of the graph with nodes. Since are free parameters, and there is no constraints between the coefficients of and , we can let . Then, we have
Furthermore, are still free parameters. We let , , and get
Here, and can be explained as a weight controlling the contribution between self-connections and neighbors.
In our GCN, we constrain and . For convenience, we denote , and get
The eigenvalues of are in , which can be easily verified according to Lemma 1.7 in (Chung, 1997). Hence, repeated application of the above filter won’t result in numerical instability. Due to the flexibility of , we treat it as a hyper-parameter and tune it based on a validation set.
When the input signal is multi-dimensional, denoted by with being the number of nodes and being the dimensionality, we can get the formulation of multi-dimensional graph convolution as follows. We use to denote the -th column of , which is the -th input signal.
where is the dimensionality of the output signal, is the filter parameter of the -th output signal defined on the -th input signal, . Then we can get,
which transforms the node representation from to though one-layer graph convolution with the convolution parameter .
By stacking the above formulation to multiple layers, we can get a deep model for CGE. This is formulated as follows:
where is the output signal of the -th layer, is the convolution parameter of the -th layer, and
is an activation function.
3.2 Model of CGMC
Our CGMC can also be used for the nuclear norm regularization formulation in (1), by adopting similar techniques in RMGCNN (Monti et al., 2017). However, as pointed out by (Monti et al., 2017), the nuclear norm regularization formulation is time-consuming. Hence, in this paper, we adopt the MF formulation in (2) for our CGMC.
Suppose denotes the input user features, denotes the input item features, with and being the number of users and items respectively, and being the feature dimensionality for users and items respectively. If or is not available, we set or . and are user graph and item graph. Then the CGE for users and items can be generated by applying (13) to graph and graph :
where and are diagonal degree matrices of and respectively, and are the output feature representation of the -th layer, and , is an activation function, here we take , and are convolution parameters which play the same role as in (13).
Fully-Connected Layer after CGE For a specific user, he/she plays a role in the user graph, and he/she also plays another role in the rating matrix. These two roles are different. Intuitively, the latent factors (embeddings) to represent these two different roles of this user should also have some difference. Items also have similar property.
To capture the difference between these two roles, a fully connected layer is added to the output of GCN to project the CGE to a compatible space for rating matrix. The formulation is as follows:
where and are the output user features and item features of the CGE with layers, and are parameters of the fully connected layer for user CGE and item CGE, .
This is one key difference between our method and other methods like RMGCNN. In our experiments, we will verify that this fully connected layer will improve the performance of CGE.
Objective Function With the projection by the fully connected layer, CGMC is formulated as follows:
where denotes and denotes , is -norm regularization on parameters in CGMC:
From (16), it is easy to find that CGMC seamlessly integrates GCN and MC into a unified deep learning framework, in which GCN and MC can give feedback to each other for performance improvement.
We adopt alternating minimization scheme to alternately optimize the parameters and . Adadelta (Zeiler, 2012) is adopted as our optimization algorithm.
In our training process, we have tried two different kinds of mini-batch sampling policies, user/item sampling and rating sampling. User/item sampling means that one user/item is randomly sampled with probability, and all ratings of the sampled user/item are kept for training. Rating sampling means that a rating is randomly sampled for training with probability . In our experiments, these two kinds of policies behave similarly. We adopt user/item sampling policy in our following experiments.
For each training iteration, we first perform user sampling policy to get a mask . is a diagonal matrix, with being with probability , being 0 with probability . Then the gradients of can be computed as follows:
where denotes element-wise square of matrix . Gradients of can also be derived similar to , which are omitted here.
Based on the derived gradients, we adopt back propagation (BP) to learn the parameters of CGMC. The learning process is summarized in Algorithm 1, where is the learning rate.
3.4 Comparison to Related Work
As mentioned above, sRMGCNN is a factorized (MF) version of RMGCNN. Because we only focus on the factorized version in this paper due to its efficiency, RMGCNN in this paper refers to sRMGCNN unless otherwise stated. CGMC is different from RMGCNN in the following aspects. Firstly, CGMC adopts a different GCN to extract graph embeddings, and the newly designed GCN in CGMC is better than that in RMGCNN which will be verified in experiments. Secondly, RMGCNN adopts both GCN and RNN for GMC, while our CGMC adopts only GCN without RNN. Thirdly, a fully connected layer is introduced in our CGMC for space compatibility.
CGMC is different from GCN-kw in the following aspects. Firstly, GCN-kw is proposed for semi-supervised learning, and it has not been used for MC. Secondly, the GCN in CGMC adopts weighted policy to control the contribution between self-connections and neighbors for graph embedding, while the self-connections and neighbors in GCN-kw contribute equally for graph embedding. Hence, the GCN in CGMC is more flexible than GCN-kw. Thirdly, the GCN in CGMC will not get into numerical instability, while GCN-kw(Kipf & Welling, 2017) has numerical instability problem if no further operation is performed. Although GCN-kw is not proposed for GMC, we adapt it for GMC in this paper and find that our CGMC achieves better performance than GCN-kw based method in our experiment.
We evaluate the proposed model CGMC and other baselines on collaborative filtering datasets. Our implementation is based on PyTorch with a NVIDIA TitanXP GPU server. PyTorch is only used to call GPU interfaces. The gradient computation and BP learning procedure are implemented by ourselves rather than calling the auto-gradient interface in PyTorch.
As in RMGCNN (Monti et al., 2017), we evaluate CGMC and other baselines on four real datasets: Movielens-100K111https://grouplens.org/datasets/movielens/ (ML-100K), Douban, Flixster, YahooMusic. For fair comparison, the dataset size and training/test data partition are exactly the same as those in RMGCNN (Monti et al., 2017). In particular, the latter three datasets are subsets of Douban, Flixster, YahooMusic that are preprocessed and provided by (Monti et al., 2017)222https://github.com/fmonti/mgcnn. Statistics of datasets are presented in Table 1.
4.2 Settings and Baselines
Settings As in RMGCNN (Monti et al., 2017), the graph information is constructed from user/item features. Therefore, we implement featureless version of CGMC in our experiments, where we set . For each dataset, we randomly sample instances from training set as validation set that has the same number as test set. We repeat the experiments 5 times and report the mean of results. On all the datasets, we adopt a version of single graph convolution layer for CGMC to compare with baselines. We learn CGMC according to Algorithm 1. The optimization algorithm we use is Adadelta (Zeiler, 2012) and the maximum number of iterations is set to be .
The regularization parameter is selected from . is selected from . We use the validation set to tune these two hyper-parameters. For all datasets, we set , , . For ML-100K, Douban, and Flixster, the output dimensionality of the fully connected layer . for YahooMusic because its rating level is relatively large. We do not tune these hyper-parameters, although fine-tuning with validation set might further improve the performance of CGMC. For baselines, we adopt the hyper-parameters that achieve the best results. As in RMGCNN (Monti et al., 2017), root mean square error (RMSE) is adopted as metric for evaluation. The smaller the RMSE is, the better the performance will be.
Baselines For ML-100K, we compare CGMC with baselines that utilize information of user/item features. User/item features are constructed in the same way as (Rao et al., 2015) and the user/item graphs are constructed via -nearest neighbors measured by Euclidean distance of features. On this dataset, we compare CGMC with MC (Candès & Recht, 2012), IMC (Jain & Dhillon, 2013; Xu et al., 2013), GMC (Kalofolias et al., 2014), GRALS(Rao et al., 2015), RMGCNN (Monti et al., 2017). MC learns the full matrix with a nuclear norm regularization. IMC utilizes the features of users and items to formulate an inductive matrix model for approximating the target. GMC learns a full matrix that approximates the observed rating matrix and constrains the full matrix by applying graph Laplacian regularization on it. GRALS learns the factorized matrices of the target by applying graph Laplacian regularization on the factorized matrices.
For Douban, Flixster, YahooMusic, we compare CGMC with MC, GRALS, and RMGCNN. For MC, GRALS and RMGCNN, we present results of min-max normalized version and non-normalized version. Min-max normalized version means re-scaling the predictions to the range of the rating level before each training iteration, just as that in RMGCNN (Monti et al., 2017). We implement both min-max normalized version and non-normalized vertion of MC by ourselves. Because the results of non-normalized version of GRALS have been reported in (Monti et al., 2017), we only implement the min-max normalized version of GRALS 333We adopt Adadelta to optimize the min-max normalized version of GRALS. by ourselves. For RMGCNN, the code of min-max normalized version is directly from (Monti et al., 2017), and we use the code provided by (Monti et al., 2017) to implement the non-normalized version.
The results on ML-100K are reported in Table 2, where the results of baselines are directly copied from (Monti et al., 2017). Because the training/test data partition of this paper is exactly the same as that in (Monti et al., 2017), the comparison is fair. From Table 2, we can find that our CGMC outperforms all the other baselines, including graph regularization methods and GDLG-based methods, to achieve the best performance.
|MC (Candès & Recht, 2012)||0.973|
|IMC (Jain & Dhillon, 2013; Xu et al., 2013)||1.653|
|GMC (Kalofolias et al., 2014)||0.996|
|GRALS (Rao et al., 2015)||0.945|
|RMGCNN (Monti et al., 2017)||0.929|
The results on Douban, Flixster and YahooMusic are reported in Table 3. Once again, we can find that CGMC outperforms other state-of-the-art baselines to achieve the best performance. Moreover, we can conclude from Table 3 that the min-max normalization before each training iteration boosts the performance of baselines. However, min-max normalization will hurt the training speed and cannot scale well. Our CGMC achieves the best results under both settings.
|GRALS (Rao et al., 2015)||0.8326/(0.7537)||1.3126/(0.9722)||1.2447/(0.9751)||38.0423/(24.0744)|
|RMGCNN (Monti et al., 2017)||(1.1541)/0.8012||(0.9700)/1.1788||(2.8095)/0.9258||(45.6049)/22.4149|
4.4 Effect of Fully-Connected Layer
To demonstrate the effectiveness of the fully-connected layer in GCN proposed by us, we remove the fully-connected layer after GCE. The CGMC variant without fully connected layer is denoted as CGMC-0, and CGMC is with fully-connected layer. We compare CGMC-0 to CGMC under the conditions where GCN grows from 1 layer to 4 layers. The results are shown in Table 5.
From Table 5, we can observe that with different number of layers for GCN, the improvements of CGMC over CGMC-0 are significant. These results verify the effectiveness of the fully-connected layer in CGMC.
|CGMC-0/CGMC (1 layer)||0.997/0.894||0.9580/0.7298||1.0868/ 0.8822||37.0064/19.3751|
|CGMC-0/CGMC (2 layers)||0.905/0.897||0.7421/0.7384||0.9024/0.8961||36.7636/19.5726|
|CGMC-0/CGMC (3 layers)||0.913/0.904||0.7570/0.7368||0.9099/0.8976||36.7769/19.7722|
|CGMC-0/CGMC (4 layers)||0.917/0.911||0.7653/0.7447||0.9211/0.9113||36.7714/19.7767|
|GMC-GCN-kw-0 (1 layer)||1.088||1.7088||1.5279||1.5805||34.6415|
|GMC-GCN-kw-0 (2 layers)||1.049||0.7548||0.9254||1.1116||34.4439|
|GMC-GCN-kw-0 (3 layers)||1.076||0.7728||0.9721||1.1581||34.4976|
|GMC-GCN-kw-0 (4 layers)||1.082||0.7785||1.0049||1.1568||34.4765|
4.5 Effect of Weighted Policy in GCN
To demonstrate the effectiveness of the weighted policy proposed in our newly designed GCN, we replace our GCN in CGMC by the GCN-kw (Kipf & Welling, 2017). The resulting model is denoted as GMC-GCN-kw. We design two variants of GMC-GCN-kw. GMC-GCN-kw denotes the variant of our CGMC by only replacing our GCN by GCN-kw, with all other parts fixed. It means that GMC-GCN-kw also includes a fully connected layer which is proposed by us. GMC-GCN-kw-0 denotes a variant of GMC-GCN-kw without the fully connected layer.
The results are reported in Table 5. We can observe that CGMC performs better than GMC-GCN-kw on all datasets. The results show the effectiveness of adopting weighted policy to control the contribution between self-connections and neighbors for graph embeddings. Compared with GMC-GCN-kw-0, the performance improvement of GMC-GCN-kw once again verifies the effectiveness of the fully connected layer proposed by us.
4.6 Sensitivity to Hyper-parameters
In CGMC, and are two important hyper-parameters. Here, we study the sensitivity of these two hyper-parameters on Flixster and YahooMusic.
The results are presented in Figure 1. With respect to , we can see that CGMC behaves well in a wide range of . As for , we observe that CGMC is a little sensitive to , which depends on the quality of graphs. By using the validation techniques, we can always find a suitable hyper-parameter for CGMC in our experiments.
In this paper, we propose a novel geometric matrix completion method, called convolutional geometric matrix completion (CGMC), for recommender systems with relationship (link) graphs among users/items. To the best of our knowledge, CGMC is the first work to show that pure graph convolutional network (GCN) based methods can achieve the state-of-the-art performance for GMC, as long as a proper GCN is designed and a fully connected layer is adopted for space compatibility. Experimental results on four real datasets show that CGMC can outperform other state-of-the-art baselines, including the RMGCNN (Monti et al., 2017) which is a combination of GCN and RNN.
We believe that other techniques developed in graph signal processing can also be applied in matrix completion with graph information and it is left for future work.
Adams et al. (2010)
Adams, R. P., Dahl, G. E., and Murray, I.
Incorporating side information in probabilistic matrix factorization
with gaussian processes.
Conference on Uncertainty in Artificial Intelligence, 2010.
- Agarwal & Chen (2009) Agarwal, D. and Chen, B.-C. Regression-based latent factor models. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2009.
- Belkin & Niyogi (2001) Belkin, M. and Niyogi, P. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Annual Conference on Neural Information Processing Systems, 2001.
- Belkin & Niyogi (2003) Belkin, M. and Niyogi, P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), 2003.
- Breese et al. (1998) Breese, J., Heckerman, D., and Kadie, C. Empirical analysis of predictive algorithms for collaborative filtering. In Conference on Uncertainty in Artificial Intelligence., 1998.
- Bruna et al. (2014) Bruna, J., Zaremba, W., Szlam, A., and LeCun, Y. Spectral networks and locally connected networks on graphs. In International Conference on Learning Representations, 2014.
- Cai et al. (2011) Cai, D., He, X., Han, J., and Huang, T. S. Graph regularized nonnegative matrix factorization for data representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8), 2011.
Cai et al. (2010)
Cai, J.-F., Candès, E. J., and Shen, Z.
A singular value thresholding algorithm for matrix completion.SIAM Journal on Optimization, 20(4), 2010.
- Candès & Recht (2009) Candès, E. J. and Recht, B. Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6), 2009.
- Candès & Recht (2012) Candès, E. J. and Recht, B. Exact matrix completion via convex optimization. Communications of the ACM, 55(6), 2012.
- Chung (1997) Chung, F. RK. Spectral graph theory. Number 92. American Mathematical Society, 1997.
- Defferrard et al. (2016) Defferrard, M., Bresson, X., and Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. In Annual Conference on Neural Information Processing Systems, 2016.
- Gori et al. (2005) Gori, M., Monfardini, G., and Scarselli, F. A new model for learning in graph domains. In IEEE International Joint Conference on Neural Networks, 2005.
- Hammond et al. (2011) Hammond, D. K., Vandergheynst, P., and Gribonval, R. Wavelets on graphs via spectral graph theory. Applied and Computational Harmonic Analysis, 30(2), 2011.
- Henaff et al. (2015) Henaff, M., Bruna, J., and LeCun, Y. Deep convolutional networks on graph-structured data. arXiv, 2015.
- Hochreiter & Schmidhuber (1997) Hochreiter, S. and Schmidhuber, J. Long short-term memory. Neural Computation, 9(8), 1997.
- Jain & Dhillon (2013) Jain, P. and Dhillon, I. S. Provable inductive matrix completion. arXiv, 2013.
- Kalofolias et al. (2014) Kalofolias, V., Bresson, X., Bronstein, M. M., and Vandergheynst, P. Matrix completion on graphs. arXiv, 2014.
- Kipf & Welling (2017) Kipf, T. N. and Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2017.
- Koren et al. (2009) Koren, Y., Bell, R., and Volinsky, C. Matrix factorization techniques for recommender systems. IEEE Computer, 2009.
- Li & Yeung (2009) Li, W.-J. and Yeung, D.-Y. Relation regularized matrix factorization. In International Joint Conference on Artificial Intelligence, 2009.
- Li et al. (2016) Li, Y., Tarlow, D., Brockschmidt, M., and Zemel, R. Gated graph sequence neural networks. In International Conference on Learning Representations, 2016.
- Ma et al. (2011) Ma, H., Zhou, D., Liu, C., Lyu, M. R., and King, I. Recommender systems with social regularization. In International Conference on Web Search and Web Data Mining, 2011.
- Menon et al. (2011) Menon, A. K., Chitrapura, K. P., Garg, S., Agarwal, D., and Kota, N. Response prediction using collaborative filtering with hierarchies and side-information. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011.
- Monti et al. (2017) Monti, F., Bronstein, M., and Bresson, X. Geometric matrix completion with recurrent multi-graph neural networks. In Annual Conference on Neural Information Processing Systems, 2017.
- Pazzani & Billsus (2007) Pazzani, M. and Billsus, D. Content-based recommendation systems. In The Adaptive Web, 2007.
- Porteous et al. (2010) Porteous, I.n, Asuncion, A. U., and Welling, M. Bayesian matrix factorization with side information and dirichlet process mixtures. In Association for the Advancement of Artificial Intelligence, 2010.
- Rao et al. (2015) Rao, N., Yu, H.-F., Ravikumar, P., and Dhillon, I. S. Collaborative filtering with graph information: Consistency and scalable methods. In Annual Conference on Neural Information Processing Systems, 2015.
Shuman et al. (2013)
Shuman, D. I., Narang, S. K., Frossard, P., Ortega, A., and Vandergheynst, P.
The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains.IEEE Signal Process Magazine, 30(3), 2013.
Sukhbaatar et al. (2016)
Sukhbaatar, S., Szlam, A., and Fergus, R.
Learning multiagent communication with backpropagation.In Annual Conference on Neural Information Processing Systems, Barcelona, Spain, 2016.
- Xu et al. (2013) Xu, M., Jin, R., and Zhou, Z.-H. Speedup matrix completion with side information: Application to multi-label learning. In Annual Conference on Neural Information Processing Systems, 2013.
- Zeiler (2012) Zeiler, M. D. Adadelta: an adaptive learning rate method. arXiv, 2012.