1 Introduction
Recommender systems (Koren et al., 2009) have been widely deployed in lots of applications, such as item recommendation on shopping web site, friend recommendation on social web site and so on. There are two main kinds of methods for recommender systems, contentbased filtering (Pazzani & Billsus, 2007) and collaborative filtering (Breese et al., 1998). Contentbased filtering methods recommend new items that are most similar to users’ historical favorite items. Collaborative filtering methods use collective ratings to make new recommendations by similar rating patterns between users or items.
By regarding rows as users, columns as items and entries as ratings on items by users, the task of recommender systems can be formulated as a matrix completion (MC) problem (Candès & Recht, 2009, 2012). MC has attracted lots of attention in recent years. MC models aim to predict the missing entries in a matrix given a small subset of observed entries. Under the lowrank setting, (Candès & Recht, 2009, 2012) have proved that matrix can be exactly recovered given sufficiently large number of observed entries, although it is a NPhard problem. One efficient solution for MC problem is to adopt matrix factorization (MF) techniques (Monti et al., 2017).
In many real applications, besides the rating matrix which contains the ratings on items by users, other side information is also available. Typical side information includes attributes of users/items and the relationship (link) graphs between users/items. Therefore, there have appeared a few works to incorporate the attributes of users/items to boost the performance of matrix completion models (Jain & Dhillon, 2013; Xu et al., 2013). Furthermore, geometric matrix completion (GMC) models (Li & Yeung, 2009; Kalofolias et al., 2014; Rao et al., 2015; Monti et al., 2017) have also been proposed for recommendation by integrating the relationship (link) graphs among users/items into matrix completion. For example, the methods in (Li & Yeung, 2009; Agarwal & Chen, 2009; Adams et al., 2010; Porteous et al., 2010; Ma et al., 2011; Cai et al., 2011; Menon et al., 2011; Kalofolias et al., 2014; Rao et al., 2015) propose to encode the structural (geometric) information of graphs via graph Laplacian regularization (Belkin & Niyogi, 2001, 2003) which tries to impose smoothness priors on latent factors (embeddings) of users/items. These graph regularization based methods have shown promising performance in real applications.
Recently, geometric deep learning techniques (Bruna et al., 2014; Gori et al., 2005; Li et al., 2016; Henaff et al., 2015; Sukhbaatar et al., 2016; Defferrard et al., 2016; Kipf & Welling, 2017) are proposed to learn meaningful representations for geometric structure data, such as graphs and manifolds. In particular, geometric deep learning on graphs (GDLG) (Defferrard et al., 2016; Monti et al., 2017) has been proposed to solve the GMC problem, showing better performance than existing GMC methods including graph regularization based methods. To the best of our knowledge, there exists only one GDLG method for GMC, which is called recurrent multigraph convolutional neural network (RMGCNN) (Monti et al., 2017). Based on spectral graph convolution framework (Defferrard et al., 2016)
, RMGCNN defines twodimensional graph convolutional filters to process multigraphs. The graph embeddings extracted by the twodimensional graph convolutional filters are fed into a Long ShortTerm Memory (LSTM) recurrent neural network (RNN)
(Hochreiter & Schmidhuber, 1997) to perform diffusion process, which is actually feature transformation. After that, the final embeddings are used to do matrix completion task. A factorized (matrix factorization) version, called separable RMGCNN (sRMGCNN), is also proposed in (Monti et al., 2017) for efficiency improvement. RMGCNN combines graph convolutional network (GCN) and recurrent neural network (RNN) together for GMC. Experimental results in (Monti et al., 2017) show that the GCN part and RNN part can improve the performance of matrix completion simultaneously. However, matrix completion with pure GCN, named MGCNN in (Monti et al., 2017), is shown to be worse than RMGCNN in experiments.In this paper, we propose a new GMC method, called convolutional geometric matrix completion (CGMC), for recommendation with graphs among users/items. CGMC is a pure GCNbased method. The contributions of CGMC are listed as follows:

In CGMC, a new graph convolutional network is designed, by taking only the first two terms of Chebyshev polynomials in spectral graph convolution (Defferrard et al., 2016) and adopting weighted policy to control the contribution between selfconnections and neighbors for graph embeddings.

Because the roles of users in the rating matrix and user graph are different, the latent factors (embeddings) to represent users for rating matrix and those for user graph should also have some difference, although the users are the same. Hence, in CGMC, a fully connected layer is added to the output of GCN to project the user graph embeddings to a compatible space for rating matrix. Similar operations are also performed for items.

CGMC integrates GCN and MC into a unified deep learning framework, in which the two components (GCN and MC) can give feedback to each other.

Experimental results on real datasets show that CGMC can outperform other stateoftheart methods including RMGCNN. Hence, with properly designed network architecture for graph convolution, our work shows that pure GCNbased method can also achieve the best performance.
2 Related Work
In this section, we introduce the related work of CGMC, including matrix completion (MC), geometric matrix completion (GMC), geometric deep learning on graphs (GDLG), and GDLG based GMC.
2.1 Matrix Completion
Suppose is a rating matrix, with being the number of users and being the number of items. Given a subset of the entries ,
. Matrix completion problem aims to estimate
. It is formulated as follows (Candès & Recht, 2009, 2012; Cai et al., 2010):(1) 
where is the nuclear norm of the matrix . is the projection operator, where if , else .
One solution to solve the MC problem is to reformulate it as the following matrix factorization (MF) problem:
(2) 
where and are latent factor representation for users and items, respectively.
2.2 Geometric Matrix Completion
Geometric matrix completion (GMC) (Li & Yeung, 2009; Agarwal & Chen, 2009; Adams et al., 2010; Porteous et al., 2010; Ma et al., 2011; Cai et al., 2011; Menon et al., 2011; Kalofolias et al., 2014; Rao et al., 2015) has been developed to exploit the relationship (link) graph among users/items to assist the matrix completion process. One kind of GMC methods is to adopt graph Laplacian for regularization. GRALS (Rao et al., 2015) is one representative of this kind, which is formulated as follows:
(3) 
where and are the normalized graph Laplacian of user graph and item graph , respectively. and is a diagonal matrix with diagonal entry .
is an identity matrix whose dimensionality depends on the context.
can be similarly computed based on .2.3 Geometric Deep Learning on Graphs
Recently, there have appeared a few works that attempt to perform geometric deep learning on graphs (GDLG) (Gori et al., 2005; Li et al., 2016; Sukhbaatar et al., 2016). In particular, inspired by spectral graph theory in graph signal processing (Hammond et al., 2011; Shuman et al., 2013), spectral graph convolution is proposed in (Bruna et al., 2014; Henaff et al., 2015).
Suppose is a signal of a graph ( is the number of nodes in graph), then spectral graph convolution operator is defined as follows:
where is Hadamard product, is the normalized graph Laplacian of ,
is the matrix of orthogonal eigenvectors of
, is the diagonal matrix of eigen values of . is the convolutional filters that we need to learn. For convenience, we denote as .The computational complexity for the above convolutional operation is high. Hence, (Defferrard et al., 2016) proposed to approximate with Chebyshev polynomials:
where ,
denotes the largest eigenvalue,
, , . Then, we have:(4) 
which only costs where is the number of nonzero values of . Here, . The above formulation is the combination of , and if all paths between node and node have length less than (Hammond et al., 2011). It’s localized and it captures the information of hop neighbors. (Defferrard et al., 2016) stacked the above localized convolution to construct a spectral graph convolutional network (GCN).
A variant of spectral graph convolutional network (GCN) is proposed in (Kipf & Welling, 2017). We call it GCNkw in this paper. GCNkw is a simplified version of the GCN in (Defferrard et al., 2016) by assuming and :
(5) 
where denotes the diagonal degree matrix of . Then, by constraining ,
(6) 
As the eigenvalues of are in the range , repeated application of such a filter can result in numerical instability. This can be remedied by a renormalization:
(7) 
where and . Here, we can see that selfconnections and neighbors contribute equally to graph embeddings, which is not flexible enough.
2.4 GDLG based GMC
To the best of our knowledge, RMGCNN (Monti et al., 2017) is the only work which has applied geometric deep learning on graphs (GDLG) for GMC. RMGCNN adopts GCN (Defferrard et al., 2016) to extract graph embeddings for users and items, and then combines with recurrent neural network (RNN) to perform diffusion process. The factorized version of RMGCNN (Monti et al., 2017) is shown as follows:
where and are the graph embeddings extracted by GCN and RNN for users and items respectively, and are graphs on users and items respectively, denotes the graph embedding iterates for iterations, and represent graph Laplacian regularization.
3 Convolutional GMC
In this section, we present the details of our new GDLGbased GMC method, called convolutional geometric matrix completion (CGMC). CGMC is a pure GCNbased method. CGMC shows that GMC with only GCN can outperform the GCN+RNN method RMGCNN to achieve the stateofthe art performance.
CGMC is formulated as follows. Firstly, a new GCN is proposed to extract graph embedding, which is called convolutional graph embedding (CGE) in this paper, for user/item representation. Then, a fully connected layer is added to the output of GCN to project user/item graph embeddings to a compatible space for rating matrix. After that, GCN and MC are integrated into a unified deep learning framework to get CGMC.
3.1 Convolutional Graph Embedding (CGE)
Here, we propose a new GCN to get the convolutional graph embedding (CGE) for graph node representation.
By taking in the spectral graph convolution of (4), we have:
(8) 
where we let and is the link matrix of the graph with nodes. Since are free parameters, and there is no constraints between the coefficients of and , we can let . Then, we have
(9) 
Furthermore, are still free parameters. We let , , and get
(10) 
Here, and can be explained as a weight controlling the contribution between selfconnections and neighbors.
In our GCN, we constrain and . For convenience, we denote , and get
(11) 
The eigenvalues of are in , which can be easily verified according to Lemma 1.7 in (Chung, 1997). Hence, repeated application of the above filter won’t result in numerical instability. Due to the flexibility of , we treat it as a hyperparameter and tune it based on a validation set.
When the input signal is multidimensional, denoted by with being the number of nodes and being the dimensionality, we can get the formulation of multidimensional graph convolution as follows. We use to denote the th column of , which is the th input signal.
where is the dimensionality of the output signal, is the filter parameter of the th output signal defined on the th input signal, . Then we can get,
(12) 
which transforms the node representation from to though onelayer graph convolution with the convolution parameter .
By stacking the above formulation to multiple layers, we can get a deep model for CGE. This is formulated as follows:
(13) 
where is the output signal of the th layer, is the convolution parameter of the th layer, and
is an activation function.
3.2 Model of CGMC
Our CGMC can also be used for the nuclear norm regularization formulation in (1), by adopting similar techniques in RMGCNN (Monti et al., 2017). However, as pointed out by (Monti et al., 2017), the nuclear norm regularization formulation is timeconsuming. Hence, in this paper, we adopt the MF formulation in (2) for our CGMC.
Suppose denotes the input user features, denotes the input item features, with and being the number of users and items respectively, and being the feature dimensionality for users and items respectively. If or is not available, we set or . and are user graph and item graph. Then the CGE for users and items can be generated by applying (13) to graph and graph :
(14) 
where and are diagonal degree matrices of and respectively, and are the output feature representation of the th layer, and , is an activation function, here we take , and are convolution parameters which play the same role as in (13).
FullyConnected Layer after CGE For a specific user, he/she plays a role in the user graph, and he/she also plays another role in the rating matrix. These two roles are different. Intuitively, the latent factors (embeddings) to represent these two different roles of this user should also have some difference. Items also have similar property.
To capture the difference between these two roles, a fully connected layer is added to the output of GCN to project the CGE to a compatible space for rating matrix. The formulation is as follows:
(15) 
where and are the output user features and item features of the CGE with layers, and are parameters of the fully connected layer for user CGE and item CGE, .
This is one key difference between our method and other methods like RMGCNN. In our experiments, we will verify that this fully connected layer will improve the performance of CGE.
Objective Function With the projection by the fully connected layer, CGMC is formulated as follows:
(16) 
where denotes and denotes , is norm regularization on parameters in CGMC:
(17) 
From (16), it is easy to find that CGMC seamlessly integrates GCN and MC into a unified deep learning framework, in which GCN and MC can give feedback to each other for performance improvement.
3.3 Learning
We adopt alternating minimization scheme to alternately optimize the parameters and . Adadelta (Zeiler, 2012) is adopted as our optimization algorithm.
In our training process, we have tried two different kinds of minibatch sampling policies, user/item sampling and rating sampling. User/item sampling means that one user/item is randomly sampled with probability
, and all ratings of the sampled user/item are kept for training. Rating sampling means that a rating is randomly sampled for training with probability . In our experiments, these two kinds of policies behave similarly. We adopt user/item sampling policy in our following experiments.For each training iteration, we first perform user sampling policy to get a mask . is a diagonal matrix, with being with probability , being 0 with probability . Then the gradients of can be computed as follows:
(18) 
where denotes elementwise square of matrix . Gradients of can also be derived similar to , which are omitted here.
Based on the derived gradients, we adopt back propagation (BP) to learn the parameters of CGMC. The learning process is summarized in Algorithm 1, where is the learning rate.
3.4 Comparison to Related Work
The most related work to our CGMC is RMGCNN (sRMGCNN) (Monti et al., 2017) and GCNkw in (Kipf & Welling, 2017). Here we discuss the difference between them and our CGMC.
As mentioned above, sRMGCNN is a factorized (MF) version of RMGCNN. Because we only focus on the factorized version in this paper due to its efficiency, RMGCNN in this paper refers to sRMGCNN unless otherwise stated. CGMC is different from RMGCNN in the following aspects. Firstly, CGMC adopts a different GCN to extract graph embeddings, and the newly designed GCN in CGMC is better than that in RMGCNN which will be verified in experiments. Secondly, RMGCNN adopts both GCN and RNN for GMC, while our CGMC adopts only GCN without RNN. Thirdly, a fully connected layer is introduced in our CGMC for space compatibility.
CGMC is different from GCNkw in the following aspects. Firstly, GCNkw is proposed for semisupervised learning, and it has not been used for MC. Secondly, the GCN in CGMC adopts weighted policy to control the contribution between selfconnections and neighbors for graph embedding, while the selfconnections and neighbors in GCNkw contribute equally for graph embedding. Hence, the GCN in CGMC is more flexible than GCNkw. Thirdly, the GCN in CGMC will not get into numerical instability, while GCNkw
(Kipf & Welling, 2017) has numerical instability problem if no further operation is performed. Although GCNkw is not proposed for GMC, we adapt it for GMC in this paper and find that our CGMC achieves better performance than GCNkw based method in our experiment.4 Experiment
We evaluate the proposed model CGMC and other baselines on collaborative filtering datasets. Our implementation is based on PyTorch with a NVIDIA TitanXP GPU server. PyTorch is only used to call GPU interfaces. The gradient computation and BP learning procedure are implemented by ourselves rather than calling the autogradient interface in PyTorch.
4.1 Datasets
As in RMGCNN (Monti et al., 2017), we evaluate CGMC and other baselines on four real datasets: Movielens100K^{1}^{1}1https://grouplens.org/datasets/movielens/ (ML100K), Douban, Flixster, YahooMusic. For fair comparison, the dataset size and training/test data partition are exactly the same as those in RMGCNN (Monti et al., 2017). In particular, the latter three datasets are subsets of Douban, Flixster, YahooMusic that are preprocessed and provided by (Monti et al., 2017)^{2}^{2}2https://github.com/fmonti/mgcnn. Statistics of datasets are presented in Table 1.
Dataset  #Users  #Items  Graphs  #Ratings  Density  Rating levels 

ML100K  943  1682  Users/Items  100,000  0.0630  
Douban  3000  3000  Users  136,891  0.0152  
Flixster  3000  3000  Users/Items  26,173  0.0029  
YahooMusic  3000  3000  Users  5,335  0.0006 
4.2 Settings and Baselines
Settings As in RMGCNN (Monti et al., 2017), the graph information is constructed from user/item features. Therefore, we implement featureless version of CGMC in our experiments, where we set . For each dataset, we randomly sample instances from training set as validation set that has the same number as test set. We repeat the experiments 5 times and report the mean of results. On all the datasets, we adopt a version of single graph convolution layer for CGMC to compare with baselines. We learn CGMC according to Algorithm 1. The optimization algorithm we use is Adadelta (Zeiler, 2012) and the maximum number of iterations is set to be .
The regularization parameter is selected from . is selected from . We use the validation set to tune these two hyperparameters. For all datasets, we set , , . For ML100K, Douban, and Flixster, the output dimensionality of the fully connected layer . for YahooMusic because its rating level is relatively large. We do not tune these hyperparameters, although finetuning with validation set might further improve the performance of CGMC. For baselines, we adopt the hyperparameters that achieve the best results. As in RMGCNN (Monti et al., 2017), root mean square error (RMSE) is adopted as metric for evaluation. The smaller the RMSE is, the better the performance will be.
Baselines For ML100K, we compare CGMC with baselines that utilize information of user/item features. User/item features are constructed in the same way as (Rao et al., 2015) and the user/item graphs are constructed via nearest neighbors measured by Euclidean distance of features. On this dataset, we compare CGMC with MC (Candès & Recht, 2012), IMC (Jain & Dhillon, 2013; Xu et al., 2013), GMC (Kalofolias et al., 2014), GRALS(Rao et al., 2015), RMGCNN (Monti et al., 2017). MC learns the full matrix with a nuclear norm regularization. IMC utilizes the features of users and items to formulate an inductive matrix model for approximating the target. GMC learns a full matrix that approximates the observed rating matrix and constrains the full matrix by applying graph Laplacian regularization on it. GRALS learns the factorized matrices of the target by applying graph Laplacian regularization on the factorized matrices.
For Douban, Flixster, YahooMusic, we compare CGMC with MC, GRALS, and RMGCNN. For MC, GRALS and RMGCNN, we present results of minmax normalized version and nonnormalized version. Minmax normalized version means rescaling the predictions to the range of the rating level before each training iteration, just as that in RMGCNN (Monti et al., 2017). We implement both minmax normalized version and nonnormalized vertion of MC by ourselves. Because the results of nonnormalized version of GRALS have been reported in (Monti et al., 2017), we only implement the minmax normalized version of GRALS ^{3}^{3}3We adopt Adadelta to optimize the minmax normalized version of GRALS. by ourselves. For RMGCNN, the code of minmax normalized version is directly from (Monti et al., 2017), and we use the code provided by (Monti et al., 2017) to implement the nonnormalized version.
4.3 Result
The results on ML100K are reported in Table 2, where the results of baselines are directly copied from (Monti et al., 2017). Because the training/test data partition of this paper is exactly the same as that in (Monti et al., 2017), the comparison is fair. From Table 2, we can find that our CGMC outperforms all the other baselines, including graph regularization methods and GDLGbased methods, to achieve the best performance.
Method  RMSE 

Global Mean  1.154 
User Mean  1.063 
Movie Mean  1.033 
MC (Candès & Recht, 2012)  0.973 
IMC (Jain & Dhillon, 2013; Xu et al., 2013)  1.653 
GMC (Kalofolias et al., 2014)  0.996 
GRALS (Rao et al., 2015)  0.945 
RMGCNN (Monti et al., 2017)  0.929 
CGMC  0.894 
The results on Douban, Flixster and YahooMusic are reported in Table 3. Once again, we can find that CGMC outperforms other stateoftheart baselines to achieve the best performance. Moreover, we can conclude from Table 3 that the minmax normalization before each training iteration boosts the performance of baselines. However, minmax normalization will hurt the training speed and cannot scale well. Our CGMC achieves the best results under both settings.
Method  Douban  Flixster  FlixsterU  YahooMusic 

MC  (0.8476)/(0.7544)  (1.5338)/(0.9709)  (1.5338)/(0.9709)  (52.0102)/(24.1944) 
GRALS (Rao et al., 2015)  0.8326/(0.7537)  1.3126/(0.9722)  1.2447/(0.9751)  38.0423/(24.0744) 
RMGCNN (Monti et al., 2017)  (1.1541)/0.8012  (0.9700)/1.1788  (2.8095)/0.9258  (45.6049)/22.4149 
CGMC  0.7298/0.7308  0.8822/0.8853  0.9006/0.8876  19.3751/20.0032 
4.4 Effect of FullyConnected Layer
To demonstrate the effectiveness of the fullyconnected layer in GCN proposed by us, we remove the fullyconnected layer after GCE. The CGMC variant without fully connected layer is denoted as CGMC0, and CGMC is with fullyconnected layer. We compare CGMC0 to CGMC under the conditions where GCN grows from 1 layer to 4 layers. The results are shown in Table 5.
From Table 5, we can observe that with different number of layers for GCN, the improvements of CGMC over CGMC0 are significant. These results verify the effectiveness of the fullyconnected layer in CGMC.
Method  ML100K  Douban  Flixster  YahooMusic 

CGMC0/CGMC (1 layer)  0.997/0.894  0.9580/0.7298  1.0868/ 0.8822  37.0064/19.3751 
CGMC0/CGMC (2 layers)  0.905/0.897  0.7421/0.7384  0.9024/0.8961  36.7636/19.5726 
CGMC0/CGMC (3 layers)  0.913/0.904  0.7570/0.7368  0.9099/0.8976  36.7769/19.7722 
CGMC0/CGMC (4 layers)  0.917/0.911  0.7653/0.7447  0.9211/0.9113  36.7714/19.7767 
Method  ML100K  Douban  Flixster  FlixsterU  YahooMusic 

GMCGCNkw0 (1 layer)  1.088  1.7088  1.5279  1.5805  34.6415 
GMCGCNkw0 (2 layers)  1.049  0.7548  0.9254  1.1116  34.4439 
GMCGCNkw0 (3 layers)  1.076  0.7728  0.9721  1.1581  34.4976 
GMCGCNkw0 (4 layers)  1.082  0.7785  1.0049  1.1568  34.4765 
GMCGCNkw  1.010  0.7372  0.8886  0.9368  19.8627 
CGMC  0.894  0.7298  0.8822  0.9006  19.3751 
4.5 Effect of Weighted Policy in GCN
To demonstrate the effectiveness of the weighted policy proposed in our newly designed GCN, we replace our GCN in CGMC by the GCNkw (Kipf & Welling, 2017). The resulting model is denoted as GMCGCNkw. We design two variants of GMCGCNkw. GMCGCNkw denotes the variant of our CGMC by only replacing our GCN by GCNkw, with all other parts fixed. It means that GMCGCNkw also includes a fully connected layer which is proposed by us. GMCGCNkw0 denotes a variant of GMCGCNkw without the fully connected layer.
The results are reported in Table 5. We can observe that CGMC performs better than GMCGCNkw on all datasets. The results show the effectiveness of adopting weighted policy to control the contribution between selfconnections and neighbors for graph embeddings. Compared with GMCGCNkw0, the performance improvement of GMCGCNkw once again verifies the effectiveness of the fully connected layer proposed by us.
4.6 Sensitivity to Hyperparameters
In CGMC, and are two important hyperparameters. Here, we study the sensitivity of these two hyperparameters on Flixster and YahooMusic.
The results are presented in Figure 1. With respect to , we can see that CGMC behaves well in a wide range of . As for , we observe that CGMC is a little sensitive to , which depends on the quality of graphs. By using the validation techniques, we can always find a suitable hyperparameter for CGMC in our experiments.
5 Conclusion
In this paper, we propose a novel geometric matrix completion method, called convolutional geometric matrix completion (CGMC), for recommender systems with relationship (link) graphs among users/items. To the best of our knowledge, CGMC is the first work to show that pure graph convolutional network (GCN) based methods can achieve the stateoftheart performance for GMC, as long as a proper GCN is designed and a fully connected layer is adopted for space compatibility. Experimental results on four real datasets show that CGMC can outperform other stateoftheart baselines, including the RMGCNN (Monti et al., 2017) which is a combination of GCN and RNN.
We believe that other techniques developed in graph signal processing can also be applied in matrix completion with graph information and it is left for future work.
References

Adams et al. (2010)
Adams, R. P., Dahl, G. E., and Murray, I.
Incorporating side information in probabilistic matrix factorization
with gaussian processes.
In
Conference on Uncertainty in Artificial Intelligence
, 2010.  Agarwal & Chen (2009) Agarwal, D. and Chen, B.C. Regressionbased latent factor models. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2009.
 Belkin & Niyogi (2001) Belkin, M. and Niyogi, P. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Annual Conference on Neural Information Processing Systems, 2001.
 Belkin & Niyogi (2003) Belkin, M. and Niyogi, P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), 2003.
 Breese et al. (1998) Breese, J., Heckerman, D., and Kadie, C. Empirical analysis of predictive algorithms for collaborative filtering. In Conference on Uncertainty in Artificial Intelligence., 1998.
 Bruna et al. (2014) Bruna, J., Zaremba, W., Szlam, A., and LeCun, Y. Spectral networks and locally connected networks on graphs. In International Conference on Learning Representations, 2014.
 Cai et al. (2011) Cai, D., He, X., Han, J., and Huang, T. S. Graph regularized nonnegative matrix factorization for data representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8), 2011.

Cai et al. (2010)
Cai, J.F., Candès, E. J., and Shen, Z.
A singular value thresholding algorithm for matrix completion.
SIAM Journal on Optimization, 20(4), 2010.  Candès & Recht (2009) Candès, E. J. and Recht, B. Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6), 2009.
 Candès & Recht (2012) Candès, E. J. and Recht, B. Exact matrix completion via convex optimization. Communications of the ACM, 55(6), 2012.
 Chung (1997) Chung, F. RK. Spectral graph theory. Number 92. American Mathematical Society, 1997.
 Defferrard et al. (2016) Defferrard, M., Bresson, X., and Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. In Annual Conference on Neural Information Processing Systems, 2016.
 Gori et al. (2005) Gori, M., Monfardini, G., and Scarselli, F. A new model for learning in graph domains. In IEEE International Joint Conference on Neural Networks, 2005.
 Hammond et al. (2011) Hammond, D. K., Vandergheynst, P., and Gribonval, R. Wavelets on graphs via spectral graph theory. Applied and Computational Harmonic Analysis, 30(2), 2011.
 Henaff et al. (2015) Henaff, M., Bruna, J., and LeCun, Y. Deep convolutional networks on graphstructured data. arXiv, 2015.
 Hochreiter & Schmidhuber (1997) Hochreiter, S. and Schmidhuber, J. Long shortterm memory. Neural Computation, 9(8), 1997.
 Jain & Dhillon (2013) Jain, P. and Dhillon, I. S. Provable inductive matrix completion. arXiv, 2013.
 Kalofolias et al. (2014) Kalofolias, V., Bresson, X., Bronstein, M. M., and Vandergheynst, P. Matrix completion on graphs. arXiv, 2014.
 Kipf & Welling (2017) Kipf, T. N. and Welling, M. Semisupervised classification with graph convolutional networks. In International Conference on Learning Representations, 2017.
 Koren et al. (2009) Koren, Y., Bell, R., and Volinsky, C. Matrix factorization techniques for recommender systems. IEEE Computer, 2009.
 Li & Yeung (2009) Li, W.J. and Yeung, D.Y. Relation regularized matrix factorization. In International Joint Conference on Artificial Intelligence, 2009.
 Li et al. (2016) Li, Y., Tarlow, D., Brockschmidt, M., and Zemel, R. Gated graph sequence neural networks. In International Conference on Learning Representations, 2016.
 Ma et al. (2011) Ma, H., Zhou, D., Liu, C., Lyu, M. R., and King, I. Recommender systems with social regularization. In International Conference on Web Search and Web Data Mining, 2011.
 Menon et al. (2011) Menon, A. K., Chitrapura, K. P., Garg, S., Agarwal, D., and Kota, N. Response prediction using collaborative filtering with hierarchies and sideinformation. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011.
 Monti et al. (2017) Monti, F., Bronstein, M., and Bresson, X. Geometric matrix completion with recurrent multigraph neural networks. In Annual Conference on Neural Information Processing Systems, 2017.
 Pazzani & Billsus (2007) Pazzani, M. and Billsus, D. Contentbased recommendation systems. In The Adaptive Web, 2007.
 Porteous et al. (2010) Porteous, I.n, Asuncion, A. U., and Welling, M. Bayesian matrix factorization with side information and dirichlet process mixtures. In Association for the Advancement of Artificial Intelligence, 2010.
 Rao et al. (2015) Rao, N., Yu, H.F., Ravikumar, P., and Dhillon, I. S. Collaborative filtering with graph information: Consistency and scalable methods. In Annual Conference on Neural Information Processing Systems, 2015.

Shuman et al. (2013)
Shuman, D. I., Narang, S. K., Frossard, P., Ortega, A., and Vandergheynst, P.
The emerging field of signal processing on graphs: Extending highdimensional data analysis to networks and other irregular domains.
IEEE Signal Process Magazine, 30(3), 2013. 
Sukhbaatar et al. (2016)
Sukhbaatar, S., Szlam, A., and Fergus, R.
Learning multiagent communication with backpropagation.
In Annual Conference on Neural Information Processing Systems, Barcelona, Spain, 2016.  Xu et al. (2013) Xu, M., Jin, R., and Zhou, Z.H. Speedup matrix completion with side information: Application to multilabel learning. In Annual Conference on Neural Information Processing Systems, 2013.
 Zeiler (2012) Zeiler, M. D. Adadelta: an adaptive learning rate method. arXiv, 2012.