Introduction
Matrix completion (MC) is a fundamental methodology for addressing many practical machine learning problems
[2]. A typical application is recommender systems where one tackles a useritem interaction matrix whose entries, which stand for interactions of users with items (ratings or click behaviors), are partially observed. The goal of MC is to predict missing entries (unobserved or future potential interactions) in the matrix based on the observed ones. Existing methods are generally collaborativefiltering (CF)based or featurebased. The former assumes no side information (feature) other than the interaction matrix and often solves the problem by matrix factorization (MF) which learns latent factors (embeddings) for users and items and further uses the interaction of two factors to predict ratings [10, 23, 27, 38]. Differently, the latter makes use of informative features (such as user occupation, movie genre, etc.) as input for prediction [33, 30, 3]. Existing works have shown great power in warmstart settings where users have many observed interactions (as training examples) [1, 37]. However, in practical scenarios, recommender systems are supposed to interact with an open world and make decisions for users with a variety of historical interaction patterns (from zero to hundreds of), which requires the model to simultaneously handle warmstart, fewshot (sparsity) and zeroshot (coldstart) recommendation^{1}^{1}1Coldstart recommendation is considered differently in the literature. For CFbased method (resp. featurebased method), coldstart users often mean users with few (resp. no) historical interactions. In this paper, we use fewshot and zeroshot recommendation to distinguish the two cases.. Indeed, data sparsity and coldstart problems emerge as two major bottlenecks for MC performance in practice and an effective treatment would bring significant practical impact and economic benefits [21, 1]. For users with few or no observed interactions, it is hard to accurately capture user’s preferences from those insufficient historical information, in which case model prediction tends to suffer. Some research works have attempted to tackle coldstart problem from different perspectives, e.g., by incorporating extra information (like social networks [14, 13], user reviews [4], initial surveys [39], etc.) or learning a transferable model that suits fewshot or zeroshot recommendation [5, 19, 20, 11]. However, most of them rely on highquality features and would fail to work when those are inaccessible. The data sparsity and coldstart issues are particularly challenging for CFbased methods. Some early studies attempted to add regularization constraints into matrix factorization for learning more generalizable preference embeddings [27, 28, 18]. However, those bilinear models have limited expressive power when dealing with complicated useritem interactions. Many recent works extend MF with neural networks
[9, 32] and graph neural networks [17, 29, 34], and they have achieved stateoftheart results on many realworld datasets. Nonetheless, when users have insufficient historical interactions, the performance of such deep models would degrade dramatically [7, 31]. In fact, most CFbased model assumes transductive user embeddings (i.e.,dimensional vectors) that need to be learned from observed interactions. Given few interactions, such transductive embeddings would be highly underdetermined. Recent studies
[8, 36] propose inductive models for MC problem that propagate preferences among neighbored users in useritem bipartite graph, enabling CFbased method to tackle unseen users (with historical interactions not used in training). However, since they directly use observed interactions as links in the bipartite graph, the message passing makes little difference for users with few interactions, and may, in fact, interfere with the learning for users with many interactions (in which case inductive learning even performs worse than transductive embeddings as reported in [36]). r0.6 [width=0.6]model.png To address the limitations discussed above, we leverage a novel set of ideas: 1) we first learn transductive embeddings for users with many interactions and then learn inductive embeddings for the remaining users utilizing the former; 2) we also estimate the underlying relations among users and consider message passing through a hidden dense graph instead of solely relying on the sparse graph dictated by observed interactions. These two ideas constitute our new inductive relational matrix completion method (Fig. 1) that can fundamentally address data sparsity and coldstart issues in matrix completion without side information. To this end, we partition users into two sets: support users with many observed interactions and query users with few observed interactions. We first learn transductive embeddings for support users using their interactions (like CFbased method). Then we devise an inductive relation inference model that can estimate underlying relations between support users and query users based on their behavioral patterns in historical interactions. The relational model allows us to inductively generalize the preference embeddings of support users to those of query users, and flexibly handle users with few or even no interactions via a hidden dense graph. We summarize main contributions as follows.
We devise a novel matrix completion framework that considers transductive embeddings for a dense submatrix and inductive embeddings for a sparse submatrix, which can address data sparsity and coldstart issues in MC.

We justify our design by rigorously showing that a general version of our model can minimize the reconstruction loss on query users to the same level as matrix factorization under mild conditions, which means that our inductive model does not sacrifice any model capacity. Moreover, we prove that the generalization error on query users would be (tighter) bounded by (fewer) support users and (more) observed interactions of query users.

We compare our model with several stateoftheart methods (including GNNbased transductive model [29] and inductive model [36]), and achieve great improvements on MAE and AUC for warmstart/fewshot recommendation. We also test our model with side information that enables us to consider zeroshot recommendation. The results show that our model significantly outperforms metalearning model [11] and attributed GNN model [20].
Background
In this section, we introduce some background related to our work in order to make the paper selfcontained. Matrix completion deals with a useritem interaction matrix where and are the numbers of users and items, respectively. For implicit interaction, is a binary entry which denotes whether user rated (or clicked on, reviewed, liked, purchased, etc.) item or not. For explicit interaction, records rating of user on item . The entries of are partially observed and the goal is to estimate the missing values in the matrix. Existing methods for MC are generally divided into CFbased method and featurebased method. CFbased method often considers the problem as matrix factorization (MF) where user (resp. item ) corresponds to a dimensional latent factor (embedding) (resp. ), which can be interpreted as representation of user preference (resp. item attribute). Then it considers a prediction model where can be specified as simple dot product or a neural network. The CF model does not require any side information other than the interaction matrix, but the preference embedding is transductive, which means that it needs to be learned from training interactions and cannot handle unseen users without retraining the whole model. By contrast, featurebased method can achieve inductive representation by using extra side information denoted by (user ’s feature) and (item ’s feature) and targets a prediction model . Both methods achieve desirable performance in warmstart settings where users have a number of observed interactions as training examples. However, when handling users with few observed interactions, the model performance would degrade dramatically and even worse, some of them would fail to work. Recently, there are quite a few works that attempt to address the issues from different perspectives. On the featurebased side, [11] and [12] propose to use metalearning and zeroshot learning techniques, respectively, to learn a transferable model that can adapt to new coldstart users. However, they highly rely on highquality features to obtain domaininvariant confounders and a transferable latent space, respectively. Moreover, [20] harnesses user features to compute useruser similarities and construct a graph based on which GNNs are used to aggregate neighbored information. In practice, such side information may be inaccessible due to privacy issue, which limits the application of featurebased method. On the CFbased side, some early studies attempt to add regularization constraints on original matrix factorization model in order to improve the generalization ability of latent factors given by MF. Common regularization constraints include low rank [27], low trace norm [28] and nonnegativity [10, 18]. Some recent studies [17, 29, 34] extend traditional MF with GNNs architecture (or GNNlike operation) and convert the problem into a link prediction problem in useritem bipartite graph. The GNNs allow message passing among neighbored users and propagate user embeddings through edges. [8, 36] leverage the message passing idea of GNNs and propose inductive matrix completion models that free CFbased method from transductive embeddings and manage to deal with unseen users during test. However, in existing GNNbased models, the message passing is only conducted through edges in the bipartite graph of useritem interactions. For users with few interactions, the propagated information would be inadequate since its neighbored users are rare. Also, for users with sufficient interactions, the message passing from neighbors may not be consistent with user’s inherent behavior patterns contained in observed interactions. Indeed, [36] observes that for users with sufficient historical information, inductive MC method performs worse than transductive model. We believe that the power of transductive learning and inductive learning can be better exploited to simultaneously handle users with distinct quantity of historical information. In this paper, we propose a new framework that unifies transductive and inductive representation learning in matrix completion. There are also interesting works that leverage extra information from other domains (such as social networks [13, 6], item content information [4], crossdomain recommendation [26], etc.) to alleviate data sparsity and coldstart issues in MC. They are orthogonal to our paper. We focus on matrix completion without side information in model formulation. Our model can be easily extended to cases where side features are available, as discussed in our experiments and Appendix LABEL:appendixextension.
Methodology
We introduce our inductive relational matrix completion. As discussed in Section 2, transductive learning can achieve desirable performance when users have sufficient observed information, while inductive learning can address new users via propagating information from users to the neighbors. Based on this, we take a step further: why not first learn transductive embeddings for users with sufficient interactions and then compute inductive embeddings for other users based on the former. To this end, we partition users into two sets: support users (denoted by ), whose observed interactions exceed , and query users (denoted by ), whose observed interactions are less than . Assume and . The interaction matrix is divided into two parts: (given by ) and (given by )^{2}^{2}2Also, one can consider selecting support users as an optimization problem, similar to determining landmark points [22] or sample selection [35]. We leave it for future work.. We use to train a transductive CFbased model , where denotes preference embeddings for user in , denotes attribute embeddings for item and can be simple dotproduct operation or a neural network with parameter . Denote and and the objective function becomes
() 
where , and is a set with size containing indices of observed entries in . Here one can use crossentropy loss for implicit interaction or L2 loss for explicit interaction. Our goal is to compute inductive preference embeddings for users in based on learned . One plausible solution is to consider GNNs over useritem bipartite graph of observed interactions as is done by previous works, which can presumably propagate embeddings from users in to users in . However, query users have few historical interactions leading to very sparse local subgraphs over which GNNs could only propagate inadequate information. To mitigate the issue, we propose an inductive relational inference model that can estimate the underlying user relations and paves the way for sufficient message passing through a hidden dense network.
Inductive Relation Inference Model
Consider an adjacent matrix , where denotes weighted edge from user to user , satisfying that where is the th column of . Then we can express preference embedding of user as , the weighted sum of embeddings of support users. In the following, we first justify this idea by showing its expressive power and then propose a parametrized inductive model that puts it into practice. Theoretical Justification If we use dotproduct for in the CF model, the rating of query user can be predicted by . We are interested in problem
() 
where , and is a set with size containing indices of observed entries in . Assume and use to denote convex hull of , i.e., the class of vectors , where satisfying and is the th row vector in . We have the following theorem.
Theorem 1.
Assume that the optimization in (Document) can give for . If satisfies and is convex, then there exists at least one solution for in problem (Document) such that for . The theorem shows that under mild conditions, the proposed model can minimize the reconstruction loss of MC to the same level as matrix factorization. Note that the two conditions in Theorem 1 can be satisfied in most cases. To guarantee , one can design a careful construction for
, in particular, e.g., diversifying behavior patterns of support users. Besides, the widely used loss functions for recommendation are convex for
and , including crossentropy and L2 loss. Parametrization We showed that using weighted combination doesn’t sacrifice model capacity under some conditions. However, in practice, directly optimizing over is intractable due to its large parameter space. Hence, we parametrize with a multihead attention network, enabling it to inductively compute hidden relations. Concretely, the attention weight of the th head is() 
where is a trainable vector, denotes concatenation and . Here includes support users who have common rated items as user . If is empty (in zeroshot recommendation), we can randomly select a group of support users to construct or use (the embeddings of) user side information as if user features are available, as shown in our experiments. The th attention head independently aggregates preference embeddings and the final inductive embedding of user can be given as
() 
where . To keep the notation clean, we denote and . Then we can predict rating of query user via .
Optimization
The training process is divided into two stages. First, we pretrain a transductive CF model via (Document) and obtain transductive embeddings , and prediction network . Second, we train our relation model with fixed , via
() 
Complexity The complexity bottleneck of our method is calculating attention weights for all the support users in the denominator of (Theorem 1.). Given large dataset, we can sample a subset of support users (with size
) per epoch for each query user and calculate attention weights over them. Such approximation can control both time and space complexity of the second training stage within
. Hence, the time complexity of twostage training is . Generalization Error Bound In this paper, we are interested in model performance on query users with few observed interactions. Here we investigate into generalization ability of our inductive relation model. We also assume as dotproduct operation to simplify our analysis. In the next theorem, we show that the generalization error on query users would be bounded by the numbers of support users and observed interactions of query users.Theorem 2.
Assume is Lipschitz and each entry in is absolutely bounded by
. Then with probability at least
over the random choice of , it holds that for any ,() 
The theorem shows that a smaller size of would make the generalization error bound tighter. Looking at both Theorem 1 and 2, we will find that the configuration of has an important effect on model capacity and generalization ability. Notably, we need to make support users in ‘representative’ of diverse user behavior patterns on item consumption in order to guarantee enough model capacity. Also, we need to control the size of in order to maintain generalization ability. Based on these insights, how to properly select support users can be an interesting direction for future investigation. We will further study this issue in our experiments.
Experiments
Dataset (Metric)  Movielens1M (MAE)  AmazonBooks (AUC)  

Baselines  All  FewShot  WarmStart  All  FewShot  WarmStart 
ItemPop  0.8874  0.8873  0.8875  0.6745  0.6620  0.6782 
PMF [24]  0.7510  0.7842  0.7334  0.7181 (.0003)  0.6980  0.7238 
NCF [9]  0.7456  0.7685  0.7334  0.7067 (.0003)  0.6990  0.7087 
GCMC [29]  0.7418  0.7741  0.7246  0.7185 (.0003)  0.7040  0.7241 
IGMC [36]  0.7347  0.7527  0.7251  0.4994 (.0002)  0.4970  0.5006 
IRMC (ours)  0.7230  0.7330  0.7176  0.7820 (.0004)  0.7143  0.8013 
In this section, we conduct experiments to verify proposed model^{3}^{3}3The experiment codes will be released.. We basically deploy our experiments on Movielens1M and AmazonBooks. Movielens1M contains movie rating data^{4}^{4}4https://grouplens.org/datasets/movielens/ with 6040 users, 3706 items and 1000209 ratings (ranged within 15). AmazonBooks is selected from amazon product review dataset [15]. It is a large dataset. After filtering out items with less than five interactions, the dataset contains 101839 users, 91599 items and 2931466 ratings which we convert to implicit interaction (as positive examples), and then sample 5 items as negative examples for each interaction during training. For Movielens1M, [11] collects side information (user gender, age, occupation, movie genre, etc.) for users and items in the original dataset. We use the augmented dataset as Movielens1Mfeatures and further test our model in featurebased setting. For each user, we hold out ten interactions as test set and use the remaining as training set. After that, each user has three to ninety (resp. one to thousands of) training examples for Movielens1M (resp. AmazonBooks). We select support users such that they have more than training interactions. Basically, for Movielens1M and
for AmazonBooks. The partition gives 49058 support users for AmazonBooks and 2164 for Movielens1M. The remaining users are used as query users. We use Mean Absolute Error (MAE) and Area Under the Curve (AUC) as evaluation metrics for explicit interactions (Movielens1M) and implicit interactions (AmazonBooks), respectively. For comparison, we consider ItemPop and PMF as two baseline methods. ItemPop directly uses the number of interacted users for item recommendation. PMF
[24] is a simple matrix factorization method with L2 regularization. For CFbased method, we also consider Neural Collaborative Filtering (NCF) [9] which extends matrix factorization with neural network and here we specify it as a threelayer neural network with activation. For in our transductive CF model, we use the same architecture as NCF. Moreover, we further consider Graph Convolutional Matrix Completion (GCMC) [29], one stateoftheart transductive matrix completion method and recently proposed GNNbased inductive matrix completion IGMC [36], as two strong competitors. For featurebased method, we use Wide&Deep network [3] as a baseline method. Furthermore, we consider two powerful competitors, MetaLearning User Preference Estimator (MeLU) [11] and Attribute Graph Neural Networks (AGNN) [20]. Different from our model that divides training process into two stages, for other methods, the training is conducted on all the users. We tune each comparative model on different datasets and report the best results of them. In Appendix LABEL:appendixexperiment, we present detailed information for model specification, hyperparameter settings and training details.Experiment Results
In Table Document, we report the MAEs (resp. AUCs) for test interactions from all the users, support users (warmstart) and query users (fewshot), respectively, on Movielens1M (resp. AmazonBooks). The results show that our model IRMC outperforms other competitors in most cases. In particularly, IRMC gives the best overall MAE and AUC for all the users, which demonstrates that IRMC is a powerful framework to solve matrix completion, especially for users with distinct historical information. For warmstart recommendation, IRMC manages to beat other competitors (especially for AUC by ) even using a simple transductive model without GNN architecture. Compared with NCF which uses the same architecture as our transductive model, IRMC achieves much better MAEs and AUCs for support users (warmstart). The reason could be that NCF directly uses all the users for learning transductive embeddings and the query users (with sparse interactions) would have a negative effect on learning for support users. This result validates the effectiveness of partitioning the users into two groups, which can maintain good performance for transductive learning on support users. Moreover, as for fewshot recommendation on query users, our model significantly achieves improvement on MAE over the strong competitor IGMC, which proves that IRMC with an inductive relational model is indeed effective for addressing data sparsity issue. The AUCs of IGMC on AmazonBooks are much worse than other methods. The reason is that IGMC relies on subgraphs for users and items as input for prediction. For dataset with implicit interactions, the subgraphs only contain onetype edges (positive) and lose efficacy for making desirable (twoclass) classification. In Table Document, we present the test MAEs for featurebased competitors and IRMC on Movielens1Mfeature where IRMC achieves the best MAEs for overall/warmstart/fewshot recommendation. Furthermore, the user features enable us to consider zeroshot recommendation. In specific, we only use training interactions of support users to train each model and directly use them for prediction on test interactions of query users without any historical interaction. While no interaction is given for query users, these methods can leverage user features to achieve inductive computation. We can see that our model IRMC gives the best results, achieving and improvement over MeLU and AGNN, respectively, two stateofthearts for coldstart recommendation. This depicts that IRMC is capable of dealing with zeroshot recommendation and a promising approach to handle new users with no historical behavior in realworld dynamic systems. We also statistic test performance for users with different numbers of training interactions and present the results on Movielens1M and Movielens1Mfeature in Fig. Document. As shown in the figures, as the number of training interactions goes up, the MAEs of all transductive models suffer from obvious drop while IGMC and our model exhibit a more smooth decrease. In the extreme cases with less than five training interactions, notably, our model also gives the best results with (resp. ) improvement on MAE for Movielens1M (resp. Movielens1Mfeature).
Further Discussions
Impact of Partition Threshold We study variation of model performance w.r.t. different partition thresholds. We show the MAEs on Movielens1M in Fig. Document where we change the threshold from 5 to 50. The MAE goes down suddenly, remains at a fixed level () and then goes high again. This indicates that the partition strategy is important to keep balance of two sets. If is too small, the submatrix of interactions w.r.t. support users would be sparse, which may affect transductive learning; if it is too large, a small set of support users would not be representative enough and limits the expressive power for inductive relation model.
Inductive Representations v.s. Transductive Representations We conduct a case study in Fig. LABEL:figvis where we visualize transductive embeddings of support users and inductive ones of query users, which is given by IRMC, and transductive embeddings of all the users, given by PMF (matrix factorization). The details are in Appendix LABEL:appendixresult. One key observation is that when the number of training interactions becomes larger, the inductive embeddings given by IRMC would get closer to the transductive embeddings given by PMF (matrix factorization). This phenomenon indicates that given sufficient training interactions, the inductive relation model can capture similar preference embeddings as transductive learning, which again justifies the design of IRMC. r0.4 tableTraining time per epoch on Movielens1Mfeature. IRMC contains times for twostage training. [0.5pt] Method AGNN IRMC MeLU 0em0.5pt0.5pt 0em0.5pt0.5pt Time (s) 40.6 20.5+27.9 513.2 0em0.5pt0.5pt [0.5pt] Scalability Test We further study the scalability of our IRMC compared with IGMC and GCMC. We statistic the training time per epoch on AmazonBooks using a GTX 1080Ti with 11G memory. Here we truncate the dataset and use different numbers of users for training. For IRMC, we add the training times of two stages where one epoch is considered. The results are shown in Fig. Document (with logscale axis). As we can see, when dataset size becomes large, the training times per epoch of three models all exhibit linear increase. IRMC spends approximately one more time than GCMC, while IGMC is approximately ten times slower than IRMC. In fact, IRMC requires a subgraph for each training interaction, so one may need to transmit millions of subgraphs between GPU memory and CPU memory in one epoch, which leads to high time cost. On the other hand, GCMC and our IRMC only rely on one global graph. In specific, GCMC deals with a sparse useritem bipartite graph and IRMC handles a dense useruser graph. However, GCMC needs to update transductive embeddings of users in a local graph for each training interaction, which induces complexity at least (where denotes average number of observed interactions for one item), while the second stage of IRMC only updates the inductive model, which contributes to the total complexity (as discussed in Section 3.2). In AmazonBooks, we have , and this is why the time costs of IRMC and GCMC remain in the same level. Nevertheless, GCMC as a transductive model cannot deal with new unseen users in test stage, while our IRMC can efficiently compute inductive embeddings for new users given the trained model. We also compare the training time of MeLU, AGNN and IRMC on Movielens1Mfeature using side information and report the results in Table Document where IRMC is nearly as efficient as AGNN and about 11 times faster than MeLU which uses metalearning (with five steps of local updates per global update).
Conclusions
In this paper, we propose a new inductive relational matrix completion method that can effectively address data sparsity and coldstart issues. The model is theoretically sound with our rigorous justification and analysis on generalization ability. Through extensive experiments, we show that our model outperforms stateofthearts by showing superior performance on both warmstart and coldstart users. As future direction, it would be interesting to consider the selection for support users as a decision problem (which could be jointly optimized with the prediction model). The core idea of IRMC opens a new way for next generation of representation learning, i.e., one can consider a pretrained representation model for one set of existing entities and generalize their representations (through some simple transformations) to efficiently compute inductive representations for others, enabling the model to flexibly handle new coming entities in an open world. We believe that this novel and effective framework can inspire more researches in broad areas of AI.
Broader Impact
There always exists a tradeoff between information utility and risk of exposing user privacy. Our model is a promising approach for building a powerful recommender system that can exploit historical behaviors of users, induce their latent interests and preferences, and recommend an item that one is very likely to click or purchase. The accurate recommendation can help to filter useful information for individuals, improve the efficiency of global society and further alleviate the information explosion problem in the age of information. Also, our methodology that can improve recommendation performance on coldstart users with few or no historical behaviors, which can help to reduce bias in previous recommendation model and facilitate fairness among both old users and new users in one platform. Admittedly, such model would also possibly be used by a company for uncovering user habits, personalities and social circles that may concern user privacy. We encourage that in data collection process, one should take care of the privacy issue and sufficiently anonymize the data before it is used for preference learning. Also, on the methodological level, more works need to be done on how to guarantee a good preference estimation in recommender systems under some constraints on data privacy. If the algorithm is not aware of the correspondence between data and specific users, to a certain degree, user privacy can properly protected.
 [1] J. Bobadilla, F. Ortega, A. Hernando, and A. Gutiérrez. Recommender systems survey. Knowl. Based Syst., 46:109–132, 2013.
 [2] E. J. Cand‘es and B. Recht. Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6):717–772, 2009.

[3]
H. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. Anderson,
G. Corrado, W. Chai, M. Ispir, R. Anil, Z. Haque, L. Hong, V. Jain, X. Liu,
and H. Shah.
Wide & deep learning for recommender systems.
In DLRS, pages 7–10, 2016.  [4] Z. Cheng, Y. Ding, L. Zhu, and M. S. Kankanhalli. Aspectaware latent factor model: Rating prediction with ratings and reviews. In WWW, pages 639–648, 2018.
 [5] Z. Du, X. Wang, H. Yang, J. Zhou, and J. Tang. Sequential scenariospecific meta learner for online recommendation. In KDD, pages 2895–2904, 2019.
 [6] W. Fan, Y. Ma, Q. Li, Y. He, Y. E. Zhao, J. Tang, and D. Yin. Graph neural networks for social recommendation. In WWW, pages 417–426, 2019.
 [7] G. Guo, J. Zhang, and N. YorkeSmith. Trustsvd: Collaborative filtering with both the explicit and implicit influence of user trust and of item ratings. In AAAI, pages 123–129, 2015.
 [8] J. S. Hartford, D. R. Graham, K. LeytonBrown, and S. Ravanbakhsh. Deep models of interactions across sets. In ICML, pages 1914–1923, 2018.
 [9] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T. Chua. Neural collaborative filtering. In WWW, pages 173–182, 2017.
 [10] Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8):30–37, 2009.
 [11] H. Lee, J. Im, S. Jang, H. Cho, and S. Chung. Melu: Metalearned user preference estimator for coldstart recommendation. In KDD, pages 1073–1082, 2019.
 [12] J. Li, M. Jing, K. Lu, L. Zhu, Y. Yang, and Z. Huang. From zeroshot learning to coldstart recommendation. In AAAI, pages 4189–4196, 2019.

[13]
Z. Lu, M. Gao, X. Wang, J. Zhang, H. Ali, and Q. Xiong.
SRRL: select reliable friends for social recommendation with reinforcement learning.
In ICONIP, pages 631–642, 2019.  [14] P. Massa and P. Avesani. Trustaware recommender systems. In RecSys, pages 17–24, 2007.
 [15] J. J. McAuley, C. Targett, Q. Shi, and A. van den Hengel. Imagebased recommendations on styles and substitutes. In SIGIR, pages 43–52, 2015.
 [16] M. Mohri, A. Rostamizadeh, and A. Talwalkar. Foundations of Machine Learning. Adaptive computation and machine learning. MIT Press, 2012.
 [17] F. Monti, M. M. Bronstein, and X. Bresson. Geometric matrix completion with recurrent multigraph neural networks. In NeurIPS, pages 3697–3707, 2017.
 [18] K. Moridomi, K. Hatano, and E. Takimoto. Tighter generalization bounds for matrix completion via factorization into constrained matrices. IEICE Trans. Inf. Syst., 101D(8):1997–2004, 2018.
 [19] S. Park and W. Chu. Pairwise preference regression for coldstart recommendation. In ACM RecSys, pages 21–28, 2009.
 [20] T. Qian, Y. Liang, and Q. Li. Solving cold start problem in recommendation with attribute graph neural networks. CoRR, abs/1912.12398, 2019.
 [21] A. M. Rashid, G. Karypis, and J. Riedl. Learning preferences of new users in recommender systems: an information theoretic approach. KDD, 10(2):90–100, 2008.
 [22] S. Ren, K. He, R. B. Girshick, and J. Sun. Faster RCNN: towards realtime object detection with region proposal networks. In NeurIPS, pages 91–99, 2015.
 [23] S. Rendle, C. Freudenthaler, Z. Gantner, and L. SchmidtThieme. BPR: bayesian personalized ranking from implicit feedback. In UAI, pages 452–461, 2009.
 [24] R. Salakhutdinov and A. Mnih. Probabilistic matrix factorization. In J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, editors, NeurIPS, pages 1257–1264, 2007.
 [25] M. S. Schlichtkrull, T. N. Kipf, P. Bloem, R. van den Berg, I. Titov, and M. Welling. Modeling relational data with graph convolutional networks. In ESWC, volume 10843 of Lecture Notes in Computer Science, pages 593–607, 2018.
 [26] A. P. Singh and G. J. Gordon. Relational learning via collective matrix factorization. In KDD, pages 650–658, 2008.
 [27] N. Srebro, N. Alon, and T. S. Jaakkola. Generalization error bounds for collaborative prediction with lowrank matrices. In NeurIPS, pages 1321–1328, 2004.
 [28] N. Srebro and A. Shraibman. Rank, tracenorm and maxnorm. In COLT, pages 545–560, 2005.
 [29] R. van den Berg, T. N. Kipf, and M. Welling. Graph convolutional matrix completion. CoRR, abs/1706.02263, 2017.
 [30] A. van den Oord, S. Dieleman, and B. Schrauwen. Deep contentbased music recommendation. In NeurIPS, pages 2643–2651, 2013.
 [31] M. Vartak, A. Thiagarajan, C. Miranda, J. Bratman, and H. Larochelle. A metalearning perspective on coldstart recommendations for items. In NeurIPS, pages 6904–6914, 2017.
 [32] H. Wang, N. Wang, and D. Yeung. Collaborative deep learning for recommender systems. In KDD, pages 1235–1244, 2015.
 [33] M. Xu, R. Jin, and Z. Zhou. Speedup matrix completion with side information: Application to multilabel learning. In NeurIPS, pages 2301–2309, 2013.

[34]
R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec.
Graph convolutional neural networks for webscale recommender systems.
In KDD, pages 974–983, 2018.  [35] K. Yu, J. Bi, and V. Tresp. Active learning via transductive experimental design. In ICML, pages 1081–1088, 2006.
 [36] M. Zhang and Y. Chen. Inductive matrix completion based on graph neural networks. In ICLR, 2020.
 [37] S. Zhang, L. Yao, A. Sun, and Y. Tay. Deep learning based recommender system: A survey and new perspectives. ACM Comput. Surv., 52(1):5:1–5:38, 2019.
 [38] Y. Zheng, B. Tang, W. Ding, and H. Zhou. A neural autoregressive approach to collaborative filtering. In ICML, pages 764–773, 2016.
 [39] K. Zhou, S. Yang, and H. Zha. Functional matrix factorizations for coldstart recommendation. In SIGIR, pages 315–324, 2011.