Matrix completion (MC) is a fundamental methodology for addressing many practical machine learning problems. A typical application is recommender systems where one tackles a user-item interaction matrix whose entries, which stand for interactions of users with items (ratings or click behaviors), are partially observed. The goal of MC is to predict missing entries (unobserved or future potential interactions) in the matrix based on the observed ones. Existing methods are generally collaborative-filtering (CF)-based or feature-based. The former assumes no side information (feature) other than the interaction matrix and often solves the problem by matrix factorization (MF) which learns latent factors (embeddings) for users and items and further uses the interaction of two factors to predict ratings [10, 23, 27, 38]. Differently, the latter makes use of informative features (such as user occupation, movie genre, etc.) as input for prediction [33, 30, 3]. Existing works have shown great power in warm-start settings where users have many observed interactions (as training examples) [1, 37]. However, in practical scenarios, recommender systems are supposed to interact with an open world and make decisions for users with a variety of historical interaction patterns (from zero to hundreds of), which requires the model to simultaneously handle warm-start, few-shot (sparsity) and zero-shot (cold-start) recommendation111Cold-start recommendation is considered differently in the literature. For CF-based method (resp. feature-based method), cold-start users often mean users with few (resp. no) historical interactions. In this paper, we use few-shot and zero-shot recommendation to distinguish the two cases.. Indeed, data sparsity and cold-start problems emerge as two major bottlenecks for MC performance in practice and an effective treatment would bring significant practical impact and economic benefits [21, 1]. For users with few or no observed interactions, it is hard to accurately capture user’s preferences from those insufficient historical information, in which case model prediction tends to suffer. Some research works have attempted to tackle cold-start problem from different perspectives, e.g., by incorporating extra information (like social networks [14, 13], user reviews , initial surveys , etc.) or learning a transferable model that suits few-shot or zero-shot recommendation [5, 19, 20, 11]. However, most of them rely on high-quality features and would fail to work when those are inaccessible. The data sparsity and cold-start issues are particularly challenging for CF-based methods. Some early studies attempted to add regularization constraints into matrix factorization for learning more generalizable preference embeddings [27, 28, 18]
. However, those bilinear models have limited expressive power when dealing with complicated user-item interactions. Many recent works extend MF with neural networks[9, 32] and graph neural networks [17, 29, 34], and they have achieved state-of-the-art results on many real-world datasets. Nonetheless, when users have insufficient historical interactions, the performance of such deep models would degrade dramatically [7, 31]. In fact, most CF-based model assumes transductive user embeddings (i.e.,
-dimensional vectors) that need to be learned from observed interactions. Given few interactions, such transductive embeddings would be highly under-determined. Recent studies[8, 36] propose inductive models for MC problem that propagate preferences among neighbored users in user-item bipartite graph, enabling CF-based method to tackle unseen users (with historical interactions not used in training). However, since they directly use observed interactions as links in the bipartite graph, the message passing makes little difference for users with few interactions, and may, in fact, interfere with the learning for users with many interactions (in which case inductive learning even performs worse than transductive embeddings as reported in ). r0.6 [width=0.6]model.png To address the limitations discussed above, we leverage a novel set of ideas: 1) we first learn transductive embeddings for users with many interactions and then learn inductive embeddings for the remaining users utilizing the former; 2) we also estimate the underlying relations among users and consider message passing through a hidden dense graph instead of solely relying on the sparse graph dictated by observed interactions. These two ideas constitute our new inductive relational matrix completion method (Fig. 1) that can fundamentally address data sparsity and cold-start issues in matrix completion without side information. To this end, we partition users into two sets: support users with many observed interactions and query users with few observed interactions. We first learn transductive embeddings for support users using their interactions (like CF-based method). Then we devise an inductive relation inference model that can estimate underlying relations between support users and query users based on their behavioral patterns in historical interactions. The relational model allows us to inductively generalize the preference embeddings of support users to those of query users, and flexibly handle users with few or even no interactions via a hidden dense graph. We summarize main contributions as follows.
We devise a novel matrix completion framework that considers transductive embeddings for a dense sub-matrix and inductive embeddings for a sparse sub-matrix, which can address data sparsity and cold-start issues in MC.
We justify our design by rigorously showing that a general version of our model can minimize the reconstruction loss on query users to the same level as matrix factorization under mild conditions, which means that our inductive model does not sacrifice any model capacity. Moreover, we prove that the generalization error on query users would be (tighter) bounded by (fewer) support users and (more) observed interactions of query users.
We compare our model with several state-of-the-art methods (including GNN-based transductive model  and inductive model ), and achieve great improvements on MAE and AUC for warm-start/few-shot recommendation. We also test our model with side information that enables us to consider zero-shot recommendation. The results show that our model significantly outperforms meta-learning model  and attributed GNN model .
In this section, we introduce some background related to our work in order to make the paper self-contained. Matrix completion deals with a user-item interaction matrix where and are the numbers of users and items, respectively. For implicit interaction, is a binary entry which denotes whether user rated (or clicked on, reviewed, liked, purchased, etc.) item or not. For explicit interaction, records rating of user on item . The entries of are partially observed and the goal is to estimate the missing values in the matrix. Existing methods for MC are generally divided into CF-based method and feature-based method. CF-based method often considers the problem as matrix factorization (MF) where user (resp. item ) corresponds to a -dimensional latent factor (embedding) (resp. ), which can be interpreted as representation of user preference (resp. item attribute). Then it considers a prediction model where can be specified as simple dot product or a neural network. The CF model does not require any side information other than the interaction matrix, but the preference embedding is transductive, which means that it needs to be learned from training interactions and cannot handle unseen users without retraining the whole model. By contrast, feature-based method can achieve inductive representation by using extra side information denoted by (user ’s feature) and (item ’s feature) and targets a prediction model . Both methods achieve desirable performance in warm-start settings where users have a number of observed interactions as training examples. However, when handling users with few observed interactions, the model performance would degrade dramatically and even worse, some of them would fail to work. Recently, there are quite a few works that attempt to address the issues from different perspectives. On the feature-based side,  and  propose to use meta-learning and zero-shot learning techniques, respectively, to learn a transferable model that can adapt to new cold-start users. However, they highly rely on high-quality features to obtain domain-invariant con-founders and a transferable latent space, respectively. Moreover,  harnesses user features to compute user-user similarities and construct a graph based on which GNNs are used to aggregate neighbored information. In practice, such side information may be inaccessible due to privacy issue, which limits the application of feature-based method. On the CF-based side, some early studies attempt to add regularization constraints on original matrix factorization model in order to improve the generalization ability of latent factors given by MF. Common regularization constraints include low rank , low trace norm  and non-negativity [10, 18]. Some recent studies [17, 29, 34] extend traditional MF with GNNs architecture (or GNN-like operation) and convert the problem into a link prediction problem in user-item bipartite graph. The GNNs allow message passing among neighbored users and propagate user embeddings through edges. [8, 36] leverage the message passing idea of GNNs and propose inductive matrix completion models that free CF-based method from transductive embeddings and manage to deal with unseen users during test. However, in existing GNN-based models, the message passing is only conducted through edges in the bipartite graph of user-item interactions. For users with few interactions, the propagated information would be inadequate since its neighbored users are rare. Also, for users with sufficient interactions, the message passing from neighbors may not be consistent with user’s inherent behavior patterns contained in observed interactions. Indeed,  observes that for users with sufficient historical information, inductive MC method performs worse than transductive model. We believe that the power of transductive learning and inductive learning can be better exploited to simultaneously handle users with distinct quantity of historical information. In this paper, we propose a new framework that unifies transductive and inductive representation learning in matrix completion. There are also interesting works that leverage extra information from other domains (such as social networks [13, 6], item content information , cross-domain recommendation , etc.) to alleviate data sparsity and cold-start issues in MC. They are orthogonal to our paper. We focus on matrix completion without side information in model formulation. Our model can be easily extended to cases where side features are available, as discussed in our experiments and Appendix LABEL:appendix-extension.
We introduce our inductive relational matrix completion. As discussed in Section 2, transductive learning can achieve desirable performance when users have sufficient observed information, while inductive learning can address new users via propagating information from users to the neighbors. Based on this, we take a step further: why not first learn transductive embeddings for users with sufficient interactions and then compute inductive embeddings for other users based on the former. To this end, we partition users into two sets: support users (denoted by ), whose observed interactions exceed , and query users (denoted by ), whose observed interactions are less than . Assume and . The interaction matrix is divided into two parts: (given by ) and (given by )222Also, one can consider selecting support users as an optimization problem, similar to determining landmark points  or sample selection . We leave it for future work.. We use to train a transductive CF-based model , where denotes preference embeddings for user in , denotes attribute embeddings for item and can be simple dot-product operation or a neural network with parameter . Denote and and the objective function becomes
where , and is a set with size containing indices of observed entries in . Here one can use cross-entropy loss for implicit interaction or L2 loss for explicit interaction. Our goal is to compute inductive preference embeddings for users in based on learned . One plausible solution is to consider GNNs over user-item bipartite graph of observed interactions as is done by previous works, which can presumably propagate embeddings from users in to users in . However, query users have few historical interactions leading to very sparse local sub-graphs over which GNNs could only propagate inadequate information. To mitigate the issue, we propose an inductive relational inference model that can estimate the underlying user relations and paves the way for sufficient message passing through a hidden dense network.
Inductive Relation Inference Model
Consider an adjacent matrix , where denotes weighted edge from user to user , satisfying that where is the -th column of . Then we can express preference embedding of user as , the weighted sum of embeddings of support users. In the following, we first justify this idea by showing its expressive power and then propose a parametrized inductive model that puts it into practice. Theoretical Justification If we use dot-product for in the CF model, the rating of query user can be predicted by . We are interested in problem
where , and is a set with size containing indices of observed entries in . Assume and use to denote convex hull of , i.e., the class of vectors , where satisfying and is the -th row vector in . We have the following theorem.
Assume that the optimization in (Document) can give for . If satisfies and is convex, then there exists at least one solution for in problem (Document) such that for . The theorem shows that under mild conditions, the proposed model can minimize the reconstruction loss of MC to the same level as matrix factorization. Note that the two conditions in Theorem 1 can be satisfied in most cases. To guarantee , one can design a careful construction for
, in particular, e.g., diversifying behavior patterns of support users. Besides, the widely used loss functions for recommendation are convex forand , including cross-entropy and L2 loss. Parametrization We showed that using weighted combination doesn’t sacrifice model capacity under some conditions. However, in practice, directly optimizing over is intractable due to its large parameter space. Hence, we parametrize with a multi-head attention network, enabling it to inductively compute hidden relations. Concretely, the attention weight of the -th head is
where is a trainable vector, denotes concatenation and . Here includes support users who have common rated items as user . If is empty (in zero-shot recommendation), we can randomly select a group of support users to construct or use (the embeddings of) user side information as if user features are available, as shown in our experiments. The -th attention head independently aggregates preference embeddings and the final inductive embedding of user can be given as
where . To keep the notation clean, we denote and . Then we can predict rating of query user via .
The training process is divided into two stages. First, we pre-train a transductive CF model via (Document) and obtain transductive embeddings , and prediction network . Second, we train our relation model with fixed , via
Complexity The complexity bottleneck of our method is calculating attention weights for all the support users in the denominator of (Theorem 1.). Given large dataset, we can sample a subset of support users (with size
) per epoch for each query user and calculate attention weights over them. Such approximation can control both time and space complexity of the second training stage within. Hence, the time complexity of two-stage training is . Generalization Error Bound In this paper, we are interested in model performance on query users with few observed interactions. Here we investigate into generalization ability of our inductive relation model. We also assume as dot-product operation to simplify our analysis. In the next theorem, we show that the generalization error on query users would be bounded by the numbers of support users and observed interactions of query users.
Assume is -Lipschitz and each entry in is absolutely bounded by . Then with probability at least
. Then with probability at leastover the random choice of , it holds that for any ,
The theorem shows that a smaller size of would make the generalization error bound tighter. Looking at both Theorem 1 and 2, we will find that the configuration of has an important effect on model capacity and generalization ability. Notably, we need to make support users in ‘representative’ of diverse user behavior patterns on item consumption in order to guarantee enough model capacity. Also, we need to control the size of in order to maintain generalization ability. Based on these insights, how to properly select support users can be an interesting direction for future investigation. We will further study this issue in our experiments.
|Dataset (Metric)||Movielens-1M (MAE)||Amazon-Books (AUC)|
|PMF ||0.7510||0.7842||0.7334||0.7181 (.0003)||0.6980||0.7238|
|NCF ||0.7456||0.7685||0.7334||0.7067 (.0003)||0.6990||0.7087|
|GCMC ||0.7418||0.7741||0.7246||0.7185 (.0003)||0.7040||0.7241|
|IGMC ||0.7347||0.7527||0.7251||0.4994 (.0002)||0.4970||0.5006|
|IRMC (ours)||0.7230||0.7330||0.7176||0.7820 (.0004)||0.7143||0.8013|
In this section, we conduct experiments to verify proposed model333The experiment codes will be released.. We basically deploy our experiments on Movielens-1M and Amazon-Books. Movielens-1M contains movie rating data444https://grouplens.org/datasets/movielens/ with 6040 users, 3706 items and 1000209 ratings (ranged within 1-5). Amazon-Books is selected from amazon product review dataset . It is a large dataset. After filtering out items with less than five interactions, the dataset contains 101839 users, 91599 items and 2931466 ratings which we convert to implicit interaction (as positive examples), and then sample 5 items as negative examples for each interaction during training. For Movielens-1M,  collects side information (user gender, age, occupation, movie genre, etc.) for users and items in the original dataset. We use the augmented dataset as Movielens-1M-features and further test our model in feature-based setting. For each user, we hold out ten interactions as test set and use the remaining as training set. After that, each user has three to ninety (resp. one to thousands of) training examples for Movielens-1M (resp. Amazon-Books). We select support users such that they have more than training interactions. Basically, for Movielens-1M and
for Amazon-Books. The partition gives 49058 support users for Amazon-Books and 2164 for Movielens-1M. The remaining users are used as query users. We use Mean Absolute Error (MAE) and Area Under the Curve (AUC) as evaluation metrics for explicit interactions (Movielens-1M) and implicit interactions (Amazon-Books), respectively. For comparison, we consider ItemPop and PMF as two baseline methods. ItemPop directly uses the number of interacted users for item recommendation. PMF is a simple matrix factorization method with L2 regularization. For CF-based method, we also consider Neural Collaborative Filtering (NCF)  which extends matrix factorization with neural network and here we specify it as a three-layer neural network with activation. For in our transductive CF model, we use the same architecture as NCF. Moreover, we further consider Graph Convolutional Matrix Completion (GCMC) , one state-of-the-art transductive matrix completion method and recently proposed GNN-based inductive matrix completion IGMC , as two strong competitors. For feature-based method, we use Wide&Deep network  as a baseline method. Furthermore, we consider two powerful competitors, Meta-Learning User Preference Estimator (MeLU)  and Attribute Graph Neural Networks (AGNN) . Different from our model that divides training process into two stages, for other methods, the training is conducted on all the users. We tune each comparative model on different datasets and report the best results of them. In Appendix LABEL:appendix-experiment, we present detailed information for model specification, hyper-parameter settings and training details.
In Table Document, we report the MAEs (resp. AUCs) for test interactions from all the users, support users (warm-start) and query users (few-shot), respectively, on Movielens-1M (resp. Amazon-Books). The results show that our model IRMC outperforms other competitors in most cases. In particularly, IRMC gives the best overall MAE and AUC for all the users, which demonstrates that IRMC is a powerful framework to solve matrix completion, especially for users with distinct historical information. For warm-start recommendation, IRMC manages to beat other competitors (especially for AUC by ) even using a simple transductive model without GNN architecture. Compared with NCF which uses the same architecture as our transductive model, IRMC achieves much better MAEs and AUCs for support users (warm-start). The reason could be that NCF directly uses all the users for learning transductive embeddings and the query users (with sparse interactions) would have a negative effect on learning for support users. This result validates the effectiveness of partitioning the users into two groups, which can maintain good performance for transductive learning on support users. Moreover, as for few-shot recommendation on query users, our model significantly achieves improvement on MAE over the strong competitor IGMC, which proves that IRMC with an inductive relational model is indeed effective for addressing data sparsity issue. The AUCs of IGMC on Amazon-Books are much worse than other methods. The reason is that IGMC relies on sub-graphs for users and items as input for prediction. For dataset with implicit interactions, the sub-graphs only contain one-type edges (positive) and lose efficacy for making desirable (two-class) classification. In Table Document, we present the test MAEs for feature-based competitors and IRMC on Movielens-1M-feature where IRMC achieves the best MAEs for overall/warm-start/few-shot recommendation. Furthermore, the user features enable us to consider zero-shot recommendation. In specific, we only use training interactions of support users to train each model and directly use them for prediction on test interactions of query users without any historical interaction. While no interaction is given for query users, these methods can leverage user features to achieve inductive computation. We can see that our model IRMC gives the best results, achieving and improvement over MeLU and AGNN, respectively, two state-of-the-arts for cold-start recommendation. This depicts that IRMC is capable of dealing with zero-shot recommendation and a promising approach to handle new users with no historical behavior in real-world dynamic systems. We also statistic test performance for users with different numbers of training interactions and present the results on Movielens-1M and Movielens-1M-feature in Fig. Document. As shown in the figures, as the number of training interactions goes up, the MAEs of all transductive models suffer from obvious drop while IGMC and our model exhibit a more smooth decrease. In the extreme cases with less than five training interactions, notably, our model also gives the best results with (resp. ) improvement on MAE for Movielens-1M (resp. Movielens-1M-feature).
Impact of Partition Threshold We study variation of model performance w.r.t. different partition thresholds. We show the MAEs on Movielens-1M in Fig. Document where we change the threshold from 5 to 50. The MAE goes down suddenly, remains at a fixed level () and then goes high again. This indicates that the partition strategy is important to keep balance of two sets. If is too small, the sub-matrix of interactions w.r.t. support users would be sparse, which may affect transductive learning; if it is too large, a small set of support users would not be representative enough and limits the expressive power for inductive relation model.
Inductive Representations v.s. Transductive Representations We conduct a case study in Fig. LABEL:fig-vis where we visualize transductive embeddings of support users and inductive ones of query users, which is given by IRMC, and transductive embeddings of all the users, given by PMF (matrix factorization). The details are in Appendix LABEL:appendix-result. One key observation is that when the number of training interactions becomes larger, the inductive embeddings given by IRMC would get closer to the transductive embeddings given by PMF (matrix factorization). This phenomenon indicates that given sufficient training interactions, the inductive relation model can capture similar preference embeddings as transductive learning, which again justifies the design of IRMC. r0.4 tableTraining time per epoch on Movielens-1M-feature. IRMC contains times for two-stage training. [0.5pt] Method AGNN IRMC MeLU 0em0.5pt0.5pt 0em0.5pt0.5pt Time (s) 40.6 20.5+27.9 513.2 0em0.5pt0.5pt [0.5pt] Scalability Test We further study the scalability of our IRMC compared with IGMC and GCMC. We statistic the training time per epoch on Amazon-Books using a GTX 1080Ti with 11G memory. Here we truncate the dataset and use different numbers of users for training. For IRMC, we add the training times of two stages where one epoch is considered. The results are shown in Fig. Document (with log-scale axis). As we can see, when dataset size becomes large, the training times per epoch of three models all exhibit linear increase. IRMC spends approximately one more time than GCMC, while IGMC is approximately ten times slower than IRMC. In fact, IRMC requires a subgraph for each training interaction, so one may need to transmit millions of subgraphs between GPU memory and CPU memory in one epoch, which leads to high time cost. On the other hand, GCMC and our IRMC only rely on one global graph. In specific, GCMC deals with a sparse user-item bipartite graph and IRMC handles a dense user-user graph. However, GCMC needs to update transductive embeddings of users in a local graph for each training interaction, which induces complexity at least (where denotes average number of observed interactions for one item), while the second stage of IRMC only updates the inductive model, which contributes to the total complexity (as discussed in Section 3.2). In Amazon-Books, we have , and this is why the time costs of IRMC and GCMC remain in the same level. Nevertheless, GCMC as a transductive model cannot deal with new unseen users in test stage, while our IRMC can efficiently compute inductive embeddings for new users given the trained model. We also compare the training time of MeLU, AGNN and IRMC on Movielens-1M-feature using side information and report the results in Table Document where IRMC is nearly as efficient as AGNN and about 11 times faster than MeLU which uses meta-learning (with five steps of local updates per global update).
In this paper, we propose a new inductive relational matrix completion method that can effectively address data sparsity and cold-start issues. The model is theoretically sound with our rigorous justification and analysis on generalization ability. Through extensive experiments, we show that our model outperforms state-of-the-arts by showing superior performance on both warm-start and cold-start users. As future direction, it would be interesting to consider the selection for support users as a decision problem (which could be jointly optimized with the prediction model). The core idea of IRMC opens a new way for next generation of representation learning, i.e., one can consider a pretrained representation model for one set of existing entities and generalize their representations (through some simple transformations) to efficiently compute inductive representations for others, enabling the model to flexibly handle new coming entities in an open world. We believe that this novel and effective framework can inspire more researches in broad areas of AI.
There always exists a trade-off between information utility and risk of exposing user privacy. Our model is a promising approach for building a powerful recommender system that can exploit historical behaviors of users, induce their latent interests and preferences, and recommend an item that one is very likely to click or purchase. The accurate recommendation can help to filter useful information for individuals, improve the efficiency of global society and further alleviate the information explosion problem in the age of information. Also, our methodology that can improve recommendation performance on cold-start users with few or no historical behaviors, which can help to reduce bias in previous recommendation model and facilitate fairness among both old users and new users in one platform. Admittedly, such model would also possibly be used by a company for uncovering user habits, personalities and social circles that may concern user privacy. We encourage that in data collection process, one should take care of the privacy issue and sufficiently anonymize the data before it is used for preference learning. Also, on the methodological level, more works need to be done on how to guarantee a good preference estimation in recommender systems under some constraints on data privacy. If the algorithm is not aware of the correspondence between data and specific users, to a certain degree, user privacy can properly protected.
-  J. Bobadilla, F. Ortega, A. Hernando, and A. Gutiérrez. Recommender systems survey. Knowl. Based Syst., 46:109–132, 2013.
-  E. J. Cand‘es and B. Recht. Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6):717–772, 2009.
H. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. Anderson,
G. Corrado, W. Chai, M. Ispir, R. Anil, Z. Haque, L. Hong, V. Jain, X. Liu,
and H. Shah.
Wide & deep learning for recommender systems.In DLRS, pages 7–10, 2016.
-  Z. Cheng, Y. Ding, L. Zhu, and M. S. Kankanhalli. Aspect-aware latent factor model: Rating prediction with ratings and reviews. In WWW, pages 639–648, 2018.
-  Z. Du, X. Wang, H. Yang, J. Zhou, and J. Tang. Sequential scenario-specific meta learner for online recommendation. In KDD, pages 2895–2904, 2019.
-  W. Fan, Y. Ma, Q. Li, Y. He, Y. E. Zhao, J. Tang, and D. Yin. Graph neural networks for social recommendation. In WWW, pages 417–426, 2019.
-  G. Guo, J. Zhang, and N. Yorke-Smith. Trustsvd: Collaborative filtering with both the explicit and implicit influence of user trust and of item ratings. In AAAI, pages 123–129, 2015.
-  J. S. Hartford, D. R. Graham, K. Leyton-Brown, and S. Ravanbakhsh. Deep models of interactions across sets. In ICML, pages 1914–1923, 2018.
-  X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T. Chua. Neural collaborative filtering. In WWW, pages 173–182, 2017.
-  Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8):30–37, 2009.
-  H. Lee, J. Im, S. Jang, H. Cho, and S. Chung. Melu: Meta-learned user preference estimator for cold-start recommendation. In KDD, pages 1073–1082, 2019.
-  J. Li, M. Jing, K. Lu, L. Zhu, Y. Yang, and Z. Huang. From zero-shot learning to cold-start recommendation. In AAAI, pages 4189–4196, 2019.
Z. Lu, M. Gao, X. Wang, J. Zhang, H. Ali, and Q. Xiong.
SRRL: select reliable friends for social recommendation with reinforcement learning.In ICONIP, pages 631–642, 2019.
-  P. Massa and P. Avesani. Trust-aware recommender systems. In RecSys, pages 17–24, 2007.
-  J. J. McAuley, C. Targett, Q. Shi, and A. van den Hengel. Image-based recommendations on styles and substitutes. In SIGIR, pages 43–52, 2015.
-  M. Mohri, A. Rostamizadeh, and A. Talwalkar. Foundations of Machine Learning. Adaptive computation and machine learning. MIT Press, 2012.
-  F. Monti, M. M. Bronstein, and X. Bresson. Geometric matrix completion with recurrent multi-graph neural networks. In NeurIPS, pages 3697–3707, 2017.
-  K. Moridomi, K. Hatano, and E. Takimoto. Tighter generalization bounds for matrix completion via factorization into constrained matrices. IEICE Trans. Inf. Syst., 101-D(8):1997–2004, 2018.
-  S. Park and W. Chu. Pairwise preference regression for cold-start recommendation. In ACM RecSys, pages 21–28, 2009.
-  T. Qian, Y. Liang, and Q. Li. Solving cold start problem in recommendation with attribute graph neural networks. CoRR, abs/1912.12398, 2019.
-  A. M. Rashid, G. Karypis, and J. Riedl. Learning preferences of new users in recommender systems: an information theoretic approach. KDD, 10(2):90–100, 2008.
-  S. Ren, K. He, R. B. Girshick, and J. Sun. Faster R-CNN: towards real-time object detection with region proposal networks. In NeurIPS, pages 91–99, 2015.
-  S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme. BPR: bayesian personalized ranking from implicit feedback. In UAI, pages 452–461, 2009.
-  R. Salakhutdinov and A. Mnih. Probabilistic matrix factorization. In J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, editors, NeurIPS, pages 1257–1264, 2007.
-  M. S. Schlichtkrull, T. N. Kipf, P. Bloem, R. van den Berg, I. Titov, and M. Welling. Modeling relational data with graph convolutional networks. In ESWC, volume 10843 of Lecture Notes in Computer Science, pages 593–607, 2018.
-  A. P. Singh and G. J. Gordon. Relational learning via collective matrix factorization. In KDD, pages 650–658, 2008.
-  N. Srebro, N. Alon, and T. S. Jaakkola. Generalization error bounds for collaborative prediction with low-rank matrices. In NeurIPS, pages 1321–1328, 2004.
-  N. Srebro and A. Shraibman. Rank, trace-norm and max-norm. In COLT, pages 545–560, 2005.
-  R. van den Berg, T. N. Kipf, and M. Welling. Graph convolutional matrix completion. CoRR, abs/1706.02263, 2017.
-  A. van den Oord, S. Dieleman, and B. Schrauwen. Deep content-based music recommendation. In NeurIPS, pages 2643–2651, 2013.
-  M. Vartak, A. Thiagarajan, C. Miranda, J. Bratman, and H. Larochelle. A meta-learning perspective on cold-start recommendations for items. In NeurIPS, pages 6904–6914, 2017.
-  H. Wang, N. Wang, and D. Yeung. Collaborative deep learning for recommender systems. In KDD, pages 1235–1244, 2015.
-  M. Xu, R. Jin, and Z. Zhou. Speedup matrix completion with side information: Application to multi-label learning. In NeurIPS, pages 2301–2309, 2013.
R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec.
Graph convolutional neural networks for web-scale recommender systems.In KDD, pages 974–983, 2018.
-  K. Yu, J. Bi, and V. Tresp. Active learning via transductive experimental design. In ICML, pages 1081–1088, 2006.
-  M. Zhang and Y. Chen. Inductive matrix completion based on graph neural networks. In ICLR, 2020.
-  S. Zhang, L. Yao, A. Sun, and Y. Tay. Deep learning based recommender system: A survey and new perspectives. ACM Comput. Surv., 52(1):5:1–5:38, 2019.
-  Y. Zheng, B. Tang, W. Ding, and H. Zhou. A neural autoregressive approach to collaborative filtering. In ICML, pages 764–773, 2016.
-  K. Zhou, S. Yang, and H. Zha. Functional matrix factorizations for cold-start recommendation. In SIGIR, pages 315–324, 2011.