Rating predictions (RP) have been studied for decades as a branch of research in recommender systems [15, 11, 14]. Unlike ranking predictions, the goal of RPs is to predict the rating that a user would give to an item that she has not rated in the past as precisely as possible [10, 7]
. This is useful not only for recommendation purposes but also when there is a need to estimate users’ opinions about a particular item.
The general idea of matrix factorization (MF) is to optimize latent factors to represent users and items by projecting users and items into a joint dense vector space [10, 6]. Conventional MF methods, such as singular value decomposition (SVD) , probabilistic matrix factorization (PMF)  and non-negative matrix factorization (NMF) 
, decompose the rating matrix into a user factor matrix and a shared item factor matrix for all users. Deep learning basedMF methods further take the linear and non-linear interactions between users and items into consideration by employing restricted Boltzmann machine (RBM) , autoencoder (AE) [20, 22], convolutional neural network (CNN)  or
multi-layer perceptron(MLP) [6, 28]. Recently, various methods have been proposed to enhance MF by incorporating side information [24, 4, 3, 2, 25, 26].
All MF methods use the same RP model as well as item embeddings for all users to predict personalized ratings. We hypothesize that this is not always optimal. First, different users might have different views and/or angles about the same item, which indicates they should not always share the same item embeddings. For example, each reader has her own unique understanding of “Hamlet.” Second, different users might favor different RP strategies, which means they should not consistently use the same RP model.
|Method||User 4||User 90||User 1983|
Consider Table 1, where we select three users from the test set of a benchmark dataset and list the RP performance of two competitive MF methods. PMF is more suitable for user 4, but performs very badly for user 90. In contrast, NMF is good for user 90, but is not suitable for user 4. This is because there are different factors that should be considered for different users for RPs and it is hard for a single RP model to perfectly capture all factors.
We propose to adopt private RPs, where each user has her own item embeddings and RP model. A key challenge is how to build private RP models and at the same time effectively utilize CF information due to the fact that we may not have enough personal data for each user to build her own model. Besides, it is also unrealistic to store and maintain a separate model for each individual user. In this paper, we address this by introducing a novel matrix factorization framework, namely meta matrix factorization (MetaMF). Instead of building a model for each user, we propose to “generate” private item embeddings and RP models with a MetaMF model. Specifically, we assign a so-called indicator vector (i.e., a one-hot vector corresponding to a user id) to each user. For a given user, we first fuse her indicator vector to get a collaborative vector by collecting useful information from other users with a collaborative memory (CM) module. Then, we employ a meta recommender (MR) module to generate private item embeddings and a RP model based on the collaborative vector. It is challenging to directly generate the item embeddings due to the large number of items and the high dimensions. To tackle this, we devise a rise-dimensional generation (RG) strategy that first generates a low-dimensional item embedding matrix and a rise-dimensional matrix, and then multiply them to obtain high-dimensional embeddings. Finally, we use the generated model to obtain private RPs for this user.
We perform extensive experiments on two benchmark datasets. MetaMF outperforms state-of-the-art MF methods. Both the generated item embeddings and the RP model parameters exhibit clustering phenomena, demonstrating that MetaMF can effectively model CF while generating a private model for each user.
The main contributions of this paper are as follows:
We introduce a novel MetaMF framework for rating predictions, which is the first to realize private rating predictions, to the best of our knowledge.
We devise collaborative memory and meta recommender modules as well as a rise-dimensional generation strategy to implement MetaMF.
We conduct experiments and analyses on two datasets to verify the effectiveness of MetaMF.
2 Related Work
2.1 Matrix Factorization
Matrix factorization (MF) has attracted a lot attention since it was proposed for recommendation. Early studies focus mainly on how to achieve better rating matrix decomposition. sarwar2000application sarwar2000application employ SVD to reduce the dimensionality of the rating matrix, so that they can get low-dimensional user and item vectors. goldberg2001eigentaste goldberg2001eigentaste apply principal component analysis (PCA) to decompose the rating matrix, and obtain the principle components as user or item vectors. zhang2006learning zhang2006learning propose NMF which decomposes the rating matrix by modeling each user’s ratings as an additive mixture of rating profiles from user communities or interest groups and constraining the factorization to have non-negative entries. mnih2008probabilistic mnih2008probabilistic propose PMF to model the distributions of user and item vectors from a probabilistic point of view. koren2008factorization koren2008factorization proposes SVD++, which enhances SVD by including implicit feedback as opposed to SVD, which only includes explicit feedback.
The matrix decomposition methods mentioned above estimate ratings by simply calculating the inner product between the user and item vectors, which is not sufficient to capture their complex interactions. Deep learning has been introduced to MF
for better modeling of the user-item interactions with non-linear transformations. sedhain2015autorec sedhain2015autorec propose AutoRec, which takes ratings as input and reconstructs the ratings by an autoencoder. Later, strub2016hybrid strub2016hybrid enhance AutoRec by incorporating side information into a denoising autoencoder. he2017neural he2017neural propose theneural collaborative filtering (NCF), which employs MLP to model the user-item interactions. xue2017deep xue2017deep present the deep matrix factorization (DMF) which enhances NCF by considering both explicit and implicit feedback. He2018Outer He2018Outer use CNNs to improve NCF and present the ConvNCF which uses the outer product to model user-item interactions. cheng20183ncf cheng20183ncf introduce an attention mechanism into NCF to differentiate the importance of different user-item interactions. Recently, a number of studies have investigated the use of side information or implicit feedback to enhance these neural models [14, 25, 26, 29].
All these models provide personalized RPs by learning user representations to encode the differences among users, while sharing item embeddings and models. In contrast, MetaMF provides “private” RPs by generating non-shared models as well as item embeddings for different users.
2.2 Meta Learning
Meta learning, also known as “learning to learn,” has shown its effectiveness in reinforcement learning, few-shot learning , image classification . Below, we survey the most closely related works.
Some meta learning works aim to learn a special network used to generate the parameters of other networks. jia2016dynamic jia2016dynamic propose a network to dynamically generate filters for CNNs. bertinetto2016learning bertinetto2016learning introduce a model to predict the parameters of a pupil network from a single exemplar for one-shot learning. ha2016hypernetworks ha2016hypernetworks propose hypernetworks, which employ a network to generate the weights of another network. krueger2017bayesian krueger2017bayesian present a Bayesian variant of hypernetworks that learns the distribution over the parameters of another network. chen2018meta chen2018meta use a hypernetwork to share function-level information across multiple tasks. However, none of them targets recommendation which is a more complex task with its own new challenges.
Recently, some studies try to introduce meta learning into recommendations. vartak2017meta vartak2017meta study the item cold-start problem in recommendations from a meta-learning perspective. They view recommendation as a binary classification problem, where the class labels indicate whether the user engaged with the item. Then they devise a classifier by adapting a few-shot learning paradigm. chen2018federated chen2018federated propose a recommendation framework based on federated meta learning, which maintains a shared model in the cloud. To adapt it for each user, they download the model to the local device and fine-tune the model for personalized recommendations. Different from these publications, we learn a hypernetwork (i.e., MetaMF) to directly generate private MF models for each user for RPs.
3 Meta Matrix Factorization
Given a user and an item , the goal of rating prediction is to estimate a rating that is as accurate as the true rating . We denote the user set as , the item set as , the true rating set as , which will be divided into the training set , the valid set , and the test set .
As shown in Fig. 1, MetaMF contains three modules: a collaborative memory module, a meta recommender module and a prediction module, where the collaborative memory module and the meta recommender module are shared by all users, and the prediction module is non-shared. In the collaborative memory module, we first obtain the user embedding of from the user embedding matrix and take it as the coordinates to obtain the collaborative vector from a shared memory space that fuses the information from all users. Then we input to the meta recommender module to generate the parameters of a private RP model for . The RP model can be of any type. In this work, the RP model is a multi-layer perceptron (MLP). We also generate the private item embedding matrix of with a rise-dimensional generation strategy. Finally, the prediction module takes the item embedding of from as input and predicts using the generated RP model.
Next we detail each module.
3.2 Collaborative Memory Module
In order to facilitate collaborative filtering, we propose the CM module to learn a collaborative vector for each user, which encodes both the user’s own information and some useful information from the other users.
Specifically, we assign each user and each item the indicator vectors, and , where is the number of users and is the number of items. Note that and are one-hot vectors with each dimension corresponding to a particular user or item. For the given user , we first get the user embedding by Eq. (1):
where , is the user embedding matrix, and is the size of user embeddings. Then we proceed to get the collaborative vector for . Specifically we use a shared memory matrix to store the basis vectors which span a space of all collaborative vectors, where is the dimension of basis vectors and collaborative vectors. And we consider the user embedding as the coordinates of in the shared memory space. So the collaborative vector for is the linear combination of the basis vectors in by , as shown in Eq. (2):
where is the -th vector of and is the -th scalar of . Because the memory matrix is shared among all users, the shared memory space will fuse the information from all users. MetaMF can flexibly exploit collaborative filtering among users by assigning them with similar collaborative vectors in the space defined by , which is equivalent to learning similar user embeddings as in existing MF methods.
3.3 Meta Recommender Module
We propose the MR module to generate the private item embeddings and RP model based on the collaborative vector from the CM module.
Private Item Embeddings.
We propose to generate the private item embedding matrix for each user , where is the size of item embeddings. However, it is unrealistic to directly generate the whole item embedding matrix because there are usually a large number of items (i.e., ) and their embeddings are high-dimensional (i.e., ). Therefore, we propose the rise-dimensional generation (RG) strategy to decompose the generation into two parts: a low-dimensional item embedding matrix and a rise-dimensional matrix , where is the size of low-dimensional item embeddings and . Specifically, we first follow Eq. (3) to generate and (in the form of vectors):
where and , and are weights; and are biases; and are hidden states; is the hidden size. Then we reshape to a matrix whose shape is , and reshape to a matrix whose shape is . Finally, we multiply and to get :
For different users, the generated item embedding matrices are different.
Private Rp Model.
We also propose to generate a private RP model for each user . We use a MLP as the RP model, so we need to generate the weights and biases for each layer of MLP. Specifically, for layer , we denote its weights and biases as and respectively, where is the size of its input and is the size of its output. Then and are calculated as follows:
where , and are weights; , and are biases; is hidden state. Finally, we reshape to a matrix whose shape is . Note that , , , , and are not shared by different layers of the RP model. And and also vary with different layers. Detailed settings can be found in the experimental setup. Also, MetaMF returns different parameters of the MLP to each user.
3.4 Prediction Module
The prediction module estimates the user’s rating for a given item using the generated item embedding matrix and RP model from the CM module.
First, we get the private item embedding of from by Eq. (6):
Then we follow Eq. (7) to predict based on the RP model:
where is the number of layers of the RP model. The weights and biases are generated by the CM module. The last layer is the output layer which returns a scalar as the predicted rating .
In order to learn MetaMF, we formulate the RP
task as a regression problem and the loss function is defined as:
To avoid overfitting, we add the L2 regularization term:
where represents the trainable parameters of MetaMF. Note that unlike existing MF methods, the item embeddings and the parameters of RP models are not included in , because they are also the outputs of MetaMF, not trainable parameters.
The final loss is a linear combination of and :
where is the weight of . The whole framework of MetaMF can be efficiently trained using back-propagation in an end-to-end paradigm.
4 Experimental Setup
We conduct experiments on two widely used datasets: Douban  and Hetrec2011-movielens . We list the statistics of these two datasets in Table 2. For each dataset, we randomly separate it into three chunks: as the training set, as the validation set and as the test set.
We compare MetaMF with the following conventional and deep learning-based MF methods. It is worth noting that in this paper we focus on predicting ratings based on rating matrices, thus for fairness we neglect the MF methods which need side information.
URP : It employs a topic model to model user preference.
NMF : It uses non-negative matrix factorization to decompose rating matrices.
SVD++ : It extends SVD by considering implicit feedback for modeling latent factors.
LLORMA : It uses a number of low-rank submatrices to compose rating matrices.
Deep learning-based methods:
RBM : It employs RBM to model the generation process of ratings.
AutoRec : It proposes AEs to model the interactions between users and items. AutoRec has two variants, one taking users’ ratings as input, denoted as AutoRec-U, and the other taking items’ ratings as input, denoted as AutoRec-I.
CFN : It enhances AutoRec by introducing a denoising autoencoder. CFN also has two variants, called CFN-U and CFN-I.
4.3 Implementation Details
The user embedding size and the item embedding size are set to . The size of the collaborative vector is set to . The size of the low-dimensional item embedding is set to . The hidden size is set to . And the RP model in the prediction module is an MLP with two layers whose layer sizes are and . During training, we initialize all trainable parameters randomly with the Xavier method . We choose Adam  to optimize MetaMF, set the learning rate to , and set the regularizer weight to . We use a mini-batch size
by grid search. Our framework is implemented with Pytorch. In our experiments, we implementNCF based on the released code of the author.111https://github.com/hexiangnan/neural˙collaborative˙filtering We refer the release code222https://github.com/gtshs2/Autorec to realize AutoRec and CFN. And we use LibRec333https://www.librec.net/ to implement the other baselines.
4.4 Evaluation Metrics
To evaluate the performance of rating prediction methods, we employ two evaluation metrics, i.e.,Mean Absolute Error (MAE) and Mean Square Error (MSE). Both of them are widely applied for the RP task in recommender systems. Given the predicted rating and the true rating of user on item in the test set , MAE is calculated as:
Whereas MSE is defined as:
In our experiments, statistical significance is tested using a two-sided paired t-test for significant differences ().
5 Experimental Results
5.1 Research Questions
We seek to answer the following research questions in our experiments:
Does the proposed MetaMF method outperform the state-of-the-art MF methods on the rating prediction task?
Does generating private item embeddings improve the performance of rating predictions?
Is generating private RP models helpful to make rating predictions better?
Can MetaMF generate different item embeddings and RP models for different users while exploiting collaborative filtering?
5.2 Performance Comparison (RQ1)
We start by addressing RQ1 and test if MetaMF outperforms the state-of-the-art MF methods. Table 3 lists the rating prediction performance of all MF methods. Our main observations are as follows:
MetaMF outperforms other baselines in terms of all metrics on all datasets. For the Douban dataset, MetaMF achieves a significant () decrease over NCF in terms of MAE (MSE); and on the Hetrec2011-movielens dataset, it achieves a () decrease over NCF in terms of MAE (MSE). There are three reasons to explain these results. Firstly, MetaMF generates private item embeddings for different users, which can capture the differences among users’ views and/or angles on the same item. Secondly, MetaMF provides different users with private RP models, which allows MetaMF to better model the user’s profiles. Lastly, MetaMF can take advantage of collaborative filtering through the collaborative memory module, so users can share information as in ordinary MF methods.
The item embedding size used in MetaMF is half that of NCF,444We set the item embedding size to for NCF and the MLP used in the prediction module is also simpler than the one in NCF.555In experiments, NCF has four layers with sizes of respectively However, MetaMF still outperforms NCF. This indicates that since each user has her own item embeddings (and RP model), we can reduce their sizes (scale) while still achieving competitive RP performance. We also tried larger embedding sizes (model scale), but in that case the performance MetaMF slightly drops due to overfitting.
Although conventional methods cannot model non-linear transformations as well as deep learning-based methods, we see they still show comparable performance. In Table 3, NMF, PMF and LLORMA outperform RBM, AutoRec and CFN-U. There may be two reasons. On one hand, RBM, AutoRec and CFN do not explicitly model user latent factors and item latent factors, which hinders them from learning better user and item representations. On the other hand, we guess that the linear models may be more suitable to some users. Accordingly, we conclude that deep learning-based models are not the best choices for all users, which also supports our argument that we should provide private RP models for users.
SVD++ performs well on both datasets, and outperforms NCF on the Hetrec2011-movielens dataset. The reason is because that SVD++ considers implicit feedback, which reflects interactions between a given item and another item that the user rates. Thus, compared to other baselines, SVD++ can better capture personalized factors in the rating prediction task. However MetaMF is also better than SVD++, since MetaMF can better model users’ private behaviors or views.
CFN achieves a better performance than AutoRec. The denoising autoencoder can improve the robustness of models. And AutoRec-I and CFN-I outperform AutoRec-U and CFN-U, respectively. Because the number of items is bigger than the number of users, reconstructing item ratings is easier than reconstructing user ratings.
5.3 Effectiveness of Generating Private Item Embeddings (RQ2)
Next we address RQ2 to analyze the effectiveness of generating private item embeddings for the RP task. We compare MetaMF with MetaMF-SI which only generates private RP models for different users while sharing a common item embedding matrix among all users. As shown in Table 4, MetaMF outperforms MetaMF-SI on the Hetrec2011-movielens dataset. We conclude that generating private item embeddings for each user can improve the performance of RPs. As each user has her own perspective, generating a specific item embedding for each user pays off. Possibly because users in the Douban dataset have similar views or angles, we notice that MetaMF-SI and MetaMF have comparable performance on the Douban dataset.
5.4 Effectiveness of Generating Private Rp Models (RQ3)
To help us answer RQ3, we compare MetaMF with MetaMF-SM, which generates different item embeddings for different users and shares a common RP model among all users. From Table 5, we can see that MetaMF consistently outperforms MetaMF-SM on both datasets. Thus, generating private RP models for users is able to improve the performance of RPs. Because different users take different ways to interact with items, a shared RP model is not suitable for all users.
Furthermore, by comparing MetaMF-SI and MetaMF-SM, we see that MetaMF-SI outperforms MetaMF-SM on the Douban dataset, but MetaMF-SM outperforms MetaMF-SI on the Hetrec2011-movielens dataset. Users of the Douban dataset are prone to interact with items in different ways, while users in the Hetrec2011-movielens dataset are likely to view items from different angles.
5.5 Visualization (RQ4)
Lastly, we come to RQ4. In order to verify that MetaMF generates private item embeddings and RP models for users, we visualize the generated weights and item embeddings after reducing their dimension by t-SNE 
and normalizing them by mean and standard deviation,666Here, , where is the mean and is the standard deviation. where each point represents a user’s weights or item embeddings. Because there are many items, we randomly select two items from each dataset for visualization. As shown in Fig. 2, MetaMF generates different weights and item embeddings for different users, which indicates that MetaMF has the ability to better capture the private factors for users. And we also notice the existence of many non-trivial clusters in each image, which shows that MetaMF is able to share information among users to take advantage of collaborative filtering. Compared to previous MF methods that share common item embeddings and RP models, MetaMF is very flexible.
In this paper, we have studied matrix factorization methods for the rating prediction task. We have first argued that each user has her own views w.r.t. items and that a single common method/model is unlikely to satisfy all users. We have proposed a novel matrix factorization framework, named MetaMF. MetaMF first employs a collaborative memory module and a meta recommender module with a rise-dimensional generation strategy to generate private item embeddings and a rating prediction model for a user. Then MetaMF predicts the user’s rating for a given item based on the generated item embeddings and rating prediction model. We conduct extensive experiments to validate the performance of MetaMF which can improve the performance of rating predictions by generating private item embeddings and rating prediction models.
The main limitation of MetaMF is that it requires users to have enough data for learning the private item embeddings and RP models. To generate item embeddings and RP models, the meta recommender module needs a large number of parameters and computations. As to our future work, we plan to enhance MetaMF for dealing with the user cold-start problem. We also would like to consider alternative CM and MR modules to reduce the number of parameters and further improve the performance. Additionally, we hope to incorporate side information and implicit feedback into MetaMF.
-  (2011) Second workshop on information heterogeneity and fusion in recommender systems. In HetRec2011, Cited by: §4.1.
-  (2018-10) A collective variational autoencoder for top-n recommendation with side information. In 3rd Workshop on Deep Learning for Recommender Systems, Cited by: §1.
A^ 3ncf: an adaptive aspect attention model for rating prediction. In IJCAI, pp. 3748–3754. Cited by: §1.
-  (2018) Aspect-aware latent factor model: rating prediction with ratings and reviews. In WWW, pp. 639–648. Cited by: §1.
-  (2010) Understanding the difficulty of training deep feedforward neural networks. JMLR 9, pp. 249–256. Cited by: §4.3.
-  (2017) Neural collaborative filtering. In WWW, pp. 173–182. Cited by: §1, 4th item.
-  (2014) Your neighbors affect your ratings: on geographical neighborhood influence to rating prediction. In SIGIR, pp. 345–354. Cited by: §1, §4.1.
-  (2016) Convolutional matrix factorization for document context-aware recommendation. In RecSys, pp. 233–240. Cited by: §1.
-  (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.3.
-  (2009) Matrix factorization techniques for recommender systems. IEEE Computer 42 (8), pp. 30–37. Cited by: §1, §1.
-  (2008) Factorization meets the neighborhood: a multifaceted collaborative filtering model. In SIGKDD, pp. 426–434. Cited by: §1, §1, 4th item.
-  (2016) LLORMA: local low-rank matrix approximation. JMLR 17 (1), pp. 442–465. Cited by: 5th item.
-  (2017) Neural rating regression with abstractive tips generation for recommendation. In SIGIR, pp. 345–354. Cited by: §1.
-  (2017) Collaborative variational autoencoder for recommender systems. In SIGKDD, pp. 305–314. Cited by: §1, §2.1.
-  (2004) Modeling user rating profiles for collaborative filtering. In NeurIPS, pp. 627–634. Cited by: §1, 1st item.
-  (2008) Probabilistic matrix factorization. In NeurIPS, pp. 1257–1264. Cited by: §1, 3rd item.
-  (2018) On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999. Cited by: §2.2.
-  (2017) Optimization as a model for few-shot learning. In ICLR, Cited by: §2.2.
-  (2007) Restricted boltzmann machines for collaborative filtering. In ICML, pp. 791–798. Cited by: §1, 1st item.
-  (2015) Autorec: autoencoders meet collaborative filtering. In WWW, pp. 111–112. Cited by: §1, 2nd item.
-  (2017) Prototypical networks for few-shot learning. In NeurIPS, pp. 4077–4087. Cited by: §2.2.
-  (2016) Hybrid recommender system based on autoencoders. In DLRS, pp. 11–16. Cited by: §1, 3rd item.
-  (2008) Visualizing data using t-SNE. JMLR 9 (Nov), pp. 2579–2605. Cited by: §5.5.
-  (2018) Confidence-aware matrix factorization for recommender systems. In AAAI, Cited by: §1.
-  (2019) Neural variational hybrid collaborative filtering. In WWW, Cited by: §1, §2.1.
-  (2019) Bayesian deep collaborative matrix factorization. In AAAI, Cited by: §1, §2.1.
-  (2018) Meta-gradient reinforcement learning. In NeurIPS, pp. 2396–2407. Cited by: §2.2.
-  (2017) Deep matrix factorization models for recommender systems.. In IJCAI, pp. 3203–3209. Cited by: §1.
-  (2019) Deep matrix factorization with implicit feedback embedding for recommendation system. IEEE TII. Cited by: §2.1.
-  (2006) Learning from incomplete ratings using non-negative matrix factorization. In SDM, pp. 549–553. Cited by: §1, 2nd item.