In the Web 2.0, social tagging systems are introduced by many websites, where users can freely annotate online items using arbitrary tags (commonly known as folksonomy ). Since social tags are good summaries of the relevant items and the users’ preferences, and since they also contain little sensitive information about their creators, they are valuable information for privacy-enhanced personalized recommendation. Consequently, many efforts have been put on tag-aware personalized recommendation using content-based filtering [4, 14, 16] or collaborative filtering [3, 12, 13, 15].
However, as users can freely choose their own vocabulary, social tags may contain many uncontrolled vocabularies. This usually results in sparse, redundant, and ambiguous tag information, and significantly weakens the performance of content-based recommendation systems. The common solution is to apply machine learning techniques, e.g., clustering
or autoencoders, to learn more abstract and representative features from raw tags. Recently, Xu et al. 
propose a deep-semantic model called DSPR which utilizes deep neural networks to model abstract and recommendation-oriented representations for social tags. DSPR is reported to achieve better performance than the clustering and autoencoder solutions.
Matrix factorization is a collaborative-filtering-based solution, which has become a dominant solution for personalized recommendation on the Social Web [3, 12, 13] and has been reported to be superior to memory-based techniques . However, there exists a cold start problem in matrix factorization: many users only give very few ratings, resulting in a very sparse user-item rating matrix, and making it difficult to summarize users’ preferences. A widely adopted solution is to incorporate additional sources of information about users, e.g., implicit feedback , social friendship , geographical neighborhood , or textual comments . We call these upgraded models additional-information-based matrix factorization (AMF) models.
Although the existing deep-semantic model, DSPR, and the upgraded matrix factorization models, AMF, have progressively improved the tag-aware personalized recommendation, there are still a few drawbacks: (i) DSPR does not utilize the idea of collaborative filtering; hence, the valuable correlation information between users and items is not being used to help recommendation. (ii) As a deep neural model, DSPR stacks many layers, which makes it difficult to optimize the model by gradient back-propagation. (iii) The existing AMF models generally incorporate the additional information as a regularization term of matrix factorization; this term’s coefficient, as proved in , has to be very small; therefore, the additional information has very limited contribution on the optimizing gradient, resulting in only “marginal” improvements on the recommendation performance. (iv) The recommendation results of the existing AMF models are difficult to interpret, because latent factor matrices are used to represent users and items.
Consequently, to solve the above problems and to further improve the performance of tag-aware personalized recommendation, we propose a hybrid deep-semantic matrix factorization (HDMF) model, which integrates the techniques of deep-semantic modeling, hybrid learning, and matrix factorization. Generally, HDMF uses a tag-based user matrix and a tag-based item matrix as respective inputs of two deep autoencoders to generate deep-semantic user and item matrices at the code layers, and also reconstructed user and item matrices at the output layers. The deep model is then trained by using a hybrid learning signal to minimize both reconstruction errors and deep-semantic matrix factorization errors, i.e., the squared differences between the user-item rating matrix (seeing tags as positive ratings) and the dot product of deep-semantic user and item matrices (seeing deep-semantic matrices as the decomposed matrices in matrix factorization). The intuitions of using the hybrid learning signal are: (i) minimizing reconstruction errors can learn better representations for both users and items; (ii) deep-semantic matrix factorization offers a learning signal that connects users and items to discover the underlying users’ preferences; (iii) two signals can complement each other to provide sufficient gradients for better model optimization and escaping the local minima.
HDMF thus has the following advantages. (i) It overcomes the drawback of DSPR by adding collaborative-based capabilities to the deep-semantic model. (ii) The hybrid learning signal helps HDMF to better optimize the model and escape local minima. (iii) Differently from AMF models, the additional tag information in HDMF is directly used to model the decomposed user and item matrices in matrix factorization; this thus maximizes the effect of the additional tag information on the model optimization. (iv) HDMF remedies the non-interpretability problem in matrix factorization: considering deep-semantic matrices as the decomposed matrices and finding the most influential input tags for each dimension, the decomposed user and item matrices in HDMF become interpretable.
The main contributions of this paper are briefly as follows:
We briefly analyze the state-of-the-art personalized recommendation models that use content-based filtering or matrix factorization and identify their existing problems.
We innovatively propose a hybrid deep-semantic matrix factorization (HDMF) model to tackle these problems and to further improve the performance of tag-aware personalized recommendation, by integrating the techniques of deep-semantic modeling, hybrid learning, and matrix factorization.
Experimental results show that HDMF significantly outperforms the state-of-the-art baselines in tag-aware personalized recommendation, in terms of all evaluation metrics, e.g., its mean reciprocal rank (resp., mean average precision) is (resp., ) times as high as that of the best baseline.
A folksonomy is a tuple , where , , and are sets of users, tags, and items, respectively, and is a set of assignments of tag to item by user .
A tag-based user profile
is a feature vector, where is the tag vocabulary’s size, and is the number of times that user annotates items with tag ; the tag-based user matrix is thus defined as , where is the profile vector of the th user, and is the total number of users. Similarly, a tag-based item profile is a vector , where is the number of times that item is annotated with tag ; while the tag-based item matrix is defined as , where is the profile vector of the th item, and is the total number of items.
The user-item rating matrix is , where is the number of tags annotated by user to item . Given , traditional matrix-factorization-based recommender systems aim to approximate using the decomposed latent matrices of users and items, i.e., and , respectively, which are optimized by minimizing the squared differences between and on a set of observed ratings; formally,
where is , if user annotated item , and , otherwise . After optimization learning, the predicted user-item rating matrix is used for personalized recommendation.
3 Hybrid Deep-semantic Matrix
To alleviate the cold start problem in traditional matrix factorization, a widely adopted solution is to incorporate additional sources of information about users to achieve additional-information-based matrix factorization (AMF) [10, 11, 12, 13]. However, as analyzed in Section 1 and demonstrated by both our experimental results and the results reported in , the existing AMF models achieve only “marginal” (around in ) improvements on the performance of personalized recommendation. Therefore, inspired by the recent development of deep-semantic modeling , we propose a hybrid deep-semantic matrix factorization (HDMF) model to tackle these problems and to further enhance the performance of tag-aware personalized recommendation, by integrating the techniques of deep-semantic modeling, hybrid learning, and matrix factorization.
Figure 1 shows an overview of the HDMF model. Generally, HDMF takes the tag-based user and item matrices and (defined in Section 2) as inputs of two deep autoencoders, consisting of encoders and decoders. These inputs are then passed through multiple hidden layers and projected to the deep-semantic user and item matrices and at the code layers, and to the reconstructed user and item matrices and at the output layers. The HDMF model is then trained by using a hybrid learning signal to minimize both deep-semantic matrix factorization errors and reconstruction errors. Finally, a predicted user-item rating matrix is used for personalized recommendation.
3.1 Deep-Semantic Matrix Factorization
Deep-semantic matrix factorization is solely based on the encoder parts of the deep autoencoders, which can be seen as multi-layer perception networks. Formally, given the tag-based user and item matrices and , a weight matrix
, and a bias vector, the intermediate outputs of the first hidden layers in the encoders are defined as follows:
is used as the activation function. Similarly, the intermediate outputs of theth hidden layers , , in the encoders are defined as follows:
where and are the weight matrix and the bias vector for the th hidden layers in the encoders, respectively, and is the total number of hidden layers in each encoder.
Then, the outputs of the th hidden layers, i.e., the code layers, are the deep-semantic user and item matrices, denoted and , respectively. Formally,
Consequently, by seeing the deep-semantic matrices and as the decomposed user and item matrices in matrix factorization, the parameters and can be optimized by minimizing the following deep-semantic matrix factorization errors:
where is an element in the user-item rating matrix , indicating the number of tags assigned by user to item ; (resp., ) is the vector at the th (resp., th) column of (resp., ), which is the deep-semantic representation of the th user (resp., th item); the second term is a regularization term used to prevent overfitting, and is the regularization parameter.
3.2 Hybrid Learning Signal
However, it is difficult to train the model using solely the learning signal from deep-semantic matrix factorization. This is because the model stacks many layers of non-linearities, and when learning signals are back-propagated to the first few layers, they become minuscule and insignificant to learn good representations for the users and items, which in turn results in poor local minima. A common solution is to first pre-train each layer using restricted Boltzmann machines (RBMs)[7, 8] or autoencoders  and then use back-propagation to fine-tune the entire deep neural network .
Therefore, in this work, we directly incorporate autoencoders into the deep-semantic matrix factorization model, and train the deep model using a hybrid learning signal that integrates reconstruction errors of autoencoders with the deep-semantic matrix factorization errors. We thus call this model hybrid deep-semantic matrix factorization (HDMF). The intuition behind it is as follows: (i) the reconstruction-error-based signal can learn better representations for both users and items; (ii) the collaborative learning signal from deep-semantic matrix factorization can connect users and items to discover the underlying users’ preferences; (iii) furthermore, the reconstruction-error-based signal can complement deep-semantic matrix factorization to provide sufficient gradients for better optimizing the model and escaping the local minima.
Specifically, as shown in Figure 1, we adopt autoencoders with tied weights in HDMF, i.e., the weight matrices in the decoder are the transposes of weight matrices in the encoder. The decoders take the deep-semantic user and item matrices and at the code layer as the inputs and generate reconstructed user and item matrices and at their output layers. Then, reconstruction errors are computed based on the squared differences between the original tag-based matrices ( and ) and the reconstructed matrices ( and ). Finally, the reconstruction-error-based learning signal will be used to first update , then back-propagated to update , , and so on. As updating is equivalent to updating , this signal complements deep-semantic matrix factorization and offers sufficient gradients to the first few layers of the deep model.
Formally, the intermediate outputs of the th hidden layers , , in the decoders are defined as:
where is the transpose of , and is the bias vector for the th hidden layer. The outputs of the th hidden layers are used to generate reconstructed user and item profiles, denoted and , at the output layers:
Then, the reconstruction errors of the user (resp., item) matrix are computed as the sum of the Euclidean (i.e., L) norms of the differences between the tag-based user (resp., item) profile (resp., ) in (resp., ) and the reconstructed user (resp., item) profile (resp., ) in (resp., ). By integrating the reconstruction errors with the deep-semantic matrix factorizations errors (as defined in Equation 3.1), the HDMF model is thus trained by minimizing the following hybrid learning signal:
We have conducted extensive experimental studies and compared our proposed hybrid deep-semantic matrix factorization (HDMF) model with a number of state-of-the-art baselines, which are grouped into two categories and summarized as follows:
Content-based tag-aware models. Four state-of-the-art models that utilize social tags as the content information to conduct tag-aware personalized recommendation are selected as the baselines. Similarly to HDMF, they all apply machine learning techniques to model abstract and effective representations for users or/and items; i.e., the clustering-based models, CCS and CCF , the autoencoder-based model, ACF , and the deep-semantic similarity-based model, DSPR .
Matrix-factorization-based models. Three matrix-factorization-based recommendation models are also selected as the baselines; i.e., the traditional matrix factorization model, MF, and the additional-information-based matrix factorization (AMF) models, MF  and MF , which incorporate, respectively, the social friendships and the textual comments of users as the additional sources of information for matrix factorization.
|Users ()||Tags ()||Items ()||Assignments (())|
|1 843||3 508||65 877||339 744|
To ensure a fair comparison, the experiments are performed on the same real-world social-tagging dataset as used in [16, 17], which is gathered from the Delicious bookmarking system and released in HetRec 2011 . After using the same pre-processing to remove the infrequent tags that are used less than times, the resulting dataset is as shown in Table 1. We randomly select of assignments as training set, as validation set, and as test set.
All models are implemented using Python and Theano and run on a GPU server with an NVIDIA Tesla KGPU and GB GPU memory. The parameters of HDMF are selected by grid search and the values are set as follows: (i) of hidden layers is ; (ii)
of neurons fromst to th hidden layer are , , , , and , respectively; (iii) the parameters and are set to and ; (iv) the learning rate for model training is .
In training, we first initialize the weight matrices
, using the random normal distribution, and initialize the biases
to be zero vectors; the model is then trained by back-propagation using stochastic gradient descent; finally, the training stops when the model converges or reaches the maximum training runs. We also use the validation set to avoid over-fitting by early stopping.
As for the evaluation of recommendation systems, the most popular metrics are precision, recall, and F-score . Since users usually only browse the topmost recommended items, we apply these metrics at a given cut-off rank , i.e., considering only the top- results on the recommendation list, called precision at (), recall at (), and F-score at (). In addition, since users always prefer to have their target items ranked in the front of the recommendation list, we also employ as evaluation metrics the mean average precision (MAP) and the mean reciprocal rank (MRR), which take into account the order of items and give greater importance to the ones ranked higher.
Table 2 depicts in detail the tag-aware personalized recommendation performances of our proposed HDMF and seven baselines on the Delicious dataset, in terms of , , , MAP, and MRR, where four cut-off ranks , , , and are selected.
In general, the relative performances of the baselines reported in Table 2 are highly consistent with the results reported in , , and ; namely, (i) ACF outperforms CCF, (ii) DSPR outperforms CCF, ACF, and CCS, and (iii) MF and MF “slightly” outperform MF, respectively. More importantly, we note that our proposed model, HDMF, significantly outperforms all seven baselines in all metrics; e.g., the MRR (resp., MAP) of HDMF are (resp., ) times as high as that of the best baseline, DSPR (resp., MF), while the relative performances in , , and are also similar. This finding strongly proves that by integrating the techniques of deep-semantic modeling, hybrid learning, and matrix factorization, HDMF overcomes the existing problems (as presented in Section 1) of the state-of-the-art recommendation models and achieves very superior performance in tag-aware personalized recommendation.
Specifically, as shown in Table 2, the MRR and MAP of HDMF are and times, respectively, as high as those of the-state-of-art deep-semantic model, DSPR. In addition, the relative improvements of HDMF to DSPR, in terms of , , and , all gradually enhance with the rise of the cut-off rank , i.e., increasing from around times at to more than double at . This observation demonstrates that incorporating collaborative-based capabilities (i.e., using correlation information between users and items to help the recommendation) can greatly enhance the deep-semantic model’s performance in tag-aware recommendation, especially for the one with relative long recommendation lists.
Furthermore, by comparing the results of the matrix-factorization-based models, MF, MF, and MF, in Table 2, we find that the AMF models, MF and MF, have close performances; and, more importantly, their relative improvements to MF are “marginal”, e.g., their MAP and MRR are only and , respectively, better than those of MF. This finding is actually consistent with the results in , where the improvement rates of MF and MF to MF are only and , respectively. The reason for these “marginal” enhancements is as follows: the AMF models incorporate the additional source of information as a regularization term with a small coefficient in matrix factorization, which greatly limits the additional information’s contribution on the optimizing gradient and thus limits their capabilities in improving the recommendation performance. By contrast, as shown in Table 2, HDMF dramatically outperforms MF: the MAP and MRR of HDMF are about and , respectively, better than those of MF. This is mainly because that the additional social tag information in HDMF is utilized to model the deep-semantic user and item matrices, which are then used directly as the decomposed user and item matrices in matrix factorization; since the decomposed matrices have dominant contribution on the optimizing gradient, HDMF maximizes the effect of the additional social tag information on the model optimization, making it possible to achieve significant improvements.
5 Summary and Outlook
In this paper, we have briefly analyzed the state-of-the-art tag-aware personalized recommendation models that use content-based filtering or matrix factorization, and identified their existing problems. We thus have proposed a hybrid deep-semantic matrix factorization (HDMF) model to tackle these problems and to further enhance the performance of tag-aware personalized recommendation. We have also conducted extensive experimental studies and compared HDMF with seven state-of-the-art baselines; the results show that, by integrating the techniques of deep-semantic modeling, hybrid learning, and matrix factorization, HDMF greatly outperforms the state-of-the-art baselines in tag-aware personalized recommendation, in terms of all evaluation metrics.
In the future, further experiments will be conducted to compare the performances of HDMF on different kinds of Social Web datasets, e.g., Last.fm and MovieLens. Moreover, we will also investigate methodologies to add spatial and temporal information into the HDMF model to capture the users’ real-time preferences.
-  Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer- wise training of deep networks. In Proc. NIPS, pp. 153–160, 2007.
-  J. Bobadilla, F. Ortega, A. Hernando, and A. Gutiérrez. Recommen- der systems survey. Knowl.-Based Syst., 46:109–132, 2013.
-  M. R. Bouadjenek, H. Hacid, M. Bouzeghoub, and A. Vakali. Using social annotations to enhance document representation for personalized search. In Proc. SIGIR, pp. 1049–1052, 2013.
-  I. Cantador, A. Bellogín, and D. Vallet. Content-based recommenda- tion in social tagging systems. In Proc. RecSys, pp. 237–240, 2010.
-  I. Cantador, P. Brusilovsky, and T. Kuflik. Second workshop on information heterogeneity and fusion in recommender systems (hetrec2011). In Proc. RecSys, pp. 387–388, 2011.
D. Erhan, Y. Bengio, A. Courville, P.-A. Manzagol, P. Vincent, and S. Bengio.
Why does unsupervised pre-training help deep learning?JMLR, 11:625–660, 2010.
-  G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527–1554, 2006.
-  G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, 2006.
-  A. Hotho, R. Jäschke, C. Schmitz, and G. Stumme. Information re- trieval in folksonomies: Search and ranking. In Proc. ESWC, 2006.
-  L. Hu, A. Sun, and Y. Liu. Your neighbors affect your ratings: On geographical neighborhood influence to rating prediction. In Proc. SIGIR, pages 345–354, 2014.
-  Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8), 2009.
-  H. Ma, D. Zhou, C. Liu, M. R. Lyu, and I. King. Recommender sys- tems with social regularization. In Proc. WSDM, pp. 287–296, 2011.
-  J. Manotumruksa, C. Macdonal, and I. Ounis. Regularising factorised models for venue recommendation using friends and their comments. In Proc. CIKM, pp. 1981–1984, 2016.
A. Shepitsen, J. Gemmell, B. Mobasher, and R. Burke.
Personalized recommendation in social tagging systems using hierarchical cluster- ing.In Proc. RecSys, pp. 259–266, 2008.
-  K. H. Tso-Sutter, L. B. Marinho, and L. Schmidt-Thieme. Tag-aware recommender systems by fusion of collaborative filtering algorithms. In Proc. SAC, pp. 1995–1999, 2008.
-  Z. Xu, C. Chen, T. Lukasiewicz, Y. Miao, and X. Meng. Tag-aware personalized recommendation using a deep-semantic similarity model with negative sampling. In Proc. CIKM, pp. 1921–1924, 2016.
-  Y. Zuo, J. Zeng, M. Gong, and L. Jiao. Tag-aware recommender sys- tems based on deep neural networks. Neurocomputing, 2016.