1 Introduction
In the Web 2.0, social tagging systems are introduced by many websites, where users can freely annotate online items using arbitrary tags (commonly known as folksonomy [9]). Since social tags are good summaries of the relevant items and the users’ preferences, and since they also contain little sensitive information about their creators, they are valuable information for privacyenhanced personalized recommendation. Consequently, many efforts have been put on tagaware personalized recommendation using contentbased filtering [4, 14, 16] or collaborative filtering [3, 12, 13, 15].
However, as users can freely choose their own vocabulary, social tags may contain many uncontrolled vocabularies. This usually results in sparse, redundant, and ambiguous tag information, and significantly weakens the performance of contentbased recommendation systems. The common solution is to apply machine learning techniques, e.g., clustering
[14]or autoencoders
[17], to learn more abstract and representative features from raw tags. Recently, Xu et al. [16]propose a deepsemantic model called DSPR which utilizes deep neural networks to model abstract and recommendationoriented representations for social tags. DSPR is reported to achieve better performance than the clustering and autoencoder solutions.
Matrix factorization is a collaborativefilteringbased solution, which has become a dominant solution for personalized recommendation on the Social Web [3, 12, 13] and has been reported to be superior to memorybased techniques [11]. However, there exists a cold start problem in matrix factorization: many users only give very few ratings, resulting in a very sparse useritem rating matrix, and making it difficult to summarize users’ preferences. A widely adopted solution is to incorporate additional sources of information about users, e.g., implicit feedback [11], social friendship [12], geographical neighborhood [10], or textual comments [13]. We call these upgraded models additionalinformationbased matrix factorization (AMF) models.
Although the existing deepsemantic model, DSPR, and the upgraded matrix factorization models, AMF, have progressively improved the tagaware personalized recommendation, there are still a few drawbacks: (i) DSPR does not utilize the idea of collaborative filtering; hence, the valuable correlation information between users and items is not being used to help recommendation. (ii) As a deep neural model, DSPR stacks many layers, which makes it difficult to optimize the model by gradient backpropagation. (iii) The existing AMF models generally incorporate the additional information as a regularization term of matrix factorization; this term’s coefficient, as proved in [12], has to be very small; therefore, the additional information has very limited contribution on the optimizing gradient, resulting in only “marginal” improvements on the recommendation performance. (iv) The recommendation results of the existing AMF models are difficult to interpret, because latent factor matrices are used to represent users and items.
Consequently, to solve the above problems and to further improve the performance of tagaware personalized recommendation, we propose a hybrid deepsemantic matrix factorization (HDMF) model, which integrates the techniques of deepsemantic modeling, hybrid learning, and matrix factorization. Generally, HDMF uses a tagbased user matrix and a tagbased item matrix as respective inputs of two deep autoencoders to generate deepsemantic user and item matrices at the code layers, and also reconstructed user and item matrices at the output layers. The deep model is then trained by using a hybrid learning signal to minimize both reconstruction errors and deepsemantic matrix factorization errors, i.e., the squared differences between the useritem rating matrix (seeing tags as positive ratings) and the dot product of deepsemantic user and item matrices (seeing deepsemantic matrices as the decomposed matrices in matrix factorization). The intuitions of using the hybrid learning signal are: (i) minimizing reconstruction errors can learn better representations for both users and items; (ii) deepsemantic matrix factorization offers a learning signal that connects users and items to discover the underlying users’ preferences; (iii) two signals can complement each other to provide sufficient gradients for better model optimization and escaping the local minima.
HDMF thus has the following advantages. (i) It overcomes the drawback of DSPR by adding collaborativebased capabilities to the deepsemantic model. (ii) The hybrid learning signal helps HDMF to better optimize the model and escape local minima. (iii) Differently from AMF models, the additional tag information in HDMF is directly used to model the decomposed user and item matrices in matrix factorization; this thus maximizes the effect of the additional tag information on the model optimization. (iv) HDMF remedies the noninterpretability problem in matrix factorization: considering deepsemantic matrices as the decomposed matrices and finding the most influential input tags for each dimension, the decomposed user and item matrices in HDMF become interpretable.
The main contributions of this paper are briefly as follows:

We briefly analyze the stateoftheart personalized recommendation models that use contentbased filtering or matrix factorization and identify their existing problems.

We innovatively propose a hybrid deepsemantic matrix factorization (HDMF) model to tackle these problems and to further improve the performance of tagaware personalized recommendation, by integrating the techniques of deepsemantic modeling, hybrid learning, and matrix factorization.

Experimental results show that HDMF significantly outperforms the stateoftheart baselines in tagaware personalized recommendation, in terms of all evaluation metrics, e.g., its mean reciprocal rank (resp., mean average precision) is (resp., ) times as high as that of the best baseline.
2 Preliminaries
A folksonomy is a tuple , where , , and are sets of users, tags, and items, respectively, and is a set of assignments of tag to item by user [9].
A tagbased user profile
is a feature vector
, where is the tag vocabulary’s size, and is the number of times that user annotates items with tag ; the tagbased user matrix is thus defined as , where is the profile vector of the th user, and is the total number of users. Similarly, a tagbased item profile is a vector , where is the number of times that item is annotated with tag ; while the tagbased item matrix is defined as , where is the profile vector of the th item, and is the total number of items.The useritem rating matrix is , where is the number of tags annotated by user to item . Given , traditional matrixfactorizationbased recommender systems aim to approximate using the decomposed latent matrices of users and items, i.e., and , respectively, which are optimized by minimizing the squared differences between and on a set of observed ratings; formally,
(1) 
where is , if user annotated item , and , otherwise [13]. After optimization learning, the predicted useritem rating matrix is used for personalized recommendation.
3 Hybrid Deepsemantic Matrix
Factorization
To alleviate the cold start problem in traditional matrix factorization, a widely adopted solution is to incorporate additional sources of information about users to achieve additionalinformationbased matrix factorization (AMF) [10, 11, 12, 13]. However, as analyzed in Section 1 and demonstrated by both our experimental results and the results reported in [13], the existing AMF models achieve only “marginal” (around in [13]) improvements on the performance of personalized recommendation. Therefore, inspired by the recent development of deepsemantic modeling [16], we propose a hybrid deepsemantic matrix factorization (HDMF) model to tackle these problems and to further enhance the performance of tagaware personalized recommendation, by integrating the techniques of deepsemantic modeling, hybrid learning, and matrix factorization.
Figure 1 shows an overview of the HDMF model. Generally, HDMF takes the tagbased user and item matrices and (defined in Section 2) as inputs of two deep autoencoders, consisting of encoders and decoders. These inputs are then passed through multiple hidden layers and projected to the deepsemantic user and item matrices and at the code layers, and to the reconstructed user and item matrices and at the output layers. The HDMF model is then trained by using a hybrid learning signal to minimize both deepsemantic matrix factorization errors and reconstruction errors. Finally, a predicted useritem rating matrix is used for personalized recommendation.
3.1 DeepSemantic Matrix Factorization
Deepsemantic matrix factorization is solely based on the encoder parts of the deep autoencoders, which can be seen as multilayer perception networks. Formally, given the tagbased user and item matrices and , a weight matrix
, and a bias vector
, the intermediate outputs of the first hidden layers in the encoders are defined as follows:(2) 
where
is used as the activation function. Similarly, the intermediate outputs of the
th hidden layers , , in the encoders are defined as follows:(3)  
(4) 
where and are the weight matrix and the bias vector for the th hidden layers in the encoders, respectively, and is the total number of hidden layers in each encoder.
Then, the outputs of the th hidden layers, i.e., the code layers, are the deepsemantic user and item matrices, denoted and , respectively. Formally,
(5) 
Consequently, by seeing the deepsemantic matrices and as the decomposed user and item matrices in matrix factorization, the parameters and can be optimized by minimizing the following deepsemantic matrix factorization errors:
(6) 
where is an element in the useritem rating matrix , indicating the number of tags assigned by user to item ; (resp., ) is the vector at the th (resp., th) column of (resp., ), which is the deepsemantic representation of the th user (resp., th item); the second term is a regularization term used to prevent overfitting, and is the regularization parameter.
3.2 Hybrid Learning Signal
However, it is difficult to train the model using solely the learning signal from deepsemantic matrix factorization. This is because the model stacks many layers of nonlinearities, and when learning signals are backpropagated to the first few layers, they become minuscule and insignificant to learn good representations for the users and items, which in turn results in poor local minima. A common solution is to first pretrain each layer using restricted Boltzmann machines (RBMs)
[7, 8] or autoencoders [1] and then use backpropagation to finetune the entire deep neural network [6].Therefore, in this work, we directly incorporate autoencoders into the deepsemantic matrix factorization model, and train the deep model using a hybrid learning signal that integrates reconstruction errors of autoencoders with the deepsemantic matrix factorization errors. We thus call this model hybrid deepsemantic matrix factorization (HDMF). The intuition behind it is as follows: (i) the reconstructionerrorbased signal can learn better representations for both users and items; (ii) the collaborative learning signal from deepsemantic matrix factorization can connect users and items to discover the underlying users’ preferences; (iii) furthermore, the reconstructionerrorbased signal can complement deepsemantic matrix factorization to provide sufficient gradients for better optimizing the model and escaping the local minima.
Specifically, as shown in Figure 1, we adopt autoencoders with tied weights in HDMF, i.e., the weight matrices in the decoder are the transposes of weight matrices in the encoder. The decoders take the deepsemantic user and item matrices and at the code layer as the inputs and generate reconstructed user and item matrices and at their output layers. Then, reconstruction errors are computed based on the squared differences between the original tagbased matrices ( and ) and the reconstructed matrices ( and ). Finally, the reconstructionerrorbased learning signal will be used to first update , then backpropagated to update , , and so on. As updating is equivalent to updating , this signal complements deepsemantic matrix factorization and offers sufficient gradients to the first few layers of the deep model.
Formally, the intermediate outputs of the th hidden layers , , in the decoders are defined as:
(7)  
(8) 
where is the transpose of , and is the bias vector for the th hidden layer. The outputs of the th hidden layers are used to generate reconstructed user and item profiles, denoted and , at the output layers:
(9)  
(10) 
Then, the reconstruction errors of the user (resp., item) matrix are computed as the sum of the Euclidean (i.e., L) norms of the differences between the tagbased user (resp., item) profile (resp., ) in (resp., ) and the reconstructed user (resp., item) profile (resp., ) in (resp., ). By integrating the reconstruction errors with the deepsemantic matrix factorizations errors (as defined in Equation 3.1), the HDMF model is thus trained by minimizing the following hybrid learning signal:
(11) 
4 Experiments
We have conducted extensive experimental studies and compared our proposed hybrid deepsemantic matrix factorization (HDMF) model with a number of stateoftheart baselines, which are grouped into two categories and summarized as follows:
Contentbased tagaware models. Four stateoftheart models that utilize social tags as the content information to conduct tagaware personalized recommendation are selected as the baselines. Similarly to HDMF, they all apply machine learning techniques to model abstract and effective representations for users or/and items; i.e., the clusteringbased models, CCS and CCF [14], the autoencoderbased model, ACF [17], and the deepsemantic similaritybased model, DSPR [16].
Matrixfactorizationbased models. Three matrixfactorizationbased recommendation models are also selected as the baselines; i.e., the traditional matrix factorization model, MF, and the additionalinformationbased matrix factorization (AMF) models, MF [12] and MF [13], which incorporate, respectively, the social friendships and the textual comments of users as the additional sources of information for matrix factorization.
Users ()  Tags ()  Items ()  Assignments (()) 

1 843  3 508  65 877  339 744 
Models  
CCF  
ACF  
CCS  
DSPR  
MF  
MF  
MF  
HDMF  18.20  15.96  13.61  11.37  5.510  13.05  21.13  28.70  8.458  14.36  16.56  16.29  11.50  3.870 
To ensure a fair comparison, the experiments are performed on the same realworld socialtagging dataset as used in [16, 17], which is gathered from the Delicious bookmarking system and released in HetRec 2011 [5]. After using the same preprocessing to remove the infrequent tags that are used less than times, the resulting dataset is as shown in Table 1. We randomly select of assignments as training set, as validation set, and as test set.
All models are implemented using Python and Theano and run on a GPU server with an NVIDIA Tesla K
GPU and GB GPU memory. The parameters of HDMF are selected by grid search and the values are set as follows: (i) of hidden layers is ; (ii)of neurons from
st to th hidden layer are , , , , and , respectively; (iii) the parameters and are set to and ; (iv) the learning rate for model training is .In training, we first initialize the weight matrices
, using the random normal distribution, and initialize the biases
to be zero vectors; the model is then trained by backpropagation using stochastic gradient descent; finally, the training stops when the model converges or reaches the maximum training runs. We also use the validation set to avoid overfitting by early stopping.
As for the evaluation of recommendation systems, the most popular metrics are precision, recall, and Fscore [2]. Since users usually only browse the topmost recommended items, we apply these metrics at a given cutoff rank , i.e., considering only the top results on the recommendation list, called precision at (), recall at (), and Fscore at (). In addition, since users always prefer to have their target items ranked in the front of the recommendation list, we also employ as evaluation metrics the mean average precision (MAP) and the mean reciprocal rank (MRR), which take into account the order of items and give greater importance to the ones ranked higher.
4.1 Results
Table 2 depicts in detail the tagaware personalized recommendation performances of our proposed HDMF and seven baselines on the Delicious dataset, in terms of , , , MAP, and MRR, where four cutoff ranks , , , and are selected.
In general, the relative performances of the baselines reported in Table 2 are highly consistent with the results reported in [17], [16], and [13]; namely, (i) ACF outperforms CCF, (ii) DSPR outperforms CCF, ACF, and CCS, and (iii) MF and MF “slightly” outperform MF, respectively. More importantly, we note that our proposed model, HDMF, significantly outperforms all seven baselines in all metrics; e.g., the MRR (resp., MAP) of HDMF are (resp., ) times as high as that of the best baseline, DSPR (resp., MF), while the relative performances in , , and are also similar. This finding strongly proves that by integrating the techniques of deepsemantic modeling, hybrid learning, and matrix factorization, HDMF overcomes the existing problems (as presented in Section 1) of the stateoftheart recommendation models and achieves very superior performance in tagaware personalized recommendation.
Specifically, as shown in Table 2, the MRR and MAP of HDMF are and times, respectively, as high as those of thestateofart deepsemantic model, DSPR. In addition, the relative improvements of HDMF to DSPR, in terms of , , and , all gradually enhance with the rise of the cutoff rank , i.e., increasing from around times at to more than double at . This observation demonstrates that incorporating collaborativebased capabilities (i.e., using correlation information between users and items to help the recommendation) can greatly enhance the deepsemantic model’s performance in tagaware recommendation, especially for the one with relative long recommendation lists.
Furthermore, by comparing the results of the matrixfactorizationbased models, MF, MF, and MF, in Table 2, we find that the AMF models, MF and MF, have close performances; and, more importantly, their relative improvements to MF are “marginal”, e.g., their MAP and MRR are only and , respectively, better than those of MF. This finding is actually consistent with the results in [13], where the improvement rates of MF and MF to MF are only and , respectively. The reason for these “marginal” enhancements is as follows: the AMF models incorporate the additional source of information as a regularization term with a small coefficient in matrix factorization, which greatly limits the additional information’s contribution on the optimizing gradient and thus limits their capabilities in improving the recommendation performance. By contrast, as shown in Table 2, HDMF dramatically outperforms MF: the MAP and MRR of HDMF are about and , respectively, better than those of MF. This is mainly because that the additional social tag information in HDMF is utilized to model the deepsemantic user and item matrices, which are then used directly as the decomposed user and item matrices in matrix factorization; since the decomposed matrices have dominant contribution on the optimizing gradient, HDMF maximizes the effect of the additional social tag information on the model optimization, making it possible to achieve significant improvements.
5 Summary and Outlook
In this paper, we have briefly analyzed the stateoftheart tagaware personalized recommendation models that use contentbased filtering or matrix factorization, and identified their existing problems. We thus have proposed a hybrid deepsemantic matrix factorization (HDMF) model to tackle these problems and to further enhance the performance of tagaware personalized recommendation. We have also conducted extensive experimental studies and compared HDMF with seven stateoftheart baselines; the results show that, by integrating the techniques of deepsemantic modeling, hybrid learning, and matrix factorization, HDMF greatly outperforms the stateoftheart baselines in tagaware personalized recommendation, in terms of all evaluation metrics.
In the future, further experiments will be conducted to compare the performances of HDMF on different kinds of Social Web datasets, e.g., Last.fm and MovieLens. Moreover, we will also investigate methodologies to add spatial and temporal information into the HDMF model to capture the users’ realtime preferences.
References
 [1] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer wise training of deep networks. In Proc. NIPS, pp. 153–160, 2007.
 [2] J. Bobadilla, F. Ortega, A. Hernando, and A. Gutiérrez. Recommen der systems survey. Knowl.Based Syst., 46:109–132, 2013.
 [3] M. R. Bouadjenek, H. Hacid, M. Bouzeghoub, and A. Vakali. Using social annotations to enhance document representation for personalized search. In Proc. SIGIR, pp. 1049–1052, 2013.
 [4] I. Cantador, A. Bellogín, and D. Vallet. Contentbased recommenda tion in social tagging systems. In Proc. RecSys, pp. 237–240, 2010.
 [5] I. Cantador, P. Brusilovsky, and T. Kuflik. Second workshop on information heterogeneity and fusion in recommender systems (hetrec2011). In Proc. RecSys, pp. 387–388, 2011.

[6]
D. Erhan, Y. Bengio, A. Courville, P.A. Manzagol, P. Vincent, and S. Bengio.
Why does unsupervised pretraining help deep learning?
JMLR, 11:625–660, 2010.  [7] G. E. Hinton, S. Osindero, and Y.W. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527–1554, 2006.
 [8] G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, 2006.
 [9] A. Hotho, R. Jäschke, C. Schmitz, and G. Stumme. Information re trieval in folksonomies: Search and ranking. In Proc. ESWC, 2006.
 [10] L. Hu, A. Sun, and Y. Liu. Your neighbors affect your ratings: On geographical neighborhood influence to rating prediction. In Proc. SIGIR, pages 345–354, 2014.
 [11] Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8), 2009.
 [12] H. Ma, D. Zhou, C. Liu, M. R. Lyu, and I. King. Recommender sys tems with social regularization. In Proc. WSDM, pp. 287–296, 2011.
 [13] J. Manotumruksa, C. Macdonal, and I. Ounis. Regularising factorised models for venue recommendation using friends and their comments. In Proc. CIKM, pp. 1981–1984, 2016.

[14]
A. Shepitsen, J. Gemmell, B. Mobasher, and R. Burke.
Personalized recommendation in social tagging systems using hierarchical cluster ing.
In Proc. RecSys, pp. 259–266, 2008.  [15] K. H. TsoSutter, L. B. Marinho, and L. SchmidtThieme. Tagaware recommender systems by fusion of collaborative filtering algorithms. In Proc. SAC, pp. 1995–1999, 2008.
 [16] Z. Xu, C. Chen, T. Lukasiewicz, Y. Miao, and X. Meng. Tagaware personalized recommendation using a deepsemantic similarity model with negative sampling. In Proc. CIKM, pp. 1921–1924, 2016.
 [17] Y. Zuo, J. Zeng, M. Gong, and L. Jiao. Tagaware recommender sys tems based on deep neural networks. Neurocomputing, 2016.
Comments
There are no comments yet.