1 Introduction
The collaborative filtering problem has gained significant attention in machine learning field since the Netflix Prize. In this challenge, of the most widely used is the latent factor model which has proven to work well. To state the problem more formally, we introduce several notations, that is, we have a set of users,
, a set of items, , and the rating scores which can be viewed as a sparse matrix , where the element is the score rated by user on item . The goal of the problem is to reasonably predict the missing elements in the sparse matrix.Many methods have been designed to address that problem. Here, we mainly focus on matrix factorization [9] as it performs the stateofthearts in dyadic prediction problem. However, matrix factorization [5] fails to utilize the benefit of the implicit feedback [3], which plays an important role in recommender system. In order to provide implicit feedback to the matrix factorization, SVD++ [4] model is proposed but it takes much longer time and larger memory in the training process. Factorization Machines (FM) can be regarded as a classification or regression model combined with feature engineering. With different features, FM can mimic different factorization models like matrix factorization and specialized models such as SVD++. In this work, we propose two latent feature based FM models, both of which can get the implicit feedback of a user or an item. One is called Topicbased FM Model, and the other is Vectorbased FM Model.
Topic model is a typical statistical model in natural language processing (NLP) area and also used in machine learning area. One of the most classic models is Latent Dirichlet Allocation (LDA)
[1], which is a generative probabilistic model for collections of discrete data such as text corpora. It assumes that each document can be expressed by several topics, and each topic is generated by some words. As a result, we can express a document using a latent topic factor. Besides that, LDA can also be applied to a rating prediction problem. Consider a specific situation, where we want to predict how a user will rate a movie based on user’s watching history which, to some degree, can indicate user’s interest on this movie. Thinking of a user’s watching history as a “document” and each movie as a “word” in this “document”, we can see that a user’s interest can be similarly obtained by several latent topics, which are generated by the “words” that belong to the “document”. In other words, the user’s interests in movies can be drawn from those latent topics. Therefore, we can use the latent topics as features to train the FM model, which we called Topicbased FM model.The other FM model we proposed is Vectorbased FM Model which is built on word2vec[8]. It provides an efficient implementation of continuous bagofwords(CBOW) and skipgram architectures for computing vector representations of words[7]
. Though it is a simple neural network model, it works quite well in practice. The main goal of word2vec is to introduce techniques that can be used for learning highquality word vectors from huge data sets with billions of words, and a big vocabulary with millions of words. Similar to LDA model, word2vec model can also train a latent vector from a vocabulary constructed from the training data. The difference from LDA is that this latent vector represents the word itself instead of the document. In our problem, a user is regarded as a “document”, and the user’s watching history can be viewed as a sequence of “words”. Thus word2vec model can be used here to generate a latent vector for each item. Following the same fashion, we use those latent vectors as features to train the FM model resulting in Vectorbased FM Model.
The following of this paper is organized as follows. In Section 2, we provide more detailed description on Topicbased FM Model and Vectorbased FM Model. Section 3 shows experimental evaluation and analysis of our method on three large scale collaborative datasets, which demonstrates that our method outperforms stateoftheart latent factor approaches. Finally, we conclude in Section 4.
2 Latent Feature Based FM Model in Rating Prediction
In this section, we will introduce the two latent feature models mentioned above into rating prediction problem. These latent features may bring some implicit feedback or some latent characters of a user or an item. In the following part, we will explain how the latent features work in FM model.
2.1 Topicbased FM Model
Topicbased FM Model is similar to a previous work, model [6] , which generates latent factors from user’s history info and item’s history info. However, in model, the latent factors must be trained every once a time which is timeconsuming. Our work, on the contrary, doesn’t need to train the latent factors every time. We just need to generate the latent factors for the first time and update them when necessary, leading to a simpler algorithm. Next we will explain how topic model works in FM model and show the detailed algorithm.
First we introduce the model to make the notation clear here and below. The
model takes three steps to obtain the parameters of user and item using Gibbs Sampling. It firstly samples the hyperparameters, then samples the topics and finally the user parameters and item parameters. For more details about the three steps, you may refer to the paper
[6]. In general, the model introduced two methods to predict the missing elements in the rating matrix. One is the TIB model, the predicted score rated by user on item is obtained by the following formula,(1) 
and are the latent vectors for the user and item respectively. represents the dot product between the two vectors. is the latent topic for the corresponding user and is the weight vector for the latent topic. Similarly, and are the latent topic for item and weight vector for that latent topic. The other one is TIF model whose prediction formula is given as follows,
(2) 
where and are the topicindexed vectors for and respectively. They provide the weight for the useritemcross using the dot product.
In our formulation, we solve the problem by combining those two existing methods mentioned above. Firstly, we train the user’s latent factor based on the user’s history info. Secondly, we train the item’s latent factor using the item’s history info which tells those users who have watched this item. After we get the user’s topic and the item’s topic, we define our prediction formula based on FM as follows,
(3)  
(4) 
is the global bias, and , are the user’s and item’s bias, stands for .
Compared with model, our method is different in two ways: , besides the cross terms between user and item, we add userusercross terms and itemitemcross terms to make our formulation containing more info, thus having more accurate results; , our method can be divided into two steps, training the latent factors and putting them into FM model. Therefore, we don’t need to sample from history data for each one, which takes longer time. More details about our method is shown in Algorithm 1.
Baidu Data  10M MovieLens  Netflix Prize  

method  iter=100 iter=200 iter=300  iter=100 iter=200 iter=300  iter=100 iter=200 iter=300 
baseline  0.629178 0.628903 0.629111  0.788312 0.787922 0.787916  0.882469 0.879894 0.879355 
topic_8  0.626627 0.626536 0.626536  0.787557 0.787020 0.786987  0.871300 0.869120 0.868806 
topic_20  0.625879 0.625958 0.625944  0.787204 0.786677 0.786583  0.868443 0.866720 0.866411 
2.2 Vectorbased FM Model
In the Topicbased FM Model, it views the items that belong to the same user as a set, so it fails to exploit the order of one user’s watching history. In fact, one can easily observe that the items next to each other share some similarities to a certain extent since they indicate a user’s interest in a short period of time. Driven by this observation, we apply the word2vec model which utilize the words order in a document to a rating prediction problem where user’s watching history can be regarded as a “document” and the item in his/her watching history can be viewed as the “word” in that “document”. In this way, The latent vector of the item trained in this model represents users’ interests to some degree.
So in order to get a better result, our approach, named Vectorbased FM Model, takes advantage of that latent vector which is obtained by using Skip Gram method. The framework of Skip Gram is shown in Figure 1. We maximize the average log probability
to get the item latent vector.After that, we use the item latent vector as features in the FM model, so our prediction formula is,
(5) 
where is the latent item vector for the corresponding item.
In this formulation, we didn’t train a user’s latent vector as it can be seen that the ordered users who watched the same item don’t have any strong relevance. In addition to train the order info of the watching history, word2vec is able to handle huge data sets with a big vocabulary. So our approach, Vectorbased FM Model, can be well applied to large scale datasets.
3 Experiment
We evaluate our models on several large scale movie rating collaborative filtering datasets including the Baidu Movie Recommendation Algorithm Contest Data (Baidu Data for short)^{1}^{1}1http://openresearch.baidu.com/topic/40.jspx, the Netflix Prize dataset^{2}^{2}2http://www.netflixprize.com/ and the 10M MovieLens Datasets^{3}^{3}3http://www.grouplens.org/. The Baidu Data contains 1.26 million ratings on the scale from 1 to 5 distributed across 9,722 users and 7,889 movies. The 10M MovieLens Datasets contains 10 million ratings on the scale from 0.5 to 5 with halfstar increments with 71,567 users and 10,681 movies. The Netflix Prize dataset has 480,189 users and 17,770 items and 100 million ratings in {1,…,5}. In our experiments, we use all the three datasets to evaluate our Topicbased FM Model. And we use only 10M MovieLens dataset and Netflix Prize dataset to compare the performance of the baseline model, topic based model and vector based model as Baidu Data doesn’t contain time info. The baseline model is obtained only using user_id and item_id to train the FM model and the parameters for FM are fixed for each experiment. Finally we use RMSE to measure the model’s performance.
3.1 Experiments on Topic based FM
We first introduce the experiments on Topicbased FM Model, for which we use a offtheshelf software implementation, LibFM tool[10]. The default parameter setting is used in LibFM, where the dimension of the latent factor is 8 and the learning method is mcmc [2].
For LDA model, we also use an open source tool Gibbs LDA++^{4}^{4}4http://sourceforge.net/projects/gibbslda/ to generate the latent factor. We consider two experimental settings where the dimension of the latent factor is 8 and 20 separately, that is and . For other parameters, we set alpha, beta and iterations.
Table 1 reports the performance of Topicbased FM Model, which shows a significant improvement on RMSE. Among the three methods, Topicbased FM Model with 20 latent factors performs best on all the three datasets, especially on Baidu Data and Netflix Prize datasets. As expected, when the dimension of latent factors increases, the model will have more expressive ability on user or item, thus the performance in terms of RMSE is better, which is also demonstrated by our experiment results.
3.2 Experiments on Vector based FM
As mentioned above, the user’s latent vector is not considered in Vectorbased FM Model, and only the item’s latent vector is trained. So for a fair comparison, only the item’s topic is used in Topicbased FM Model. Then the results on both methods are compared with the baseline model. In this experimental setting, we use a publicly available implementation of word2vec^{5}^{5}5https://code.google.com/p/word2vec/. The dimension of latent vector is set to and . The time window is set to 3, which means the prior 3 items and the posterior 3 items are considered in the training process. And we use the Skip Gram method to train.
Figure 2 shows the convergence curve on 10M MovieLens dataset and Netflix Prize dataset. We can see that RMSE of both of our proposed approaches is lower than the baseline. On MovieLens dataset, our two latent feature based FM models converge slower than baseline model, but a lower RMSE level is achieved than baseline. In addition, comparable results are obtained by Topic based and Vector based FM model. On Netflix Prize dataset, we can see that both of our methods not only perform better than baseline, but also converge more quickly. The conclusion is that Vectorbased FM model performs better since it exploits the order of watching history is validated by our experiments.
4 Conclusion
In this work, we introduce topic based latent features and vector based latent features into traditional FM model resulting in two feature based models. The Topicbased FM Model provides the implicit feedback within less training time since we only need to update when necessary. The Vectorbased FM Model exploits the order info of a user’s watching history resulting in better performance. Empirical results on three datasets demonstrate that our method performs better than the baseline model and confirm that Vector based FM model usually works better as it contains the order info. For the future work, to improve the performance, we may adjust parameters and generate latent features which better express the users or items.
References
 [1] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, pages 993–1022, 2003.

[2]
G. Heinrich.
Parameter estimation for text analysis.
Technical report, 2005.  [3] Y. Hu, Y. Koren, and C. Volinsky. Collaborative filtering for implicit feedback datasets. Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, pages 263–272, December 2008.
 [4] Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pages 426–434, August 2008.
 [5] Y. Koren, R. M. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer, pages 30–37, 2009.
 [6] L. W. Mackey, D. J. Weiss, and M. I. Jordan. Mixed membership matrix factorization. Proceedings of the 27th international conference on machine learning (ICML10). 2010, pages 711–718, November 2010.
 [7] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
 [8] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, pages 3111–3119, 2013.
 [9] S. Rendle. Factorization machines. Data Mining (ICDM), 2010 IEEE 10th International Conference on. IEEE, pages 995–1000, December 2010.
 [10] S. Rendle. Factorization machines with libfm. ACM Transactions on Intelligent Systems and Technology (TIST), 2012.