A Hybrid Approach to Enhance Pure Collaborative Filtering based on Content Feature Relationship

Recommendation systems get expanding significance because of their applications in both the scholarly community and industry. With the development of additional data sources and methods of extracting new information other than the rating history of clients on items, hybrid recommendation algorithms, in which some methods have usually been combined to improve performance, have become pervasive. In this work, we first introduce a novel method to extract the implicit relationship between content features using a sort of well-known methods from the natural language processing domain, namely Word2Vec. In contrast to the typical use of Word2Vec, we utilize some features of items as words of sentences to produce neural feature embeddings, through which we can calculate the similarity between features. Next, we propose a novel content-based recommendation system that employs the relationship to determine vector representations for items by which the similarity between items can be computed (RELFsim). Our evaluation results demonstrate that it can predict the preference a user would have for a set of items as good as pure collaborative filtering. This content-based algorithm is also embedded in a pure item-based collaborative filtering algorithm to deal with the cold-start problem and enhance its accuracy. Our experiments on a benchmark movie dataset corroborate that the proposed approach improves the accuracy of the system.



There are no comments yet.


page 1


Wasserstein Collaborative Filtering for Item Cold-start Recommendation

The item cold-start problem seriously limits the recommendation performa...

Heterogeneous Collaborative Filtering

Recommendation system is important to a content sharing/creating social ...

A Deep Hybrid Model for Recommendation Systems

Recommendation has been a long-standing problem in many areas ranging fr...

Application of Statistical Relational Learning to Hybrid Recommendation Systems

Recommendation systems usually involve exploiting the relations among kn...

A Hybrid Recommendation Method Based on Feature for Offline Book Personalization

Recommendation system has been widely used in different areas. Collabora...

Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems

In many real-world applications, e.g. recommendation systems, certain it...

The Deep Journey from Content to Collaborative Filtering

In Recommender Systems research, algorithms are often characterized as e...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

There is no doubt that nowadays, the Internet plays a key role in people’s lives, and too much information has become available on the Internet because of which the users have difficulty in finding and choosing the appropriate items among many collections. Therefore, companies and system owners have deployed sophisticated algorithms to provide their customers with recommendation systems (RSs) which help them cope with information overload problems [25, 6] .Various studies have been going on around this area of RSs over the last few decades. However, the importance of RSs still remains high due to their applications in different domains such as traveling, news, scientific articles, or advertising[7, 26]

. Basically, RSs try to realize the taste of the users according to some available data such as users’ ratings on items, purchase history of users, or contextual information on the users or items, and predict the preference of a user for items which have not been seen by the user. Then taking these predictions into account, the most relevant items are suggested to the user. This way, the user is provided with a small proportion of items that are well suited to the user’s taste. A necessary set of data to make personalized recommendations is a kind of user feedback on the items which can be explicit like the ratings on the items or implicit like the time a user spends watching details of an item. In addition, other information such as contextual information of items or users can be useful to have a better RS. Nevertheless, it is not easy to obtain contextual information in most cases. RSs can be classified into three broad groups: content-based, CF and Hybrid systems. Content-based approaches focus on the properties of items. Indeed, the content of items which are visited by the user is used to recommend other items which have similar content

[13]. There is a number of reasons why content-based approaches should be used such as not being available or accessible data of other users. CF approaches focus on the user-item interactions. In other words, the ratings on the items as a rating matrix are used to identify what items a user is interested in based on like-minded users [9, 23]. Collaborative filtering (CF) and content-based, two of the prime and well-known approaches, work based on user-item interactions (ratings on items) and contextual information respectively [2]. Both approaches have their own advantages and disadvantages [22, 2] as a result of which hybrid approaches, which are generally created by combining other methods, have become popular which attempt to enjoy advantages of both aforementioned approaches and overcome their drawbacks [3, 2]. For instance, CF algorithms have been successfully used in most situations because they can work with only interaction data regardless of the unavailability of contextual information, but they have some issues one of the most important of which is cold-start which indicates that the amount of ratings for new items or users is not adequate to prepare appropriate recommendations [24]. Word embedding methods - a set of techniques in natural language processing (NLP) - can also be used to map words from text information of items to vectors of real numbers as word representations by which identifying similar textual information of items is possible [19]

. Word2Vec is a set of related models architectures used to compute vector representations of words using two-layer neural networks

[16, 15]. One of the model architectures is the skip-gram model which is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships according to what Mikolov et al. introduced on their paper [16, 15]. In this paper, we present a new concept of the relationship between content features and propose a novel content-based algorithm based on the relationship, and then we use the proposed content-based to design a hybrid RS. The hybrid one aims to deal with the cold-start problem in a pure CF RS to improve the accuracy of the system.

Ii Related Work

This section presents a brief review of literature which is related to dealing with the cold-start problem in CF using content-based methods based on new ways of computing content similarity. Some hybrid methods have been proposed which employ different information sources to enhance CF algorithms or employ word embedding techniques to gain advantages from textual information. Melville et al. put forward content-boosted CF which exploits textual information of movies (e.g. title, cast, etc.) as features to enhance the rating matrix with which CF can work better [14]. Mobasher et al. presented SimComb which brings structured semantic knowledge about items (ontologies) into play to cope with cases in which little or no rating is available for new items as a result of which the accuracy is improved [17]

. Gunawardana and Meek suggested unified Boltzmann machines (probabilistic models) which encode collaborative and content information as features to learn weights that reflect the importance of different pairwise interactions

[4]. Lin et al. described a method that considers the nascent information culled from Twitter to provide relevant recommendations in cold-start situations [12]. Kouki et al. performed HyPER (a probabilistic model) which is a general hybrid framework, and it is able to combine multiple information types from different sources and modeling techniques into a single unified model to enhance the performance of the RS [10]. Aslanian et al. introduced hybrid RS algorithms based on content feature relationship which is extracted from the rating matrix using a mathematical formulation [1]. Wei et al. proposed two models which extract content features of the items using deep neural networks which are taken into the prediction of ratings for the cold-start items [27]. Nilashi et al. developed a new hybrid RS using combinations of dimensionality reduction and ontology techniques to find the most similar items and users in order to solve sparsity and scalability problems [20]. Musto et al. employed word embedding techniques to learn a low- dimensional vector space word representation from textual information of Wikipedia to represent both items and user profiles in a content-based recommendation [18]. Musto et al. developed a content-based recommendation framework using semantic vectors for movies that are extracted from Wikipedia using word embedding techniques [19]. Ozsoy applied Word2Vec to RSs domain to capture the correlation between venues and to recommend new venues to users [21].

Iii Overview of the Techniques

Iii-a Pure item-based collaborative filtering

Item-based CF focuses on the similarity of the user ratings for two items [11]

. Indeed, two items are similar if users, who have rated both, have given both items similar ratings. To compute the similarity between items we use cosine similarity as a similarity measure which is defined bellow



where is a subset of users who rated both items (i,j) , R is the rating matrix in which columns represent items, rows represent users and represents the rating that user u given to item i. In addition, we define as a set of items, as a set of users.To predict , first the similarities between item i and other items are measured, and then k items that have the highest similarity with item i are selected as the neighborhood. Finally, the value of is computed from weighted average of the neighbors’ ratings as follow:


where is the average rating given to item .

Iii-B Word2Vec

Word2Vec is a set of related models architectures used to compete vector representations of words using two-layer neural networks[16, 15]

. Briefly, there are two model architectures for learning distributed representations of words, namely continuous bag-of-words (CBOW) and continuous skip-gram

[15]. CBOW uses continuous distributed representation of the context to predict the currunt word, the second architecture is similar to CBOW, but tries to predict the context (words around the currunt one) using the currunt word [15] (Figure 1). The input of Word2Vec is a set of sentences, each of which is a sequence of words, and Skip-gram tries to capture the sequential nature of the sentences by considering some words surround each word. After training the model, the hidden layer of the model is a set of vectors, which are vector representations of the input words that are learned. According to what the network does, two dissimilar words that have similar contexts in the sentences provided with close vectors. Therefore, one can compute the similarity between two words by computing the cosine similarity between their vectors.

Fig. 1: The CBOW architecture predicts the current word based on the context, and the Skip-gram predicts surrounding words given the current word[15].

Iv Proposed methods based on content feature relationship

Iv-a Extracting the content feature relationship

First, we opt for the most effective features and list them in order of importance. As the dataset used here is a movie dataset, we make a list of directors, screenwriters and the first twelve members of the cast in order of appearance for each movie. We consider this list of a movie to be a sentence for that movie (Figure 2). Then we use a list of the sentences as the input of the skip-gram model. After the training step, we have the vector representations for directors, screenwriters, and actors.

Fig. 2: An example of the sequence of words in the sentence for a movie: ’word 1 : the director’ + ’word 2 : the screenwriter’ + ’word 3 : the first actor’ + ’word 3 : the second actor’ + … .

Our justification for the approach is that two actors who have played several roles in similar contexts with the same actors, directors or screenwriters, are similar. This relationship is what can be learned using skip-gram architecture. Moreover, the closer the positions of two actors, who play roles in the same movie, are in the sentence of the movie, the more similar their vectors will be. Finally, we can compute the cosine similarity between vector representations of each feature to measure the relationship between them The hyperparameters of the skip-gram architecture have effects on the accuracy of the RS. We set ”Window Size”, ”Vector dimension”, ”Negative”, ”Min-count” and ”epoch” 8, 150, 25, 1 and 20 respectively.

Iv-B A content-based predictor using the relationship

To exploit the extracted content feature relationship, there are several ways, but we simply take the average of word vectors of each movie to obtain a vector representation for that movie. Averaging the embeddings of words in a sentence tested as a successful and efficient way of obtaining sentence embeddings[8]. In this way, we can compute the cosine similarity between movies as a similarity considering the relationship - called RELFsim. Now we implement a content-based predictor using a pure CF algorithm. In fact, we exploit RELFsim between movies instead of computing the cosine similarity between the ratings of movies in a CF algorithm. Thus, we have a content-based predictor that does not have any problem with new items. In comparison with pure content-based filtering, this one can perform a similarity between two movies that have no actors in common, but their actors have appeared in the same movies in the past.

Iv-C Combining the content-based predictor with pure collaborative filtering to deal with the cold-start problem

As our proposed content-based algorithm employs the pure CF algorithm to predict the ratings, we can combine them easily. Whenever new items are added or few ratings are submitted for some items, the CF algorithm is not able to predict ratings for those items, and pure CF cannot recommend these kinds of items. Therefore, when CF tries to predict the rating of item i for user u, in the step that similarity between item i and other items, which are rated by user u, should be computed, we check whether there are enough ratings for items or not. If there were no rating or few ratings, it would bring RELFsim into play. As a result, the RS can predict the ratings for items with few ratings or no ratings as well as for others.

V Experiments

V-a Dataset

Our experiments have been performed on MovieLens 1M Dataset, a stable benchmark dataset, contains 1,000,209 anonymous ratings of approximately 3,900 movies made by 6,040 MovieLens users who joined MovieLens in 2000 [5].To have cast and crew for movies, we use TMDb information and use the Links file of MovieLens, which contains the IDs of movies on other sources, to merge them. There were a few duplicated rows and invalid values for IDs filed and some movies without information that cleaned. After cleaning the data, 995138 ratings of 3746 movies remained. We use only directors, screenwriters and first twelve actors of each movie if they exist. In the model 22669 unique features used that means 22669 vectors created.

V-B The Content-based predictor and the Hybrid approach

In this section, we compare the accuracy of the proposed content-based predictor, the proposed Hybrid RS, and pure CF. To measure the accuracy of the prediction task, we use two well-known metrics Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). They show how close RSs predict the ratings for users and items in the test set. Once we used five-fold cross-validation to cover all the data as a test set and as the number of selected neighbors. Then to evaluate the effectiveness of the value of K, we divided the data into two Trainset and Testset that are 80% and 20% of the data respectively and ran the algorithms with the diverse values of k. We did not modify the data to form cold-start items and there are a few items in the cold-start situation. As it is obvious, the more cold-start items were there, the more hybrid CF would enhance the accuracy of the pure CF. As it is seen in figure 3, the proposed content-based works as good as the pure CF and the combination of them can enhance the accuracy and be effective to deal with the cold-start problem.

Fig. 3:

Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) using three methods include the proposed content-based predictor (RELFsim CB), the proposed hybrid RS (Hybrid RELFsim and the pure CF (Pure Knn CF).

Figures 4 and 5 show the influence of parameter k on RMSE and MAE based on our practical experience. From the empirical evidence, it seems that pure CF works very well with sufficient data, but suffers from a lack of enough data in cold-start condition and the combination of pure CF and RELFsim CB can alleviate the problem and result in a better prediction error.

Fig. 4: The effect of parameter k on RMSE for three methods including the proposed content-based predictor (RELFsim CB), the proposed hybrid RS (Hybrid RELFsim and the pure CF (Pure Knn CF).

Fig. 5: The effect of parameter k on MAE for three methods including the proposed content-based predictor (RELFsim CB), the proposed hybrid RS (Hybrid RELFsim and the pure CF (Pure Knn CF).

Vi Conclusion

In this paper, we proposed an innovative learning-based approach to learn and extract the content feature relationship according to the history of emerging features in similar contexts. This relationship was learned using Word2Vec architecture particularly the skip-gram model [15] that typically used to produce word embeddings. In this approach, we utilized the relationship to propose a content-based predictor method based on this relationship of content features that works as good as pure CF method and combined it with a pure item-based CF method to advance a hybrid RS that could deal with the cold-start problem to some extent. One of the benefits of this method is the capability of applying the least content features to have a comparable content-based capture some relationship between two items that have even no feature in common. Our experiments attest to the improvement of the accuracy of pure CF. As future work, we will work on complex methods of exploiting the relationship to produce vectors for items because in this work we adopted a simple way to create vectors for items taking the average of their features vectors to realize that the relationship could be useful. Besides, we had an uncomplicated mixture of two algorithms to perform a hybrid RS. However, there are various methods can be involved to create hybrid RSs like strategies in content-boosted CF [14] and semantically enhanced CF [17] on which we are going to study.


The authors would like to acknowledge the financial support from the University of Tehran for this research.


  • [1] E. Aslanian, M. Radmanesh, and M. Jalili (2016) Hybrid recommender systems based on content feature relationship. IEEE Transactions on Industrial Informatics. Cited by: §II.
  • [2] M. Balabanović and Y. Shoham (1997) Fab: content-based, collaborative recommendation. Communications of the ACM 40 (3), pp. 66–72. Cited by: §I.
  • [3] R. Burke (2002) Hybrid recommender systems: survey and experiments. User modeling and user-adapted interaction 12 (4), pp. 331–370. Cited by: §I.
  • [4] A. Gunawardana, C. Meek, et al. (2009) A unified approach to building hybrid recommender systems.. RecSys 9, pp. 117–124. Cited by: §II.
  • [5] F. M. Harper and J. A. Konstan (2015-12) The movielens datasets: history and context. ACM Trans. Interact. Intell. Syst. 5 (4), pp. 19:1–19:19. External Links: ISSN 2160-6455, Link, Document Cited by: §V-A.
  • [6] H. Hwangbo, Y. S. Kim, and K. J. Cha (2018) Recommendation system development for fashion retail e-commerce. Electronic Commerce Research and Applications 28, pp. 94–101. Cited by: §I.
  • [7] M. Karimi, D. Jannach, and M. Jugovac (2018) News recommender systems–survey and roads ahead. Information Processing & Management 54 (6), pp. 1203–1227. Cited by: §I.
  • [8] T. Kenter, A. Borisov, and M. de Rijke (2016-08) Siamese CBOW: optimizing word embeddings for sentence representations. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, pp. 941–951. External Links: Link, Document Cited by: §IV-B.
  • [9] J. A. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker, L. R. Gordon, and J. Riedl (1997) GroupLens: applying collaborative filtering to usenet news. Communications of the ACM 40 (3), pp. 77–87. Cited by: §I.
  • [10] P. Kouki, S. Fakhraei, J. Foulds, M. Eirinaki, and L. Getoor (2015) Hyper: a flexible and extensible probabilistic framework for hybrid recommender systems. In Proceedings of the 9th ACM Conference on Recommender Systems, pp. 99–106. Cited by: §II.
  • [11] J. Leskovec, A. Rajaraman, and J. D. Ullman (2014) Mining of massive datasets. Cambridge university press. Cited by: §III-A.
  • [12] J. Lin, K. Sugiyama, M. Kan, and T. Chua (2013) Addressing cold-start in app recommendation: latent user models constructed from twitter followers. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pp. 283–292. Cited by: §II.
  • [13] P. Lops, M. De Gemmis, and G. Semeraro (2011) Content-based recommender systems: state of the art and trends. In Recommender systems handbook, pp. 73–105. Cited by: §I.
  • [14] P. Melville, R. J. Mooney, and R. Nagarajan (2002) Content-boosted collaborative filtering for improved recommendations. Aaai/iaai 23, pp. 187–192. Cited by: §II, §VI.
  • [15] T. Mikolov, K. Chen, G. Corrado, and J. Dean (2013)

    Efficient estimation of word representations in vector space

    arXiv preprint arXiv:1301.3781. Cited by: §I, Fig. 1, §III-B, §VI.
  • [16] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean (2013) Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pp. 3111–3119. Cited by: §I, §III-B.
  • [17] B. Mobasher, X. Jin, and Y. Zhou (2003) Semantically enhanced collaborative filtering on the web. In European Web Mining Forum, pp. 57–76. Cited by: §II, §VI.
  • [18] C. Musto, G. Semeraro, M. De Gemmis, and P. Lops (2015) Word embedding techniques for content-based recommender systems: an empirical evaluation.. In RecSys Posters, Cited by: §II.
  • [19] C. Musto, G. Semeraro, M. de Gemmis, and P. Lops (2016) Learning word embeddings from wikipedia for content-based recommender systems. In European Conference on Information Retrieval, pp. 729–734. Cited by: §I, §II.
  • [20] M. Nilashi, O. Ibrahim, and K. Bagherifard (2018) A recommender system based on collaborative filtering using ontology and dimensionality reduction techniques. Expert Systems with Applications 92, pp. 507–520. Cited by: §II.
  • [21] M. G. Ozsoy (2016) From word embeddings to item recommendation. arXiv preprint arXiv:1601.01356. Cited by: §II.
  • [22] M. J. Pazzani and D. Billsus (2007) Content-based recommendation systems. In The adaptive web, pp. 325–341. Cited by: §I.
  • [23] J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen (2007) Collaborative filtering recommender systems. In The adaptive web, pp. 291–324. Cited by: §I.
  • [24] A. I. Schein, A. Popescul, L. H. Ungar, and D. M. Pennock (2002) Methods and metrics for cold-start recommendations. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 253–260. Cited by: §I.
  • [25] X. Shi, X. Luo, M. Shang, and L. Gu (2017) Long-term performance of collaborative filtering based recommenders in temporally evolving systems. Neurocomputing 267, pp. 635–643. Cited by: §I.
  • [26] D. Wang, Y. Liang, D. Xu, X. Feng, and R. Guan (2018) A content-based recommender system for computer science publications. Knowledge-Based Systems 157, pp. 1–9. Cited by: §I.
  • [27] J. Wei, J. He, K. Chen, Y. Zhou, and Z. Tang (2017)

    Collaborative filtering and deep learning based recommendation system for cold start items

    Expert Systems with Applications 69, pp. 29–39. Cited by: §II.