The performance of collaborative filtering (CF) recommendation models have reached a remarkable level of maturity. These models are now widely adopted in real-world recommendation engines because of their state-of-the-art recommendation quality. In recent years, a number of recommendation scenarios have emerged, which have encouraged the research community to consider using various additional information sources (aka side information) beyond the user rating matrix (Shi et al., 2014)
. A prominent example—and the one we focus on—is item content. In the movie domain, for instance, a variety of content features have been considered, such as metadata or features extracted directly from the core audio-visual signals. Metadata-based movie recommender systems typically use genre(Filho et al., 2017; Hwang et al., 2016; Soares and Viana, 2017) or user-generated tags (Liu et al., 2017; Wei et al., 2016; Zhao et al., 2017) over which user profiles are built, assuming that these aspects represent the semantic content of movies. In contrast, audio-visual signals represent the low-level content (e.g., color, lighting, spoken dialogues, music, etc.) (Deldjoo et al., 2018a, 2019, 2017; Filho et al., 2017; Deng et al., 2018). Some approaches try to infer semantic concepts from low-level representations, e.g., via word2vec embeddings (Anwaar et al., 2018)
, deep neural networks(Zhang et al., 2019; Wei et al., 2017), fuzzy logic (Vashisth et al., 2017)
, or genetic algorithms(Mueller, 2017). For these reasons, it is evident that item content plays a key role in building hybrid or content-based filtering (CBF) models and, furthermore, it is important to correctly distinguish and weight the item features by their estimated relevance for a target user, to better model his or her tastes.
In Figure 1, we illustrate a simplified diagram that shows our research contributions. Standard recommendation based on content (CBF or hybrid) is structured in three main steps: (i) extraction of item content, consisting of building a feature vector that describes each item ; (ii) building the profile of the target user , i.e., a structured representation of the user’s preference over item content features; (iii) matching the user profile
against the feature vector of each itemto produce the list of recommended items most similar to the target user’s tastes.
A shortcoming of typical RS evaluation is that the user profiling stage, which is a key part of the RS, is barely evaluated. Usually, only the performance of the entire RS, which is composed of several components, is assessed and how effectively the user profiling step functions remains an open question. We argue that it is important to investigate the user profiling stage and compare performance of different profile modelling methods (see upper part of Figure 1).
The goal of this work is therefore to investigate the difference between explicit user ratings on individual movie content features (e.g., genre, actors, or directors) and implicit models inferred via state-of-the-art user modelling techniques from explicit ratings of the whole movies. To this end, we (i) create (and make publicly available) a varied dataset of explicit ratings both on movies and content features and (ii) evaluate different user profiling methods and compare their resulting implicit models against the true feature ratings provided in the collected dataset.
2. Related Work
With respect to previous research, to the best of our knowledge, the only work that evaluates implicit user profiles against true ratings on content features is (Nasery et al., 2015). Nasery et al. compare actually rated features with the ones implicitly derived from rated movies, but no concrete user profiling methods are investigated. Instead, the number of times each feature is explicitly rated and the number of times it appears in the content of all rated movies is counted, and these counts are compared. The authors create a dataset of movies’ feature ratings (genres, actors/cast, and directors), dubbed PoliMovie,111PoliMovie: http://bit.ly/polimovie through a survey web application they built. Their approach, using limited survey questions and a fixed reduced dataset of top popular movies and features, extracted from IMDb,222Internet Movie Database (IMDB): www.imdb.com tends to push users to limited and convergent preferences. In contrast, we systematically investigate 4 methods to model implicit user profiles and we compare them with explicit user profiles obtained by feature ratings. Another contribution of the work at hand is the creation of a dataset that includes ratings on movie content features. Other datasets commonly used in movie recommender systems research, but which do not contain such feature ratings, include MovieLens 20M (ML-20M) (Harper and Konstan, 2016), IMDB Movies Dataset (Leka, 2016), The Movies Dataset (Banik, 2017), MMTF-14K and MVCD-7K (Deldjoo et al., 2018b; Deldjoo and Schedl, 2019) and the Netflix Prize dataset (Netflix, 2009).
3. User profile Modelling Techniques
To create a user profile, we adopt the vector profile representation, consisting of weighted attributes measuring the user’s taste on each feature (Deldjoo et al., 2017; Kacem, 2017), because it is best suited for our evaluation in terms of similarity functions. Formally, the user profiling methods we investigate build the user profile as a vector whose attributes are the relevance weight of each feature for the target user , denoted as .
We analyze 3 state-of-the-art methods from literature to model user profiles and we refer to them according to the first author of the corresponding publication, for simplicity and a 4th method that applies the TF-IDF (term frequency–inverse document frequency) term weighting idea, which is widely used in CBF and, in general, in information retrieval (Lops et al., 2011; Sánchez-Moreno et al., 2018; Wang et al., 2018).
Zhang method. Zhang et al. (Zhang et al., 2015) build the user profile based on item ratings or explicit feature ratings. Let and denote the set of users and items, respectively, and the set of all features of the items. In case of binary ratings (like in our dataset), this method assigns relevance weight equal to 1 for each feature in that applies to items with which the target user interacted with, 0 otherwise. The obvious limitation of this method is that it assigns only weights 0 or 1 to the features, without distinguishing their relevance for the user.
Li method. Li et al. (Li and Kim, 2004), unlike Zhang et al., differentiate the relevance of features contained in an item by assigning scalar weights. Their method furthermore ignores items with low ratings by using a threshold value. In case of binary ratings, the threshold rating is 0 and the relevance weight of each feature in for the target user becomes the percentage of occurrences of in the items interacted with: , where is the number of items rated by user containing feature and is the total number of items rated by user .
Symeonidis method. Symeonidis et al. (Symeonidis et al., 2007) adopt an approach similar to TF-IDF to compute feature relevance weights, but define them in the vector space of user profiles. The rationale of using TF-IDF is to increase the relevance of rare features contained in less user profiles. Symeonidis et al. also use a fixed rating threshold to consider only the most relevant items. In case of binary ratings, the threshold rating is set to 0 and the relevance weight of each feature in for the target user is computed as: , where is the feature frequency, i.e., the number of times feature occurs in movies rated by , and is the inverse user frequency of feature . , where is the user frequency of , i.e., the number of users whose rated movies contain feature at least once.
TF-IDF method. After having reviewed the 3 state-of-art methods described above, we decided to investigate another variant of TF-IDF as a user profiling method. The Symeonidis method above is similar to TF-IDF, but it is user-centric because it considers the vector space of user profiles. Instead, our proposed TF-IDF method is item-centric as it considers the vector space of items (movies). First, we compute the IDF of each feature as: , where denotes the number of items in in which feature occurs at least once. Then, for each user , we compute the relevance weight of a feature as: , where is equivalent to of the Symeonidis method (i.e., number of times feature occurs in items rated by user ). In contrast to the method by Symeonidis et al., is computed in relation to all the existing items in which feature appears, not related to user profiles. As will be shown in Section 5.2, our TF-IDF method yields better results than Symeonidis et al.’s.
4. Data Acquisition
The dataset we use to evaluate user profiling methods has been collected through a web application we implemented, which can be navigated on a variety of stationary and mobile devices. It provides access to a large catalogue of more than 450K movies and any related content feature. This vast breadth of choice is possible thanks to the fact that we retrieve up-to-date information on-the-fly from TMDb333The Movie Database (TMDb): www.themoviedb.org via APIs. We developed the application with the idea of a completely free user experience, instead of making it like a survey application, so that users are not forced in any way during their selections.
To acquire the needed data, we asked users to select a set of “favourites”, which included at least 5 movies, 2 genres, 3 actors, and 1 director. Users were, nevertheless, free to select more than these numbers of elements. We also asked users to provide some demographics information: age range, gender, and country of residence. The collection of data was divided into two phases, the first one involved the volunteer users, which are the ones invited to freely contribute (friends, family, acquaintances, and colleagues of the authors), while the second phase involved users recruited by the crowdsourcing platform MTurk,444Amazon Mechanical Turk (MTurk): www.mturk.com which have been paid between 20 and 50 US cents for their contribution. To assess the participants’ reliability, we also asked them to complete a final consistency test which required to select again all (and only) the favourites they remember to have added (from a list of movies, genres, and actors of random popular elements). A user’s reliability is then estimated by means of the precision score computed on the re-selection of correct favourites.
Finally, in order to explore a catalogue of existing features needed for user profiling evaluation, we retrieved The Movies Dataset (Banik, 2017) containing the content of 45,3K movies scraped from TMDb. Then, we extended this dataset by scraping the content of missing movies that were added as favorites by users on our web application.
Dataset characteristics. We have collected the preferences of 194 users, 180 (93%) of whom have added the minimum number of required favourites. Among all users, 81 (42%) are volunteers and 113 (58%) are paid ones. We consider users reliable if they are either volunteers that have completed the required favourites or crowdsourced users who scored at least 50% of precision during the consistency test (see above). The reliable volunteers are 67 (83% of all volunteers), while the crowdsourcing ones are 88 (78% of all crowdsourcing), hence a total of 155 reliable users (80% of all users).
Regarding users’ gender, 115 users (59%) are male, 66 are female (34%), and 13 (7%) did not specify gender. 53% of the users are between 24 and 30 years old. We received registrations from users coming from 10 different countries, mainly from Italy (40%), India (31%), and United States (19%).
We collected a total 4,109 favourites (movies and content features) selected by participants, including 1,212 unique elements, i.e., favourites selected by at least one user. In the following experiments, we include only favourites ofreliable users, that are 3,341 (81%), including 1,737 favourite movies, 461 genres, 698 actors, 198 directors, 74 production companies, 92 production countries, 39 producers, 17 screenwriters, 21 release years, and 4 sound crew members. The dataset is available on Kaggle555https://www.kaggle.com/lucacostanzo/mints-dataset-for-recommender-systems.
5. Results and Discussion
5.1. Initial statistical analysis
An initial statistical analysis highlights main differences between the set of all explicitly rated features and the set of all implicit features extracted from rated movies. In Tables 1 and 2, we present a comparison between the explicit and implicit sets of features, in percentage of common attributes (features), focusing on the most frequently selected attributes, respectively, for genre, actor, and director. These tables generally highlight a low overlap between the explicitly preferred features and the implicitly estimated ones (derived from favourite movies), in particular for actors and directors. The only exception is the genre attribute, which reveals a maximum overlap of 94.74% when considering all 19 genres. These results generally confirm the previous findings in (Nasery et al., 2015) regarding existing gaps between explicitly selected features and implicitly estimated ones, with a different dataset containing more up-to-date movies and not limited to the most popular movies as used in (Nasery et al., 2015).
|No. common genres||% common genres|
|All genres (19)||18||94.74%|
|% of common actors||% of common directors|
We further provide a finer-grained analysis of the gap between explicit and implicit preferences of users according to their gender. In Tables 3 and 4, we compare the 5 most frequently selected genres, actors, and directors, by male and female users, respectively. We notice a substantial difference between between male and female users with the exception of genre.
preferred actors even if he barely acted as a main character in any movie. The most probable reason is that even though he has not been selected explicitly as favourite actor by study participants, he appeared in all Marvel movies (in small “cameo roles”), so he is included in the implicit profiles. Furthermore, it is surprising that the genre “action” is highly ranked by female users. This could be due to the fact that the genre tastes of young women might be changing nowadays, especially because many popular action movies, like the Marvel ones, are liked by many people (especially under 30, i.e., the largest age group in our dataset), irrespective of gender. Nonetheless, the other differences between male and female users suggest to embed gender information in a recommender system.
|Pos.||Explicit selection||Implicit selection|
|Actors||1||Robert Downey Jr.||16||Samuel L. Jackson||64|
|2||Johnny Depp||15||Stan Lee||56|
|3||Jason Statham||10||Bradley Cooper||51|
|4||Leonardo DiCaprio||10||Paul Bettany||47|
|5||Tom Hardy||8||Vin Diesel||47|
|Directors||1||Quentin Tarantino||11||Hajar Mainl||42|
|2||Steven Spielberg||9||Chris Castaldi||41|
|3||Joe Russo||7||Mark Rossini||41|
|4||M. Night Shyamalan||6||Lori Grabowski||41|
|5||Christopher Nolan||6||Eli Sasich||41|
5.2. Evaluation of user profiling methods
We study the user profiling step in-depth by investigating the 4 user profiling methods described in Section 3. Our aim is to analyze the similarity (i.e., the overlap) between the implicitly modelled user profiles and the real explicit tastes of users. For each target user , we built his or her explicit profile as vector composed of relevance weights equal to 1, for all the features explicitly rated by , and weight 0 for the ones not rated. Then we computed the pairwise similarity between the explicit user profiles and implicit profiles produced by each method, using cosine similarity and Jaccard similarity. The highest is this similarity, the most accurate is the implicit user profile modelled.
The average pairwise similarity between implicit user profile and explicit one is shown in Table 5. As revealed in the table and already anticipated in Section 3, the TF-IDF method yields better results than Symeonidis even if they are intrinsically similar, hence the item-centric TF-IDF approach outperforms the user profle-based one. In general, the average pairwise similarities are remarkably low, even for the best investigated method, i.e., Li. The overlap between explicit and implicit profiles increases if we consider only genres; the reason is that the catalogue of all possible genres in the dataset is rather limited (19) compared to actors (567K) and directors (58K). The Jaccard measure yields lower similarities because it can be applied only to vectors composed of binary attributes while our tested profiling methods compute scalar weights (except for Zhang); hence we had to cut-off some feature weights by considering only the k most relevant features in the implicit profile of each user considered, in which k is the number of explicit features rated by that user.
The presented results underline the low effectiveness of the investigated user profiling methods to model real user tastes. This finding gives rise to the need of further research on this important user profiling step when devising recommender systems. If user profiles are not properly modelled before applying any RS technique, the accuracy of the final recommendations will likely be affected and lowered by an inaccurate representation of the user’s tastes.
6. Conclusion and Future Works
In this paper, we analyzed the user profiling modelling by studying the differences between explicit user preferences and implicit user profiles. We evaluated different user profiling methods and showed that even the best profiling method that we tested provided low pairwise similarities between explicit and implicit profiles. This finding can be explained by the fact that when a user rates a movie, he is implicitly rating only some characteristics of the item that impacted on her (but not all). Also, it could happen that a user may select a movie but she only loved some part of it (e.g., very good director but bad actors), and this can result in the introduction of some noise in the learning process. Overall, our study encourages a more in-depth on ways we can obtain reliable feedbacks on features and study the optimization of the user profile modelling step in RS, which will eventually allow to produce more accurate recommendations. Furthermore, we publicly provide the dataset that we collected and used for evaluation, which includes ratings on movies and on corresponding content features.
In the future, we plan to investigate the generalizability of findings in this work on other domains where the exist a wide variety of item content features and personalization on these features is paramount, in domains including but not limited to fashion (He and McAuley, 2016), music domain (Schedl et al., 2018), tourism (Adamczak et al., 2019; Knees et al., 2019) and so forth.
2017 international conference on machine learning and cybernetics, ICMLC 2017, ningbo, china, july 9-12, 2017. IEEE. External Links: Cited by: H. Liu, S. Feng, and G. Yu (2017).
-  (2017) 2017 international joint conference on neural networks, IJCNN 2017, anchorage, ak, usa, may 14-19, 2017. IEEE. External Links: Cited by: R. J. R. Filho, J. Wehrmann, and R. C. Barros (2017).
- Session-based hotel recommendations: challenges and future directions. arXiv preprint arXiv:1908.00071. Cited by: §6.
- HRS-CE: A hybrid framework to integrate content embeddings in recommender systems for cold start items. J. Comput. Science 29, pp. 9–18. External Links: Cited by: §1.
- The Movies Dataset. Note: Dataset on Kaggle External Links: Cited by: §2, §4.
Proceedings of the twenty-ninth AAAI conference on artificial intelligence, january 25-30, 2015, austin, texas, USA. AAAI Press. External Links: Cited by: C. Zhang, K. Wang, E. Lim, Q. Xu, J. Sun, and H. Yu (2015).
- User modeling 2007, 11th international conference, UM 2007, corfu, greece, june 25-29, 2007, proceedings. Lecture Notes in Computer Science, Vol. 4511, Springer. External Links: Cited by: P. Symeonidis, A. Nanopoulos, and Y. Manolopoulos (2007).
- Hybrid artificial intelligent systems - 13th international conference, HAIS 2018, oviedo, spain, june 20-22, 2018, proceedings. Lecture Notes in Computer Science, Vol. 10870, Springer. External Links: Cited by: D. Sánchez-Moreno, M. N. M. García, N. Sonboli, B. Mobasher, and R. Burke (2018).
- Audio-visual encoding of multimedia content for enhancing movie recommendations. See Proceedings of the 12th ACM conference on recommender systems, recsys 2018, vancouver, bc, canada, october 2-7, 2018, Pera et al., pp. 455–459. External Links: Cited by: §1.
- MMTF-14K: a multifaceted movie trailer feature dataset for recommendation and retrieval. In Proceedings of the 9th ACM Multimedia Systems Conference, pp. 450–455. External Links: Cited by: §2.
- The effect of different video summarization models on the quality of video recommendation based on low-level visual features. See 30, pp. 20:1–20:6. External Links: Cited by: §1, §3.
- Movie genome: alleviating new item cold start in movie recommendation. User Model. User-Adapt. Interact. 29 (2), pp. 291–343. External Links: Cited by: §1.
- Retrieving relevant and diverse movie clips using the mfvcd-7k multifaceted video clip dataset. In Proceedings of the 17th Int. Workshop on Content-Based Multimedia Indexing, Cited by: §2.
- Leveraging image visual features in content-based recommender system. Scientific Programming 2018, pp. 5497070:1–5497070:8. External Links: Cited by: §1.
- Leveraging deep visual features for content-based movie recommender systems. See 2, pp. 604–611. External Links: Cited by: §1.
- The movielens datasets: history and context. TiiS 5 (4), pp. 19:1–19:19. External Links: Cited by: §2.
- Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. In proceedings of the 25th international conference on world wide web, pp. 507–517. Cited by: §6.
- An algorithm for movie classification and recommendation using genre correlation. Multimedia Tools Appl. 75 (20), pp. 12843–12858. External Links: Cited by: §1.
- Proceedings of the 19th international conference on information integration and web-based applications & services, iiwas 2017, salzburg, austria, december 4-6, 2017. ACM. External Links: Cited by: J. Mueller (2017).
- Personalized information retrieval based on time-sensitive user profile. (recherche d’information personalisée basée sur un profil utilisateur sensible au temps). Ph.D. Thesis, Paul Sabatier University, Toulouse, France. External Links: Cited by: §3.
- RecSys challenge 2019: session-based hotel recommendations. In Proceedings of the Thirteenth ACM Conference on Recommender Systems, RecSys ’19, New York, NY, USA. External Links: Cited by: §6.
- IMDB Movies Dataset. Note: Dataset on Kaggle External Links: Cited by: §2.
- Constructing user profiles for collaborative recommender system. See Advanced web technologies and applications, 6th asia-pacific web conference, apweb 2004, hangzhou, china, april 14-17, 2004, proceedings, Yu et al., pp. 100–110. External Links: Cited by: §3.
- An interest propagation based movie recommendation method for social tagging system. See 1, pp. 130–135. External Links: Cited by: §1.
- Content-based recommender systems: state of the art and trends. See Recommender systems handbook, Ricci et al., pp. 73–105. External Links: Cited by: §3.
- Combining aspects of genetic algorithms with weighted recommender hybridization. See Proceedings of the 19th international conference on information integration and web-based applications & services, iiwas 2017, salzburg, austria, december 4-6, 2017, Indrawan-Santiago et al., pp. 13–22. External Links: Cited by: §1.
- PoliMovie: a feature-based dataset for recommender systems. pp. . External Links: Cited by: §2, §5.1.
- Netflix Prize Data. Note: Dataset on Kaggle External Links: Cited by: §2.
- Proceedings of the 12th ACM conference on recommender systems, recsys 2018, vancouver, bc, canada, october 2-7, 2018. ACM. External Links: Cited by: Y. Deldjoo, M. G. Constantin, H. Eghbal-Zadeh, B. Ionescu, M. Schedl, and P. Cremonesi (2018a).
-  (2017) Proceedings of the 15th international workshop on content-based multimedia indexing, CBMI 2017, florence, italy, june 19-21, 2017. ACM. External Links: Cited by: Y. Deldjoo, P. Cremonesi, M. Schedl, and M. Quadrana (2017).
- Recommender systems handbook. Springer. External Links: Cited by: P. Lops, M. de Gemmis, and G. Semeraro (2011).
- Recent advances in information systems and technologies - volume 1 [worldcist’17, porto santo island, madeira, portugal, april 11-13, 2017]. Advances in Intelligent Systems and Computing, Vol. 569, Springer. External Links: Cited by: M. Soares and P. Viana (2017).
- Inferring user expertise from social tagging in music recommender systems for streaming services. See Hybrid artificial intelligent systems - 13th international conference, HAIS 2018, oviedo, spain, june 20-22, 2018, proceedings, de Cos Juez et al., pp. 39–49. External Links: Cited by: §3.
- Current challenges and visions in music recommender systems research. International Journal of Multimedia Information Retrieval 7 (2), pp. 95–116. Cited by: §6.
- Collaborative filtering beyond the user-item matrix: a survey of the state of the art and future challenges. ACM Computing Surveys (CSUR) 47 (1), pp. 3. Cited by: §1.
- The semantics of movie metadata: enhancing user profiling for hybrid recommendation. See Recent advances in information systems and technologies - volume 1 [worldcist’17, porto santo island, madeira, portugal, april 11-13, 2017], Rocha et al., pp. 328–338. External Links: Cited by: §1.
- Feature-weighted user model for recommender systems. See User modeling 2007, 11th international conference, UM 2007, corfu, greece, june 25-29, 2007, proceedings, Conati et al., pp. 97–106. External Links: Cited by: §3.
- A fuzzy hybrid recommender system. Journal of Intelligent and Fuzzy Systems 32 (6), pp. 3945–3960. External Links: Cited by: §1.
- A content-based recommender system for computer science publications. Knowl.-Based Syst. 157, pp. 1–9. External Links: Cited by: §3.
Collaborative filtering and deep learning based recommendation system for cold start items. Expert Syst. Appl. 69, pp. 29–39. External Links: Cited by: §1.
- A hybrid approach for movie recommendation via tags and ratings. Electronic Commerce Research and Applications 18, pp. 83–94. Cited by: §1.
- Advanced web technologies and applications, 6th asia-pacific web conference, apweb 2004, hangzhou, china, april 14-17, 2004, proceedings. Lecture Notes in Computer Science, Vol. 3007, Springer. External Links: Cited by: Q. Li and B. M. Kim (2004).
- Are features equally representative? A feature-centric recommendation. See Proceedings of the twenty-ninth AAAI conference on artificial intelligence, january 25-30, 2015, austin, texas, USA, Bonet and Koenig, pp. 389–395. External Links: Cited by: §3.
- Deep learning based recommender system: A survey and new perspectives. ACM Comput. Surv. 52 (1), pp. 5:1–5:38. External Links: Cited by: §1.
- Social-aware movie recommendation via multimodal network learning. IEEE Transactions on Multimedia. Cited by: §1.