Graph Based Recommendations: From Data Representation to Feature Extraction and Application

07/05/2017 ∙ by Amit Tiroshi, et al. ∙ 0

Modeling users for the purpose of identifying their preferences and then personalizing services on the basis of these models is a complex task, primarily due to the need to take into consideration various explicit and implicit signals, missing or uncertain information, contextual aspects, and more. In this study, a novel generic approach for uncovering latent preference patterns from user data is proposed and evaluated. The approach relies on representing the data using graphs, and then systematically extracting graph-based features and using them to enrich the original user models. The extracted features encapsulate complex relationships between users, items, and metadata. The enhanced user models can then serve as an input to any recommendation algorithm. The proposed approach is domain-independent (demonstrated on data from movies, music, and business recommender systems), and is evaluated using several state-of-the-art machine learning methods, on different recommendation tasks, and using different evaluation metrics. The results show a unanimous improvement in the recommendation accuracy across tasks and domains. In addition, the evaluation provides a deeper analysis regarding the performance of the approach in special scenarios, including high sparsity and variability of ratings.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recommender systems aim at helping users find relevant items among a large variety of possibilities, based on their preferences (Adomavicius and Tuzhilin, 2005). In many cases, these personal preferences are inferred from patterns that emerge from data about the users’ past interactions with the system and with other users, as well as additional personal characteristics available from different sources. These patterns are typically user-specific and are based on the metadata of both the users and items, as well as on the interpretation of the observed user interactions (Kobsa, 2001; Zukerman and Albrecht, 2001). Eliciting user preferences is a challenging task because of issues such as changes in user preferences, contextual dependencies, privacy constraints, and practical data collection difficulties (Ricci et al., 2011). Moreover, the collected data may be incomplete, outdated, imprecise, or even completely inapplicable to the recommendation task at hand. In order to address these issues, modern recommender systems attempt to capture as much data as possible, and then, apply data mining and other inference techniques to elicit the desired preferences Cantador et al. (2015). Several techniques can be applied for the pattern-mining task, among which are techniques originated in machine learning and statistics, e.g., clustering and regression, or those that evolved in information retrieval and user modeling (Mobasher, 2007).

Regardless of the technique exploited by a recommender system, it is inherently bound by the available user data and the features extracted/elicited from it. One major question that arises in this context is how to engineer111Feature engineering is sometimes also referred to in the literature as feature extraction, generation, and discovery, depending on the field of research. In this paper, it broadly refers to the task of adding new features to a dataset, regardless of the manner in which it is done (e.g., manual vs. automatic). meaningful features from often noisy user data? Features may be manually engineered by domain experts. This approach is considered expensive and non-scalable because of the deep domain knowledge that is necessary, the creativity required to conceive new features, and the time needed to populate and evaluate the contribution of the features. A notable example of this challenge is provided by the Netflix Prize winning team, in their recap: “while major breakthroughs in the competition were achieved by uncovering new features underlying the data, those became rare and very hard to get” (Koren, 2009).

An alternative to manual feature engineering is automatic feature engineering, which is a major area of research in machine learning (Guyon et al., 2006), particularly in the domains of image recognition (Nixon, 2008; Due Trier et al., 1996) and text classification (Scott and Matwin, 1999). So far, automatic feature engineering has mainly focused on either algebraic combinations of existing features, e.g., summation or averaging of existing features (Markovitch and Rosenstein, 2002), finding domain specific feature generators, e.g., for character recognition in image processing (Nixon, 2008; Due Trier et al., 1996), or eliciting latent features as in the SVD (Klema and Laub, 1980) and PCA (Wold et al., 1987) methods. The algebraic approaches for automatic feature engineering manage to produce large quantities of features; however, the relationships between the engineered features and the underlying patterns in the data are often not interpretable (Kotsiantis et al., 2006). For example, if averaging the ratings for items with the sum of some other arbitrary feature improves predictions, the reasons for this improvement will not necessarily be clear. Similarly, the latent feature discovery techniques do not provide sufficient insight regarding the representation or meaning of those features (Koren et al., 2009).

In this work, a novel framework is proposed that uses graph-based representation properties to generate additional features from user modeling data of recommender systems, with the objective of improving the accuracy of the generated recommendations. The proposed framework is underpinned by the idea of examining a tabular recommender system’s data from the graph theory-based perspective, which represents entities and their relationships as a graph and allows the extraction of a suite of new features computed using established graph-based metrics. The extracted features encapsulate information about the relationships between entities in the graph and lead to new patterns uncovered in the data. In most cases, they are also interpretable; for example, a node’s degree (number of edges to other nodes) represents the importance of the node in the graph, while the path length between two nodes communicates their relatedness (the shorter the path - the more related are the nodes). The approach is domain-independent and can be applied automatically.

The proposed framework offers several benefits for automatic feature extraction. Given a new dataset, it is usually impossible to determine a-priori which graph representations will yield the most informative set of features for the recommendation generation. Thus, the proposed framework provides a systematic method for generating and assessing various graph representations, their contribution to the newly extracted features, and, in turn, to the accuracy of the generated recommendations. Additionally, since the number of nodes and relationship types in each graph representation is different, an exhaustive method of distilling the possible graph metrics from each representation is proposed.

Two case studies are conducted to gather extensive empirical evidence and demonstrate how graph features supplement existing feature sets, improve the accuracy of the recommendations, and perform adequately as stand-alone out-of-the-box features. The case studies answer the following questions:

  • How does the use of graph features affect the performance of rating predictions and recommendation generation in different domains and tasks?

  • How are the recommendations affected by the sub-graph and its representation used to generate the graph features?

Multiple datasets, multiple machine learning mechanisms, and multiple evaluation metrics are used across the case studies, in order to demonstrate the effectiveness of the approach. Overall, the results show that graph-based representation and automatic feature extraction allow for the generation of more precise recommendations. A comparison across various graph schemes is conducted and the justification for systematic feature extraction is established. Hence, this work concludes the line of research presented earlier in (Tiroshi et al., 2013, 2014a, 2014b) and provides a complete picture that validates the applicability of the proposed graph-based feature generation approach to recommender systems.

The rest of the paper is structured as follows. Next, the necessary background is provided, and related work is described. Then, the graph representation and graph-based feature extraction process is formalized, and its advantages and disadvantages are discussed. Two case studies demonstrating the contribution of the graph-based features to the recommendation process are then presented. Through these, the overall performance of the framework, as well as the performance of certain graph representations and feature subsets, is evaluated. Finally, the implications of the findings are discussed, together with the suggested future work.

2 Background and Related Work

Graphs have been exploited in recommender system for many tasks, mainly due to their ability to represent many entities of different types and their relationships in a simple data structure that offers a broad variety of metrics and reasoning techniques. In this section we provide a general background on the use of graphs in recommender systems, followed by specific aspects of graph representation in recommender systems and feature engineering.

2.1 Graph-Based Recommender Systems

In recent years, especially since social networks were identified as a major source for freely available personal information, graphs and networks data structures have been used as tools for user modeling, especially since they combine different entities and links into one simple structure capturing the links between the entities. This section aims at giving the readers an idea about how graph techniques are used in graph-based user modeling and recommender systems. Given the vast amount of studies (a search for “graph-based” and “recommender systems” in Google Scholar yielded 225 results for 2016 alone), this is only a brief presentation of recent studies and not an in-depth survey.

What was clearly noticeable was that most of the graph-based representations were defined for a specific problem, in specific domains, and in many cases they applied variants of random walk as the only graph feature used for recommendations. (Pham et al., 2015) suggested to use a simple graph representation for recommending groups to users, tags to groups, and events to users, using a general graph-based model called HeteRS, while considering the recommendation problem as a query dependent node proximity problem. (Portilla et al., 2015) applied random walk for predicting YouTube videos watching, on a graph composed of videos as nodes and the link representing the appearance of videos in the recommendation lists. (Wu et al., 2015) suggested the use of a heterogeneous graph for representing contextual aspects in addition to items and users, and used random walk for context-aware recommendation. (Lee et al., 2015) applied random walk for finding top- paths from an origin user node to an item node in a heterogeneous graph, as a way for identifying the best items for recommendation. Still, these works used the PageRank algorithm for the purpose of generating the recommendations. (Lee et al., 2013) used an enhanced version of personalised PageRank algorithm to recommend items to target users and proposed to reduce the size of the graph by clustering nodes and edges. (Shams and Haratizadeh, 2016) also applied personalised PageRank over the user/item graph augmented with pairwise ranking for items recommendation.

In addition to the wide use of random walk based algorithms, there is a variety of task-specific representations and metrics. It is interesting to note that even for a specific task, a variety of approaches was suggested. For instance, for song/playlist recommendations, (Benzi et al., 2016) combined graph-based similarity representation of playlists and songs with classical matrix factorization to improve the recommendations. (Ostuni et al., 2015)

took a different approach and suggested to use tags and sound description represented as a knowledge graph, from which similarity of nodes was extracted using a specific metric they defined.

(Mao et al., 2016)

suggested using graph representation for music tracks recommendations, where they represented by graphs the relative preferences of users, e.g., pair-wise preference of tracks. They used the graph as a representation for user preferences for tracks and calculated the probability of a user liking a track based on the probability that s/he likes the in-linked tracks.

Some researchers suggested to use graph representations as an alternative to the classical collaborative and hybrid recommenders. (Moradi et al., 2015) used clustering of graph representation of users and items for generating a model for item- and user-based collaborative filtering. (Bae et al., 2015) used graphs for representing co-occurrence of mobile apps, as logged from users mobile devices, and the similarity of user graphs was used for finding a neighborhood and generation recommendations. (Cordobés et al., 2015) also addressed the app recommendation problem and explored the potential of graph representation for several variants of recommendation strategies for recommending apps to users through banners on webpages. (Park et al., 2015) proposed a graph representation for linking item based on their similarity; hence, having a graph that links items while the weight on the edges represents their similarity. Users were linked to items they rated, such that items most similar to the items rated by the users could be recommended. (Lee and Lee, 2015) suggested an approach for graph-based representation of the user-item matrix, where links among items represent the positive user ratings, and use entropy to find the items to recommend to users, thus, introducing serendipity into the recommendation process. (Hong and Jung, 2016) used affinity between users for creating a user graph, where users are nodes and edges represent affinity for the purpose of group recommendations of movies Said et al. (2011).

A highly relevant line of work focuses on enriching recommender systems dataset with information extracted from graph representation of the data, which is called MetaPaths. A good recent example study is the work of (Vahedian et al., 2016). The author suggested to enrich a classical recommender systems dataset (in their case DBLP authors/papers dataset) with what the so-called metapath data – links extracted from citations network. They added this information to the existing set of features, then applied classical matrix factorization, and showed an improvement to the results using only the original data. Our framework can be considered as a generalized variant of (Vahedian et al., 2016), where a specific set of metrics was extracted from the graph representation of the data and matrix factorization was applied for recommendation generation purposes. The studies presented in this work used a variety of metrics, datasets, and recommendation methods.

Additional applications of graphs for recommendations include domains of cultural heritage, tourism, social networks, and more. (Chianese and Piccialli, 2016) used graphs for representing context evolution in cultural heritage: nodes modeled states and transitions between the nodes were based on observation of user behavior Bohnert et al. (2008). (Shen et al., 2016) used graphs for representing tourist attractions and their similarity, where different graphs could represent content-based, collaborative, and social relationships. (Jiang et al., 2016) used graph techniques for trust prediction in social networks. (Godoy and Corbellini, 2016) reviewed the use of folksonomies, which can be naturally seen as user-item-tag graphs, in recommender systems. As we see, graphs-based approaches in user modeling and recommender systems have become highly popular and there is a growing numbers of tools that enable analysis of large graphs. We refer an interested reader to (Batarfi et al., 2015) and (Zoidi et al., 2015) for recent and encompassing reviews of the area.

2.1.1 Similarity Measurement Using Graphs and their Application

Previous research on recommender systems that use graph representations focused on measuring the similarity of two entities in the data (user-to-item, user-to-user, or item-to-item), and tried to associate this with a score or rating (Amatriain et al., 2011). Graph-based similarity measurement is based on metrics extracted from a graph-based representation (Desrosiers and Karypis, 2011). Two key approaches for measuring similarity using graphs are path-based and random walk-based.

In the path-based similarity, the distance between two graph nodes can be measured using the shortest path and/or the total number of paths between the two. The definition of the shortest path may include a combination of the number of edges transitively connecting the two nodes in question and the weights of these edges if exist, e.g., if a user is connected to an item and the user’s rating for the item as the edge label. Shortest paths can then be computed for a user node and an item node in question, in order to quantify the extent to which the user prefers the item. The “number of paths” approach works similarly, by calculating the number of paths between the two nodes as a proxy for their relatedness (the more paths, the more related they are). However, this approach is more computationally intensive.

Random walks can be used to compute similarity by estimating the probability of one node being reached from another node, given the available graph paths. The more probable it is that the target node can be reached from the source node, the higher is the relatedness of the two nodes. Random walks can be either unweighted (equal probability of edges) or weighted (edges having different probabilities based on their label, e.g., rating)

(Desrosiers and Karypis, 2011).

Examples of recommendation studies in which the approaches detailed above were applied can be found in (Li and Chen, 2009; Lösch et al., 2012; Konstas et al., 2009), as well as in Section 2.1. (Li and Chen, 2009) reducted the recommendation problem was to a link prediction problem. That is, the problem of finding whether a user would like an item was cast as a problem of finding whether a link exists between the user and item in the graph. A similarity measure between user and item nodes was computed using random walks. Items were then ranked based on their similarity scores, such that top scoring items were recommended to users. Using classification accuracy metrics, this approach was shown to be superior to other non-graph based similarity ranking methods.

A similar walking distance metric was used in (Lösch et al., 2012), complemented by graph structure metrics such as the number of sub trees. These metrics were used for the purpose of link prediction and property value prediction in RDF semantic graphs, using a learning technique based on an SVM. Experimental results showed that the graph features varied in their performance based on the graph structure on which they operated, for example, full versus partial subtrees. It was also noted that the newly defined features were not dataset-specific, but could be applied to any RDF graph The graph structures in the context of RDF are less applicable to those used in the approach proposed in this work, because the recommendation dataset graphs do not follow a hierarchical model of RDFs. In the presented approach, any feature value is connected to other features values based on co-occurrence in the dataset, without the need for matching a predefined structure or scheme.

Finally, (Konstas et al., 2009) developed a graph-based approach for generating recommendations in social datasets like The work focused on optimizing a single graph algorithm (random walk with restarts) and its parameters, such as the walk restart. The reported results show an improvement in recommendations using the random walk approach, compared to the baseline collaborative filtering. In the presented work, random walks on a graph, although with static parameters, are represented by the PageRank score feature. The above studies are also extended in this work by generalizing the adoption of graph metrics beyond random walks and their use for similarity measurements, and they are not bound to specific graph structures, such as RDF trees.

2.1.2 Representing social data and trust using graphs

Other studies involving graph approaches in recommender systems primarily addressed the context of representing social, semantic, and trust data. In some studies, only the graph representation was used as the means to query the data, e.g., neighboring nodes and the weights of edges connecting to them (Ma et al., 2009), while others utilize both the graph representation and graph-based reasoning methods (Massa and Avesani, 2007; Quercia et al., 2014).

A survey of connection-centric approaches in recommender systems (Perugini et al., 2004) exemplifies how the data of an email network (Schwartz and Wood, 1993) and of a co-occurrence in Web documents (Kautz et al., 1997) can be represented in graphs. The graph representation of the email interactions between users defines each user as a node and edges connect users, who corresponded via email. In the case of Web documents, people are again represented as nodes and edges connect people, who are mentioned in the same document. When these graphs are established, they can be used to answer recommendation-related queries. In the email graph, a query regarding the closeness of users can be answered using a similarity or distance metric, such as those mentioned in the previous section. In the Web co-occurrence graph, a query regarding people sharing interests can be answered by counting their common neighbors (assuming the co-occurrence in the type of Web documents collected is an indicator of shared interests).

Other graph representation variants are hypergraphs (Berge and Minieka, 1973). They differ from graphs by allowing an edge, denoted by a hyperedge, to connect with multiple nodes. Hypergraphs have been proposed in the context of recommendation generation, for the purpose of representing complex associations, such as social tagging (Jäschke et al., 2007; Berkovsky et al., 2007; Bu et al., 2010; Tan et al., 2011), where a tag is attached to an item by a user. If the tag, user, and item are represented by nodes, at least two edges are required to represent the association between the three entities222A single edge between the user and item can be labeled with the chosen tag, but then the reuse of tags by other users or for other items becomes less comprehensible.. This association can be represented by a hyperedge connecting the three nodes. In these studies, similarity metrics, e.g., a modified hypergraph PageRank, are then composed based on this structure and used for the recommendation generation. Results presented in (Jäschke et al., 2007) show that the similarity metrics from hypergraphs led to better recommendations than variations that did not utilize the properties of the hypergraph representation.

Prior works focusing on the means of incorporating trust between users for the sake of improving the recommendations were surveyed in (O’Donovan and Smyth, 2005). For example, (Ma et al., 2009) proposed a graph representation encapsulating trust between users. The representation modeled users as the graph nodes and the trust relationships between them were reflected by the weights on the edges. Data extracted from the graph, e.g., who trusts whom and to what extent, was used in the recommendation process, and it was shown to improve the generated recommendations. However, the graph was used only to represent the data and propagate the trust scores.

Another usage of graphs for recommendation purposes is in the case of geospatial recommendations. Quercia et al. used graphs to find the shortest path between geographical locations, while also maximizing the enjoyment of the path for the user (Quercia et al., 2014). Locations were represented as nodes and connected to each other based on geographical proximity. Nodes were also ranked based on how pleasant (beautiful, quiet, happy) the locations were. Finally, a route that optimizes the shortness and pleasantness was computed based on a graph method and recommended to the user. In this work, both graph-based representation and graph theory methods are used for recommendation generation.

2.2 Feature Engineering for Recommendations

As mentioned at the beginning of the section, another group of related works that covers automatic feature engineering. According to Guyon et al., “feature extraction addresses the problem of finding the most compact and informative set of features, to improve the efficiency or data storage and processing” (Guyon et al., 2006). Basic features are a result of quantitative and qualitative measurements, while new features can be engineered by combining these or finding new means to generate additional measurements. In the big data era, the possibilities of engineering additional features, as well as their potential importance, have risen dramatically.

Feature engineering (also referred to in the literature as feature extraction, composition, or discovery) can be performed either manually or automatically. In the manual method, domain experts analyze the task for which the features are required, e.g., online movie recommendation versus customer churn prediction, and conceive features that may potentially inform the task. The engineering process involves aggregating and combining features already present in the data, in order to form new, more informative features. This approach, however, does not scale well because of the need for a human expert, the time it takes to compose features, and the sheer number of possibilities for the new features (Domingos, 2012)

. Conversely, automatic feature extraction, the process of algorithmically extracting new features from a dataset, does scale up well. Many features can be engineered in a short time using a variety of engineering methods. Coupling automatic feature engineering with automatic feature selection

(Kohavi and John, 1997) (the process of separating between useful and not useful features) can lead to faster and more accurate recommendation models.

A basic approach for engineering new features from the existing ones is to combine them using arithmetic functions. In one study that evaluated this approach, arithmetic functions, such as min, max, average, and others, were used (Markovitch and Rosenstein, 2002). The study also presented a specific language for defining features, where the features were described by a set of inputs, their types, construction blocks, and the produced output. A framework for generating a feature space using the feature language as input was evaluated. The evaluation showed that the framework outperformed legacy feature generation algorithms in terms of accuracy. The main difference between the framework presented at (Markovitch and Rosenstein, 2002) and its predecessors was that the framework was generic and applicable to multiple tasks and machine learning approaches.

Additional automatic feature engineering methods that are domain-specific were surveyed in (Nixon, 2008; Due Trier et al., 1996) for image recognition and in (Scott and Matwin, 1999)

for text classification purposes. An example of a feature engineering method for image recognition is quantifying the amount of skin color pixels in an image in order to classify whether it contains a human face or not

(Garcia and Tziritas, 1999), whereas for text classification a bag-of-words (frequency of occurrence of each word in a document) can be generated for every document and used to describe it.

A different suite of methods for eliciting new features, which is also applicable to recommender systems, is latent features computation. Methods such as SVD (Klema and Laub, 1980) and PCA (Wold et al., 1987) can be used to compute new features and support the generation of recommendations by decomposing the available data into components and matching composing factors, i.e., the latent features. When the data is decomposed and there exists a set of latent features that can recompose it with a certain error rate, missing features and ratings can be estimated (Amatriain et al., 2011). Although it has been shown that this approach successfully improves the accuracy of the recommendations (Bennett and Lanning, 2007), it is limited in the interpretability of the latent features found (Koren et al., 2009).

The current work defines an automatic and recommendation task agnostic feature engineering process, which is based on graph-based representation of a recommender system data. The details of this process are provided in the following section.

3 Graph Based Data Modeling for Recommendation Systems

In this section, an approach for enhancing recommendations based on representing the data as a graph is presented. This representation allows a set of graph algorithms to be applied and a set of graph-related metrics, which offer a new perspective on the data and allow the extraction of new features, to be deduced. Following a brief overview of the approach, the structure of recommender system datasets is formalized (Section 3.1). Then a detailed description of porting data from a classical tabular representation to a graph-based representation is given (Section 3.2). An elaboration of methods for generating multiple graph representations follows (Section 3.2.2) and finally the process of exhaustively distilling graph features from these representations is outlined (Section 3.3).333An open source package implementing the approach is released at

Figure 1: Graph modeling and feature extraction flow chart

The input to the process (illustrated in Figure 1) is a tabular recommender system dataset and the output is a set of graph-based features capturing the relationships between the dataset entities from the graph perspective. The first step deals with the generation of a complete graph representation of the data: the tabular data is converted into a representation where the dataset entities are nodes, connected based on their co-occurrence in the data. Next, a set of partial representations is derived from the complete graph: first the basic representation containing only user and item nodes, and then additional alternative representations, each with a unique combination of relationships filtered from the complete graph. The partial representations are passed to the next step, where the extraction of the graph features is performed. Finally, the newly generated graph-based features are used to supplement the original features available in the dataset and this extended data is fed into the recommender system for the generation of predictions or recommendations.444Note that although the process of selecting features that are more predictive for the task at hand (i.e., feature selection) is outside the scope of the propose approach, it is addressed indirectly by the features extracted from the partial graph representations. In the following sub-sections the above steps of the feature extraction process are elaborated.

3.1 The Structure of a Recommender System Dataset

In (Burke, 2007; Ricci et al., 2011), classical recommendation approaches are categorized into several key groups: collaborative filtering, content based filtering, demographic, knowledge-based, community-based, and hybrid approaches. We first consider the representation of the input data used by these approaches, which can be converted into a tabular form as follows:

  • In collaborative filtering, the data is represented as a matrix of user feedback on items (matrix dimensions are usersitems), where both the users and the items are denoted by their unique identifiers and the content of the matrix reflects the feedback of the users for the items, e.g., numeric ratings or binary consumption logs.

  • In content based filtering, the items are modeled using a set of features, e.g., terms or domain features. Here, the matrix dimensions include the identifiers of the users, as well as the identifiers of the content features, and the values represent the preferences of the users for the features. The model also contains a second matrix with item identifiers and the same content features. The values in this matrix represent the weights of the features in each item.

  • In demographic recommenders, the demographic features of the users are exploited in order to assign them to a group with a known set of preferences. Hence, in essence, it is analogous to the representation of content-based recommender systems, where a user’s demographic features are used instead of individual user’s characteristics and preferences.

  • Two variants of knowledge-based recommenders – case-based and constraint-based – break the items into weighted features, e.g., the price of a product and the importance of the price for the user. This model can be represented by two matrices, one contains the items’ weighted features and the second contains the users’ ranking of the features’ importance. In the items matrix, each column represents a feature, each row represents an item, and the values are the strength, or how representative the feature is of the item. Similarly, in the users matrix, each column represents a feature, each row represents a user, and the values represent the importance of the feature for the given user.

  • Community-based recommenders combine information regarding users’ social/trust relations with their ratings. Therefore, ratings of a trusted or socially close user are weighted heavier than those of a less trusted one. The items’ rating information can be represented in a matrix identically to the one described in the collaborative filtering approach. The trust or social relations weights between users can be represented by a second matrix, where the rows and columns are represent users and the values quantify the degree of the relationship between them. The values of the matrix diagonal are 1, since users fully trust themselves, while the rest of the matrix can be either symmetric or directional.

  • Finally, hybrid approaches combine some of the above stand-alone recommendation models and, therefore, can be represented using the matrix representation.

The datasets used by the above approaches, which we denote by , contain two key types of entities. The first refers to the entity for which the recommendations are generated, i.e., the user; it is referred to as the source entity and denoted by . The second refers to the entity that is being recommended, e.g., item, content, product, service, or even another user. This entity is referred to as the target entity and denoted by . This notation follows the primary goal of a recommender system: to recommend a target item to the source of the recommendation request. 555This definition will also be useful when moving to a graph representation, where metrics are defined relative to source and target vertices. An alternative definition of “target users” would have led to confusion and would have broken traditional definitions of metrics, e.g., shortest path measured from source to target and not vice versa. Additional data available in the datasets typically represent the features of the source and/or the target entity, or the relationships between the two. The feature set is denoted by .

For example, in a movie recommenderation dataset, refers to the system users and to the recommendable movies. Any available features describing either the users or the movies are denoted by . User features can be the user’s age, gender, and location, while movie features can be genre, director, language, and length. A practical assumption is made that in a tabular recommender dataset, all the features associated with an entity are stored in the same table as the entity itself. That is, the gender of a user is stored in the user table rather than in the movie table. A formal representation of the entities and their features in the above example is , where , , and the features are split into as follows: and .

It should be noted that the source and target entities can have common features (Berkovsky, 2006; Berkovsky et al., 2008). For example, in the case of a restaurant recommendation task, the source entity (user) and target entity (business) can both have the “location” feature. The role of the source/target entities and features can also change according to the recommendation task at hand. In the restaurant recommendation example, when the task is to recommend restaurants to users, the users are the source entity, the restaurants are the target entity, and location is a feature of both. However, if the task was to recommend a location, e.g., tourist destinations, for a user to visit based on the restaurants in that location, then the source entity would still be the users, the target entity would be the locations, and the restaurants would be the features of the locations.

An important aspect that needs to be considered is the relationship observed between the entities, e.g., the fact that a user watched, rated, tagged, or favored a movie. Relationships can be established not only between a source and a target entity, but also between two source/target entities. Examples of relationships between two user entities are the directional followee-follower relationship or the non-directional friendship. Relationships between two movies can be established because they are directed by the same director, are in the same language, and so forth. Relationships between entities are defined using the tuple . For example, the availability of user ratings for a movie is defined by and friendship between two users is defined by 666Additional friendship features, such as duration or strength, can also be included. The set of all possible relationships in a dataset is denoted by , such as in the movies example .

Given the above formalization of entities, features, and relationships, a recommendation task implies the prediction of a relationship between entities. For example, the task of a movie recommender can be considered as the prediction of the relationship. This relationship can be numeric (star rating) or binary (interested or not interested), but the recommendations delivered to the users are guided by the predicted values of . If, on the contrary, the system is a social recommender that recommends online friends, then the relationship in question is and its task is to recommend a set of candidate friends.

In addition to the original data that is available to the recommender, more features can be generated and distilled, thus, enriching the dataset. For example, two popular features frequently computed in rating-based recommendation datasets are the average rating of a user and the average rating for an item. These features are associated with the users and items, stored in the relevant tables, and they are used to refine, e.g., normalize, the predicted ratings and improve the quality of the recommendations (Schafer et al., 1999). The question addressed in this work is whether the availability of additional, supposedly more complex, features that encompass more information and stem from graph representation of the data can contribute to the accuracy of the predictions and the quality of the recommendations. In the following sub-sections, the details of extracting and populating features are provided.

3.2 Transforming a Tabular Representation into a Graph-based Representation

3.2.1 Basic graph representation for recommender systems data

When moving from the tabular to the graph-based representation of a recommender system dataset, there are multiple graph design considerations. Three key design questions are:

  1. Should the graph encompass all the available data? What parts of the dataset are important and need to be represented by the graph?

  2. Which entities from the selected data should be represented by graph vertices and which entities by graph edges?

  3. How should the edges be defined? Should they be directed or undirected? Should they be labeled? What should the labels be?

Regarding the first question, it is probable that the decision regarding the data to be represented in the graph is data-dependent. For some domains, datasets, and recommendation tasks, certain parts of the data may be more informative than others. Since the space of possible graph-based data representations is too large for determining a-priori the most suitable scheme, a possible alternative is to start with a graph model based on the entire data, and then, to systematically extract all sub-graph representations and their features. This leads to automatic coverage of the entire search space, inherently uncovering the representations that produce the most effective features. Then, the most informative feature set can be selected.

Figure 2: Examples of two types of graph schemes for representing a recommender system dataset: bipartite (a,b) and non-bipartite (c,d). In (b) the red block confines the multi-part bipartite graph component. In (c) and (d) the red edges break the bipartite structure.

To answer the second and third questions, an intuitive modeling approach is used. Namely, the graph model considers all the source, target, and feature entities as vertices, while their links and relationships between features (including user feedback on items) are the edges. If the information about the relationship is binary, e.g., the item is viewed or not, the edges are not labeled. Otherwise, the edges’ labels communicate the information about the relationship, e.g., rating or type of association. In most cases, the edges are not directed, as information about a feature connected to an entity or about an entity connected to a feature is equivalent. Although this work does not consider directed edges, the proposed approach can be extended to support this (outlined in Section 6.2).

Based on the above abstraction of recommender systems datasets, the following basic graph representation emerges. User and item entities are represented by the graph vertices, and edges connect a user and an item vertex when an association between the two is available. This association can be explicit (ratings or likes) or implicit (content or user view). This graph is called a bipartite graph (West et al., 2001), because it can be split into two partitions consisting of the source and target entity vertices, i.e., the users and items, respectively (Figure 2-A).

The basic representation can be extended by adding additional features as new graph vertices and linking them to the existing vertices. For example, if user locations are provided, each location can be represented by a vertex and the users associated with the locations are linked to their vertices. A similar situation may occur in the target partition of the graph, e.g., the target entity of movies and a variety of their content features: genre, actors, keywords, and more (Figure 2-B). Adding the feature vertices still preserves the bipartite nature of the graph, but the partition with the added features gets virtually split into two groups of vertices: the entities themselves and their features.

The situation changes, however, when adding information within the source or target partitions, e.g., user-to-user social links or item-to-item links of the domain taxonomy. This information introduces new links within the partitions, which break the bipartite structure (Figure 2-C). Additional information that may break the bipartite structure is the common features shared between the source and target partitions. For example, in the movie domain, the items may be linked to their genres, while the users may also express their preferences towards the genres. Thus, links to the genre vertices are established from both the user and item partitions (Figure 2-D) and the graph is no longer bipartite. Note that each of the four schemes shown in Figure 2 potentially generates different sets of features and the values of the features also vary.

Following is an outline777For readability purposes, the pseudo codes in this paper omit several pre-processing steps and technical optimizations. The exact implementation details can be found in the accompanying library. of a high-level approach for generating the complete graph, which includes all the data and relationships of a recommender dataset. The algorithm scans all the tables in the dataset, and for each column that is not a source entity column, target entity column, or feedback column (e.g., ratings) it generates a graph node for every unique value appearing in the column. Thus, every unique and is assigned to a graph vertex, as well as every actor, director, movie genre, keyword, and so forth. Features that are non-categorical, e.g., movie budget, can be discretized using a simple binning, e.g., under $10M, $10M-$20M, $20M-$30M, etc. When the range of values is unknown, the discretization can split the values based on their observed distribution, e.g., four equal-sized quarters, each containing 25% of the data. Upon discretizing the values in the columns and creating the nodes, all the nodes matching the values that appear in the same row are connected by edges to the source and target nodes of the same row, if they are available in the table. The result is a graph that contains all the values of the features as the graph nodes, which are connected to the source and target entities based on their co-occurrence in the data.

3.2.2 Multiple sub-graph representations

Despite being included in a dataset, not all the features are necessarily informative and contribute to the accuracy of the recommendations. Certain features may be noisy or bear little information, thus, hindering the recommendation process. For example, if a feature is sparsely populated, its values are identical across users, or it is populated only across a certain subset of users, then this feature is unlikely to help the recommender and may not be included in the graph representation. However, it is hard to assess the contribution of the features in advance with a high degree of certainty. This leads to the idea of automatically deriving multiple sub-graph representations from the complete graph and extracting the graph features for each sub-graph first, and selecting the most informative ones in a later stage. Specifically, all the possible sub-graphs are exhaustively generated and their features are extracted. Each sub-graph represents a combination of features influenced by the entities and relationships included in the graph. The process is presented in detail in Algorithm 1.

input :  - complete graph representation of the dataset
- edge type of the relationship being predicted
output :  - set of features extracted from various sub-graph representations
1 GenerateEdgeCombinations({}, ) foreach   do
2        RemoveEdgesFromGraph(, ) ExtractGraphFeatures(, ) ( )
3 end foreach
Algorithm 1 Generate sub-graphs and extract features

The input to the algorithm is the complete graph representation , which was discussed at the end of Section 3.2.1, and the edge representing the relationship being predicted. The function GenerateEdgeCombinations invoked in line 1 returns all the possible combinations of different types of graph edges. Note that this function receives also the type of the predicted edges . This is done in order to preserve the edges in all the sub-graphs. Namely, this type of edges will not be included in the combinations that are removed from the complete graph and, therefore, will be present in all the sub-graphs.

Upon generating all the possible edge type combinations, the set is iterated over and the function RemoveEdgesFromGraph is invoked to create a sub-graph by removing the combination from (line 4). Then, the function ExtractGraphFeatures is invoked to extract from the set of possible graph features referred to as (line 5, to be elaborated in Section 3.3) and append to the set of features (line 6). Finally, in line 8 the algorithm returns – the set of all the possible graph features from all the possible sub-graphs.

Figure 3: Four sub-graph schemes that are generated from the complete schema based on the relationship permutations. Dashed lines represent links removed from the graph.

The execution of Algorithm 1 is illustrated by an example in Figure 3. Consider a graph , where is the set of vertices of the source entities , target entities , and domain feature values . In addition, is the set of graph edges, reflecting three relationship types: is the source-target relationship being predicted; is the relationship between the target entities and domain features; and is the relationship between the source vertices. In graph terminology, the recommendation task is to predict the label (or the existence) of an edge between a source vertex and a target vertex .

For this graph, the set created by GenerateEdgeCombinations includes . These are the combinations of edges that are removed from the graph while creating sub-graphs, whereas the predicted relationship is preserved in all the sub-graphs. Removing these combinations of edges, function RemoveEdgesFromGraph generates four variants of shown in Figure 3: , , , and . Note that is the complete graph, whereas other sub-graphs have either or , or both removed. For each , function ExtractGraphFeatures is invoked to extract the respective feature set and all the extracted feature sets are appended to .

3.3 Distilling Graph Features

The function ExtractGraphFeatures in line 5 of Algorithm 1 received a sub-graph derived from the complete representation and was invoked to extract a set of graph-based features. Moreover, this function was invoked for all the possible sub-graphs, to ensure that all the possible graph features are extracted. The graph-based features are extracted using a number of functions, each calculating a different graph metric. These functions, referred as generators, are divided into several, families according to the number of graph vertices they process.

input :  - sub-graph derived from the complete graph representation
- edge type of the relationship being predicted
output :  - set of features extracted from
1 ExtractPredictedEdges(,) foreach (,) of  do
2        foreach 1-Function in 1-VertexGenerators do
3               1-Function() 1-Function() ( )
4        end foreach
5       foreach 2-Function in 2-VertexGenerators do
6               2-Function(,) ( )
7        end foreach
8        ExtractEntityCombinations({}) foreach   do
9               foreach N-Function in N-VertexGenerators do
10                      N-Function(, , ) ( )
11               end foreach
13        end foreach
14       return
15 end foreach
Algorithm 2 Extract graph features from a sub-graph
Figure 4: Key graph-based feature generator families and their instances

The main steps of ExtractGraphFeatures are detailed in Algorithm 2, which uses three types of generators:

  • 1-VertexGenerators are applied to a single graph vertex, either the source or the target entity, and compute features of this vertex only, e.g., the PageRank score (Figure 4-A).

  • 2-VertexGenerators are applied to a pair of vertices, the source and the target entities, and compute graph-based relationships between the two, e.g., the shortest path (Figure 4-B).

  • N-VertexGenerators are applied to vertices, two of which are the source and target entities and the rest are not. An example function from this family is “number of vertices of type X, which are common neighbors of the source and target vertices” (Figure 4-C).

Section 3.3.1 lists the functions from each generator family that were used. Note that these are executed iteratively, in order to generate all the possible graph features. By no means this list of functions is exhaustive; it only exemplifies a number of popular functions that were used, but many more functions can be conceived and added.

At the initial stage of Algorithm 2, edges belonging to the predicted relationship are copied to the set (line 2). For each in this set, the generators are invoked as follows. The 1-VertexGenerators functions are invoked in lines 5 and 6, respectively, on the and vertices of . Applying these functions to other vertices is unlikely to produce features that can contribute to the prediction of the desired relationship, while leading to significant computational overheads. Hence, 1-VertexGenerators are restricted to these two vertices only. The 2-VertexGenerators are applied in line 10 to the pairs of vertices and . Then, the ExtractEntityCombinations function is invoked in line 13, in order to create a set of all the possible entity combinations of vertices, . These combinations necessarily involve and , and in addition any other type of graph vertices. For each combination of size (line 15), the relevant N-VertexGenerators generators are invoked in line 17. Features extracted by 1-VertexGenerators, 2-VertexGenerators, and N-VertexGenerators are all appended to .

Note that the value of determines the N-VertexGenerators functions that are invoked and the relationships they uncover. Again, two of the vertices are necessarily and , whereas the third vertex can be of any other entity linked to either of them. For instance, for in the movie recommendation task and entities of user, item, and location, the relationship can be “the number of cinema locations that the user has visited and where the movie is screened”. The generator considers the user and movie vertices, and then, scans all the location vertices and identifies those, with edges connected to both. It should be noted that more complex relationships with a higher value of can be considered. Since a broad range of combinations is possible, the N-VertexGenerators extract a large number of features that surpasses by far the set of features that can be engineered manually.

3.3.1 Distilled graph features

The set of metrics selected for implementation in this work and used for the evaluation of the approach is now given in detail. The metrics are those that are commonly implemented in widely used graph analysis libraries – NetworkX (Hagberg et al., 2008), igraph (Csardi and Nepusz, 2006), and Gephi (Bastian et al., 2009)) – and used in social network analysis and measurement works (Wilson et al., 2009; Lewis et al., 2008). It is important to stress that this set of metrics is only a portion of those that could be used and serves only as an example. The space of all graph metrics is large, as can be seen in (Costa et al., 2007; Wasserman, 1994; Coffman et al., 2004), and, thus, could not be exhaustively evaluated within the scope of this work.

The set of 1-VertexGenerators functions were implemented and used for evaluation are degree centrality (Borgatti and Halgin, 2011), average neighbor degree (Barrat et al., 2004), PageRank score (Page et al., 1999), clustering coefficient (Latapy et al., 2008), and node redundancy (Latapy et al., 2008). These metrics are referred to as the basic graph features. Following is a brief of the 1-VertexGenerators functions.

  • Degree Centrality (Borgatti and Halgin, 2011) (or, simply, node degree) quantifies the importance of a vertex through the number of other vertices to which it is connected. Hence, in the bipartite graph, the degree centrality of a user vertex is the activity of , i.e., the number of items with which is associated, and, vice versa, for an item vertex it is the popularity of , i.e., the number of users who are associated with . In a graph that includes metadata, the number of metadata vertices associated with either the user or the item vertex are added to the degree centrality score. The degree of centrality of a vertex is denoted by .

  • Average Neighbor Degree (Barrat et al., 2004) measures the average degree of vertices to which a vertex is connected. In the bipartite graph, this metric conveys for – the average popularity of items with which is associated, and for – the average activity of users who are associated with . Formally, if denotes the set of neighbors of a vertex , then the average neighbor degree is


    In a graph with metadata, the average neighbor degree of a user/item vertex also incorporates the popularity of the metadata features with which it is associated.

  • PageRank (Page et al., 1999) is a widely-used recursive metric that quantifies the importance of graph vertices. For a user vertex , the PageRank score is computed through PageRank scores of a set of item vertices {} with which is associated and vice versa Thus, the PageRank score of a user vertex can be expressed as


    i.e., the PageRank score of depends on the PageRanks of each item vertex connected to , divided by the degree of . In a graph with metadata, the PageRank scores of user/item vertices are also affected by the PageRank of the metadata vertices to which they are connected.

  • Clustering Coefficient (Latapy et al., 2008) measures the density of the immediate subgraph of a vertex as the ratio between the observed and possible number of cliques of which the vertex may be a part. Since cliques of a size greater than two are impossible in the bipartite graph, measures the density of shared neighbors with respect to the total number of neighbors of the vertex. The formal definition for the bipartite graphs is

  • Node Redundancy (Latapy et al., 2008) is applicable only to bipartite graphs and shows the fraction of pairs of neighbors of a vertex that is linked to the same other vertices. This metric quantifies for user vertex - the portion of pairs of items with which is associated that are also both associated with another user . Likewise, for item vertex , it quantifies the portion of pairs of users associated with and also both associated with another item . If a vertex, node redundancy of which is computed, is removed from the graph, the metric reflects the fraction of its neighbors that will still be connected to each other through other vertices. Intuitively, in a bipartite graph can be seen as the portion of connected ‘squares’ of which is a part, among all the potential ‘squares’.

Next, multiple-vertex generator functions are detailed. Specifically, the following functions from the 2-VertexGenerators and N-VertexGenerators families were implemented:

  • Shortest Path (Floyd, 1962). Unlike the above feature generators that operate on a single vertex, shortest path receives a pair of graph vertices: a source entity and a target entity. It evaluates the distance, i.e., the lowest number of edges, between the two vertices. The distance communicates the proximity of the vertices in the graph, as is a proxy for their similarity or relatedness. A short distance indicates high relatedness, e.g., more items shared between users or more features for items, while a longer distance indicates low relatedness.

  • Shared Neighbors of Type . This is one of the N-VertexGenerators functions, which receives three parameters: source entity vertex, target entity vertex, and entity type . It returns the fraction of neighbors shared between the source and target vertices that areof the desired type . The fraction is computed relatively to the union of the source vertex neighbors with the target vertex neighbors. Note that this feature cannot be populated for graphs that do not have a sufficient variety of entities connected to the source and target vertices. For example, the generator is inapplicable for a graph having only the source and target entities.

  • Complex relationships across entities. Apart from the above mentioned generators, system designers may define other N-VertexGenerators functions, which could extract valuable features. For example, it may be beneficial for a movie recommender to extract the portion of users, who watched movies from genres , directed by person , and released between years and . It is clear that it is impossible to exhaustively list all the combinations of such features: this is domain- and application-dependent. Hence, the task of defining these complex generators is left open-ended and invites system designers to use the provided library and develop their own feature generators.

To recap, each of the above 1-VertexGenerators and 2-VertexGenerators is applied to every source and target vertex and generates features associated with the vertex or a pair of vertices. In addition, N-VertexGenerators is applied to the source and target vertices and all the possible combinations of other entity types. Recall that this is done for every sub-graph extracted from the complete graph888Some generators should be applied in a different manner to certain sub-graphs, e.g., and generators in bipartite and non-bipartite graphs. and the complexity of the feature generation task becomes clear.

3.3.2 Quantifying the number of graph features

Here, the number of graph features that can be extracted from a recommender system dataset using the proposed approach is quantified. The quantification illustrates the coverage and computational complexity of the extraction process. Considering Algorithms 1 and 2, it becomes evident that the number of extracted features primarily depends on two key elements: the number of sub-graphs that are extracted from the complete graph and the number of entities in each sub-graph. Both of these are derived from the number of entities and relationships in the dataset.

Building on the recommender dataset analysis given in Section 3.1, each dataset contains two entities, and , and relationships. The features are extracted from the complete graph and also from the sub-graphs. The latter are generated from any non-ordered combination of relationships from , but necessarily contain the predicted relationship , such that the overall number of relationships is . Hence, the number of possible sub-graph combinations is


the sum of the numbers of combinations of size that can be produced, where .

Figure 5: Number of extracted graph features versus the number of relationships in the dataset

For each of these sub-graphs, let us assume that features can be generated by 1-VertexGenerators function for and individually, and features can be generated by 2-VertexGenerators for the pair ). On top of these, at least more complex by N-VertexGenerators functions can be applied to every sub-graph, as per the number of relationships available in the data. This brings the overall number of generated graph-based features to the order of


The library that accompanies this work defines single node generators and dual node generator. For illustrative purposes only, Figure 5 plots the number of extracted features, which is exponential with the number of relationships . For example, the number of features extracted from a dataset having relationships is smaller than 100. However, for a dataset with relationships, the number of extracted features exceeds 5,000. Clearly, engineering all these features manually would require considerable resources, whereas the proposed approach is fully automated.

4 Experimental Setting and Datasets

It is important to highlight that the product of the presented approach is graph-based features that help to generate recommendations using existing recommendation methods. These features can either be used as stand-alone features, i.e., the only source of information for the recommendation generation, or be combined with other features. Hence, the baseline for comparison in the evaluation part is the performance of common recommendation methods when applied without the newly generated features.

To present solid empirical evidence, the contribution of the graph feature extraction to the accuracy of the recommendations was evaluated using three machine learning methods: Random Forest

(Breiman, 2001)

, Gradient Boosting

(Friedman, 2000)

, and Support Vector Machine (SVM)

(Gunn et al., 1998). Both Random Forest and Gradient Boosting are popular ensemble methods that have been shown to be accurate and won recommendation (Koren, 2009) and general prediction (Yu et al., 2010) competitions. The methods are also implemented in widely used machine-learning libraries (Pedregosa et al., 2011; Hall et al., 2009), and were shown to perform well in prior recommender systems works (Jahrer et al., 2010; Bellogín et al., 2013; Töscher et al., 2009).

In the next section, two case studies showing the contribution of graph-based features are presented. These case studies demonstrate the value of the proposed graph-based approach when applied to a range of recommendation tasks and application domains. Case study i@ evaluates the performance of the graph-based approach, evaluating its contribution in different domains and tasks. Case study ii@ focuses on the impact of representing data using different graph schemes on the recommendations. Altogether, five datasets were used across the case studies and the mapping between the datasets and case studies is laid out in Table 1. In the following sub-sections, a brief characterization of the datasets, as well as the overview of the recommendation tasks and evaluation metrics, is provided.

Case Study i@ –
Overall contribution of
graph-based features
Case Study ii@ –
Performance of different
graph schemes
Dataset i@ -
Dataset ii@ - Yelp
Dataset iii@ - Yelp ii@
Dataset iv@ - OSN
Dataset v@ - Movielens
Table 1: Mapping of datasets to case studies

4.1 Dataset i@ –

The first dataset is of users’ relevance feedback provided for music performers via the online service. The dataset is publicly available999 and was obtained by (Cantador et al., 2011). The dataset consists of 1,892 users and 17,632 artists whom the users tagged and/or listened to. More than 95% of users in the dataset have 50 artists listed in their profiles as a result of the method used to collect the data. There are 11,946 unique tags in the dataset, which were assigned by users to artists 186,479 times. Each user assigned on average 98.56 tags, 18.93 of which are distinct. Each artist was assigned 14.89 tags on average, of which 8.76 are distinct. The dataset also contains social information regarding 12,717 bidirectional friendship linkss established between users, based on common music interests or real life friendship.

(a) Distribution of the number of friends per user
(b) Distribution of the average number of listens per user/artist and overall
Figure 8: data characteristics

A brief characterisation of the dataset is shown in Figure 8. Figure (a)a illustrates the distribution of the number of friends per user. The average number of user-to-user edges is low, which is illustrated by the vast majority of users having less than 10 friends and about half of users having less than four friends. Intuitively, a friendship edge between two users can be an indicator of similar tastes, and as such, friendship-based features are expected to affect the recommendations. Figure (b)b, shows the distributions of the number of listens per artist, user, and in total. It can be observed that the overall and per artist distribution are highly similar. The user-based distribution resembles the same behaviour, but drops faster. This aligns with the intuition that the number of users who listen to several hundreds of artists is smaller than the number of artists who are listened by several hundreds of users (Haupt, 2009).

There are four relationships in the dataset: [user, listens, artist], [user, uses, tag], [tag, used, artist], and [user, friend, user]. The task defined for this dataset was to predict the artists to whom a users will listen the most, i.e., the predicted relationship was [user, listens, artist]. This task requires first predicting the number of times each user will listen to each artist, then ranking the artists, and choosing the top K artists. Based on the sub-graph generation process detailed in Algorithm 1 and the relationship being predicted, the data can be represented via eight graph schemes in general. Four graph schemas that incorporate the source and target entities were evaluated:

  • A bipartite graph that includes users and artists only, denoted as the baseline (BL)

  • A non-bipartite graph that includes users, social links, and artists (BL+F)

  • A non-bipartite graph that includes users, artists, and tags assigned by users to artists (BL+T)

  • A graph that includes all the entities and relationships: users, tags, artists and social links (BL+T+F).

Figure 9: Graph representations for dataset i@ (

The four graphs are illustrated in Figure 9

. For each of the graphs, two sets of features were generated: basic features, as well as a set of extended features associated with the auxiliary data being included. The generated features are used as the input for a Gradient Boosting Decision Tree regressor

(Friedman, 2000), trained to predict the number of listens for a given user-artist pair.

A 5-fold cross validation was performed. Users with fewer than five ratings were pruned, to ensure that every user has at least one rating in the test set and four in the training set. For each training fold, a graph was created for each graph model shown in Figure 9. For each user, in the test set, a candidate set of artists was created by selecting artists out of the set of the artists listened to by the user and complementing these by randomly selected artists. For example, a candidate set of 100 artists included 10 artists listened to by the user and 90 random artists. Three different candidate sizes were evaluated: 50, 100, and 150.

Then, a regressor was used to predict the number of listens for each artist in the candidate set, rank the set, and compute precision at 10 (P@10) as the performance metric (Shani and Gunawardana, 2011). If candidate set consists of the artists selected from a user’s artist set denoted by and the randomly selected artists set , then P@10 is computed by , where top_K_artists is the list of top-K artists in

ranked according to the predicted number of listens. Finally, an average of the P@10 scores across all the users in the test set is computed. In order to evaluate the significance in the performance of the various graph schemes feature sets, a two-sided t-test was applied on the results.

4.2 Dataset ii@ – Yelp (from RecSys-2013)

The second dataset is of users’ relevance feedback given for businesses, such as restaurants, shops, and services. The dataset was released by Yelp for the RecSys-2013 Challenge (Blomo et al., 2013), and is publicly available.101010

For the analysis, users with less than five reviews were filtered out, which resulted in 9,464 users providing 171,003 reviews and the corresponding ratings for 11,197 businesses. The average number of reviews per user is 18.07 and the average number of reviews per business is 15.27. A key observation regarding this dataset is the distribution of ratings, which were almost all positive (more than 60% of ratings were at least 4 stars on a 5-star scale), and the low variance of ratings across businesses and users. This phenomenon is common in star rating datasets, where users tend to review fewer items that they did not like.

(a) Distribution of the number of reviews per user
(b) Distribution of the number of reviews per business
Figure 12: Yelp dataset characteristics
Figure 13: Graph representations for dataset ii@ (Yelp - RecSys-2013 Challenge)

Figure 12 summarizes the basic statistics of users and businesses in the Yelp dataset. Figure (a)a illustrates the distribution of the number of reviews and ratings per user. A long tail distribution of the number of businesses a user reviewed can be observed, with more than 75% of the users providing less than 10 reviews. Likewise, we observe in Figure (b)b the distribution of the number of reviews a business received. Only 24% of businesses attract more than 10 reviews, while only a few businesses (less than 2%) have a relatively high number of reviews (more than 100). Despite the high number of categories in the data, the average number of categories with which a business is associated is only 2.68. Every business is also associated with a single location.

The task defined for this dataset is the one originally defined for the RecSys-2013 challenge, i.e., to predict the ratings a user will assign to businesses. Two graph models were implemented and evaluated based on this dataset: a bipartite model with sets of vertices U and B representing users and businesses and a tripartite111111The use of ‘tripartite’ is slightly inconsistent with the canonic definition, such that the “bipartite graph with metadata nodes” notation would be more appropriate. For the sake of brevity, the bipartite and tripartite terminology is used. model with sets of vertices U, B, and M representing users, businesses, and metadata items, respectively. The high-level graph representation models are illustrated in Figure 13, while the detailed presentation of the sub-graphs will be given in Section 4.3, in which the follow-up dataset is presented.

The features generated for this dataset were aggregated into three groups:

  • Basic features that include only the unique identifiers of users and businesses .

  • Manual features that include the number of reviews by , average rating of , number of reviews for , number of categories {} with which is associated, average number of businesses in {}, average rating of businesses in {}, the main category121212Each business in the Yelp dataset is associated with multiple categories, some having an internal hierarchy. The main category is the most frequent root category a business was associated with. of , average degree of businesses associated with the main category of , average degree of businesses in {}, and the location of .

  • Graph features that include the degree centrality, average neighbor degree, PageRank score, clustering coefficient, and node redundancy. These features were generated for both user nodes and business nodes , whereas an additional shortest path feature was computed for the pairs of (,).

In this case, a Random Forest regression model (Breiman, 2001) was applied for the generation of the predictions of users’ ratings for businesses. At the classification stage, the test data items were run through all the trees in the trained forest. The value of the predicted rating was computed as a linear combination of the scores of the terminal nodes reached when traversing the trees. It should be noted that the ensemble of trees in Random Forest and the selection of the best performing feature in each node inherently eliminate the need for feature selection. Since every node uses a single top performing feature for decision making, the most predictive features are naturally selected in many nodes and the ensemble of multiple trees virtually replaces the feature selection process.

A 5-fold cross validation was performed. For each fold, the predictive model was trained using both the original features encapsulated in the dataset and the new graph features. The basic and manual groups of features were populated directly from the reviews, whereas the graph features were populated from the bipartite and tripartite graph representations and augmentrf the former groups of features. Predictive accuracy of various combinations of features was measured using the widely-used RMSE metric (Shani and Gunawardana, 2011), computed as , where is the number of predictions, are the predicted values, and are the actual user ratings. A two-sided t-test was applied to validate the statistical significance of the results.

Figure 14: Yelp II Dataset characteristics – distribution of social links
Figure 15: Graph representations for dataset iii@ (Yelp ii@)

4.3 Dataset iii@ – Yelp ii@ (with social links)

The third dataset is an extension that was released by Yelp to the previous dataset. The new version contains more users, businesses and reviews (although their distribution still resembles the one shown in Figure 12), and, more importantly, new information regarding users’ social links. The distribution of the social links among users is illustrated in Figure 14. It can be seen that the social links follow a long tail distribution, where most users have a small number of links: 29% with no links, 57% with less than 20 links, and only a few users with more than 20 links. The social links also break the bipartite structure of the first Yelp dataset, which influences the generated graph features.

The task for this dataset is identical to that of the first Yelp dataset, i.e., predicting users’ ratings for businesses. Eight graph models were generated and evaluated based on this dataset. The models are illustrated in Figure 15 and, depending on the availability of the user-to-user friendship edges, categorized as bipartite or non-bipartite. The complete graph is shown in the top-left schema. In the following three schemes one type of edges is missing: either social links, user names, or categories. In the next three, two types of edges are misisng: social and categories, social and names, and names and categories. Finally, in the bottom-right graph all three are missing.

The generated features presented in Section 3.3.1 are referred to in the evaluation of this dataset as the basic features. These features are aggregated into groups, based on the graph scheme from which they were extracted. For example, all the features extracted from the graph named “without category links” in Figure 15 were grouped into a combination having the same name. Another evaluated combination includes the union of all the features generated from all the graph schemes, and this is named “all graph features”. Finally, the union of “all graph features” with the “basic features” is referred to as “all features”.

A 5-fold cross validation was performed. For each fold, the predictive models were trained using graph features extracted from each of the above feature sets. The evaluation was conducted three times, each time training the models using a different method (Random Forests, Gradient Boosting, and SVM), in order to evaluate how the choice of method impacts the results. Predictive accuracy of various feature combinations was measured using the RMSE metric (Shani and Gunawardana, 2011), and a two-sided t-test was applied to validate statistical significance.

4.4 Dataset iv@ – OSN

The fourth dataset is an Online Social Network (OSN) profile dataset that was collected from six large networks: Facebook, LinkedIn,, Blogger, YouTube, and LiveJournal. The profiles were manually linked and matched to each other by the users themselves, as they mentioned their user names on other OSNs. The lists of user interests were then extracted from the OSNs and categorized into five domains: movies, music, books, TV, and general. The categorization was explicitly made by the users on Blogger, Facebook, and YouTube; all interests were categorized as music; no categorization was available on LinkedIn and LiveJournal, so that there interests were treated as general. Users having one interest only and interests mentioned by one user only were filtered, such that the resultant dataset contained 21,880 users with an average of 1.49 OSNs and 19.46 interests per user.

It can be observed in Table 2 that the most common interests in user profiles are from the music and general domains. Table 3 shows that Facebook is, by far, the OSN with the most listed interests. Table 4 shows the number of users who have at least one interest for a domain and OSN combination, with the right-most column indicating the total number of users. The OSN with the largest number of profiles is Facebook, followed by LinkedIn and Music interests are the most common across the Facebook and YouTube profiles, while general interests are the most common in Blogger profiles. Finally, Table 5 shows the average number of interests a user has in each domain and OSN.

The task defined for this dataset was to predict the interests of the users, based on their partial profiles. The data were represented using a single graph model because of the availability of only two entities: users and interests. The model is bipartite graph G = {U, I, E}, where users U = {} and interests I = {} are the vertices. User vertices are connected to interest vertices with an edge if the interest is mentioned in one of the available OSN profiles, i.e., E = { if listed }). The edges are labeled by the OSN(s), in which the interest was listed.

Domain Total Unique Domain Total Unique
General interests 154,245 13,053 Books 24,404 3,789
Movies 54,382 5,190 TV 53,508 4,027
Musics 139,307 21,255 All 425,846 47,314
Table 2: Total number of interests and unique interests in each domain
Network Total Unique Network Total Unique
Blogger 27,045 6,587 Livejournal 30,924 5,198
Facebook 253,217 31,511 YouTube 2,753 1,561 63,952 16,483 All 425,846 47,314
LinkedIn 47,955 6,325
Table 3: Total number of interests and unique interests in each OSN
General interests Movies Music Books TV Total
Blogger 2,716 1,090 1,370 518 3,136
Facebook 8,391 8,922 10,453 6,565 9,619 11,619 7,042 7,042
LinkedIn 7,755 7,755
LiveJournal 1,494 1,494
YouTube 448 484 650 552 1,548
Table 4: Number of users who have at least one interest in each domain and OSN
General interests Movies Music Books TV
Blogger 5.913 2.976 4.676 2.577
Facebook 6.999 5.662 6.509 3.414 5.562 9.082
LinkedIn 6.183
LiveJournal 20.69
YouTube 1.288 1.278 1.395 1.177
Table 5: Average number of interests available per user in each domain and OSN

From the graph, a set of graph and manually engineered features was extracted. They can be categorized into two groups: user features and interest features. Each of these groups can be split into two sub-groups: basic manual features ( and ) and graph-based features ( and ). The features include the number of OSNs of which the user is a member (), the number of user interests in each domain – books, TV, movies, music, general (, , , , , respectively), and the total number of interests (). The features include the number of users who liked the interest (), the number of OSNs where the interest appears (), Boolean features signifying whether the interest is listed on each OSN – Blogger, LinkedIn,, LiveJournal, Facebook, YouTube (, , , , , , respectively), and the domain to which the interest belongs ().

The graph-based features are identical for users and interests and contain: Degree centrality (, ), Node redundancy (, ), Clustering coefficient (, ), Average neighborhood degree (, ), PageRank (, ), and the Shortest path feature computing the distance between a user-interest pair. Additional features defined for this dataset are , , , and . Finally, and .

The experiments using the OSN dataset evaluated the effect of the features on the predictions of the likelihood of a user to list an interest. A Random Forest classifier was used and trained on the user-interest pairs augmented with their features. Each pair was classified into the ‘like’ or ‘dislike’ classes. 10-fold cross validation was applied for evaluation. For each fold, a graph was built, and then, the above features were extracted and fed into the classifier. Since no real disliked interests were in the data, random interests were selected from the interests not listed by the user. The number of disliked interests was equal to the number of liked interests for each user. The synthetic disliked interests were used only to train the classifier and not used in the evaluation. Precision was the metric chosen to evaluate the quality of predictions for a user: , where is the number of correctly and is the number of incorrectly predicted interests (Shani and Gunawardana, 2011).

4.5 Dataset v@ – Movielens

Movielens (Lam and Herlocker, 2012) is a classical recommender system’s dataset studied in numerous prior works. In this work it is used to show that the graph-based approach is as effective on legacy datasets as on more recent datasets including social data. The 1M Ratings Movielens dataset consists of 1,000,209 ratings assigned by 6,040 users for 3,883 movies, on a discrete scale of 1 to 5 stars. Each user in the dataset rated at least 20 movies. The distribution of ratings across users and movies is illustrated in Figures (a)a and (b)b, respectively. The dataset contains metadata of both users and movies. The user metadata includes the gender, occupation, zip code area, and age group, while the movie metadata contains the genre(s) of the movies.

The task defined for this dataset was to predict what ratings would users assign to movies. Based on the above description of the dataset, 32 graph schemes were generated and evaluated (see Figure 19). The schemes are categorized based on the number of relationships that were removed from the complete graph that contains all the entities and relationships. As can be seen, there are four categories: schemes with a single node type removed, containing 5 sub-graphs, schemes with 2 node types removed containing 10 sub-graphs, schemes with 3 node types removed containing 10 more sub-graphs, and finally, schemes with 4 node types removed containing 5 graphs. The minimal graph scheme is the one from which all the entities and relationships were removed, except for the source and target entities and the predicted ‘rating’ relationships.

(a) Distribution of ratings across users
(b) Distribution of ratings across movies
Figure 18: Movielens dataset characteristics
Figure 19: Graph representations for dataset v@ (Movielens). Each graph is an example of the sub-graphs in the group.

A 5-fold cross validation was performed. For each fold, the predictive models were trained using graph features extracted from each of the above graph schemes. The evaluations were conducted twice, training the models using the Random Forest and Gradient Boosting approaches, in order to evaluate how the choice of the learning method impacts the results. The predictive accuracy of various combinations of the above feature sets was measured again using the RMSE and MAE predictive accuracy metrics (Shani and Gunawardana, 2011), and a two-sided t-test was applied to validate statistical significance.

4.6 Summary of the datasets, features, and metrics

Table 6 summarizes this section and presents the experimental datasets, number of source and target entities, various sub-graph schemes investigated, number of extracted feature sets, groups of features, and evaluation metrics exploited. The five datasets contain large numbers of users and items and cover a broad range of data types, application domains, and recommendation tasks. The datasets also contain both legacy and recently collected datasets, such that the evaluation presented in the following section offers solid empirical validity.

Dataset Source and Target Entities Graph Schemes Graphs Feature Sets Extracted Features Learning Method Evaluation Metric 1,892 (users) 17,632 (artists) Bipartite + Non-bipartite (w/ social links, w/ tags, w/ social links+tags) 4 7 Basic graph features, extended graph features Gradient Boosting P@K
Yelp 9,464 (users) 11,197 (businesses) Bipartite + Bipartite with metadata 2 13 Basic graph features, manually engineered Random Forest RMSE
Yelp ii@ 13,366 (users) 14,853 (businesses) Bipartite + Non-bipartite (w/ social links), with and without metadeta 8 13 Basic graph features Random Forest, Gradient Boosting, Support Vector Machine RMSE
OSN 21,880 (users) 47,314 (interests) Bipartite 1 8 Basic graph features, manually engineered Random Forest Precision
Movielens 6,040 (users) 3,883 (movies) Bipartite + Bipartite with metadata 32 36 Basic graph features Random Forest, Gradient Boosting RMSE, MAE
Table 6: Summary of datasets characteristics

5 Results and Analysis

5.1 Case Study i@: Overall Contribution of the Graph-based Approach

This case study answers the broad question: How does the use of graph features affect the performance of rating predictions and recommendation generation in different domains and tasks? Each of the above datasets was represented by graphs and graph-based features were extracted from the graphs using the approach detailed in Section 3. For each dataset, a matching recommendation task was defined as follows: for the dataset the task was to predict the artists to which users will listen; for the two Yelp datasets the task was to predict user ratings for business; for the OSN dataset, to predict the interests in user profiles; and, for the Movielens dataset, to predict user ratings for movies. The tasks were performed and evaluated under three conditions:

  • Prediction with versus without the newly extracted graph features

  • Prediction with user-related versus item-related graph features

  • Prediction using features of a bipartite graph versus extended graph schemes, e.g, containing metadata.

All the evaluations were conducted using the N-fold cross validation methodology (Kohavi et al., 1995), with folds in the, both Yelps, and Movielens datasets, and

in the OSN dataset. For each fold, the complete graph representation was generated based on the entities from both the training and test sets, except for the relationships being predicted in the test set. A two-sided t-test was conducted with the null hypothesis of having identical expected values across the compared prediction sets. The tests assumed that the predicted ratings using feature set A and predicted ratings using feature set B were taken from the same population. The threshold used for a statistically significant difference was p=0.05.

Figure 20: Precision of feature combinations using the four graphs - dataset.

5.1.1 Dataset i@ – Results

Four graph schemes were generated for the dataset, as per the structure in Figure 9. For each graph scheme the set of basic graph features listed in Section 3.3.1 was extracted and populated. The basic features encapsulate only the user-artist listening data and denoted by . In addition, when the social and tagging data is available, namely, in the BL+T, BL+F, BL+T+F schemes, the set of extended features can be extracted. These features are denoted by , e.g., denotes the set of extended features extracted from the graph with the tagging data. Note that for the BL schema in Figure 9, having neither social nor tagging data, the basic and extended feature sets are identical, i.e., .

The P@10 results obtained for the extended feature sets extracted from the four schemes are summarized in Figure 20. The boundaries of the boxes represent the 25th and 75th percentile of the obtained P@10, and the average P@10 is marked by the dot inside the boxes. The values of the average P@10 are also given. The baseline for comparison in this case is the performance of the graph features extracted from the bipartite scheme , which scored P@10=0.336. A notable improvement, between 63% and 70%, was observed when the extended feature sets were extracted. For instance, , scored P@10=0.555, which is an improvement of more than 65%. A combination of the extended features using the graph that includes both social tags and friendships, is the best performing feature. This scored the highest P@10=0.571 and improved the baseline P@10 by as much as almost 70%.

In order to evaluate the significance of the results, a paired t-test was performed with each group of features, using the P@10 values obtained for each of the four graphs. The results show that among the extended feature sets, all the differences were significant, p0.05. Thus, the inclusion of auxiliary tagging and friendship data improved the accuracy of the prediction, while their combination including both components led to the most accurate predictions. More importantly, the extraction of graph-based features was shown to consistently and significantly boost the performance of the recommender, in comparison to the variant not using the extracted features.

Features combination Features RMSE Improvement
1 All_Features BasicManualGraph 1.0766 8.82%
2 AllExcept_Tripartite BasicManualBipartite 1.0775 8.75%
3 AllExcept_Basic ManualGraph 1.0822 8.35%
4 Manual_and_Bipartite ManualBipartite 1.0850 8.11%
5 AllExcept_Bipartite BasicManualTripartite 1.0896 7.72%
6 Manual_and_tripartite ManualTripartite 1.1073 6.22%
7 AllExcept_Manual BasicGraph 1.1095 6.04%
8 All_Graph BipartiteTripartite 1.1148 5.59%
9 AllExcept_Graph BasicManual 1.1175 5.36%
10 Bipartite 1.1188 5.25%
11 Tripartite 1.1326 4.09%
12 Basic 1.1809 N/A
13 Manual 1.1853 -0.37%
Table 7: RMSE of selected feature combinations - Yelp dataset (baseline combination in light gray).

5.1.2 Dataset ii@ – Yelp Results

Improvements due to the use of the graph-based approach were also evident in experiments using the second dataset (Yelp). As per the description in Section 4.2, basic (user and business identifiers), manual (number of reviews, average rating, business categoriy and location), and graph-based features were extracted and populated. The latter were further broken down into the bipartite and tripartite features. In this dataset, the performance of the basic features related to the user-to-business associations serves as the baseline. Table 7 presents the full results for all the feature combinations.

The largest improvement in the RMSE of business ratings prediction was an 8.82% decrease obtained for the combination of graph features with basic and manually engineered ones (row 1). The similarity of the RMSE scores obtained by the various combinations is explained primarily by the low variance of user ratings in the dataset. Since most ratings are similar, they are highly predictable using simple methods and there is only a limited space for improvement. A combination containing only the graph features (row 5) outperformed the baseline performance by 5.59%. On the contrary, the use of manual features (row 13) slightly deteriorated the accuracy of the predictions. This demonstrates the full benefit of the graph-based approach: extracting the graph features took less time than crafting the manual ones, and the graph features also outperformed the manual ones.

Figure 21: Significance of the differences between feature combinations in the Yelp dataset. White cells - significant, dark cells - not significant, p-value given.

An examination of the differences in the accuracy of the results obtained when combining various groups of features revealed a number of findings. An analysis of the performance of each group of features shows that the bipartite and tripartite feature sets performed noticeably better than the basic and manual feature sets (rows 10 and 11 versus rows 12 and 13). A combination of graph features (row 8) still outperforms slightly, although significantly, the combination of the basic and manually engineered features (row 9). To analyze the impact of the feature groups, each group was excluded from the overall set of features and the change with respect to the All_Features combination (row 1) was measured. When the graph features were excluded (row 9), the predictions were less accurate than when the basic (row 3) or manual features (row 7) were excluded. This indicates that the graph features provide the most valuable information, which is not covered by the basic and manual features.

The paired t-test performed using the RMSE values revealed that the majority of differences were significant, p0.001. The insignificant differences are highlighted in Figure 21. Three conclusions can be drawn from the insignificant pairs: (1) the bipartite features are comparable to all the graph features, which indicates the low contribution of the tripartite features; (2) all graph features are comparable to all features except for manual (i.e., graph and basic features), which indicates the low contribution of the basic features; and, (3) the combination of manual and bipartite features is comparable to the combination of all the features, which indicates that the former two are the most informative features in this case.

5.1.3 Dataset iii@ – Yelp ii@ (with social links) Results

The results of the evaluation using the extended Yelp ii@ dataset that includes social links between users are in line with the results of the original Yelp dataset. Table 8 lists the results of this evaluation for a selected set of feature combinations: basic user and business features, feature of the complete graph, features of all the sub-graphs, and the union of all the available features.

Features Subset RMSE Improvement
1 All Features 1.1416 1.73%
2 All Graph Features 1.1417 1.73%
3 Complete Graph 1.1450 1.45%
4 Business Features 1.1465 1.32%
5 User Features 1.1580 0.33%
6 Basic Features 1.1619 N/A
Table 8: RMSE of selected feature combinations - Yelp ii@ dataset (baseline combination in light gray).

The results show that the combinations including graph features generally outperform the basic feature sets. The best performing combination of graph-based features only, using all the features from all the sub-graph schemes (row 2, RMSE=1.1417), achieves a 1.73% improvement over the baselines. When adding the basic features to all the graph-based features, a slightly lower RMSE=1.1416 (row 1) is obtained. Another noticeable difference is between the business-related features, which achieve RMSE=1.1465 and the user-related features, which achieve RMSE=1.158 (rows 4 and 5, respectively). This intuitively indicates that the predicted ratings assigned to the businesses being predicted are more informative than the ratings of the target user. Again, the achieved improvements are generally modest, primarily due to the low variance of ratings in the Yelp ii@ dataset.

The performance differences between the evaluated combinations are mostly significant, p0.01, except for two pairs of feature sets. The difference between business-related features and complete graph features is borderline, with p=0.07. Also the difference between ‘All Features’ and ‘All Graph Features’ is expectedly insignificant. This shows that the most important contribution to the predictive accuracy comes from the graph features, while the addition of the basic features improves the prediction only a little.

5.1.4 Dataset iv@ – OSN Results

In the fourth dataset of online social networks, user mentions of interests in their profiles were predicted. Table 9 shows the precision scores achieved by individual features listed in Section 4.4, as well as by a number of their combinations. These features include the individual interest- and user-focused graph features IG and UG; basic interest and graph features IB and UB; their unions IG_All, UG_All, IB_All, and UB_All; as well as I_All = IG_All IB_All and U_All = UG_All UB_All. The baseline here is the prediction using the available user-interest features only.

Feature/Group Precision Feature/Group Precision Feature/Group Precision
All 0.6455 IB4 0.5287 UB5 0.4529
IG_All 0.5821 IB5 0.5230 UG1 0.4514
IG1 0.5745 IB6 0.5221 UB6 0.4507
IG2 0.5734 IB7 0.5215 UG_All 0.4465
IG3 0.5691 IB8 0.5206 UG3 0.4440
IB1 0.5687 IB9 0.5205 UG2 0.4430
I_All 0.5642 Baseline 0.5128 UB7 0.4405
IB_All 0.5599 SP 0.5107 UG4 0.4401
IG4 0.5589 UB1 0.4771 UG5 0.4400
IG5 0.5560 UB2 0.4736 UB_All 0.4391
IB2 0.5482 UB3 0.4646 U_All 0.4376
IB3 0.5307 UB4 0.4632
Table 9: Average precision for individual features and feature combinations in the OSN dataset IG - Interests Graph features, UG - Users Graph features, IB - Interests Basic (manual) features, UB - Users Basic (manual) features.
Figure 22: Precision CDF of the various feature combinations - OSN dataset.

Overall, item-related features were again seen to improve the precision of the predictions more than user-related features. In fact, the accuracy of all the item features is above the baseline precision of 0.56, while the accuracy of all the user features is below the baseline. This can also be seen through the comparison of the union of all item features I_All with the union of user features U_All, which shows the superiority of the former. Zooming on the item features, it can be observed that the graph-based item features IG outperform most of the basic item features IB, except for IB1 (the number of users who mentioned the interest), which turns out to be a reliable predictor. As a result, the union of item-related graph features IG_All obtains a higher precision than its basic feature counterpart IB_All, 0.58 vs 0.56. Combining the graph features with the basic ones produced the highest precision, 0.65. Hence, graph-based features resulted in an improvement of the the interest predictions.

The cumulative distribution functions of the results obtained by selected feature combinations is illustrated in Figure

22. The combined feature combination ‘All’, performs best by having the highest precision over large portions of the data. The item-related graph-based feature combination is third best, and it is very close to the combined graph-based feature set that comes second. The performance of all user-related feature sets is visibly lower, which shows another argument in favor of the extraction of the graph-based features.

5.1.5 Dataset v@ – Movielens Results

Features Subset RMSE Improvement MAE Improvement
1 All Features 1.0272 4.20% 0.8303 6.06%
2 All Graph Features 1.0362 3.36% 0.8349 5.53%
3 Movie Features 1.0400 3.01% 0.8380 5.18%
4 Basic Features 1.0722 N/A 0.8838 N/A
5 User Features 1.0895 -1.61% 0.8967 -1.46%
Table 10: Performance of selected features combinations - Movielens dataset (baseline combination in light gray, rows are sorted by RMSE).

Finally, the experimentation with the Movielens dataset re-affirms the contribution of graph-based feature extraction to the recommendation generation. The task in this dataset was to predict movie ratings, whereas the predictions were evaluated using the MAE and RMSE predictive accuracy metrics. Table 10 summarizes the perofrmance of a selected group of features. The basic user-item pairs are compared here with the user and item features used individually, all the extracted graph-based features, and the union of all of them, denoted by ‘All Features’ (row 1).

The already discussed superiority of item features over user features (row 3 versus row 5) can be clearly seen again. In this case, the former improve the accuracy of the predictions by 3-5%, while the latter only deteriorate it. The extraction of the graph-based features (row 2) also leads to an improvement of 3.36% and 5.53% relative to the baseline, using the RMSE and MAE metrics, respectively. When combined with other features, the graph features achieve the best result, which is RMSE=1.0272, or a 4.20% improvement over the baseline. Those performance differences were statistically evaluated and found significant.

5.1.6 Performance across Learning Methods, Datasets, and Metrics

This case investigated the the impact of the graph-based features affect on the accuracy of the recommendations. Although all the evaluations reported so far show that using the graph-based features improves the accuracy of the recommendations, the results cannot be fully corroborated yet, as the conducted experiments use different learning methods, datasets, and evaluation metrics (see Table 6). To confidently address the resarch question, the design of the evaluation has overlaps in these factors, so that the contribution of the graph features can be singled out.

The analysis below aims to establish whether the observed improvements should be attributed to the information contributed by the graph features or to the differences in the experimental settings, i.e., learning method, dataset, and metric.The results of the experiments used in the analysis are summarized in Table 11. In all the cases, the performance of the baseline approaches not using the graph features, which were highlighted in light gray in all the tables, is compared to the performance of all the graph-based based features, i.e, row 8 in Table 7, row 2 in Table 8, and row 2 in Table 10.

Dataset Metric Method Baseline Graph Features Improvement
1 Yelp I RMSE Random Forest 1.1809 1.1148 5.59%
2 Yelp II RMSE Random Forest 1.1619 1.1417 1.74%
3 Yelp II RMSE Gradient Boosting 1.2480 1.1715 6.13%
4 Yelp II RMSE SVM 1.1818 1.1783 0.30%
5 Movielens RMSE Random Forest 1.1667 1.0268 11.90%
6 Movielens RMSE Gradient Boosting 1.0722 1.0362 3.36%
7 Movielens MAE Random Forest 0.9144 0.8157 10.79%
8 Movielens MAE Gradient Boosting 0.8838 0.8349 5.53%
Table 11: Summary of experiments and results for Case Study i@.

Included are the results of experiments using the Yelp, Yelp ii@, and Movielens datasets, which were discussed in sections 5.1.2, 5.1.3, and 5.1.5, respectively. That said, results in rows 3, 4, 5, and 7 of Table 11 are presented here for the first time. This is due to the fact that previously reported Yelp experiments (both datasets) used Random Forest as their learning method, while the Movielens experiments used Graduate Boosting. Here, new Yelp ii@ results with Gradient Boosting and SVM, and new Movielens results with Random Forest are also presented. It should also be highlighted that experiments using the and OSN datasets are excluded from the analysis, since, unlike the other three experiments, they use classification accuracy metrics. As such, they differ in two factors, dataset and evaluation metric, and are not comparable with the other experiments.

In order to demonstrate that the improvement is not due to the selected dataset, the metric and learning method were fixed, while the approaches using different datasets were compared. Two evaluations sets are applicable to this scenario: (1) Random Forest predictions evaluated with the RMSE metric, using the Yelp, Yelp ii@, and Movielens datasets (rows 1, 2, and 5), and (2) Gradient Boosting predictions also evaluated with RMSE, but using the Yelp ii@ and Movielens datasets (rows 3 and 6). The results of these experiments show an improvement of 1.74% to 11.90%, which allows to eliminate the selected dataset as a possible reason for improvement.

To demonstrate that the improvement is also not due to the selected machine learning method, the dataset and metric were fixed, while the approaches using different learning methods were compared. Three evaluation sets are applicable to this scenario: (1) RMSE of business predictions using the Yelp ii@ dataset, where the learning methods are Random Forest, Gradient Boosting, and SVM (rows 2, 3, and 4), (2) RMSE of movie rating predictions using the Movielens dataset, where the methods are Random Forest and Gradient Boosting (rows 5 and 6), and (3) MAE of movie rating predictions using the the Movielens dataset, where the methods are Random Forest and Gradient Boosting (rows 7 and 8). The results of these experiments show an improvement across all experiments, ranging from 0.30% to 6.13% for the Yelp ii@ dataset, and from from 3.36% to 11.90% for the Movielens dataset. The low variance of ratings in the Yelp datasets, which was discussed earlier, is the main reason for the low improvement observed. This is particularly noticeable with the SVM method, which struggles to linearly separate businesses with moderate ratings. Thus, the learning method cannot be the reason for the accuracy improvement.

Finally, to demonstrate that the improvement is not due to the selected evaluation metric, the dataset and method were fixed, while the performance of approaches using different metrics was compated. Two evaluation sets are applicable to this scenario: (1) Random Forest movie rating predictions using the Movielens dataset, evaluated using RMSE and MAE (rows 5 and 7), and (2) Gradient Boosting movie rating predictions also using the Movielens dataset, and also evaluated using the RMSE and MAE metrics (rows 6 and 8). The results of these experiments show a clear improvement across, ranging from 3.36% to 11.90%, allowing to eliminate the selected evaluation metric as a possible reason for improvement.

Summing up this causal analysis, all three hypotheses that the improved performance is driven by the differences in the experimental settings (dataset, learning method, and evaluation metric) were rejected. Thus, it can be concluded that the reason for the observed improvement lies in the inclusion of graph-based features, contributing new information to the recommendation process.

5.2 Case Study ii@: Different Graph Schemes and their Impact on Recommendations

As mentioned in the Section 3, various sub-graphs and graph schemes can be generated for each dataset. The feature extraction process will, thus, yield a number of graph schemes, corresponding feature sets, and even the values of the same graph features. This leads to to the second research question: How are the recommendations affected by the sub-graph and its representation used to generate the graph features? In order to answer this question, another set of experiments was conducted.

In these experiments, the accuracy of recommendations when using various graph schemes was evaluated using four datasets:, both Yelp datasets, and Movielens. The OSN dataset was not used here because it was represented as a simple bipartite graph, lacking the desired number of entities and relationships. The recommendation tasks were identical to the previous experiments, i.e., to predict listened artists in the dataset, user ratings for businesses in the two Yelp datasets and user ratings for movies in the Movielens dataset. An N-fold cross validation methodology similar to the one reported in Section 5.1

was followed. Also, the same two-sided t-test statistical significance testing was carried out.

Figure 23: results: Precision of the extended (solid) versus the basic (dashed) feature sets.

5.2.1 Dataset i@ – Results

The evaluations using the dataset focused on the influence of the social elements, i.e., friendship links and tags, on the obtained recommendation accuracy. In this dataset, the results of recommendations based on the bipartite user-artist graph representation (BL in Figure 9) were compared with those of three non-bipartite schemes, BL+T, BL+F, and BL+T+F, including, respectively, the tags assigned by the users to the artists, social friendship links between the users, and tags and friendship links alike. As mentioned in Section 5.1.1, two sets of graph features were extracted for each schema: a set of basic features and a set of extended features . Although the basic feature set is shared across all the schemes, their values may change due to the presence of additional graph nodes. The extended feature set is composed of the basic features along with new features that were extracted from the social links and tags available in each schema. A detailed discussion of the extended feature set can be found in (Tiroshi et al., 2014b).

Figure 23 shows the obtained P@10 scores averaged over all the users in the test set, when using both the basic and extended feature sets. For each representation, the solid boxes on the left denote the results obtained with the extended features , whereas the dotted boxes on the right present the results obtained with the basic features . First, it can clearly be observed that the inclusion of the social auxiliary data of either the assigned tags or friendships links substantially improves P@10. When both the tags and friendship links are included in the BL+T+F model, the highest average P@10 is observed. Both in the basic and the extended feature sets, the BL+T and BL+F models obtain comparable P@10 scores, showing the effect of the inclusion of auxiliary data in the graph schemes. However, as noted in Section 4.1, the tag data includes more than 186K tag assignments, whereas the friendship data consists of only 12K user-to-user links. Since the obtained precision scores are comparable, a single friendship link is more influential than a single artist tag and yields a greater improvement in the recommendation accuracy. Looking at the significance tests conducted within the basic and extended feature sets, significant differences, p0.05, were observed between all the pairs of extended features and all the pairs of basic features except for the and pair.

Features combination Features RMSE Improvement
8 All_Graph BipartiteTripartite 1.1148 5.59%
10 Bipartite 1.1188 5.25%
11 Tripartite 1.1326 4.09%
12 Basic 1.1809 N/A
Table 12: Yelp results: RMSE of the bipartite versus the tripartite feature sets. Full results are given in Table 7.

When comparing the performance of the extended graph features to the performance of the corresponding basic features (solid boxes versus dashed boxes in Figure 23, it can be seen that the extended sets consistently outperformed the basic sets across all the four graph schemes, and the difference within the pairs was statistically significant, p0.05. In the BL+T scheme, the extended graph features from improved on the basic features extracted from it by 10%, P@10=0.548 versus P@10=0.498, while in the BL+F scheme the improvement was by 11.6%, P@10=0.555 versus P@10=0.497. The largest improvement was noted in the BL+T+F scheme, where the extended graph features outperformed the basic features by as much as 28.6%, P@10=0.571 versus P@10=0.444. Surprisingly, when the basic feature set set was found achieve a lower P@10 than and . A possible explanation for this can be that including both types of social data but not extracting and populating the extended features leads to redundancy in the graph and degrades the performance of the recommender.

5.2.2 Dataset ii@ – Yelp Results

For the Yelp dataset and the task of business rating prediction, two graph schemes were compared: a pure bipartite graph that contained only the users and businesses, and a tripartite graph that, on top of user and business nodes, also contained metadata nodes describing the businesses. The two graph schemes are illustrated in Figure 13. The reason these were the only graph schemes created is that sparse features having a small number of unique features, were filtered from the dataset. These features would have resulted in most of the nodes of a group, e.g., users, being connected to a single node, which would render it meaningless. For example, adding three “gender” nodes, male, female, and unspecified, would have resulted in all users being connected to either one of the three, essentially creating three large clusters in the graph.

Features Subset RMSE Improvement
2 All Graph Features 1.1417 1.73%
7 Without Name Links 1.1450 1.45%
3 Complete Graph 1.1450 1.45%
8 Without Social Links 1.1463 1.33%
9 Without Social and Name Links 1.1465 1.32%
10 Without Category Links 1.1508 0.95%
11 Without Metadata 1.1508 0.94%
12 Without Social and Category Links 1.1519 0.85%
13 Without Metadata and Social Links 1.1523 0.82%
6 Basic Features 1.1619 N/A
Table 13: Yelp ii@ results: RMSE of various sub-graph feature sets.

The complete set of graph features was generated for both the bipartite and tripartite representations. The results in Table 12 show the RMSE scores obtained for these feature sets. Note that these results are essentially extracted from the results presented in Table 7 and their original row numbers are preserved. The experiments showed that the bipartite schema, not including the metadata nodes, performed slightly but significantly better than the tripartite schema with metadata, RMSE=1.1188 versus RMSE=1.1326. The relative improvement with respect to the baseline recommendations was 1.16% higher. This difference in the performance of the schemas led to their unified feature set, which is the All_Graph, to outperform the two feature sets individually. However, the superiority of All_Graph was statistically significant only when compared to the tripartite schema, as can be seen in Figure 21.

5.2.3 Dataset iii@ – Yelp ii@ (with social links) Results

The richer information provided by the Yelp ii@ datasets allowed for the creation of a larger set of sub-graphs. These are illustrated in Figure 15, where various combinations of entities are removed from the complete graph. Thus, in addition to the complete graph, seven sub-graph representations can be created and the performance of the feature sets extracted from these can be compared. The results of this experiment are presented in Table 13. The complete graph and the seven sub-graphs are compared to the basic feature set and the union of all the graph features, which were, respectively, the baseline and best performing combination in Table 8. The numbering of rows already presented in Table 8 is preserved (rows 2, 3, and 6), while the rows of all the sub-graphs from Figure 15 are numbered 7 to 13. The significance of the differences between the sub-graphs is shown in Figure 24.

Figure 24: Significance of the differences between feature combinations in the Yelp ii@ dataset. White cells - significant, dark cells - not significant, p-value given.

As can be clearly seen, the results of the various sub-graphs fell into two groups, based on the significance tests. The groups were: sub-graphs containing the ‘category’ relationship (“Without Name Links”, “Complete Graph”, “Without Social Links”, and “Without Social and Name Links”) and sub-graphs not containing the ‘category’ relationship (“Without Category Links”, “Without Metadata”, “Without Social and Category Links”, and “Without Metadata and Social Links”). The former group of sub-graphs (rows 7, 3, 8, and 9 in Table 13) performed significantly better than the latter (rows 10, 11, 12, and 13), which highlights the importance of business categories in predicting the business ratings. This is also in line with the dominance of business features over the user features that was already observed in Table 8. The union of all the graph-based features extracted from all the sub-graphs (“All Graph Features” in row 2) expectedly outperformed all other sub-graphs and feature sets. This highlights the strength of the proposed approach in producing all the possible features from all the possible sub-graph representations of the data rather than identifying the optimal sub-graph and dealing with feature selection.

5.2.4 Dataset v@ – Movielens Results

Features Set RMSE Improvement MAE Improvement
2 All Graph Features 1.0362 3.36% 0.8349 5.53%
6 graph w/[Age, Gender, Genre, Zip] 1.0369 3.29% 0.8353 5.48%
7 graph w/[Age, Gender, Occupation, Zip] 1.0373 3.25% 0.8357 5.44%
8 graph w/[Gender, Genre, Occupation, Zip] 1.0384 3.16% 0.8365 5.35%
9 graph w/[Genre, Occupation] 1.0410 2.91% 0.8391 5.06%
10 graph w/[Age, Genre, Zip] 1.0411 2.90% 0.8386 5.12%
11 graph w/[Genre, Occupation, Zip] 1.0411 2.90% 0.8385 5.12%
12 graph w/[Age, Genre, Occupation] 1.0412 2.90% 0.8390 5.07%
13 graph w/[Age, Gender, Genre] 1.0412 2.89% 0.8392 5.05%
14 graph w/[Age, Gender, Genre, Occupation] 1.0413 2.89% 0.8390 5.07%
15 graph w/[Age, Genre, Occupation, Zip] 1.0413 2.89% 0.8388 5.10%
16 graph w/[Genre] 1.0413 2.89% 0.8393 5.04%
17 graph w/[Gender, Genre, Occupation] 1.0413 2.88% 0.8392 5.04%
18 graph w/[Age, Genre] 1.0414 2.88% 0.8395 5.02%
19 graph w/[Age, Gender, Genre, Occupation, Zip] 1.0414 2.88% 0.8390 5.08%
20 graph w/[Genre, Zip] 1.0415 2.87% 0.8388 5.09%
21 graph w/[Gender, Genre] 1.0416 2.86% 0.8396 5.00%
22 graph w/[Gender, Genre, Zip] 1.0416 2.85% 0.8391 5.06%
23 graph w/[Age] 1.0425 2.77% 0.8413 4.81%
24 graph w/[Zip] 1.0426 2.77% 0.8407 4.88%
25 graph w/[Age, Zip] 1.0426 2.76% 0.8409 4.86%
26 graph w/[Age, Occupation] 1.0426 2.76% 0.8412 4.82%
27 graph w/[Age, Gender] 1.0427 2.76% 0.8413 4.81%
28 graph w/[Age, Gender, Zip] 1.0427 2.75% 0.8408 4.87%
29 graph w/[Age, Occupation, Zip] 1.0427 2.75% 0.8408 4.86%
30 graph w/[Occupation, Zip] 1.0427 2.75% 0.8410 4.84%
31 graph w/[Occupation] 1.0428 2.75% 0.8414 4.80%
32 graph w/[Gender, Occupation, Zip] 1.0428 2.74% 0.8409 4.85%
33 graph w/[Gender, Zip] 1.0431 2.72% 0.8411 4.83%
34 graph w/[Gender] 1.0431 2.71% 0.8418 4.75%
35 graph w/[Gender, Occupation] 1.0432 2.70% 0.8417 4.77%
36 graph w/[Age, Gender, Occupation] 1.0433 2.70% 0.8417 4.77%
4 Basic Features 1.0722 N/A 0.8838 N/A
Table 14: Performance of selected features combinations - Movielens dataset (baseline combination in light gray, rows are sorted by RMSE).

The Movielens dataset offered an even richer information about users and items and allowed for the extraction of 32 sub-graph schemes. Only a small sample of these is illustrated in Figure 19. The MAE and RMSE scores obtained for the 32 sub-graphs are listed in Table 14 and the significance test results are given in Figure 25. The sub-graphs are compared to the basic feature set and the union of all the graph features, which were presented in Table 10 (rows numbered 2 and 4). The rows corresponding to the various sub-graph representations are numbered 6 to 36. For the sake of clarity, the sub-graphs are denoted by the entity types included rather than excluded. For example, “graph w/[Age, Genre, Zip]” denotes the sub-graph with the ‘Age’, ‘Genre’, and ‘Zip’ entities, which is identical to the complete graph with the ‘Occupation’ and ‘Gender’ entities excluded. In Figure 25, the names of the included entities are further abbreviated, as detailed in the caption.

Figure 25: Significance of the differences between feature combinations in the Movielens dataset. White cells - significant, dark cells - not significant, p-value given. “G w/[]” denotes sub-graphs that contain the listed entity types, where: A=Age, Gndr=Gender, Gnr=Genre, O=Occupation, and Z=Zip.

The significance test shows that the ‘genre’ relationship in Movielens sub-graphs plays a similar role to the “category” relationship in Yelp. Sub-graphs containing this relationship (rows 6 to 22) outperformed those, where it was excluded (rows 23 to 36), and the differences between the groups are significant. A common link between the ‘category’ relationship in Yelp ii@ and the ‘genre’ relationship in MovieLens is that they both divide the item space – be it businesses or movies – into connected groups, which affects values of the item features. In agreement with previous results, the feature set that unifies all the graph features from all the sub-graph schemes (“All Graph Features”, row 2) achieves the highest accuracy and outperforms any other feature set. Again, this is attributed to the broad coverage of the proposed feature extraction mechanism, which produces and aggregates promising feature combinations.

5.2.5 Summary

The purpose of this analysis was to analyze the differences driven by the sub-graphs that are used for the feature extraction. To recap the results obtained using the four datasets, the following was established.

  • Features extracted from different graph schemes performed differently, not following a certain pattern tied to the entities or relationships included in the sub-graph. This means that it was not possible to conclude which relationships lead to better results if included in the graph. We posit that this is dataset-specific and may be affected by additional factors, such as density of a specific feature, distribution of its values, domain-specific considerations, and so forth. This finding comes through in the ‘category’ and ‘genre’ relationships in the Yelp ii@ and Movielens datasets, but not in the Yelp i@ dataset. Notably, the social links had a major contribution in the dataset, but not in the Yelp ii@ dataset, possibly due to the sparsity of the latter.

  • Features extracted from the complete graph representations, i.e., those containing all the relationships and entities in the dataset, were not necessarily the best performing feature sets. A negative example can be seen in the basic features of the BL+F+T schema in Figure 23 that are dominated by the basic feature of BL+F and BL+T alike. Having said that, the feature set that aggregated (that is, unified) all the graph features from all the sub-graph schemes performed the best in the other three scenarios in which it was evaluated: Yelp, Yelp ii@, and Movielens. We consider this to be a strong argument in favor of using the proposed approach, as its exhaustive nature allows to cover a range of features and necessarily uncover the most informative ones, as well as their best combination.

The differences across the obtained results do not allow to generalize and determine a priori the best performing sub-graph and feature set. Due to this, the suggested approach of generating sub-graphs, populating features from each of them, and then aggregating the features in the feature sets is more likely to uncover the best performing feature combination. Note that this trades off with computational overheads and potential scalability issues in large-scale datasets (discussed in detail in Section 6.1.2). We believe that future research may unveil rating patterns or characteristics of datasets, which may predict the contribution of certain sub-graph, data entities, or even types of features.

6 Discussion and Conclusions

6.1 Discussion

The effectiveness of the graph-based approach for improving recommendations was demonstrated in the previous sections. It has been shown that precision and accuracy gains can be achieved by representing tabular data by graphs and extracting new features from them. This contrasts and complements prior approaches that improved recommendations by enhancing the recommendation techniques themselves. Also established are the benefits of the graph-based approach across recommendation domains, tasks, and metrics. These findings show that the graph representation exploits indirect latent links in the data, which lead to an improved recommendation accuracy. Finally, the approach is generic and it can be applied to many recommender system datasets.

The suggested process is automatic and can be run end-to-end, from data representation to feature extraction, without human intervention, unlike manual feature extraction methods, which are often time consuming and requires domain expertise. Using the proposed graph-based approach, rich features, based on intricate relationships between various data entities and sub-graph scheme variations, can be systematically extracted from a dataset. This allows for a better coverage of the features space with a considerable lower effort, as discussed in detail in Section 3. In the following sub-sections, the key limitations and challenges encountered in the experiments and case studies are discussed.

6.1.1 Overfitting

Regarding concerns referring to possible overfitting due to the newly generated features, as long as the volume of available data greatly exceeds the number of extracted features, there is little risk that the features will be the cause of overfitting. The high diversity of unique data characteristics can hardly be captured in full by a smaller subset of features. Recommender system datasets tend to be in the medium to large scale (tens of thousands to millions of data points), while the number of features generated by the proposed approach is still in the scale of tens to hundreds.

Additionally, machine learning methods such as Random Forests have internal mechanisms for feature selection and can filter out features that overfit. They do so by training on a sample of the dataset and evaluating the performance of the features on the rest of the data. A feature that performs well on the sample but underperforms on the test data is ranked low. In the evaluations, cross validation was used with at least N=5 folds, showing that the models and features on which they are built are in fact generalizable. Moreover, it was shown that in cases of sparse data, which require a higher degree of generalization, the graph features still outperformed other features.

6.1.2 Scalability

A possible disadvantage of the proposed approach is that some graph-based computations, e.g., PageRank, are iterative and may take a long time to converge. In the age of Big Data, recommender system datasets are getting large and this limitation may become a hurdle. The representation of the datasets results in large graphs and the computation issue becomes a bigger problem. A general approach for handling this issue in a deployed system would be to extract the graph-based features offline, say, on a nightly basis, and use the pre-computed values for real-time predictions. This may resolve the problem under the reasonable assumption that the values do not change substantially too frequently. Another means to overcome the computational latency is through using a distributed graph feature computation library. Such a library, e.g., Okapi131313, can use distributed tools in order extract the graph features.

Another factor that adds to the computational complexity of the approach is the exhaustive search for new features. It should be noted that the complexity of the process of generating every possible sub-graph and populating the matching feature combinations is exponential. The number of relationships in current recommendation datasets (as surveyed in Section 3.1

) is still manageable, and can be accommodated by the proposed approach. However in the future, with additional data sources being integrated for recommendation purposes, this might become unsustainable and will require a long-term solution. Two possible approaches for handling this issue are parallelization, e.g., each sub-graph being processed by a different machine, and heuristics for pruning less relevant sub-graph representations.

6.1.3 Initial Transition to the Graph Model

Another possible disadvantage of graph-based features is the possible need for human intervention when generating the initial complete graph. Non-categorical feature values, e.g., income or price, may generate a large number of vertices, which would lead to a low connectivity of the graph, since not many users or items would share the exact value of the feature. This would lead to a very sparse graph and will need to be addressed by a manual intervention by a domain expert, who can determine how the non-categorical values can be grouped and categorized, e.g., by creating appropriate income or price buckets. A naive solution for this might be to attempt to auto-categorize such features based on the observed distribution of their values, e.g., first quarter, second quarter, and so on. This may, however, mask the differences between fine-grained groups and cause information loss.

Also to be acknowledged in this context is the historic human contribution that was required in order to conceive the graph methods exploited in this work for the generation of the various basic graph features: shortest path, degree, PageRank, etc. Indeed, these methods took a considerable amount of time and effort to evolve; however, they are reusable for generations and the overheads related to their development have been shared across many subsequent applications, while manually engineered features would usually not be highly reusable. Overall, when weighting the ease, quantity, and the possible contribution of the graph-based features to the accuracy of the generated recommendations against the above mentioned disadvantages, it can be concluded that it is worth to generate and populate such features, when designing a recommendation engine.

6.2 Conclusions and Future Work

In this work, a new approach for improving recommendations was presented and evaluated. Unlike many previous works, which focused on addressing the recommendation problem by making improvements to the recommendation algorithms, the presented approach does so by suggesting a different way of looking at the dataset used for recommendation. It proposed representing the datasets using graphs and then to extract and populate new features from those graphs, all in a systematic fashion, and feed the new features into existing recommendation algorithms. New features and relationships that were not visible in the original tabular form can be thus uncovered. In this manner, applying this approach may compliment classical recommendation approaches and further enhance them.

The methodology, implementation, and analysis of the approach were described in detail and the approach was evaluated from two main perspectives: overall contribution to recommendations and impact of various graph representations. The evaluation encompassed a number of datasets, recommendation tasks, and evaluation metrics. Furthermore, the datasets belonged to four application domains (movies, music, businesses, and personal interests) that in part included metadata and in part included social links. The recommendation tasks varied from binary link predictions to star rating predictions. A number of state-of-the-art classifiers and regressors were used for the generation of the predictions. All in all, the presented evaluations examined the impact of the graph representations and showed that the approach had a profound effect on the accuracy of the recommendations.

The graph-based representation and features were shown to lead to the generation of more accurate recommendations. The variations in performance across various graph schemes and the justification for systematically extracting them, due to that, was established. The approach presented was implemented in a library and is being provided as open source software for the community to use and build on-top. Given such a library, the cost of generating additional features that can improve recommendations becomes substantially lower, in terms of computation time and effort. It can be adopted as a natural first resort, when given a dataset and recommendation task, or as a complementary aid to enhance the standard manual feature engineering.

The conducted evaluations suggested and demonstrated the potential of the proposed approach in improving the recommendations by exploiting the benefits of links between entities and characteristics of entities extracted from the graph representations. Therefore, this work lays the foundations for further exploring how graph-based features can enhance recommender systems and automatic feature engineering in the more general context. Several variables were investigated in this work but many more require additional attention. The following paragraphs identify several directions of exploration, which were identified as possible research directions in future works.

  • Temporal Aspects. Given a dataset that includs dated actions that are not sparse, the time aspect can be used to build a different type of graphs. Each graph will represent a snapshot in time and will either contain or exclude a link between vertices based on whether it was available in the dataset at that time. A combination of two temporally adjacent graphs will reflect the evolution of the data over that period of time. The main question in this setting is how such temporal graphs will affect the values of features extracted from them and how a recommender systems that use these features will perform in their respective recommendation tasks.

  • Weighted and Labeled Graphs. Several features in a dataset can be used to populate the edge labels when constructing the graph based representation. The labels, once set, can be taken into consideration in some graph features being extracted. One example would be to calculate a weighted PageRank score that will have jumps from a vertex to its neighbors based on a skewed probability correlated with the weight on the edge linking to the neighbor. This could lead to further improvement in the recommendations; however, this requires fine-tuning of initial weights on edges that do not naturally have them, e.g., social relationship edges in the dataset.

  • Directed Graphs. Similarly, in cases where the direction of the edges can be important, the process can be extended to include this aspect by generating additional graph representations, with various combinations of the edge directions. For example, in one variant, edges will be directed from the source vertex to the target vertex, in another, in the opposite direction, and in a third one there will be no direction. This will guarantee coverage in terms of expressing the direction of the edges, and the performance of the features in the various scenarios can be evaluated.

The effects of these modifications on the scalability of the approach can be handled using the previously suggested methods, either by scaling the computations (e.g., computing the features of each subgraph in a separate process), or using distributed graph computation libraries, or identifying heuristics for pruning the feature and subgraph space.


  • Adomavicius and Tuzhilin (2005) Adomavicius, G. and Tuzhilin, A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. Knowledge and Data Engineering, IEEE Transactions on, 17(6):734–749.
  • Amatriain et al. (2011) Amatriain, X., Jaimes, A., Oliver, N., and Pujol, J. M. (2011). Data mining methods for recommender systems. In Recommender Systems Handbook, pages 39–71. Springer.
  • Bae et al. (2015) Bae, D., Han, K., Park, J., and Yi, M. Y. (2015). Apptrends: A graph-based mobile app recommendation system using usage history. In 2015 International Conference on Big Data and Smart Computing, BIGCOMP 2015, Jeju, South Korea, February 9-11, 2015, pages 210–216.
  • Barrat et al. (2004) Barrat, A., Barthelemy, M., Pastor-Satorras, R., and Vespignani, A. (2004). The Architecture of Complex Weighted Networks. PNAS.
  • Bastian et al. (2009) Bastian, M., Heymann, S., Jacomy, M., et al. (2009). Gephi: an open source software for exploring and manipulating networks. ICWSM, 8:361–362.
  • Batarfi et al. (2015) Batarfi, O., Shawi, R. E., Fayoumi, A. G., Nouri, R., Beheshti, S., Barnawi, A., and Sakr, S. (2015). Large scale graph processing systems: survey and an experimental evaluation. Cluster Computing, 18(3):1189–1213.
  • Bellogín et al. (2013) Bellogín, A., Cantador, I., Díez, F., Castells, P., and Chavarriaga, E. (2013). An empirical comparison of social, collaborative filtering, and hybrid recommenders. ACM Transactions on Intelligent Systems and Technology (TIST), 4(1):14.
  • Bennett and Lanning (2007) Bennett, J. and Lanning, S. (2007). The netflix prize. In Proceedings of KDD cup and workshop, volume 2007, page 35.
  • Benzi et al. (2016) Benzi, K., Kalofolias, V., Bresson, X., and Vandergheynst, P. (2016). Song recommendation with non-negative matrix factorization and graph total variation. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016, pages 2439–2443.
  • Berge and Minieka (1973) Berge, C. and Minieka, E. (1973). Graphs and hypergraphs, volume 7. North-Holland publishing company Amsterdam.
  • Berkovsky (2006) Berkovsky, S. (2006). Decentralized mediation of user models for a better personalization. In Adaptive Hypermedia and Adaptive Web-Based Systems, pages 404–408. Springer Berlin/Heidelberg.
  • Berkovsky et al. (2007) Berkovsky, S., Aroyo, L., Heckmann, D., Houben, G.-J., Kröner, A., Kuflik, T., and Ricci, F. (2007). Providing context-aware personalization through cross-context reasoning of user modeling data. UBIDEUM 2007.
  • Berkovsky et al. (2008) Berkovsky, S., Kuflik, T., and Ricci, F. (2008). Mediation of user models for enhanced personalization in recommender systems. User Modeling and User-Adapted Interaction, 18(3):245–286.
  • Blomo et al. (2013) Blomo, J., Ester, M., and Field, M. (2013). Recsys challenge 2013. In Proceedings of the 7th ACM conference on Recommender systems, pages 489–490. ACM.
  • Bohnert et al. (2008) Bohnert, F., Zukerman, I., Berkovsky, S., Baldwin, T., and Sonenberg, L. (2008). Using interest and transition models to predict visitor locations in museums. AI Communications, 21(2-3):195–202.
  • Borgatti and Halgin (2011) Borgatti, S. P. and Halgin, D. S. (2011). Analyzing Affiliation Networks. The Sage handbook of social network analysis.
  • Breiman (2001) Breiman, L. (2001). Random forests. Machine learning, 45(1):5–32.
  • Bu et al. (2010) Bu, J., Tan, S., Chen, C., Wang, C., Wu, H., Zhang, L., and He, X. (2010). Music recommendation by unified hypergraph: combining social media information and music content. In Proceedings of the international conference on Multimedia, pages 391–400. ACM.
  • Burke (2007) Burke, R. (2007). Hybrid web recommender systems. In The adaptive web, pages 377–408. Springer.
  • Cantador et al. (2011) Cantador, I., Brusilovsky, P., and Kuflik, T. (2011). Second workshop on information heterogeneity and fusion in recommender systems (hetrec 2011). In RecSys.
  • Cantador et al. (2015) Cantador, I., Fernández-Tobías, I., Berkovsky, S., and Cremonesi, P. (2015). Cross-domain recommender systems. In Recommender Systems Handbook.
  • Chianese and Piccialli (2016) Chianese, A. and Piccialli, F. (2016). A smart system to manage the context evolution in the cultural heritage domain. Computers & Electrical Engineering, pages –.
  • Coffman et al. (2004) Coffman, T., Greenblatt, S., and Marcus, S. (2004). Graph-based technologies for intelligence analysis. Communications of the ACM, 47(3):45–47.
  • Cordobés et al. (2015) Cordobés, H., Chiroque, L. F., Anta, A. F., Leiva, R. A. G., Morere, P., Ornella, L., Pérez, F., and Santos, A. (2015). Empirical comparison of graph-based recommendation engines for an apps ecosystem. IJIMAI, 3(2):33–39.
  • Costa et al. (2007) Costa, L. d. F., Rodrigues, F. A., Travieso, G., and Villas Boas, P. R. (2007). Characterization of complex networks: A survey of measurements. Advances in Physics, 56(1):167–242.
  • Csardi and Nepusz (2006) Csardi, G. and Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems, 1695(5).
  • Desrosiers and Karypis (2011) Desrosiers, C. and Karypis, G. (2011). A comprehensive survey of neighborhood-based recommendation methods. In Recommender systems handbook, pages 107–144. Springer.
  • Domingos (2012) Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10):78–87.
  • Due Trier et al. (1996) Due Trier, Ø., Jain, A. K., and Taxt, T. (1996). Feature extraction methods for character recognition-a survey. Pattern recognition, 29(4):641–662.
  • Floyd (1962) Floyd, R. W. (1962). Algorithm 97: shortest path. Communications of the ACM, 5(6):345.
  • Friedman (2000) Friedman, J. H. (2000). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29:1189–1232.
  • Garcia and Tziritas (1999) Garcia, C. and Tziritas, G. (1999). Face detection using quantized skin color regions merging and wavelet packet analysis. Multimedia, IEEE Transactions on, 1(3):264–277.
  • Godoy and Corbellini (2016) Godoy, D. and Corbellini, A. (2016). Folksonomy-based recommender systems: A state-of-the-art review. Int. J. Intell. Syst., 31(4):314–346.
  • Gunn et al. (1998) Gunn, S. R. et al. (1998). Support vector machines for classification and regression. ISIS technical report, 14.
  • Guyon et al. (2006) Guyon, I., Gunn, S., Nikravesh, M., and Zadeh, L. (2006). Feature extraction. Foundations and applications.
  • Hagberg et al. (2008) Hagberg, A., Swart, P., and S Chult, D. (2008). Exploring network structure, dynamics, and function using networkx. Technical report, LANL.
  • Hall et al. (2009) Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009). The weka data mining software: an update. ACM SIGKDD explorations newsletter, 11(1):10–18.
  • Haupt (2009) Haupt, J. (2009). People-powered online radio. Music Reference Services Quarterly, 12(1-2):23–24.
  • Hong and Jung (2016) Hong, M. and Jung, J. J. (2016). Mymoviehistory: Social recommender system by discovering social affinities among users. Cybernetics and Systems, 47(1-2):88–110.
  • Jahrer et al. (2010) Jahrer, M., Töscher, A., and Legenstein, R. (2010). Combining predictions for accurate recommender systems. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 693–702. ACM.
  • Jäschke et al. (2007) Jäschke, R., Marinho, L., Hotho, A., Schmidt-Thieme, L., and Stumme, G. (2007). Tag recommendations in folksonomies. In Knowledge Discovery in Databases: PKDD 2007, pages 506–514. Springer.
  • Jiang et al. (2016) Jiang, W., Wang, G., Bhuiyan, M. Z. A., and Wu, J. (2016). Understanding graph-based trust evaluation in online social networks: Methodologies and challenges. ACM Comput. Surv., 49(1):1–35.
  • Kautz et al. (1997) Kautz, H., Selman, B., and Shah, M. (1997). Referral web: combining social networks and collaborative filtering. Communications of the ACM, 40(3):63–65.
  • Klema and Laub (1980) Klema, V. and Laub, A. J. (1980).

    The singular value decomposition: Its computation and some applications.

    Automatic Control, IEEE Transactions on, 25(2):164–176.
  • Kobsa (2001) Kobsa, A. (2001). Generic user modeling systems. User modeling and user-adapted interaction, 11(1-2):49–63.
  • Kohavi et al. (1995) Kohavi, R. et al. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In IJCAI, volume 14, pages 1137–1145.
  • Kohavi and John (1997) Kohavi, R. and John, G. H. (1997). Wrappers for feature subset selection. Artificial intelligence, 97(1):273–324.
  • Konstas et al. (2009) Konstas, I., Stathopoulos, V., and Jose, J. M. (2009). On social networks and collaborative recommendation. In SIGIR, pages 195–202.
  • Koren (2009) Koren, Y. (2009). The bellkor solution to the netflix grand prize. Netflix prize documentation, 81.
  • Koren et al. (2009) Koren, Y., Bell, R., and Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8):30–37.
  • Kotsiantis et al. (2006) Kotsiantis, S., Kanellopoulos, D., and Pintelas, P. (2006). Data preprocessing for supervised leaning. International Journal of Computer Science, 1(2):111–117.
  • Lam and Herlocker (2012) Lam, S. and Herlocker, J. (2012). Movielens 1m dataset.
  • Latapy et al. (2008) Latapy, M., Magnien, C., and Vecchio, N. D. (2008). Basic notions for the analysis of large two-mode networks. Social Networks.
  • Lee and Lee (2015) Lee, K. and Lee, K. (2015). Escaping your comfort zone: A graph-based recommender system for finding novel recommendations among relevant items. Expert Syst. Appl., 42(10):4851–4858.
  • Lee et al. (2015) Lee, S., Kim, S., and Park, S. (2015). A graph-based recommendation framework for price-comparison services. In Proceedings of the 24th International Conference on World Wide Web Companion, WWW 2015, Florence, Italy, May 18-22, 2015 - Companion Volume, pages 59–60.
  • Lee et al. (2013) Lee, S., Park, S., Kahng, M., and Lee, S. (2013). Pathrank: Ranking nodes on a heterogeneous graph for flexible hybrid recommender systems. Expert Syst. Appl., 40(2):684–697.
  • Lewis et al. (2008) Lewis, K., Kaufman, J., Gonzalez, M., Wimmer, A., and Christakis, N. (2008). Tastes, ties, and time: A new social network dataset using facebook. com. Social networks, 30(4):330–342.
  • Li and Chen (2009) Li, X. and Chen, H. (2009). Recommendation as link prediction: a graph kernel-based machine learning approach. In Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries, pages 213–216. ACM.
  • Lösch et al. (2012) Lösch, U., Bloehdorn, S., and Rettinger, A. (2012). Graph kernels for rdf data. In The Semantic Web: Research and Applications, pages 134–148. Springer.
  • Ma et al. (2009) Ma, H., King, I., and Lyu, M. R. (2009). Learning to recommend with social trust ensemble. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 203–210. ACM.
  • Mao et al. (2016) Mao, K., Chen, G., Hu, Y., and Zhang, L. (2016). Music recommendation using graph based quality model. Signal Processing, 120:806–813.
  • Markovitch and Rosenstein (2002) Markovitch, S. and Rosenstein, D. (2002). Feature generation using general constructor functions. Machine Learning, 49(1):59–98.
  • Massa and Avesani (2007) Massa, P. and Avesani, P. (2007). Trust-aware recommender systems. In Proceedings of the 2007 ACM conference on Recommender systems, pages 17–24. ACM.
  • Mobasher (2007) Mobasher, B. (2007). Data mining for web personalization. In The Adaptive Web, Methods and Strategies of Web Personalization, pages 90–135.
  • Moradi et al. (2015) Moradi, P., Ahmadian, S., and Akhlaghian, F. (2015). An effective trust-based recommendation method using a novel graph clustering algorithm. Physica A: Statistical Mechanics and its Applications, 436:462 – 481.
  • Nixon (2008) Nixon, M. (2008). Feature extraction & image processing. Academic Press.
  • O’Donovan and Smyth (2005) O’Donovan, J. and Smyth, B. (2005). Trust in recommender systems. In Proceedings of the 10th international conference on Intelligent user interfaces, pages 167–174. ACM.
  • Ostuni et al. (2015) Ostuni, V. C., Noia, T. D., Sciascio, E. D., Oramas, S., and Serra, X. (2015). A semantic hybrid approach for sound recommendation. In Proceedings of the 24th International Conference on World Wide Web Companion, WWW 2015, Florence, Italy, May 18-22, 2015 - Companion Volume, pages 85–86.
  • Page et al. (1999) Page, L., Brin, S., Motwani, R., and Winograd, T. (1999). The pagerank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab.
  • Park et al. (2015) Park, Y., Park, S., Jung, W., and Lee, S. (2015). Reversed CF: A fast collaborative filtering algorithm using a k-nearest neighbor graph. Expert Syst. Appl., 42(8):4022–4028.
  • Pedregosa et al. (2011) Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011). Scikit-learn: Machine learning in python. The Journal of Machine Learning Research, 12:2825–2830.
  • Perugini et al. (2004) Perugini, S., Gonçalves, M. A., and Fox, E. A. (2004). Recommender systems research: A connection-centric survey. Journal of Intelligent Information Systems, 23(2):107–143.
  • Pham et al. (2015) Pham, T. N., Li, X., Cong, G., and Zhang, Z. (2015). A general graph-based model for recommendation in event-based social networks. In 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, April 13-17, 2015, pages 567–578.
  • Portilla et al. (2015) Portilla, Y., Reiffers, A., Altman, E., and Azouzi, R. E. (2015). A study of youtube recommendation graph based on measurements and stochastic tools. In 8th IEEE/ACM International Conference on Utility and Cloud Computing, UCC 2015, Limassol, Cyprus, December 7-10, 2015, pages 430–435.
  • Quercia et al. (2014) Quercia, D., Schifanella, R., and Aiello, L. M. (2014). The shortest path to happiness: Recommending beautiful, quiet, and happy routes in the city. In Proceedings of the 25th ACM Conference on Hypertext and Social Media, HT ’14, pages 116–125. ACM.
  • Ricci et al. (2011) Ricci, F., Rokach, L., and Shapira, B. (2011). Introduction to recommender systems handbook. Springer.
  • Said et al. (2011) Said, A., Berkovsky, S., and De Luca, E. W. (2011). Group recommendation in context. In Proceedings of the 2nd Challenge on Context-Aware Movie Recommendation, pages 2–4. ACM.
  • Schafer et al. (1999) Schafer, J. B., Konstan, J., and Riedl, J. (1999). Recommender systems in e-commerce. In Proceedings of the 1st ACM conference on Electronic commerce, pages 158–166. ACM.
  • Schwartz and Wood (1993) Schwartz, M. F. and Wood, D. (1993). Discovering shared interests using graph analysis. Communications of the ACM, 36(8):78–89.
  • Scott and Matwin (1999) Scott, S. and Matwin, S. (1999). Feature engineering for text classification. In ICML, volume 99, pages 379–388.
  • Shams and Haratizadeh (2016) Shams, B. and Haratizadeh, S. (2016). Graph-based collaborative ranking. CoRR, abs/1604.03147.
  • Shani and Gunawardana (2011) Shani, G. and Gunawardana, A. (2011). Evaluating recommendation systems. In Recommender Systems Handbook, pages 257–297. Springer.
  • Shen et al. (2016) Shen, J., Deng, C., and Gao, X. (2016). Attraction recommendation: Towards personalized tourism via collective intelligence. Neurocomputing, 173:789–798.
  • Tan et al. (2011) Tan, S., Bu, J., Chen, C., and He, X. (2011). Using rich social media information for music recommendation via hypergraph model. In Social media modeling and computing, pages 213–237. Springer.
  • Tiroshi et al. (2013) Tiroshi, A., Berkovsky, S., Kaafar, M. A., Chen, T., and Kuflik, T. (2013). Cross social networks interests predictions based on graph features. In Proceedings of the 7th ACM Conference on Recommender Systems, RecSys ’13, pages 319–322. ACM.
  • Tiroshi et al. (2014a) Tiroshi, A., Berkovsky, S., Kaafar, M. A., Vallet, D., Chen, T., and Kuflik, T. (2014a). Improving business rating predictions using graph based features. In Proceedings of the 19th international conference on Intelligent User Interfaces, pages 17–26. ACM.
  • Tiroshi et al. (2014b) Tiroshi, A., Berkovsky, S., Kaafar, M. A., Vallet, D., and Kuflik, T. (2014b). Graph-based recommendations: Make the most out of social data. In User Modeling, Adaptation, and Personalization, pages 447–458. Springer.
  • Töscher et al. (2009) Töscher, A., Jahrer, M., and Bell, R. M. (2009). The bigchaos solution to the netflix grand prize. Netflix prize documentation.
  • Vahedian et al. (2016) Vahedian, F., Burke, R. D., and Mobasher, B. (2016). Meta-path selection for extended multi-relational matrix factorization. In Proceedings of the Twenty-Ninth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2016, Key Largo, Florida, May 16-18, 2016., pages 566–571.
  • Wasserman (1994) Wasserman, S. (1994). Social network analysis: Methods and applications, volume 8. Cambridge university press.
  • West et al. (2001) West, D. B. et al. (2001). Introduction to graph theory, volume 2. Prentice hall Upper Saddle River.
  • Wilson et al. (2009) Wilson, C., Boe, B., Sala, A., Puttaswamy, K. P., and Zhao, B. Y. (2009). User interactions in social networks and their implications. In Proceedings of the 4th ACM European conference on Computer systems, pages 205–218. Acm.
  • Wold et al. (1987) Wold, S., Esbensen, K., and Geladi, P. (1987). Principal component analysis. Chemometrics and intelligent laboratory systems, 2(1):37–52.
  • Wu et al. (2015) Wu, H., Yue, K., Liu, X., Pei, Y., and Li, B. (2015). Context-aware recommendation via graph-based contextual modeling and postfiltering. Int. J. Distrib. Sen. Netw., 2015:16:16–16:16.
  • Yu et al. (2010) Yu, H.-F., Lo, H.-Y., Hsieh, H.-P., Lou, J.-K., McKenzie, T. G., Chou, J.-W., Chung, P.-H., Ho, C.-H., Chang, C.-F., Wei, Y.-H., et al. (2010). Feature engineering and classifier ensemble for kdd cup 2010.
  • Zoidi et al. (2015) Zoidi, O., Fotiadou, E., Nikolaidis, N., and Pitas, I. (2015). Graph-based label propagation in digital media: A review. ACM Comput. Surv., 47(3):48.
  • Zukerman and Albrecht (2001) Zukerman, I. and Albrecht, D. W. (2001). Predictive statistical models for user modeling. User Modeling and User-Adapted Interaction, 11(1-2):5–18.