Algorithm Selection for Collaborative Filtering: the influence of graph metafeatures and multicriteria metatargets

07/23/2018 ∙ by Tiago Cunha, et al. ∙ Universidade de São Paulo Universidade do Porto 0

To select the best algorithm for a new problem is an expensive and difficult task. However, there are automatic solutions to address this problem: using Metalearning, which takes advantage of problem characteristics (i.e. metafeatures), one is able to predict the relative performance of algorithms. In the Collaborative Filtering scope, recent works have proposed diverse metafeatures describing several dimensions of this problem. Despite interesting and effective findings, it is still unknown whether these are the most effective metafeatures. Hence, this work proposes a new set of graph metafeatures, which approach the Collaborative Filtering problem from a Graph Theory perspective. Furthermore, in order to understand whether metafeatures from multiple dimensions are a better fit, we investigate the effects of comprehensive metafeatures. These metafeatures are a selection of the best metafeatures from all existing Collaborative Filtering metafeatures. The impact of the most representative metafeatures is investigated in a controlled experimental setup. Another contribution we present is the use of a Pareto-Efficient ranking procedure to create multicriteria metatargets. These new rankings of algorithms, which take into account multiple evaluation measures, allow to explore the algorithm selection problem in a fairer and more detailed way. According to the experimental results, the graph metafeatures are a good alternative to related work metafeatures. However, the results have shown that the feature selection procedure used to create the comprehensive metafeatures is is not effective, since there is no gain in predictive performance. Finally, an extensive metaknowledge analysis was conducted to identify the most influential metafeatures.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The algorithm selection problem has been frequently addressed with Metalearning approaches (MtL) (Vilalta1999; Hilario2000; Brazdil2003; Prudencio2004; Smith-Miles2008a; Gomes2012; Lemke2013; Rossi2014). This technique finds the mapping between problem-specific characteristics (i.e. metafeatures) and the relative performance of learning algorithms (i.e. metatarget) (Brazdil:2008:MAD:1507541). This mapping, provided as a Machine Learning model (i.e. metamodel), can then be used to predict the best algorithms for a new problem. This task is organised into a baselevel and a metalevel. The baselevel refers to the learning task for which recommendations of algorithms are made, which, in this case, is Collaborative Filtering (CF). The metalevel refers to the learning task which studies the mapping between metafeatures and algorithm performance. In this work, the metalevel is addressed as a Label Ranking task.

Several algorithm selection approaches have been recently proposed for CF (Adomavicius2012; Ekstrand2012; Griffith2012; Matuszyk2014; Cunha2016; Cunha2017; Cunha2018128). In spite of their contribution to important advances in the area, there are still limitations that need to be addressed. These limitations are mainly related to the metafeatures and the metatarget, which are the focus of this work.

The main limitation regarding metafeatures is that most approaches only describe the recommendation problem using descriptive characteristics of the rating matrix and estimates of performance on samples (i.e. landmarkers), overlooking a wide spectrum of other possibilities. Furthermore, existing papers typically perform a limited comparison between the proposed metafeatures and the ones proposed in other studies. Additionally, there is a lack of studies combining metafeatures from multiple domains in a single collection and validating their individual and combined merits in the same experimental setup.

Regarding the metatarget, the limitation lies mainly in the fact that the best algorithms per dataset are considered using only one evaluation measure at a time. Hence, to do algorithm selection according to additional measures, it is necessary to replicate the experimental procedure for each measure, represented as a different metatarget. Beyond the eficiency issues, this process is not ideal since it leads to limited and measure-dependent metaknowledge. Hence, an alternative must be found, ideally in a way which allows for a multitude of evaluation measures to be used simultaneously.

This work proposes solutions for these limitations. These are evaluated in a comprehensive experimental study. and provide the following novel contributions to the problem of CF algorithm selection:

  • Graph metafeatures: By modelling the CF problem as a bipartite graph, one is able to use an alternative way to describe the relationships between users and items. For such, this work proposes metafeatures based on Graph Theory (west2001introduction; godsil2013algebraic) and adopts aspects from systematic and hierarchical metafeature extraction processes (Cunha2016; Cunha:2017:MCF:3109859.3109899).

  • Comprehensive metafeatures: A set of metafeatures obtained by taking advantage of metafeatures from multiple domains: Rating Matrix (Cunha2016), Landmarkers (Cunha2017) and Graph metafeatures.

  • Multicriteria metatarget: The metatarget is obtained by aggregating the rankings of algorithms produced by multiple evaluation measures. We adapt Pareto-Efficient rankings (Ribeiro2013) to CF algorithm selection.

This document is organised as follows: Section 2 introduces the related work on CF, MtL and algorithm selection for CF, while Sections 3 and  4 presents the main contributions, respectively. In Section 5, the empirical setup is presented and Sections 6 and  LABEL:sec:results discuss both the preliminary analysis and the empirical results, respectively. Section LABEL:sec:conclusions presents the conclusions and directions for future work.

2 Related Work

2.1 Collaborative Filtering

CF recommendations are based on the premise that a user will probably like the items favoured by a similar user. Thus, CF employs the feedback from each individual user to recommend items to similar users 

(Yang2014). The feedback is a numeric value, proportional to the user’s interest in an item. Most feedback is based on a rating scale, although variants, such as like/dislike binary responses, are also employed. The data structure is a rating matrix . It is usually described as , representing a set of users and a set of items . Each element of this matrix is the feedback provided by each user for each item.

CF algorithms can be organised into memory- and model-based (Bobadilla2013a)

. Memory-based algorithms apply heuristics to a rating matrix to extract recommendations, whereas model-based algorithms induce a predictive model from this matrix, which can later be used for future recommendations. Most memory-based algorithms adopt Nearest Neighbour strategies, while model-based are mostly based on Matrix Factorization methods 

(Yang2014). Further discussion regarding CF algorithms is available elsewhere (Yang2014).

The evaluation of Recommender Systems (RSs) is usually performed by procedures that split the dataset into training and test subsets (using sampling strategies, such as k-fold cross-validation (Herlocker2004)

) and assess the performance of the induced model on the test dataset. Different evaluation metrics can be used 

(Lu2012). The evaluation measures used depend on the type of prediction: for ratings of the items, error measures like Mean Absolute Error (MAE) or Root Mean Squared Error (RMSE) are used; for binary relevance, Precision/Recall or Area Under the Curve (AUC) are used; finally, a common measure for rankings of items is the Normalised Discounted Cumulative Gain (NDCG).

2.2 Algorithm selection using Metalearning

The algorithm selection problem has been conceptualised in 1976 by Rice (DBLP:journals/ac/Rice76). It involves the following search spaces: the problem space , the feature space , the algorithm space and the performance space . These refer respectively to problem instances, features, algorithms and evaluation measures. The problem is formulated as: for a given instance , with features , find the selection mapping into space , such that the selected algorithm maximises the performance mapping (DBLP:journals/ac/Rice76).

One of the main challenges in MtL is to define which metafeatures effectively describe how a problem matches the bias of an algorithm (Brazdil:2008:MAD:1507541). The MtL literature often divides the metafeatures into three main groups (Serban2013; Vanschoren2010):

  • Statistical and/or information-theoretical

    describe the dataset using a set of measures from statistics and information theory. Examples include simple measures, like the number of examples and features, as well as more advanced measures, like entropy, skewness and kurtosis of features and even mutual information and correlation between features;

  • Model-based

    metafeatures are made of properties extracted from models induced from a dataset. As an example, if a decision tree induction algorithm is applied to a dataset, one model-based metafeature could be the number of leaf nodes in the decision tree. The rationale is that there is a relationship between model characteristics and algorithm performance that cannot be directly captured from the dataset.

  • Landmarkers are fast estimates of the performance of an algorithm on a given dataset. Since these estimates are used as metafeatures, it is important that they are computationally much faster than applying the algorithm to the dataset (e.g. using hold-out to estimate performance). Two different types of landmarkers can be obtained by 1) applying fast and/or simplified algorithms on complete datasets (e.g. a decision stump can be regarded as a simplified version of a decision tree) ; 2) applying conventional algorithms to a sample extracted from a dataset, also known as subsampling landmarkers (Brazdil:2008:MAD:1507541) (e.g. applying a decision tree induction algorithm to a sample extracted from a dataset).

Recently, a systematic metafeature framework (Pinto2016) has been proposed to simplify the process of designing metafeatures for a MtL task. The framework requires three main elements: objects (e.g. numeric variables), functions (e.g. correlation) and post-functions (e.g. average). In order to generate a single metafeature value, the metafeature extraction procedure applies each function to all possible set of compatible objects (e.g., correlation between every pair of numeric variables). This yields multiple values, and a post-function is applied to those values to obtain a metafeature. The metafeatures created using this framework are represented as: . One important property of this framework is recursiveness. As a result, the outcome of an inner level (IL) application of the framework can be used as the result of an outer level (OL) function. Formally:

2.3 Metalearning for Collaborative Filtering

Recently, a few MtL approaches were proposed to the problem of selecting CF algorithms. Two types of metafeatures have been used for that purpose, statistical and/or information-theoretical and subsampling landmarkers. In the rest of this document, we will use the terms metafeature and meta-approach to refer to the descriptors and MtL approaches, respectively.

Statistical and/or information-theoretical

Existing studies have made arbitrary choices in the development of metafeatures (Adomavicius2012; Ekstrand2012; Griffith2012; Matuszyk2014), A systematic approach to the design of metafeatures for CF was proposed recently (Cunha2016). These metafeatures describe the rating matrix using the systematic framework summarised earlier. It analyses an extensive combination of a set of objects (rating matrix , and its rows and columns ), a set of function (original ratings (ratings), number of ratings (count), mean rating value (mean) and sum of ratings (sum)), and a set of post-functions

(maximum, minimum, mean, standard deviation, median, mode, entropy, Gini index, skewness and kurtosis). This class of metafeatures will be identified as RM from this point onward.

Subsampling Landmarkers

A single approach uses this type of metafeatures in the CF scope (Cunha2017). These data characteristics are obtained by assessing the performance of the CF algorithms on random samples of the datasets. These estimates are combined to create different metafeatures. Performance is estimated using different evaluation measures, which leads to a set of metafeatures for each measure. Although the work studied different landmarking perspectives (i.e. relative landmarkers (Furnkranz2002)), which manipulate the values in different ways in order to properly explore the problem, no significant gain of performance was obtained. Therefore, this work considers simply the performance values as metafeatures. The format used to describe these metafeatures is: algorithm.evaluation measure. This class of metafeatures is referred to as SL in the remainder of this document.

3 Graph metafeatures

Given that CF’s rating matrix can be regarded as a (weighed) adjacency matrix, it means that a CF problem can be represented as a graph. It is our belief that the extraction of new metafeatures using this graph representation can provide new information not captured by other meta-approaches. Among other benefits, it allows not only to model but also to describe the problem in more detail. Thus, the main motivations for this new approach are two-fold:

  • Data structure compatibility: The rating matrix data can be correctly described using a bipartite graph. For such, it can be assumed that rows and columns refer to independent sets of nodes and that the feedback values stored within the matrix are represented as edge between nodes.

  • Neighbourhood characterisation: Metafeatures that characterize users in terms of their neighbourhood have been used before in algorithm selection for CF (Griffith2012). The approach is capable of creating user-specific metafeatures, responsible to describe a user by statistics of its neighbours. However, it is not able to generate metafeatures which represent all neighbourhoods in a dataset. Hence, if the problem is represented by a graph, extracting complex neighbourhood statistics becomes easy.

As a result, this study models the problem as a bipartite graph , whose nodes and represent users and items, respectively. The set of edges connects elements of the two groups and represent the feedback of users regarding items. The edges can be weighted, hence representing preference values (i.e. ratings). Figure 1 presents an example with the two representations for the same CF problem.

(a) Rating Matrix.







(b) Bipartite Graph.
Figure 1: Toy example for two different CF representations.

The proposed graph meta-approach is based on Graph Theory (west2001introduction; godsil2013algebraic). Although the literature provides several functions for graph characterisation that can be used for this purpose, they have a major limitation: the characteristics describe the graph at a high-level, which limits the information to be extracted. For instance, the amount of information available in measures such as the number of nodes or edges is limited for our purpose.

To deal with this limitation, we use the systematic metafeature extraction (Pinto2016) and hierarchical decomposition of complex data structures (Cunha:2017:MCF:3109859.3109899) approaches for metafeature design. It is important to notice that since we were unable to find any graph-based metafeatures in the literature, we have adopted an exploratory approach: this means that we use as many graph characterisation features as possible and then try to identify which ones are informative. Then, we propose metafeatures extracted from graphs at different levels:

  • Graph-level properties that describe the graph in a high level perspective;

  • Node-level characteristics relating nodes through their edges relationships;

  • Pairwise-level properties obtained by node pairwise comparisons;

  • Subgraph-level characteristics summarising relevant subgraphs.

3.1 Graph-level

When trying to propose metafeatures for a complex structure, it is common to consider high level characteristics first. Although in the context of algorithm selection this is not typically effective Cunha:2017:MCF:3109859.3109899, it is nevertheless important to verify it. Hence, at this level, only one object is considered for metafeature extraction: the whole bipartite graph , which can be directly characterised through several Graph Theory measures (west2001introduction; godsil2013algebraic). This work selects a subset of potentially useful characteristics to be used as metafeatures. These are:


The functions refer, respectively, to the ratio of the number of existing edges over the number of possible edges, length of the shortest circle, number of nodes, number of edges and the smallest maximum distance between the farthest nodes of the graph. The formalisation of these functions lie outside the scope of this work.111The interested reader may find more information in the graph theory literature (west2001introduction; godsil2013algebraic). Since these functions return a single value, no metafeature used at this level requires post-processing. This is represented by the symbol .

3.2 Node-level

In this level we argue that since nodes represent the main entities in the graph, it is potentially beneficial to extract characreristics which represent them and their edges on a global perspective. Specifically in this case, where two clearly well defined sets of nodes exist (i.e. users and items), it is important to find suitable characteristics for each one. If one is able to properly characterize the users through their relationships to items (and vice-versa), then hopefully we will be able to find metafeatures able to represent the way CF operates: new items are recommended based on the preferences of users with similar tastes.

Hence, Node-level metafeatures use three different objects: the graph , the set of users and the set of items . These consider the entire graph and each subset independently. This separation of concepts allows a more extensive analysis and to understand whether the different subsets of nodes hold different degrees of importance for the MtL problem. For instance, if we find that metafeatures related to the users are not informative, then this presents interesting insights to the algorithm selection problem. However, if we considered all nodes, we are unable to make such analysis. The functions used at this level describe the nodes through their edge relationships. We select a wide variety of functions which are suitable to describe bipartite graphs:

  • Alpha centrality: Bonacich’s alpha centrality (BONACICH2001191);

  • Authority score: Kleinberg’s authority score (Kleinberg:1999:ASH:324133.324140);

  • Closeness centrality: the inverse of the average length of the shortest paths to/from all the other nodes in the graph;

  • Constraint: Burt’s constraint score (doi:10.1086/421787);

  • Coreness: the coreness of a node is if it belongs to the -core (maximal subgraph in which each node has at least degree ) but not to the ()-core.

  • Degree: the number of adjacent edges;

  • Diversity: the Shannon entropy of the weights of a node’s incident edges;

  • Eccentricity: shortest path from the farthest node in the graph;

  • Eigenvector Centrality score:

    the values of the first eigenvector of the adjacency matrix;

  • Hub score: Kleinberg’s hub centrality score (Kleinberg:1999:ASH:324133.324140);

  • KNN: average nearest neighbour degree;

  • Neighbours; amount of adjacent nodes in a graph;

  • Local Scan: average edge weights;

  • PageRank: Google’s PageRank score per node.

  • Strength: sum of adjacent edges weights.

Since the application of these functions to the nodes of a graph return a set of values, these values must be aggregated to have a single value for the metafeature. To do so, this work employs post-processing functions

, which return the following single values: mean, variance, skewness and entropy. These functions, based on statistical univariate analysis (central tendency, dispersion and shape) and Information Theory, have performed well in other recommendation metafeatures 

(Cunha:2017:MCF:3109859.3109899). These metafeatures can be formally described as:


3.3 Pairwise-level

Having exhausted the ability to characterize nodes by their explicit edge relationships, one must find alternative ways to explore implicit patterns. A methodology which proved to be successful in other algorithm selection domains Cunha:2017:MCF:3109859.3109899 performs pairwise comparisons of simpler elements in the complex data structure and aggregates its values to create a global score to characterize the entire structure. These comparisons allow to understand whether there are important relationships among said elements which represent overall patterns.

Hence, the pairwise metafeatures designed in this level are based on the comparison among all pairs of nodes. Due to the complexity of the data structure, the pairwise-level defines 2 layers - inner (IL) and outer (OL) - which we present next.

3.3.1 Inner Layer (IL)

The IL, responsible for node comparison, applies pairwise comparison functions to all pairs of nodes . The output is stored in the specific row and column of a IL matrix, used to keep records of pairwise comparisons. Figure 2 presents such data structure, with rows and columns referring to the same set of nodes.

Figure 2: IL matrix for all nodes .

The functions used to perform pairwise comparisons are:

  • Similarity: the number of common neighbours divided by the number of nodes that are neighbours of at least one of the two nodes being considered;

  • Distance: length of the shortest paths between nodes.

The post-processing functions used in this layer are the matrix post-processing functions (). The sum, mean, count and variance functions are applied to each matrix row (alternatively, given the symmetry in the IL matrix, could be applied to each column). The output is a set of summarised comparison values for each function. Such values are submitted to the OL to obtain the final metafeatures.

3.3.2 Outer Layer (OL)

The OL takes advantage of the recursiveness in the systematic metafeature framework. It does so by using the same objects as used in the Node-level: ,,. Each of these sets of nodes are separately submitted to the IL to obtain the actual node comparison scores. This means that effectively we perform 3 IL operations. Finally, the values returned by each set of nodes are aggregated to create the final metafeatures using the same post-processing functions as before: mean, variance, skewness and entropy. The formalization of the metafeatures in this level is (refer to Section 2.2 for interpretation of the recursive notation used):


3.4 Subgraph-level

So far, we have described measures that characterize the whole graph or very small parts of it (nodes and pairs of nodes). However, a graph may contain parts that have very specific structures, which are different from the rest (e.g. the most popular items will define a very dense subgraph). Therefore, it is important to include metafeatures that provide information about those subgraphs. Hence, the metafeatures at this level split the graph into relevant subgraphs, describes each one with specific functions and aggregates the final outcome to produce the metafeature. Once again, due to complexity, we define one IL and one OL. We start by describing how a subgraph is characterized in the IL and move to the OL afterwards.

3.4.1 Inner Layer (IL)

The IL assumes the existence of a subgraph. Our proposal is to use Node-level metafeatures to describe it. We could also include the Pairwise-level metafeatures also in this scope. However, due to the high computational resources required we have discarded them at this stage. Since the outcome is a metafeature value for each node in the subgraph, the values necessary to describe the overall subgraph must be aggregated. In order to deal with this issue, the mean, variance, skewness and entropy functions are used.

3.4.2 Outer Layer (OL)

The OL is responsible to create the subgraphs to be provided to the IL. The subgraphs characterised here are:

  • Communities: obtained using the Louvain’s community detection (DBLP:journals/corr/abs-0803-0476) algorithm, which operates by multilevel optimisation of modularity;

  • Components: subgraphs of maximal strongly connected nodes of a graph.

After providing each community and component to the IL, one must once again aggregate the results. This is necessary to obtain a fixed-size description of the communities and components that characterizes a varying number of its subgraphs. These metafeatures can be formally defined as:


4 Multicriteria Metatarget

MtL focuses mainly on which are the most informative metafeatures to predict the best algorithms (Adomavicius2012; Ekstrand2012; Griffith2012; Matuszyk2014; Cunha2016; Cunha2017; Cunha2018128). However, the way the best algorithms are selected to build the metatarget is usually simplified: a specific evaluation metric is selected and used to assess the performance of a set of algorithms on a specific dataset. Then, the best algorithm according to that specific dataset is used as its metatarget.

The main problem with this approach is that a single evaluation measure is usually not enough to properly and completely characterize the performance of an algorithm. In fact, this has been identified as a particularly important issue in the RS scope (Herlocker2004; Gunawardana2009a; Ciordas2010)

, as multiple, sometimes conflicting, measures are equally important (e.g. precision and recall). Hence, it makes sense that any MtL approach for RS methods must analyse the algorithm selection problem, while taking into account the inputs of multiple evaluation measures to create a multicriteria metatarget.

This section describes our proposal to tackle this issue: the multicriteria metatarget. It is important to notice that unlike earlier works which considered only the best algorithm per dataset to build the metatarget (Cunha2016; Cunha2017; Cunha2018128), this work builds upon a recent work which has shown the importance of using rankings of algorithms (Cunha2018). Hence, our multicriteria metatarget procedure takes into account algorithm rankings provided by various evaluation measures to create a multicriteria ranking of algorithms.

Before dwelling in the inner workings of the procedure, let us assign proper notation. Let us assume the following concepts: consider as the group of CF datasets, as an ordered collection of CF algorithms and as the set of evaluation measures. To create the metatargets, first every dataset is subjected to all algorithms to create recommendation models. Afterwards, every model is evaluated using a specific evaluation measure in order to obtain a performance , which characterizes how good the model is for that problem accordingly to the scope the evaluation measure assesses. Then, for every and measure , the performance values are sorted with decreasing degree of importance to create an array of algorithm rankings . This ranking refers to positions in the ordered collection of algorithms , meaning that . Notice that can also be regarded as a sequence of pairs of algorithms and rankings positions: . Formally, the ranking metatarget is:


The problem addressed here lies in the cases where more than one evaluation measure must be used to create the multicriteria metatarget. To do so, we adapt Pareto-Efficient Rankings (Ribeiro2013), originally proposed to create a single ranked lists of items using rankings predicted by different recommendation algorithms. First, let us inspect the original rationale: consider a User-Interest space, which is used to represent the preferences each algorithm defines for multiple Items for a specific User. This space is used to define Pareto frontiers, which in turn allows to create multicriteria rankings of Items considering the inputs of multiple Algorithms. Notice the obvious parallelism to our problem: if we consider that User, Algorithm and Item concepts are now represented as Dataset, Evaluation Measure and Algorithm, then the task can be similarly expressed:

Claim 1

For every {User/Dataset}, create a ranking of {Items/Algorithms} which considers the preferences of multiple {Algorithms/Evaluation Measures}.

To adapt the Pareto-Efficient Rankings to the multicriteria metatarget, we must first build the Dataset-Interest space . This space is shown in Figure 3, which shows two evaluation measures and fifteen algorithms as the axis and points of the problem, respectively.

Figure 3: Dataset-Interest space.

The Figure also shows the Pareto frontiers, which delimit the areas of Pareto dominance for the investigated algorithms, allowing to state when an algorithm is superior to another. The frontiers allow to understand two different relationships: algorithms within the same frontier can be considered similar, while those in different frontiers are effectively different. Similarly to the original work, the frontiers are calculated using the skyline operator algorithm (Lin2007).

Formally, consider that the skyline operator creates a set of frontiers, where each frontier is represented as . This means that the output is now a sequence of pairs , which assign each algorithm to a specific frontier. Our proposal at this point is to use such frontier values as algorithms rankings instead of using sorting mechanisms like previously. Considering how this procedure takes Pareto dominance into account the advantages are two-fold: (1) since we are not forced to assign a different ranking to all algorithms, this results in a more representative and fair assignment of algorithm ranking positions and (2) since the process is defined using a multidimensional Dataset-Interest space, then any number of evaluation measures can be used simultaneously.

5 Empirical setup

This section presents the experimental setup. In order to ensure fair comparison of meta-approaches, based only on the predictive performance, the following constraints are adopted: (1) the baselevel datasets, algorithms and evaluation measures are exactly the same for all experiments; (2) all metalevel characteristics (multicriteria metatargets, algorithms and evaluation measures) are fixed, thus only the metafeatures change. Since the work considers baselevel and metalevel algorithms, we will refer to them as baselearners and metalearners, respectively.

5.1 Collaborative Filtering

The baselevel setup is concerned with the CF datasets, baselearners and measures used to evaluate the performance of CF baselearners when applied to these datasets. The 38 datasets used in the experiments are described in Table 1, alongside a summary of their statistics, namely the number of users, items and ratings.

Dataset #users #items #ratings Reference
Amazon Apps 132391 24366 264233 (McAuley2013)
Amazon Automotive 85142 73135 138039
Amazon Baby 53188 23092 91468
Amazon Beauty 121027 76253 202719
Amazon CD 157862 151198 371275
Amazon Clothes 311726 267503 574029
Amazon Digital Music 47824 47313 83863
Amazon Food 76844 51139 130235
Amazon Games 82676 24600 133726
Amazon Garden 71480 34004 99111
Amazon Health 185112 84108 298802
Amazon Home 251162 123878 425764
Amazon Instant Video 42692 8882 58437
Amazon Instruments 33922 22964 50394
Amazon Kindle 137107 131122 308158
Amazon Movies 7278 1847 11215
Amazon Office 90932 39229 124095
Amazon Pet Supplies 74099 33852 123236
Amazon Phones 226105 91289 345285
Amazon Sports 199052 127620 326941
Amazon Tools 121248 73742 192015
Amazon Toys 134291 94594 225670
Bookcrossing 7780 29533 39944 (Ziegler2005)
Flixter 14761 22040 812930 (Zafarani+Liu:2009)
Jester1 2498 100 181560 (Goldberg2001)
Jester2 2350 100 169783
Jester3 2493 96 61770
Movielens 100k 94 1202 9759 (GroupLens2016)
Movielens 10m 6987 9814 1017159
Movielens 1m 604 3421 106926
Movielens 20m 13849 16680 2036552
Movielens Latest 22906 17133 2111176
MovieTweetings latest 3702 7358 39097 (Dooms13crowdrec)
MovieTweetings RecSys 2491 4754 20913
Tripadvisor 77851 10590 151030 (Wang2011)
Yahoo! Movies 764 4078 22135 (Yahoo)
Yahoo! Music 613 4620 30852
Yelp 55233 46045 211627 (Yelp2016)
Table 1: Summary description about the datasets used in the experimental study.

The experiments were carried out with MyMediaLite, a RS library (Gantner2011). Two CF tasks were addressed: Rating Prediction and Item Recommendation. While the first aims to predict the rating an user would assign to a new instance, the second aims to recommend a ranked list of items. Since the tasks are different, so are the baselearners and evaluation measures required.

The following CF baselearners were used for Rating Prediction: Matrix Factorization (MF), Biased MF (BMF) (Salakhutdinov2008), Latent Feature Log Linear Model (LFLLM) (Menon2010), SVD++ (Koren2008), 3 variants of Sigmoid Asymmetric Factor Model (SIAFM, SUAFM and SCAFM) (Paterek2007), User Item Baseline (UIB) (Koren2010) and Global Average (GA). Regarding Item Recommendation, the baselearners used are BPRMF (Rendle2009), Weighted BPRMF (WBPRMF) (Rendle2009), Soft Margin Ranking MF (SMRMF) (Weimer2008), WRMF (Hu2008a) and Most Popular (MP). All baselearners were selected since they represent different Matrix Factorization techniques, which are widely used in CF both in academia and industry due to their predictive power and computational efficiency.

In the experiments carried out, for Item Recommendation, the baselearners are evaluated using NDCG and AUC, while for Rating Prediction the evaluation measures NMAE and RMSE are used. All experiments are performed using 10-fold cross-validation. In order to prevent bias in favour of any baselearner, the hyperparameters were not optimised..

5.2 Label Ranking as the Metalearning approach

This work studies the performance of 4 meta-approaches: Rating Matrix metafeatures (RM) (Cunha2016), Subsampling Landmarkers (SL) (Cunha2017), the proposed Graph metafeatures (GR) and the Comprehensive metafeatures (CM). The last metafeatures are obtained aggregating all metafeatures from all existing meta-approaches and performing Correlation Feature Selection. Empirical validation has shown that setting the cutoff threshold at 70% yields the best results.

The multicriteria metatarget procedure is used to create the metatargets. Hence, for each CF problem studied (Rating Prediction and Item Recommendation), all specific evaluation measures are considered to create the Dataset-Interest spaces. This means that while NDCG and AUC are used for the Item Recommendation problem, NMAE and RMSE are used for Rating Prediction. Next, the Pareto-Efficient ranking procedure is employed for each dataset to generate a ranking of baselearners for Item Recommendation and another for Rating Prediction. The process is repeated for all remaining datasets in order to generate the complete metatargets.

The MtL problem is addressed as a Label Ranking task. The following metalearners are used to induce metamodels: KNN (Soares2015)

, Ranking Tree (RT), Ranking Random Forest(RF) 

(EXSY:EXSY12166), and the baseline Average Ranking (AVG). The results are evaluated in terms of Kendall’s Tau using leave one out cross-validation, due to the small number of meta-examples. Also, since we aim to obtain the best possible performance from the metalearners, we employ grid search optimisation.

6 Preliminary Analysis

6.1 Graph Metafeatures Analysis

This analysis applies Correlation Feature Selection (CFS) to all proposed Graph metafeatures. It has two goals: (1) to remove unnecessary metafeatures and (2) to understand which levels of the proposed meta-approach are relevant to the investigated problem. Table 6.1 presents the metafeatures selected (65 out of 761), organised by level and number of metafeatures kept in the level. Each metafeature is presented using the notations introduced in Section 3.

Table 2: Graph metafeatures used in the experiments after CFS.