1 Introduction
The algorithm selection problem has been frequently addressed with Metalearning approaches (MtL) (Vilalta1999; Hilario2000; Brazdil2003; Prudencio2004; SmithMiles2008a; Gomes2012; Lemke2013; Rossi2014). This technique finds the mapping between problemspecific characteristics (i.e. metafeatures) and the relative performance of learning algorithms (i.e. metatarget) (Brazdil:2008:MAD:1507541). This mapping, provided as a Machine Learning model (i.e. metamodel), can then be used to predict the best algorithms for a new problem. This task is organised into a baselevel and a metalevel. The baselevel refers to the learning task for which recommendations of algorithms are made, which, in this case, is Collaborative Filtering (CF). The metalevel refers to the learning task which studies the mapping between metafeatures and algorithm performance. In this work, the metalevel is addressed as a Label Ranking task.
Several algorithm selection approaches have been recently proposed for CF (Adomavicius2012; Ekstrand2012; Griffith2012; Matuszyk2014; Cunha2016; Cunha2017; Cunha2018128). In spite of their contribution to important advances in the area, there are still limitations that need to be addressed. These limitations are mainly related to the metafeatures and the metatarget, which are the focus of this work.
The main limitation regarding metafeatures is that most approaches only describe the recommendation problem using descriptive characteristics of the rating matrix and estimates of performance on samples (i.e. landmarkers), overlooking a wide spectrum of other possibilities. Furthermore, existing papers typically perform a limited comparison between the proposed metafeatures and the ones proposed in other studies. Additionally, there is a lack of studies combining metafeatures from multiple domains in a single collection and validating their individual and combined merits in the same experimental setup.
Regarding the metatarget, the limitation lies mainly in the fact that the best algorithms per dataset are considered using only one evaluation measure at a time. Hence, to do algorithm selection according to additional measures, it is necessary to replicate the experimental procedure for each measure, represented as a different metatarget. Beyond the eficiency issues, this process is not ideal since it leads to limited and measuredependent metaknowledge. Hence, an alternative must be found, ideally in a way which allows for a multitude of evaluation measures to be used simultaneously.
This work proposes solutions for these limitations. These are evaluated in a comprehensive experimental study. and provide the following novel contributions to the problem of CF algorithm selection:

Graph metafeatures: By modelling the CF problem as a bipartite graph, one is able to use an alternative way to describe the relationships between users and items. For such, this work proposes metafeatures based on Graph Theory (west2001introduction; godsil2013algebraic) and adopts aspects from systematic and hierarchical metafeature extraction processes (Cunha2016; Cunha:2017:MCF:3109859.3109899).

Comprehensive metafeatures: A set of metafeatures obtained by taking advantage of metafeatures from multiple domains: Rating Matrix (Cunha2016), Landmarkers (Cunha2017) and Graph metafeatures.

Multicriteria metatarget: The metatarget is obtained by aggregating the rankings of algorithms produced by multiple evaluation measures. We adapt ParetoEfficient rankings (Ribeiro2013) to CF algorithm selection.
This document is organised as follows: Section 2 introduces the related work on CF, MtL and algorithm selection for CF, while Sections 3 and 4 presents the main contributions, respectively. In Section 5, the empirical setup is presented and Sections 6 and LABEL:sec:results discuss both the preliminary analysis and the empirical results, respectively. Section LABEL:sec:conclusions presents the conclusions and directions for future work.
2 Related Work
2.1 Collaborative Filtering
CF recommendations are based on the premise that a user will probably like the items favoured by a similar user. Thus, CF employs the feedback from each individual user to recommend items to similar users
(Yang2014). The feedback is a numeric value, proportional to the user’s interest in an item. Most feedback is based on a rating scale, although variants, such as like/dislike binary responses, are also employed. The data structure is a rating matrix . It is usually described as , representing a set of users and a set of items . Each element of this matrix is the feedback provided by each user for each item.CF algorithms can be organised into memory and modelbased (Bobadilla2013a)
. Memorybased algorithms apply heuristics to a rating matrix to extract recommendations, whereas modelbased algorithms induce a predictive model from this matrix, which can later be used for future recommendations. Most memorybased algorithms adopt Nearest Neighbour strategies, while modelbased are mostly based on Matrix Factorization methods
(Yang2014). Further discussion regarding CF algorithms is available elsewhere (Yang2014).The evaluation of Recommender Systems (RSs) is usually performed by procedures that split the dataset into training and test subsets (using sampling strategies, such as kfold crossvalidation (Herlocker2004)
) and assess the performance of the induced model on the test dataset. Different evaluation metrics can be used
(Lu2012). The evaluation measures used depend on the type of prediction: for ratings of the items, error measures like Mean Absolute Error (MAE) or Root Mean Squared Error (RMSE) are used; for binary relevance, Precision/Recall or Area Under the Curve (AUC) are used; finally, a common measure for rankings of items is the Normalised Discounted Cumulative Gain (NDCG).2.2 Algorithm selection using Metalearning
The algorithm selection problem has been conceptualised in 1976 by Rice (DBLP:journals/ac/Rice76). It involves the following search spaces: the problem space , the feature space , the algorithm space and the performance space . These refer respectively to problem instances, features, algorithms and evaluation measures. The problem is formulated as: for a given instance , with features , find the selection mapping into space , such that the selected algorithm maximises the performance mapping (DBLP:journals/ac/Rice76).
One of the main challenges in MtL is to define which metafeatures effectively describe how a problem matches the bias of an algorithm (Brazdil:2008:MAD:1507541). The MtL literature often divides the metafeatures into three main groups (Serban2013; Vanschoren2010):

Statistical and/or informationtheoretical
describe the dataset using a set of measures from statistics and information theory. Examples include simple measures, like the number of examples and features, as well as more advanced measures, like entropy, skewness and kurtosis of features and even mutual information and correlation between features;

Modelbased
metafeatures are made of properties extracted from models induced from a dataset. As an example, if a decision tree induction algorithm is applied to a dataset, one modelbased metafeature could be the number of leaf nodes in the decision tree. The rationale is that there is a relationship between model characteristics and algorithm performance that cannot be directly captured from the dataset.

Landmarkers are fast estimates of the performance of an algorithm on a given dataset. Since these estimates are used as metafeatures, it is important that they are computationally much faster than applying the algorithm to the dataset (e.g. using holdout to estimate performance). Two different types of landmarkers can be obtained by 1) applying fast and/or simplified algorithms on complete datasets (e.g. a decision stump can be regarded as a simplified version of a decision tree) ; 2) applying conventional algorithms to a sample extracted from a dataset, also known as subsampling landmarkers (Brazdil:2008:MAD:1507541) (e.g. applying a decision tree induction algorithm to a sample extracted from a dataset).
Recently, a systematic metafeature framework (Pinto2016) has been proposed to simplify the process of designing metafeatures for a MtL task. The framework requires three main elements: objects (e.g. numeric variables), functions (e.g. correlation) and postfunctions (e.g. average). In order to generate a single metafeature value, the metafeature extraction procedure applies each function to all possible set of compatible objects (e.g., correlation between every pair of numeric variables). This yields multiple values, and a postfunction is applied to those values to obtain a metafeature. The metafeatures created using this framework are represented as: . One important property of this framework is recursiveness. As a result, the outcome of an inner level (IL) application of the framework can be used as the result of an outer level (OL) function. Formally:
2.3 Metalearning for Collaborative Filtering
Recently, a few MtL approaches were proposed to the problem of selecting CF algorithms. Two types of metafeatures have been used for that purpose, statistical and/or informationtheoretical and subsampling landmarkers. In the rest of this document, we will use the terms metafeature and metaapproach to refer to the descriptors and MtL approaches, respectively.
Statistical and/or informationtheoretical
Existing studies have made arbitrary choices in the development of metafeatures (Adomavicius2012; Ekstrand2012; Griffith2012; Matuszyk2014), A systematic approach to the design of metafeatures for CF was proposed recently (Cunha2016). These metafeatures describe the rating matrix using the systematic framework summarised earlier. It analyses an extensive combination of a set of objects (rating matrix , and its rows and columns ), a set of function (original ratings (ratings), number of ratings (count), mean rating value (mean) and sum of ratings (sum)), and a set of postfunctions
(maximum, minimum, mean, standard deviation, median, mode, entropy, Gini index, skewness and kurtosis). This class of metafeatures will be identified as RM from this point onward.
Subsampling Landmarkers
A single approach uses this type of metafeatures in the CF scope (Cunha2017). These data characteristics are obtained by assessing the performance of the CF algorithms on random samples of the datasets. These estimates are combined to create different metafeatures. Performance is estimated using different evaluation measures, which leads to a set of metafeatures for each measure. Although the work studied different landmarking perspectives (i.e. relative landmarkers (Furnkranz2002)), which manipulate the values in different ways in order to properly explore the problem, no significant gain of performance was obtained. Therefore, this work considers simply the performance values as metafeatures. The format used to describe these metafeatures is: algorithm.evaluation measure. This class of metafeatures is referred to as SL in the remainder of this document.
3 Graph metafeatures
Given that CF’s rating matrix can be regarded as a (weighed) adjacency matrix, it means that a CF problem can be represented as a graph. It is our belief that the extraction of new metafeatures using this graph representation can provide new information not captured by other metaapproaches. Among other benefits, it allows not only to model but also to describe the problem in more detail. Thus, the main motivations for this new approach are twofold:

Data structure compatibility: The rating matrix data can be correctly described using a bipartite graph. For such, it can be assumed that rows and columns refer to independent sets of nodes and that the feedback values stored within the matrix are represented as edge between nodes.

Neighbourhood characterisation: Metafeatures that characterize users in terms of their neighbourhood have been used before in algorithm selection for CF (Griffith2012). The approach is capable of creating userspecific metafeatures, responsible to describe a user by statistics of its neighbours. However, it is not able to generate metafeatures which represent all neighbourhoods in a dataset. Hence, if the problem is represented by a graph, extracting complex neighbourhood statistics becomes easy.
As a result, this study models the problem as a bipartite graph , whose nodes and represent users and items, respectively. The set of edges connects elements of the two groups and represent the feedback of users regarding items. The edges can be weighted, hence representing preference values (i.e. ratings). Figure 1 presents an example with the two representations for the same CF problem.
The proposed graph metaapproach is based on Graph Theory (west2001introduction; godsil2013algebraic). Although the literature provides several functions for graph characterisation that can be used for this purpose, they have a major limitation: the characteristics describe the graph at a highlevel, which limits the information to be extracted. For instance, the amount of information available in measures such as the number of nodes or edges is limited for our purpose.
To deal with this limitation, we use the systematic metafeature extraction (Pinto2016) and hierarchical decomposition of complex data structures (Cunha:2017:MCF:3109859.3109899) approaches for metafeature design. It is important to notice that since we were unable to find any graphbased metafeatures in the literature, we have adopted an exploratory approach: this means that we use as many graph characterisation features as possible and then try to identify which ones are informative. Then, we propose metafeatures extracted from graphs at different levels:

Graphlevel properties that describe the graph in a high level perspective;

Nodelevel characteristics relating nodes through their edges relationships;

Pairwiselevel properties obtained by node pairwise comparisons;

Subgraphlevel characteristics summarising relevant subgraphs.
3.1 Graphlevel
When trying to propose metafeatures for a complex structure, it is common to consider high level characteristics first. Although in the context of algorithm selection this is not typically effective Cunha:2017:MCF:3109859.3109899, it is nevertheless important to verify it. Hence, at this level, only one object is considered for metafeature extraction: the whole bipartite graph , which can be directly characterised through several Graph Theory measures (west2001introduction; godsil2013algebraic). This work selects a subset of potentially useful characteristics to be used as metafeatures. These are:
(1) 
The functions refer, respectively, to the ratio of the number of existing edges over the number of possible edges, length of the shortest circle, number of nodes, number of edges and the smallest maximum distance between the farthest nodes of the graph. The formalisation of these functions lie outside the scope of this work.^{1}^{1}1The interested reader may find more information in the graph theory literature (west2001introduction; godsil2013algebraic). Since these functions return a single value, no metafeature used at this level requires postprocessing. This is represented by the symbol .
3.2 Nodelevel
In this level we argue that since nodes represent the main entities in the graph, it is potentially beneficial to extract characreristics which represent them and their edges on a global perspective. Specifically in this case, where two clearly well defined sets of nodes exist (i.e. users and items), it is important to find suitable characteristics for each one. If one is able to properly characterize the users through their relationships to items (and viceversa), then hopefully we will be able to find metafeatures able to represent the way CF operates: new items are recommended based on the preferences of users with similar tastes.
Hence, Nodelevel metafeatures use three different objects: the graph , the set of users and the set of items . These consider the entire graph and each subset independently. This separation of concepts allows a more extensive analysis and to understand whether the different subsets of nodes hold different degrees of importance for the MtL problem. For instance, if we find that metafeatures related to the users are not informative, then this presents interesting insights to the algorithm selection problem. However, if we considered all nodes, we are unable to make such analysis. The functions used at this level describe the nodes through their edge relationships. We select a wide variety of functions which are suitable to describe bipartite graphs:

Alpha centrality: Bonacich’s alpha centrality (BONACICH2001191);

Authority score: Kleinberg’s authority score (Kleinberg:1999:ASH:324133.324140);

Closeness centrality: the inverse of the average length of the shortest paths to/from all the other nodes in the graph;

Constraint: Burt’s constraint score (doi:10.1086/421787);

Coreness: the coreness of a node is if it belongs to the core (maximal subgraph in which each node has at least degree ) but not to the ()core.

Degree: the number of adjacent edges;

Diversity: the Shannon entropy of the weights of a node’s incident edges;

Eccentricity: shortest path from the farthest node in the graph;

Eigenvector Centrality score:
the values of the first eigenvector of the adjacency matrix;

Hub score: Kleinberg’s hub centrality score (Kleinberg:1999:ASH:324133.324140);

KNN: average nearest neighbour degree;

Neighbours; amount of adjacent nodes in a graph;

Local Scan: average edge weights;

PageRank: Google’s PageRank score per node.

Strength: sum of adjacent edges weights.
Since the application of these functions to the nodes of a graph return a set of values, these values must be aggregated to have a single value for the metafeature. To do so, this work employs postprocessing functions
, which return the following single values: mean, variance, skewness and entropy. These functions, based on statistical univariate analysis (central tendency, dispersion and shape) and Information Theory, have performed well in other recommendation metafeatures
(Cunha:2017:MCF:3109859.3109899). These metafeatures can be formally described as:(2)  
3.3 Pairwiselevel
Having exhausted the ability to characterize nodes by their explicit edge relationships, one must find alternative ways to explore implicit patterns. A methodology which proved to be successful in other algorithm selection domains Cunha:2017:MCF:3109859.3109899 performs pairwise comparisons of simpler elements in the complex data structure and aggregates its values to create a global score to characterize the entire structure. These comparisons allow to understand whether there are important relationships among said elements which represent overall patterns.
Hence, the pairwise metafeatures designed in this level are based on the comparison among all pairs of nodes. Due to the complexity of the data structure, the pairwiselevel defines 2 layers  inner (IL) and outer (OL)  which we present next.
3.3.1 Inner Layer (IL)
The IL, responsible for node comparison, applies pairwise comparison functions to all pairs of nodes . The output is stored in the specific row and column of a IL matrix, used to keep records of pairwise comparisons. Figure 2 presents such data structure, with rows and columns referring to the same set of nodes.
The functions used to perform pairwise comparisons are:

Similarity: the number of common neighbours divided by the number of nodes that are neighbours of at least one of the two nodes being considered;

Distance: length of the shortest paths between nodes.
The postprocessing functions used in this layer are the matrix postprocessing functions (). The sum, mean, count and variance functions are applied to each matrix row (alternatively, given the symmetry in the IL matrix, could be applied to each column). The output is a set of summarised comparison values for each function. Such values are submitted to the OL to obtain the final metafeatures.
3.3.2 Outer Layer (OL)
The OL takes advantage of the recursiveness in the systematic metafeature framework. It does so by using the same objects as used in the Nodelevel: ,,. Each of these sets of nodes are separately submitted to the IL to obtain the actual node comparison scores. This means that effectively we perform 3 IL operations. Finally, the values returned by each set of nodes are aggregated to create the final metafeatures using the same postprocessing functions as before: mean, variance, skewness and entropy. The formalization of the metafeatures in this level is (refer to Section 2.2 for interpretation of the recursive notation used):
(3) 
3.4 Subgraphlevel
So far, we have described measures that characterize the whole graph or very small parts of it (nodes and pairs of nodes). However, a graph may contain parts that have very specific structures, which are different from the rest (e.g. the most popular items will define a very dense subgraph). Therefore, it is important to include metafeatures that provide information about those subgraphs. Hence, the metafeatures at this level split the graph into relevant subgraphs, describes each one with specific functions and aggregates the final outcome to produce the metafeature. Once again, due to complexity, we define one IL and one OL. We start by describing how a subgraph is characterized in the IL and move to the OL afterwards.
3.4.1 Inner Layer (IL)
The IL assumes the existence of a subgraph. Our proposal is to use Nodelevel metafeatures to describe it. We could also include the Pairwiselevel metafeatures also in this scope. However, due to the high computational resources required we have discarded them at this stage. Since the outcome is a metafeature value for each node in the subgraph, the values necessary to describe the overall subgraph must be aggregated. In order to deal with this issue, the mean, variance, skewness and entropy functions are used.
3.4.2 Outer Layer (OL)
The OL is responsible to create the subgraphs to be provided to the IL. The subgraphs characterised here are:

Communities: obtained using the Louvain’s community detection (DBLP:journals/corr/abs08030476) algorithm, which operates by multilevel optimisation of modularity;

Components: subgraphs of maximal strongly connected nodes of a graph.
After providing each community and component to the IL, one must once again aggregate the results. This is necessary to obtain a fixedsize description of the communities and components that characterizes a varying number of its subgraphs. These metafeatures can be formally defined as:
(4) 
4 Multicriteria Metatarget
MtL focuses mainly on which are the most informative metafeatures to predict the best algorithms (Adomavicius2012; Ekstrand2012; Griffith2012; Matuszyk2014; Cunha2016; Cunha2017; Cunha2018128). However, the way the best algorithms are selected to build the metatarget is usually simplified: a specific evaluation metric is selected and used to assess the performance of a set of algorithms on a specific dataset. Then, the best algorithm according to that specific dataset is used as its metatarget.
The main problem with this approach is that a single evaluation measure is usually not enough to properly and completely characterize the performance of an algorithm. In fact, this has been identified as a particularly important issue in the RS scope (Herlocker2004; Gunawardana2009a; Ciordas2010)
, as multiple, sometimes conflicting, measures are equally important (e.g. precision and recall). Hence, it makes sense that any MtL approach for RS methods must analyse the algorithm selection problem, while taking into account the inputs of multiple evaluation measures to create a multicriteria metatarget.
This section describes our proposal to tackle this issue: the multicriteria metatarget. It is important to notice that unlike earlier works which considered only the best algorithm per dataset to build the metatarget (Cunha2016; Cunha2017; Cunha2018128), this work builds upon a recent work which has shown the importance of using rankings of algorithms (Cunha2018). Hence, our multicriteria metatarget procedure takes into account algorithm rankings provided by various evaluation measures to create a multicriteria ranking of algorithms.
Before dwelling in the inner workings of the procedure, let us assign proper notation. Let us assume the following concepts: consider as the group of CF datasets, as an ordered collection of CF algorithms and as the set of evaluation measures. To create the metatargets, first every dataset is subjected to all algorithms to create recommendation models. Afterwards, every model is evaluated using a specific evaluation measure in order to obtain a performance , which characterizes how good the model is for that problem accordingly to the scope the evaluation measure assesses. Then, for every and measure , the performance values are sorted with decreasing degree of importance to create an array of algorithm rankings . This ranking refers to positions in the ordered collection of algorithms , meaning that . Notice that can also be regarded as a sequence of pairs of algorithms and rankings positions: . Formally, the ranking metatarget is:
(5) 
The problem addressed here lies in the cases where more than one evaluation measure must be used to create the multicriteria metatarget. To do so, we adapt ParetoEfficient Rankings (Ribeiro2013), originally proposed to create a single ranked lists of items using rankings predicted by different recommendation algorithms. First, let us inspect the original rationale: consider a UserInterest space, which is used to represent the preferences each algorithm defines for multiple Items for a specific User. This space is used to define Pareto frontiers, which in turn allows to create multicriteria rankings of Items considering the inputs of multiple Algorithms. Notice the obvious parallelism to our problem: if we consider that User, Algorithm and Item concepts are now represented as Dataset, Evaluation Measure and Algorithm, then the task can be similarly expressed:
Claim 1
For every {User/Dataset}, create a ranking of {Items/Algorithms} which considers the preferences of multiple {Algorithms/Evaluation Measures}.
To adapt the ParetoEfficient Rankings to the multicriteria metatarget, we must first build the DatasetInterest space . This space is shown in Figure 3, which shows two evaluation measures and fifteen algorithms as the axis and points of the problem, respectively.
The Figure also shows the Pareto frontiers, which delimit the areas of Pareto dominance for the investigated algorithms, allowing to state when an algorithm is superior to another. The frontiers allow to understand two different relationships: algorithms within the same frontier can be considered similar, while those in different frontiers are effectively different. Similarly to the original work, the frontiers are calculated using the skyline operator algorithm (Lin2007).
Formally, consider that the skyline operator creates a set of frontiers, where each frontier is represented as . This means that the output is now a sequence of pairs , which assign each algorithm to a specific frontier. Our proposal at this point is to use such frontier values as algorithms rankings instead of using sorting mechanisms like previously. Considering how this procedure takes Pareto dominance into account the advantages are twofold: (1) since we are not forced to assign a different ranking to all algorithms, this results in a more representative and fair assignment of algorithm ranking positions and (2) since the process is defined using a multidimensional DatasetInterest space, then any number of evaluation measures can be used simultaneously.
5 Empirical setup
This section presents the experimental setup. In order to ensure fair comparison of metaapproaches, based only on the predictive performance, the following constraints are adopted: (1) the baselevel datasets, algorithms and evaluation measures are exactly the same for all experiments; (2) all metalevel characteristics (multicriteria metatargets, algorithms and evaluation measures) are fixed, thus only the metafeatures change. Since the work considers baselevel and metalevel algorithms, we will refer to them as baselearners and metalearners, respectively.
5.1 Collaborative Filtering
The baselevel setup is concerned with the CF datasets, baselearners and measures used to evaluate the performance of CF baselearners when applied to these datasets. The 38 datasets used in the experiments are described in Table 1, alongside a summary of their statistics, namely the number of users, items and ratings.
Dataset  #users  #items  #ratings  Reference 
Amazon Apps  132391  24366  264233  (McAuley2013) 
Amazon Automotive  85142  73135  138039  
Amazon Baby  53188  23092  91468  
Amazon Beauty  121027  76253  202719  
Amazon CD  157862  151198  371275  
Amazon Clothes  311726  267503  574029  
Amazon Digital Music  47824  47313  83863  
Amazon Food  76844  51139  130235  
Amazon Games  82676  24600  133726  
Amazon Garden  71480  34004  99111  
Amazon Health  185112  84108  298802  
Amazon Home  251162  123878  425764  
Amazon Instant Video  42692  8882  58437  
Amazon Instruments  33922  22964  50394  
Amazon Kindle  137107  131122  308158  
Amazon Movies  7278  1847  11215  
Amazon Office  90932  39229  124095  
Amazon Pet Supplies  74099  33852  123236  
Amazon Phones  226105  91289  345285  
Amazon Sports  199052  127620  326941  
Amazon Tools  121248  73742  192015  
Amazon Toys  134291  94594  225670  
Bookcrossing  7780  29533  39944  (Ziegler2005) 
Flixter  14761  22040  812930  (Zafarani+Liu:2009) 
Jester1  2498  100  181560  (Goldberg2001) 
Jester2  2350  100  169783  
Jester3  2493  96  61770  
Movielens 100k  94  1202  9759  (GroupLens2016) 
Movielens 10m  6987  9814  1017159  
Movielens 1m  604  3421  106926  
Movielens 20m  13849  16680  2036552  
Movielens Latest  22906  17133  2111176  
MovieTweetings latest  3702  7358  39097  (Dooms13crowdrec) 
MovieTweetings RecSys  2491  4754  20913  
Tripadvisor  77851  10590  151030  (Wang2011) 
Yahoo! Movies  764  4078  22135  (Yahoo) 
Yahoo! Music  613  4620  30852  
Yelp  55233  46045  211627  (Yelp2016) 
The experiments were carried out with MyMediaLite, a RS library (Gantner2011). Two CF tasks were addressed: Rating Prediction and Item Recommendation. While the first aims to predict the rating an user would assign to a new instance, the second aims to recommend a ranked list of items. Since the tasks are different, so are the baselearners and evaluation measures required.
The following CF baselearners were used for Rating Prediction: Matrix Factorization (MF), Biased MF (BMF) (Salakhutdinov2008), Latent Feature Log Linear Model (LFLLM) (Menon2010), SVD++ (Koren2008), 3 variants of Sigmoid Asymmetric Factor Model (SIAFM, SUAFM and SCAFM) (Paterek2007), User Item Baseline (UIB) (Koren2010) and Global Average (GA). Regarding Item Recommendation, the baselearners used are BPRMF (Rendle2009), Weighted BPRMF (WBPRMF) (Rendle2009), Soft Margin Ranking MF (SMRMF) (Weimer2008), WRMF (Hu2008a) and Most Popular (MP). All baselearners were selected since they represent different Matrix Factorization techniques, which are widely used in CF both in academia and industry due to their predictive power and computational efficiency.
In the experiments carried out, for Item Recommendation, the baselearners are evaluated using NDCG and AUC, while for Rating Prediction the evaluation measures NMAE and RMSE are used. All experiments are performed using 10fold crossvalidation. In order to prevent bias in favour of any baselearner, the hyperparameters were not optimised..
5.2 Label Ranking as the Metalearning approach
This work studies the performance of 4 metaapproaches: Rating Matrix metafeatures (RM) (Cunha2016), Subsampling Landmarkers (SL) (Cunha2017), the proposed Graph metafeatures (GR) and the Comprehensive metafeatures (CM). The last metafeatures are obtained aggregating all metafeatures from all existing metaapproaches and performing Correlation Feature Selection. Empirical validation has shown that setting the cutoff threshold at 70% yields the best results.
The multicriteria metatarget procedure is used to create the metatargets. Hence, for each CF problem studied (Rating Prediction and Item Recommendation), all specific evaluation measures are considered to create the DatasetInterest spaces. This means that while NDCG and AUC are used for the Item Recommendation problem, NMAE and RMSE are used for Rating Prediction. Next, the ParetoEfficient ranking procedure is employed for each dataset to generate a ranking of baselearners for Item Recommendation and another for Rating Prediction. The process is repeated for all remaining datasets in order to generate the complete metatargets.
The MtL problem is addressed as a Label Ranking task. The following metalearners are used to induce metamodels: KNN (Soares2015)
, Ranking Tree (RT), Ranking Random Forest(RF)
(EXSY:EXSY12166), and the baseline Average Ranking (AVG). The results are evaluated in terms of Kendall’s Tau using leave one out crossvalidation, due to the small number of metaexamples. Also, since we aim to obtain the best possible performance from the metalearners, we employ grid search optimisation.6 Preliminary Analysis
6.1 Graph Metafeatures Analysis
This analysis applies Correlation Feature Selection (CFS) to all proposed Graph metafeatures. It has two goals: (1) to remove unnecessary metafeatures and (2) to understand which levels of the proposed metaapproach are relevant to the investigated problem. Table 6.1 presents the metafeatures selected (65 out of 761), organised by level and number of metafeatures kept in the level. Each metafeature is presented using the notations introduced in Section 3.