I Introduction
ml approaches have been providing significant advances in understanding and modeling problems from the broadest knowledge fields. A considerable part of the ml solutions takes advantage of supervised learning algorithms which explore the information, i.e., input and prediction target, from the problem data to learn a pattern. However, data from several real problems present more than one target. In this case, when a dataset presents multiple continuous targets, we call it a
multitarget regression problem.Currently, there are several methods in the literature addressing this type of problem. The most straightforward approach, referred to as st regression, is to create a single model for each target disregarding the possible intertarget correlation. mtr is an alternative approach that, besides using the original input features, exploits the statistical correlation among the outputs. The mtr methods have been applied to solve many problems [1, 2, 3, 4, 5], leading to improvement in the predictive performance over st methods. However, each method has specific characteristics and has been effective for different problems.
Selecting the most suitable algorithm for a given problem requires extensive experimental evaluation, which demands massive computational resources (particularly processing time) and specialists [6, 7]. On the other hand, a mtr method could be automatically selected when addressed as an output in an algorithm selection (or recommendation) problem by mtl [8].
The mtl core concept is to use the knowledge acquired from previous similar problems to recommend the most suitable algorithm, for a new unseen dataset. In the last years, mtl has been employed in different contexts, such as tasks to select [9], rank [10] and predict [11] the performance of ml algorithms and employing them on a new dataset.
Our hypothesis holds that mtl can be applied to mtr problems and recommend the most suitable method for new unseen problems. Thus, in this study, we propose a recommendation system able to predict the best mtr method for a new dataset. For such, experiments were carried out with metadatasets generated with a total of synthetic regression problems, also generated to explore the different intertargets characteristics. In the experiments, the st approach and three mtr methods were evaluated: sst [12], motc [4] and erc [12]. Thus, the metaknowledge was generated with different datasets, with different biases, often used for multitarget benchmarking [13]
. In the experiments, we compared nb, rf, xgb and svm as metalearners using their default hyperparameter values.
Ii Background
Many ml algorithms have been proposed for different prediction tasks. However, the ’No free lunch’ theorem [14] states there is no one algorithm suitable for every dataset. A possible solution is to recommend the best algorithm for each problem.
The notion of algorithm recommendation problems was introduced in [15], grounded on selecting one algorithm from a portfolio of options. Given a set of datasets composed of instances from a distribution ; a set of algorithms ; and a performance measure :; the algorithm recommendation problem is to find a mapping : that optimizes the expected performance measure for the instance problems described in .
In practice, there are some alternatives to induce this mapping between algorithms and datasets/problems: one of them is through the mtl [8]. The core concept of mtl is to exploit past learning experiences in a particular type of task and solutions by adapting learning algorithms and data mining processes. This is done by extracting features from a dataset, named as metafeatures, to represent a dataset and the performance of the ml algorithms when applied on it. The relation between metafeatures and the ml performance provides information to select the most suitable algorithm for new datasets. Thus, ml algorithms are applied to a metadataset, whose examples are described in terms of metafeatures, to induce a metamodel.
In the last years, mtl has been used for: algorithm selection [16], segmentation algorithm recommendation [17], and hyperparameter tuning [18].
Iia Multitarget regression
mtr is related to the problems with multiple continuous outputs. In this way, to solve these problems a function or a collection of functions that models the relationship from input () to output () is created. If is composed of input variables and has targets, the prediction problem can be stated as:
(1) 
Then, for each vector that belongs to , is capable of predicting an output vector that is the best approximation of the true output vector [12].
mtr methods might use one of two main procedures: Algorithm Adaptation or Problem Transformation [19]. The first one adapts wellknown algorithms, such as: ann; rf and svm, to deal with multiple outputs, modeling the problem at once. On the other hand, problem transformation methods modify the original input task aiming at exploring the correlation among the targets. SpyromitrosXioufis et al. [12] proposed two problem transformation methods that contributed notably to the area: sst and erc. The sst method builds one model for each target , which are iteratively stacked to the input, and induced new models over the augmented input. The prediction of these last models are the final predictions.
Differently, the erc method creates regressors based on a different order of the targets. For each order, models are trained sequentially: the model that is trained for the second response considers the model trained for the first one. Both models are used in the induction of the third regressor, and so forth. In the end, for each target, the prediction is the average of the predictions of the trained regressors.
These both methods inspired the development of new mtr methods [20, 21, 22]. One of them, the motc [4], is a method that requires less memory and training time than ERC, besides generating an interpretation of the targets’ dependencies. It creates regressors from a tree built based on correlation assessment of the targets. The training of the models is performed from the leaves to the root, stacking the models’ predictions as new inputs.
IiB Metalearning for Multitarget regression
During the literature research, we did not find any papers employing mtl for mtr. However, in some studies, the authors investigated similar problems, such as mlc problems.
Considering the set of labels, differently from Singlelabel classification task, which there is just one label to predict for each dataset’s example, in mlc tasks the examples are associated with more than one label, i.e., it is necessary to learn how to associate the example with a subset of .
Similarly to the problem investigated in this paper, many mlc methods [23] were proposed, but there is few research concerning when each method is more efficient.
To select a mlc method and configure their hyperparameters for a given dataset, de Sá et al. [24]
applied Evolutionary Algorithms (EA). This study was carried using
MLC methods, in different datasets. The EA selection outperformed or at least draw the baselines in most of the cases. Also in this direction, the pioneering research based on mtl was done by Chekina et al. [25]. They evaluateddifferent multilabel methods, grouping them into: SingleClassifier Algorithms and EnsembleClassifier Algorithms. They performed experiments in
datasets of mlc from the literature. The results showed that employing mtl to select one method in mlc tasks is promising, since in most of the experimented cases, to apply the recommendation through mtl was better than selecting one method for all tasks or selecting it randomly.mlc tasks are similar to mtr tasks, since both deal with the prediction of multiple targets using a common set of features. The main difference is the type of the predicted variable: while in mlc the targets are binary, in mtr the outputs are continuous. Indeed, both tasks can be seen as a more general learning task of multitarget prediction with different types of variables to predict [12]. Therefore, given that mtl was successfully applied to select mlc methods, it is significant to experiment mtl to select mtr methods.
Iii Material and Methods
Fig 1 provides an overview of the adopted experimental methodology. First, we performed exhaustive experiments evaluating all the mtr methods in all available datasets. We also identified the best method for each dataset, selecting the one with the smallest rrmse. This information is used to define the metalabel. At the same time, a set of measures, named metafeatures, are also extracted to describe each dataset. We then unify the metafeature values with the metalabels to compose our metadataset. Then, we can employ ML algorithms to predict the best mtr method for a new unseen dataset. The next subsections describe each one of these processes with details.
Iiia Datasets
In the experiments, the metadataset was composed of benchmarking synthetic datasets^{1}^{1}1The generated datasets are available for download in: http://www.uel.br/grupopesquisa/remid/?page_id=145, generated by following the procedure described in [13].
We used synthetic datasets to overcome the lack of real datasets that meet specifics scenarios of intertargets dependencies, complexity levels from the input to output relations, and cover a different number of input features and targets.
To create a wide possibility of datasets, the parameters of the dataset generator assumed the values presented in Table I. The numeric targets were built upon math expressions of identity, quadratic, and cubic functions, or their combination.
Symbol  Hyperparameter  Values 

N  Number of instances  {500,1000} 
m  Number of features  {15,30,45,60,75,90} 
d  Number of targets  {3,6} 
g  Generating groups  {1,2} 
% Instances affected by noise  {1,5,10} 
IiiB Metafeatures
Each baselevel dataset is represented by a vector of characteristics, the metafeatures. In [8] the authors list some requirements that a metafeatures must follow: they need to have good discriminative power, their extraction should not be computational complex and the number of metafeatures should not be large to avoid overfitting.
In our metalevel experiments, a set of
metafeatures were explored. They included measures from different categories: statistical information about the dataset (STAT), correlation between attributes and targets (COR), performance metrics related to a linear regression (LIN), distribution of the dataset (DIM) and smoothness of the data (SMO)
[26, 18].It is important to mention that some of these metafeatures were designed for problems with one single output. Since we are dealing with multitarget problems, the real value of the metafeatures were aggregated, given that a metafeature is extracted for each target. To overcome this problem, the metafeature was extracted for each target, then the average, standard deviation, maximum and minimum was added to the set of metafeatures
[27]. Most of the metafeatures values were extracted using the R package ECoL [26]. A complete list of the metafeatures used in the experiments is presented in Table II.Type  Acronym  Aggregation  Description  
Functions  
STAT  n.samples    Number of samples  
n.attributes    Number of attributes  
n.targets    Number of targets  
target.ratio    Ratio between targets and attributes  
pc[13]   


DIM  T2    Average number of samples per dimension  
T3   


T4    Intrinsic dimensionality proportion  
COR  cor.targets  {avg,max,min,sd}  Correlation between targets  
C1  {avg,max,min,sd}  Maximum feature correlation to the output  
C2  {avg,max,min,sd}  Average feature correlation to the output  
C3  {avg,max,min,sd}  Individual feature efficiency  
C4  {avg,max,min,sd}  Collective feature efficiency  
LIN  regr.L1  {avg,max,min,sd} 


regr.L2  {avg,max,min,sd}  Training error of a linear classifier  
regr.L3  {avg,max,min,sd}  Nonlinearity of a linear classifier  
SMO  S1  {avg,max,min,sd}  Smoothness of the output distribution  
S2  {avg,max,min,sd}  Smoothness of the input distribution  
S3  {avg,max,min,sd}  Error of a knearest neighbor regressor  
S4  {avg,max,min,sd}  Nonlinearity of nearest neighbor regressor 
IiiC Metalabels
st approach and three mtr methods were explored in experiment: sst, erc [12] and motc [4]. Even being the most simple, the st approach was included in the experimental setup because it can perform better than mtr methods in problems with limited intertarget dependency. On the other hand, the other three mtr methods were selected because they offer a proper tradeoff between performance and time complexity, as concluded from [13].
These four different methods mentioned above were executed for every single baselevel dataset. Their induced models were assessed in terms of rrmse evaluation measure defined in Equation 2, where represents the number of instances, and , and represent, respectively, the true, predicted and mean values of the target.
svm was used as base regressor, performing a kFold cv resampling strategy, with . svm was chosen as base regressor due to its usage in the most of MTR Problem transformation literature [1, 21, 28, 3, 4]. The method with the smallest rrmse [19] was chosen as the best multitarget method for every dataset. The experiments were performed using the mtrtoolkit^{2}^{2}2https://github.com/smastelini/mtrtoolkit, implemented in R. Thus, our metadataset was a multiclass metalabel with four different levels indicating the best mtr method or ST regression. The class distribution (%) in the metadataset is also presented in Table III.
(2) 
ERC  MOTC  SST  ST  Total  

examples  166  89  362  31  648 
%  25.6  13.7  55.8  4.9  100 
IiiD Metalearners
Four ml algorithms, with different learning biases, were used as metalearners: nb [29], rf [30], svm [31] and xgb [32]. These algorithms were selected due to their widespread use and capacity of highperformance models induction. The kFold cv resampling methodology was also adopted in the metalevel of the experiments to assess the predictive performance of the metalearners, with folds. All the ml algorithms were implemented in R, using the mlr package and their correspondent default hyperparameters.
IiiE Evaluation measures and baselines
Seven evaluation metrics were used to assess the predictive performance of the induced models: Accuracy, Balanced per class accuracy, Precision, Recall, Fscore (f1), Sensitivity and Specificity.
Besides, we used two different baselines from the mtl literature for comparisons: a model that always recommends the majority class for the whole dataset (Majority) and a model that provides random recommendations (Random). These baselines are widely used to endorse the need for a recommendation system [8]. Also, we used an upperbound as the groundtruth (Truth).
Iv Results and Discussion
The results were organized starting by exposing the results regarding the predictive performance of metamodels from different ML algorithms. Afterward, based on the RF metamodel performance, the metafeatures were compared and discussed. Finally, some contributions and open issues related to mtl and mtr were presented.
Iva Predictive Performance
The predictive performance obtained by the four metalearners and the baselines are presented as a radar chart in Fig. 2. In this figure, each line represents a metamodel and each vertex its related to a different performance measure. The larger the area in the radar chart, the better the metamodel.
Looking at the radar chart, it is possible to observe that all metamodels had a superior performance than Random baseline for all metrics. The same occurs for Majority, except for accuracy with nb, since Majority has % of accuracy, whereas the nb metamodel achieved %. Still for this metric, rf obtained the best results with % of accuracy. Following the rf, the svm achieved % and xgb was the third, with % of accuracy. The only metric that rf metamodel did not obtain the higher value was Sensitivity, when nb was the best with . Regarding the other evaluation metrics, rf achieved the best results, with of Specificity, of Precision, of Recall, of F1 and % of Balanced per class accuracy.
Although three of four metamodels overcame the baselines for all metrics, the predictive performance did not achieve high values, which might be related to the metadataset imbalance problem. However, the superiority of the mtl recommending system regarding the baselines was confirmed by statistical tests. We used the Friedman test, with a significance level of
. The null hypothesis is that the recommendation by the metamodels and by the baselines are similar. Anytime the null hypothesis is rejected, the Nemenyi post hoc test can be applied, stating that the performance of the two approaches are significantly different if their corresponding average ranks differ by at least a cd value. When multiple algorithms are compared in this way, a graphic representation can be used to represent the results with the cd diagram, as proposed by Demšar
[33].The metamodels (rf, svm, xgb, nb) were compared with Truth (expected method), the Majority (which always predicts the sst) and Random (the random selection of a method for each dataset), using the rrmse of the prediction as performance metric. This analysis is shown in Fig. 3, using the results from the Nemenyi test.
As exposed in Fig. 3, no solution was similar to the Truth, which was expected due to the predictive performance. However, the rf, svm, xgb are connected, which means they were similar and superior the baselines Majority and Random. This fact supports the benefit of using mtl recommending system in comparison to select a specific algorithm for every dataset or select it randomly.
IvB Relative importance of the metafeatures
rf metamodel was used to assess the importance of each metafeature by using the RF Feature Importance metric. This metric is calculated by permuting the values of a feature in the oob samples and recalculating the oob error in the whole ensemble. In other words, if substituting the values of a metafeature by random values results in error increase, this metafeature is considered important. Otherwise, if the error decreases, the resulting importance is negative. Thus, the metafeature is considered not important and should be removed from modeling. This procedure could be performed for each metafeature toward explaining its impact [30]. Fig. 4 shows the metafeature importance for the metadataset.
cor and lin metafeatures achieved the higher values of importance, especially the Minimum value of distance of erroneous instances to a linear classifier (), Minimum value of nonlinearity of a linear classifier () and the Standard Deviation of the Maximum feature correlation to the output (). Once the mtr method tries to explore the correlation between the features and the targets in different ways, their selection makes sense. The number of targets, attributes and samples had low importance. This might have occurred because these metafeatures did not influence in the predictive performance, showing that the mtr methods used in the experiments can deal with different numbers of targets, attributes and samples in the same way.
IvC Insights and open issues
It is important to highlight the metalabel attribution was straightforward related to the highest predictive performance (low rrmse) based on the ranking of methods. Differences between the predictive performance of the mtr methods, independent of their magnitude, were not considered while building the metadataset.
Alternatively, the metalabel assessment could be performed by indicating two or more methods suitable to solve a given problem in the case of no statistical difference between their performances. However, this scenario poses an additional challenge to deal with a multilabel problem in the metalevel of the recommending system.
Another important issue was the fact of metalabel assessment was made regarding only low predictive error of mtr methods. In some cases, e.g., Online Multitarget Regression [34], the most proper method concerns to address a tradeoff among predictive performance, memory, and time cost when predicting the output. This scenario demands additional information, as well as complexity, toward identifying the best mtr method to be learned by the recommending system.
V Conclusions and Future Work
In this study, a framework for recommending mtr methods using metalearning was presented. A metadataset, composed with datasets used for mtr methods benchmark, was created for the induction of metamodels toward predicting the best one for a given dataset. Experiments performed with the metadataset and four metalearners led to 70.83% of accuracy with RF, the best recommender. Besides, it overcame the baselines, and statistical tests showed that the recommendation system was better than select one for every task or selecting a method randomly. The analysis of metafeature importance revealed that correlation between targets and error of a linear classifier were the most useful features to predict the performance of a mtr method for a given unseen dataset.
As future work, besides implementing more metafeatures, we intend to use more mtr benchmarking datasets, in order to improve the generalization capability of the metamodels. Also, we expect to apply mlc to predict the mtr method and its base regressor. Further information related to the memory and time cost will be used to match the requirement of different scenarios, e.g., Online mtr.
Acknowledgements
The authors would like to thank the financial support of Coordination for the Improvement of Higher Education Personnel (CAPES)  Finance Code 001 , the National Council for Scientific and Technological Development (CNPq) of Brazil  Grant of Project 420562/20184  and São Paulo Research Foundation (FAPESP)  grant #2018/073196.
References

[1]
G. Tsoumakas, E. SpyromitrosXioufis, A. Vrekou, and I. Vlahavas,
“Multitarget regression via random linear target combinations,” in
Joint European Conference on Machine Learning and Knowledge Discovery in Databases
. Springer, 2014, pp. 225–240. 
[2]
J. Levatić, M. Ceci, D. Kocev, and S. Džeroski, “Semisupervised learning for multitarget regression,” in
International Workshop on New Frontiers in Mining Complex Patterns. Springer, 2014, pp. 3–18.  [3] E. J. Santana, B. C. Geronimo, S. M. Mastelini, R. H. Carvalho, D. F. Barbin, E. I. Ida, and S. Barbon, “Predicting poultry meat characteristics using an enhanced multitarget regression method,” Biosystems Engineering, vol. 171, pp. 193 – 204, 2018.
 [4] S. M. Mastelini, V. G. T. da Costa, E. J. Santana, F. K. Nakano, R. C. Guido, R. Cerri, and S. Barbon, “Multioutput tree chaining: An interpretative modelling and lightweight multitarget approach,” Journal of Signal Processing Systems, pp. 1–25, 2018.
 [5] E. J. Santana, J. A. P. R. d. Silva, S. M. Mastelini, and S. Barbon Jr., “Stock portfolio prediction by multitarget decision support,” iSysRevista Brasileira de Sistemas de Informação, vol. 12, no. 1, 2019.
 [6] B. Bilalli, A. Abelló, and T. AlujaBanet, “On the predictive power of metafeatures in openml,” International Journal of Applied Mathematics and Computer Science, vol. 27, no. 4, pp. 697–712, 2017.
 [7] T. Cunha, C. Soares, and A. C. de Carvalho, “Metalearning and recommender systems: A literature review and empirical study on the algorithm selection problem for collaborative filtering,” Information Sciences, vol. 423, pp. 128–144, 2018.
 [8] P. Brazdil, C. GiraudCarrier, C. Soares, and R. Vilalta, Metalearning: Applications to Data Mining, 2nd ed. Springer Verlag, 2009.

[9]
S. Ali and K. A. SmithMiles, “A metalearning approach to automatic kernel selection for support vector machines,”
Neurocomputing, vol. 70, no. 13, pp. 173–186, 2006.  [10] A. L. D. Rossi, A. C. P. de Leon Ferreira, C. Soares, B. F. De Souza et al., “Metastream: A metalearning based method for periodic algorithm selection in timechanging data,” Neurocomputing, vol. 127, pp. 52–64, 2014.
 [11] M. Reif, F. Shafait, and A. Dengel, “Metalearning for evolutionary parameter optimization of classifiers,” Machine learning, vol. 87, no. 3, pp. 357–380, 2012.
 [12] E. SpyromitrosXioufis, G. Tsoumakas, W. Groves, and I. Vlahavas, “Multitarget regression via input space expansion: treating targets as inputs,” Machine Learning, vol. 104, no. 1, pp. 55–98, 2016.
 [13] S. M. Mastelini, E. J. Santana, V. G. T. da Costa, and S. Barbon, “Benchmarking multitarget regression methods,” in 2018 7th Brazilian Conference on Intelligent Systems (BRACIS). IEEE, 2018, pp. 396–401.
 [14] D. H. Wolpert, “The lack of a priori distinctions between learning algorithms,” Neural computation, vol. 8, no. 7, pp. 1341–1390, 1996.
 [15] J. R. Rice, “The algorithm selection problem,” Advances in Computers, vol. 15, pp. 65–118, 1976.
 [16] D. G. Ferrari and L. N. De Castro, “Clustering algorithm selection by metalearning systems: A new distancebased problem characterization and ranking combination methods,” Information Sciences, vol. 301, pp. 181–194, 2015.
 [17] G. F. Campos, S. Barbon, and R. G. Mantovani, “A metalearning approach for recommendation of image segmentation algorithms,” in 2016 29th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI). IEEE, 2016, pp. 370–377.
 [18] R. G. Mantovani, A. L. Rossi, E. Alcobaça, J. Vanschoren, and A. C. de Carvalho, “A metalearning recommender system for hyperparameter tuning: Predicting when tuning improves svm classifiers,” Information Sciences, vol. 501, pp. 193 – 221, 2019.
 [19] H. Borchani, G. Varando, C. Bielza, and P. Larrañaga, “A survey on multioutput regression,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 5, no. 5, pp. 216–233, 2015.
 [20] G. Melki, A. Cano, V. Kecman, and S. Ventura, “Multitarget support vector regression via correlation regressor chains,” Information Sciences, vol. 415, pp. 53–69, 2017.
 [21] S. M. Mastelini, E. J. Santana, R. Cerri, and S. Barbon, “DSTARS: A multitarget deep structure for tracking asynchronous regressor stack,” in 2017 Brazilian Conference on Intelligent Systems (BRACIS). IEEE, oct 2017.

[22]
J. M. Moyano, E. L. Gibaja, and S. Ventura, “An evolutionary algorithm for
optimizing the target ordering in ensemble of regressor chains,” in
2017 IEEE Congress on Evolutionary Computation (CEC)
. IEEE, jun 2017.  [23] G. Tsoumakas and I. Katakis, “Multilabel classification: An overview,” International Journal of Data Warehousing and Mining (IJDWM), vol. 3, no. 3, pp. 1–13, 2007.
 [24] A. G. de Sá, G. L. Pappa, and A. A. Freitas, “Towards a method for automatically selecting and configuring multilabel classification algorithms,” in Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM, 2017, pp. 1125–1132.
 [25] L. Chekina, L. Rokach, and B. Shapira, “Metalearning for selecting a multilabel classification algorithm,” in 2011 IEEE 11th International Conference on Data Mining Workshops. IEEE, 2011, pp. 220–227.
 [26] A. C. Lorena, A. I. Maciel, P. B. de Miranda, I. G. Costa, and R. B. Prudêncio, “Data complety metafeatures for regression problems,” Machine Learning, vol. 107, no. 1, pp. 209–246, 2018.
 [27] A. Rivolli, L. P. F. Garcia, C. Soares, J. Vanschoren, and A. C. P. L. F. de Carvalho, “Towards reproducible empirical research in metalearning,” CoRR, vol. abs/1808.10406, 2018. [Online]. Available: http://arxiv.org/abs/1808.10406
 [28] E. J. Santana, S. M. Mastelini, and S. Barbon Jr., “Deep Regressor Stacking for Air Ticket Prices Prediction,” in XIII Brazilian Symposium on Information Systems: Information Systems for Participatory Digital Governance. Brazilian Computer Society (SBC), 2017, pp. 25–31.
 [29] S. J. Russell and P. Norvig, Artificial intelligence: a modern approach. Malaysia; Pearson Education Limited,, 2016.
 [30] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001.

[31]
V. Vapnik,
The Nature of Statistical Learning Theory
. New York: SpringerVerlag, 1995.  [32] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016, pp. 785–794.
 [33] J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” The Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006.
 [34] A. Osojnik, P. Panov, and S. Džeroski, “Treebased methods for online multitarget regression,” Journal of Intelligent Information Systems, vol. 50, no. 2, pp. 315–339, 2018.