Towards meta-learning for multi-target regression problems

07/25/2019 ∙ by Gabriel Jonas Aguiar, et al. ∙ State University of Londrina Universidade de São Paulo UTFPR 0

Several multi-target regression methods were devel-oped in the last years aiming at improving predictive performanceby exploring inter-target correlation within the problem. However, none of these methods outperforms the others for all problems. This motivates the development of automatic approachesto recommend the most suitable multi-target regression method. In this paper, we propose a meta-learning system to recommend the best predictive method for a given multi-target regression problem. We performed experiments with a meta-dataset generated by a total of 648 synthetic datasets. These datasets were created to explore distinct inter-targets characteristics toward recommending the most promising method. In experiments, we evaluated four different algorithms with different biases as meta-learners. Our meta-dataset is composed of 58 meta-features, based on: statistical information, correlation characteristics, linear landmarking, from the distribution and smoothness of the data, and has four different meta-labels. Results showed that induced meta-models were able to recommend the best methodfor different base level datasets with a balanced accuracy superior to 70 meta-model, which statistically outperformed the meta-learning baselines.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

ml approaches have been providing significant advances in understanding and modeling problems from the broadest knowledge fields. A considerable part of the ml solutions takes advantage of supervised learning algorithms which explore the information, i.e., input and prediction target, from the problem data to learn a pattern. However, data from several real problems present more than one target. In this case, when a dataset presents multiple continuous targets, we call it a

multi-target regression problem.

Currently, there are several methods in the literature addressing this type of problem. The most straightforward approach, referred to as st regression, is to create a single model for each target disregarding the possible inter-target correlation. mtr is an alternative approach that, besides using the original input features, exploits the statistical correlation among the outputs. The mtr methods have been applied to solve many problems [1, 2, 3, 4, 5], leading to improvement in the predictive performance over st methods. However, each method has specific characteristics and has been effective for different problems.

Selecting the most suitable algorithm for a given problem requires extensive experimental evaluation, which demands massive computational resources (particularly processing time) and specialists [6, 7]. On the other hand, a mtr method could be automatically selected when addressed as an output in an algorithm selection (or recommendation) problem by  mtl [8].

The mtl core concept is to use the knowledge acquired from previous similar problems to recommend the most suitable algorithm, for a new unseen dataset. In the last years, mtl has been employed in different contexts, such as tasks to select [9], rank [10] and predict [11] the performance of ml algorithms and employing them on a new dataset.

Our hypothesis holds that mtl can be applied to mtr problems and recommend the most suitable method for new unseen problems. Thus, in this study, we propose a recommendation system able to predict the best mtr method for a new dataset. For such, experiments were carried out with meta-datasets generated with a total of synthetic regression problems, also generated to explore the different inter-targets characteristics. In the experiments, the st approach and three mtr methods were evaluated: sst [12], motc [4] and  erc [12]. Thus, the meta-knowledge was generated with different datasets, with different biases, often used for multi-target benchmarking [13]

. In the experiments, we compared nb, rf, xgb and svm as meta-learners using their default hyperparameter values.

This paper is structured as follows. Section II presents the background on using mtl for mtr; section III describes the experimental methodology; the results are discussed in section IV; finally, the conclusions and future work are presented.

Ii Background

Many ml algorithms have been proposed for different prediction tasks. However, the ’No free lunch’ theorem [14] states there is no one algorithm suitable for every dataset. A possible solution is to recommend the best algorithm for each problem.

The notion of algorithm recommendation problems was introduced in [15], grounded on selecting one algorithm from a portfolio of options. Given a set of datasets composed of instances from a distribution ; a set of algorithms ; and a performance measure :; the algorithm recommendation problem is to find a mapping : that optimizes the expected performance measure for the instance problems described in .

In practice, there are some alternatives to induce this mapping between algorithms and datasets/problems: one of them is through the mtl [8]. The core concept of  mtl is to exploit past learning experiences in a particular type of task and solutions by adapting learning algorithms and data mining processes. This is done by extracting features from a dataset, named as meta-features, to represent a dataset and the performance of the ml algorithms when applied on it. The relation between meta-features and the ml performance provides information to select the most suitable algorithm for new datasets. Thus, ml algorithms are applied to a meta-dataset, whose examples are described in terms of meta-features, to induce a meta-model.

In the last years, mtl has been used for: algorithm selection [16], segmentation algorithm recommendation [17], and hyperparameter tuning [18].

Ii-a Multi-target regression

mtr is related to the problems with multiple continuous outputs. In this way, to solve these problems a function or a collection of functions that models the relationship from input () to output () is created. If is composed of input variables and has targets, the prediction problem can be stated as:

(1)

Then, for each vector that belongs to , is capable of predicting an output vector that is the best approximation of the true output vector [12].

mtr methods might use one of two main procedures: Algorithm Adaptation or Problem Transformation [19]. The first one adapts well-known algorithms, such as: ann; rf and svm, to deal with multiple outputs, modeling the problem at once. On the other hand, problem transformation methods modify the original input task aiming at exploring the correlation among the targets. Spyromitros-Xioufis et al. [12] proposed two problem transformation methods that contributed notably to the area:  sst and erc. The sst method builds one model for each target , which are iteratively stacked to the input, and induced new models over the augmented input. The prediction of these last models are the final predictions.

Differently, the erc method creates regressors based on a different order of the targets. For each order, models are trained sequentially: the model that is trained for the second response considers the model trained for the first one. Both models are used in the induction of the third regressor, and so forth. In the end, for each target, the prediction is the average of the predictions of the trained regressors.

These both methods inspired the development of new mtr methods [20, 21, 22]. One of them, the motc [4], is a method that requires less memory and training time than ERC, besides generating an interpretation of the targets’ dependencies. It creates regressors from a tree built based on correlation assessment of the targets. The training of the models is performed from the leaves to the root, stacking the models’ predictions as new inputs.

Ii-B Meta-learning for Multi-target regression

During the literature research, we did not find any papers employing mtl for mtr. However, in some studies, the authors investigated similar problems, such as mlc problems.

Considering the set of labels, differently from Single-label classification task, which there is just one label to predict for each dataset’s example, in mlc tasks the examples are associated with more than one label, i.e., it is necessary to learn how to associate the example with a subset of .

Similarly to the problem investigated in this paper, many mlc methods [23] were proposed, but there is few research concerning when each method is more efficient.

To select a mlc method and configure their hyperparameters for a given dataset, de Sá et al. [24]

applied Evolutionary Algorithms (EA). This study was carried using

MLC methods, in different datasets. The EA selection outperformed or at least draw the baselines in most of the cases. Also in this direction, the pioneering research based on mtl was done by Chekina et al. [25]. They evaluated

different multi-label methods, grouping them into: Single-Classifier Algorithms and Ensemble-Classifier Algorithms. They performed experiments in

datasets of mlc from the literature. The results showed that employing mtl to select one method in mlc tasks is promising, since in most of the experimented cases, to apply the recommendation through mtl was better than selecting one method for all tasks or selecting it randomly.

mlc tasks are similar to mtr tasks, since both deal with the prediction of multiple targets using a common set of features. The main difference is the type of the predicted variable: while in mlc the targets are binary, in mtr the outputs are continuous. Indeed, both tasks can be seen as a more general learning task of multi-target prediction with different types of variables to predict [12]. Therefore, given that mtl was successfully applied to select mlc methods, it is significant to experiment mtl to select mtr methods.

Iii Material and Methods

Fig 1 provides an overview of the adopted experimental methodology. First, we performed exhaustive experiments evaluating all the mtr methods in all available datasets. We also identified the best method for each dataset, selecting the one with the smallest rrmse. This information is used to define the meta-label. At the same time, a set of measures, named meta-features, are also extracted to describe each dataset. We then unify the meta-feature values with the meta-labels to compose our meta-dataset. Then, we can employ ML algorithms to predict the best mtr method for a new unseen dataset. The next subsections describe each one of these processes with details.

Fig. 1: Overview of the procedure to select a Multi-target Regression method through Meta-learning.

Iii-a Datasets

In the experiments, the meta-dataset was composed of benchmarking synthetic datasets111The generated datasets are available for download in: http://www.uel.br/grupo-pesquisa/remid/?page_id=145, generated by following the procedure described in [13]. We used synthetic datasets to overcome the lack of real datasets that meet specifics scenarios of inter-targets dependencies, complexity levels from the input to output relations, and cover a different number of input features and targets. To create a wide possibility of datasets, the parameters of the dataset generator assumed the values presented in Table I. The numeric targets were built upon math expressions of identity, quadratic, and cubic functions, or their combination.

Symbol Hyperparameter Values
N Number of instances {500,1000}
m Number of features {15,30,45,60,75,90}
d Number of targets {3,6}
g Generating groups {1,2}
% Instances affected by noise {1,5,10}
TABLE I: Parameters used to generate synthetic base level datasets.

Iii-B Meta-features

Each base-level dataset is represented by a vector of characteristics, the meta-features. In [8] the authors list some requirements that a meta-features must follow: they need to have good discriminative power, their extraction should not be computational complex and the number of meta-features should not be large to avoid overfitting.

In our meta-level experiments, a set of

meta-features were explored. They included measures from different categories: statistical information about the dataset (STAT), correlation between attributes and targets (COR), performance metrics related to a linear regression (LIN), distribution of the dataset (DIM) and smoothness of the data (SMO) 

[26, 18].

It is important to mention that some of these meta-features were designed for problems with one single output. Since we are dealing with multi-target problems, the real value of the meta-features were aggregated, given that a meta-feature is extracted for each target. To overcome this problem, the meta-feature was extracted for each target, then the average, standard deviation, maximum and minimum was added to the set of meta-features 

[27]. Most of the meta-features values were extracted using the R package ECoL [26]. A complete list of the meta-features used in the experiments is presented in Table II.

Type Acronym Aggregation Description
Functions
STAT n.samples - Number of samples
n.attributes - Number of attributes
n.targets - Number of targets
target.ratio - Ratio between targets and attributes
pc[1-3] -
First three components of

the Principal Components Analysis

DIM T2 - Average number of samples per dimension
T3 -
Average intrinsic dimensionality
per number of examples
T4 - Intrinsic dimensionality proportion
COR cor.targets {avg,max,min,sd} Correlation between targets
C1 {avg,max,min,sd} Maximum feature correlation to the output
C2 {avg,max,min,sd} Average feature correlation to the output
C3 {avg,max,min,sd} Individual feature efficiency
C4 {avg,max,min,sd} Collective feature efficiency
LIN regr.L1 {avg,max,min,sd}
Distance of erroneous instances
to a linear classifier
regr.L2 {avg,max,min,sd} Training error of a linear classifier
regr.L3 {avg,max,min,sd} Nonlinearity of a linear classifier
SMO S1 {avg,max,min,sd} Smoothness of the output distribution
S2 {avg,max,min,sd} Smoothness of the input distribution
S3 {avg,max,min,sd} Error of a k-nearest neighbor regressor
S4 {avg,max,min,sd} Non-linearity of nearest neighbor regressor
TABLE II: Type, acronym, aggregation function (when applied) and description of meta-features used in the experiments.

Iii-C Meta-labels

st approach and three mtr methods were explored in experiment: sst, erc [12] and motc [4]. Even being the most simple, the st approach was included in the experimental setup because it can perform better than mtr methods in problems with limited inter-target dependency. On the other hand, the other three mtr methods were selected because they offer a proper trade-off between performance and time complexity, as concluded from [13].

These four different methods mentioned above were executed for every single base-level dataset. Their induced models were assessed in terms of rrmse evaluation measure defined in Equation 2, where represents the number of instances, and , and represent, respectively, the true, predicted and mean values of the target.

svm was used as base regressor, performing a k-Fold cv resampling strategy, with . svm was chosen as base regressor due to its usage in the most of MTR Problem transformation literature [1, 21, 28, 3, 4]. The method with the smallest rrmse [19] was chosen as the best multi-target method for every dataset. The experiments were performed using the mtr-toolkit222https://github.com/smastelini/mtr-toolkit, implemented in R. Thus, our meta-dataset was a multi-class meta-label with four different levels indicating the best mtr method or ST regression. The class distribution (%) in the meta-dataset is also presented in Table III.

(2)
ERC MOTC SST ST Total
examples 166 89 362 31 648
% 25.6 13.7 55.8 4.9 100
TABLE III: Specification of the meta-dataset used in experiments

Iii-D Meta-learners

Four ml algorithms, with different learning biases, were used as meta-learners: nb [29], rf [30], svm [31] and xgb [32]. These algorithms were selected due to their widespread use and capacity of high-performance models induction. The k-Fold cv resampling methodology was also adopted in the meta-level of the experiments to assess the predictive performance of the meta-learners, with folds. All the ml algorithms were implemented in R, using the mlr package and their correspondent default hyperparameters.

Iii-E Evaluation measures and baselines

Seven evaluation metrics were used to assess the predictive performance of the induced models: Accuracy, Balanced per class accuracy, Precision, Recall, F-score (f1), Sensitivity and Specificity.

Besides, we used two different baselines from the mtl literature for comparisons: a model that always recommends the majority class for the whole dataset (Majority) and a model that provides random recommendations (Random). These baselines are widely used to endorse the need for a recommendation system [8]. Also, we used an upper-bound as the ground-truth (Truth).

Iv Results and Discussion

The results were organized starting by exposing the results regarding the predictive performance of meta-models from different ML algorithms. Afterward, based on the RF meta-model performance, the meta-features were compared and discussed. Finally, some contributions and open issues related to mtl and mtr were presented.

Iv-a Predictive Performance

The predictive performance obtained by the four meta-learners and the baselines are presented as a radar chart in Fig. 2. In this figure, each line represents a meta-model and each vertex its related to a different performance measure. The larger the area in the radar chart, the better the meta-model.

Fig. 2: Performance of the meta-models

Looking at the radar chart, it is possible to observe that all meta-models had a superior performance than Random baseline for all metrics. The same occurs for Majority, except for accuracy with nb, since Majority has % of accuracy, whereas the nb meta-model achieved %. Still for this metric, rf obtained the best results with % of accuracy. Following the rf, the svm achieved % and xgb was the third, with % of accuracy. The only metric that rf meta-model did not obtain the higher value was Sensitivity, when nb was the best with . Regarding the other evaluation metrics, rf achieved the best results, with of Specificity, of Precision, of Recall, of F1 and % of Balanced per class accuracy.

Although three of four meta-models overcame the baselines for all metrics, the predictive performance did not achieve high values, which might be related to the meta-dataset imbalance problem. However, the superiority of the mtl recommending system regarding the baselines was confirmed by statistical tests. We used the Friedman test, with a significance level of

. The null hypothesis is that the recommendation by the meta-models and by the baselines are similar. Anytime the null hypothesis is rejected, the Nemenyi post hoc test can be applied, stating that the performance of the two approaches are significantly different if their corresponding average ranks differ by at least a cd value. When multiple algorithms are compared in this way, a graphic representation can be used to represent the results with the cd diagram, as proposed by Demšar 

[33].

The meta-models (rf, svm, xgb, nb) were compared with Truth (expected method), the Majority (which always predicts the sst) and Random (the random selection of a method for each dataset), using the rrmse of the prediction as performance metric. This analysis is shown in Fig. 3, using the results from the Nemenyi test.

CD=0.35

0

1

2

3

4

5

6

Truth

RF

SVM

Random

Majority

NB

XGBoost
Fig. 3: Comparison of the aRRMSE values obtained by meta-models when recommending MTR methods according to the Nemenyi test. Groups that are not significantly different (= 0.05 and CD = 0.35) are connected.

As exposed in Fig. 3, no solution was similar to the Truth, which was expected due to the predictive performance. However, the rf, svm, xgb are connected, which means they were similar and superior the baselines Majority and Random. This fact supports the benefit of using mtl recommending system in comparison to select a specific algorithm for every dataset or select it randomly.

Iv-B Relative importance of the meta-features

rf meta-model was used to assess the importance of each meta-feature by using the RF Feature Importance metric. This metric is calculated by permuting the values of a feature in the oob samples and recalculating the oob error in the whole ensemble. In other words, if substituting the values of a meta-feature by random values results in error increase, this meta-feature is considered important. Otherwise, if the error decreases, the resulting importance is negative. Thus, the meta-feature is considered not important and should be removed from modeling. This procedure could be performed for each meta-feature toward explaining its impact [30]. Fig. 4 shows the meta-feature importance for the meta-dataset.

Fig. 4: Average relative importance of the meta-features obtained from RF importance. The names of the meta-features in the x-axis follow the acronyms presented in Table II.

cor and lin meta-features achieved the higher values of importance, especially the Minimum value of distance of erroneous instances to a linear classifier (), Minimum value of non-linearity of a linear classifier () and the Standard Deviation of the Maximum feature correlation to the output (). Once the mtr method tries to explore the correlation between the features and the targets in different ways, their selection makes sense. The number of targets, attributes and samples had low importance. This might have occurred because these meta-features did not influence in the predictive performance, showing that the mtr methods used in the experiments can deal with different numbers of targets, attributes and samples in the same way.

Iv-C Insights and open issues

It is important to highlight the meta-label attribution was straightforward related to the highest predictive performance (low rrmse) based on the ranking of methods. Differences between the predictive performance of the mtr methods, independent of their magnitude, were not considered while building the meta-dataset.

Alternatively, the meta-label assessment could be performed by indicating two or more methods suitable to solve a given problem in the case of no statistical difference between their performances. However, this scenario poses an additional challenge to deal with a multi-label problem in the meta-level of the recommending system.

Another important issue was the fact of meta-label assessment was made regarding only low predictive error of mtr methods. In some cases, e.g., Online Multi-target Regression [34], the most proper method concerns to address a trade-off among predictive performance, memory, and time cost when predicting the output. This scenario demands additional information, as well as complexity, toward identifying the best mtr method to be learned by the recommending system.

V Conclusions and Future Work

In this study, a framework for recommending mtr methods using meta-learning was presented. A meta-dataset, composed with datasets used for mtr methods benchmark, was created for the induction of meta-models toward predicting the best one for a given dataset. Experiments performed with the meta-dataset and four meta-learners led to 70.83% of accuracy with RF, the best recommender. Besides, it overcame the baselines, and statistical tests showed that the recommendation system was better than select one for every task or selecting a method randomly. The analysis of meta-feature importance revealed that correlation between targets and error of a linear classifier were the most useful features to predict the performance of a mtr method for a given unseen dataset.

As future work, besides implementing more meta-features, we intend to use more mtr benchmarking datasets, in order to improve the generalization capability of the meta-models. Also, we expect to apply mlc to predict the mtr method and its base regressor. Further information related to the memory and time cost will be used to match the requirement of different scenarios, e.g., Online mtr.

Acknowledgements

The authors would like to thank the financial support of Coordination for the Improvement of Higher Education Personnel (CAPES) - Finance Code 001 -, the National Council for Scientific and Technological Development (CNPq) of Brazil - Grant of Project 420562/2018-4 - and São Paulo Research Foundation (FAPESP) - grant #2018/07319-6.

References

  • [1] G. Tsoumakas, E. Spyromitros-Xioufis, A. Vrekou, and I. Vlahavas, “Multi-target regression via random linear target combinations,” in

    Joint European Conference on Machine Learning and Knowledge Discovery in Databases

    .   Springer, 2014, pp. 225–240.
  • [2]

    J. Levatić, M. Ceci, D. Kocev, and S. Džeroski, “Semi-supervised learning for multi-target regression,” in

    International Workshop on New Frontiers in Mining Complex Patterns.   Springer, 2014, pp. 3–18.
  • [3] E. J. Santana, B. C. Geronimo, S. M. Mastelini, R. H. Carvalho, D. F. Barbin, E. I. Ida, and S. Barbon, “Predicting poultry meat characteristics using an enhanced multi-target regression method,” Biosystems Engineering, vol. 171, pp. 193 – 204, 2018.
  • [4] S. M. Mastelini, V. G. T. da Costa, E. J. Santana, F. K. Nakano, R. C. Guido, R. Cerri, and S. Barbon, “Multi-output tree chaining: An interpretative modelling and lightweight multi-target approach,” Journal of Signal Processing Systems, pp. 1–25, 2018.
  • [5] E. J. Santana, J. A. P. R. d. Silva, S. M. Mastelini, and S. Barbon Jr., “Stock portfolio prediction by multi-target decision support,” iSys-Revista Brasileira de Sistemas de Informação, vol. 12, no. 1, 2019.
  • [6] B. Bilalli, A. Abelló, and T. Aluja-Banet, “On the predictive power of meta-features in openml,” International Journal of Applied Mathematics and Computer Science, vol. 27, no. 4, pp. 697–712, 2017.
  • [7] T. Cunha, C. Soares, and A. C. de Carvalho, “Metalearning and recommender systems: A literature review and empirical study on the algorithm selection problem for collaborative filtering,” Information Sciences, vol. 423, pp. 128–144, 2018.
  • [8] P. Brazdil, C. Giraud-Carrier, C. Soares, and R. Vilalta, Metalearning: Applications to Data Mining, 2nd ed.   Springer Verlag, 2009.
  • [9]

    S. Ali and K. A. Smith-Miles, “A meta-learning approach to automatic kernel selection for support vector machines,”

    Neurocomputing, vol. 70, no. 1-3, pp. 173–186, 2006.
  • [10] A. L. D. Rossi, A. C. P. de Leon Ferreira, C. Soares, B. F. De Souza et al., “Metastream: A meta-learning based method for periodic algorithm selection in time-changing data,” Neurocomputing, vol. 127, pp. 52–64, 2014.
  • [11] M. Reif, F. Shafait, and A. Dengel, “Meta-learning for evolutionary parameter optimization of classifiers,” Machine learning, vol. 87, no. 3, pp. 357–380, 2012.
  • [12] E. Spyromitros-Xioufis, G. Tsoumakas, W. Groves, and I. Vlahavas, “Multi-target regression via input space expansion: treating targets as inputs,” Machine Learning, vol. 104, no. 1, pp. 55–98, 2016.
  • [13] S. M. Mastelini, E. J. Santana, V. G. T. da Costa, and S. Barbon, “Benchmarking multi-target regression methods,” in 2018 7th Brazilian Conference on Intelligent Systems (BRACIS).   IEEE, 2018, pp. 396–401.
  • [14] D. H. Wolpert, “The lack of a priori distinctions between learning algorithms,” Neural computation, vol. 8, no. 7, pp. 1341–1390, 1996.
  • [15] J. R. Rice, “The algorithm selection problem,” Advances in Computers, vol. 15, pp. 65–118, 1976.
  • [16] D. G. Ferrari and L. N. De Castro, “Clustering algorithm selection by meta-learning systems: A new distance-based problem characterization and ranking combination methods,” Information Sciences, vol. 301, pp. 181–194, 2015.
  • [17] G. F. Campos, S. Barbon, and R. G. Mantovani, “A meta-learning approach for recommendation of image segmentation algorithms,” in 2016 29th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI).   IEEE, 2016, pp. 370–377.
  • [18] R. G. Mantovani, A. L. Rossi, E. Alcobaça, J. Vanschoren, and A. C. de Carvalho, “A meta-learning recommender system for hyperparameter tuning: Predicting when tuning improves svm classifiers,” Information Sciences, vol. 501, pp. 193 – 221, 2019.
  • [19] H. Borchani, G. Varando, C. Bielza, and P. Larrañaga, “A survey on multi-output regression,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 5, no. 5, pp. 216–233, 2015.
  • [20] G. Melki, A. Cano, V. Kecman, and S. Ventura, “Multi-target support vector regression via correlation regressor chains,” Information Sciences, vol. 415, pp. 53–69, 2017.
  • [21] S. M. Mastelini, E. J. Santana, R. Cerri, and S. Barbon, “DSTARS: A multi-target deep structure for tracking asynchronous regressor stack,” in 2017 Brazilian Conference on Intelligent Systems (BRACIS).   IEEE, oct 2017.
  • [22] J. M. Moyano, E. L. Gibaja, and S. Ventura, “An evolutionary algorithm for optimizing the target ordering in ensemble of regressor chains,” in

    2017 IEEE Congress on Evolutionary Computation (CEC)

    .   IEEE, jun 2017.
  • [23] G. Tsoumakas and I. Katakis, “Multi-label classification: An overview,” International Journal of Data Warehousing and Mining (IJDWM), vol. 3, no. 3, pp. 1–13, 2007.
  • [24] A. G. de Sá, G. L. Pappa, and A. A. Freitas, “Towards a method for automatically selecting and configuring multi-label classification algorithms,” in Proceedings of the Genetic and Evolutionary Computation Conference Companion.   ACM, 2017, pp. 1125–1132.
  • [25] L. Chekina, L. Rokach, and B. Shapira, “Meta-learning for selecting a multi-label classification algorithm,” in 2011 IEEE 11th International Conference on Data Mining Workshops.   IEEE, 2011, pp. 220–227.
  • [26] A. C. Lorena, A. I. Maciel, P. B. de Miranda, I. G. Costa, and R. B. Prudêncio, “Data complety meta-features for regression problems,” Machine Learning, vol. 107, no. 1, pp. 209–246, 2018.
  • [27] A. Rivolli, L. P. F. Garcia, C. Soares, J. Vanschoren, and A. C. P. L. F. de Carvalho, “Towards reproducible empirical research in meta-learning,” CoRR, vol. abs/1808.10406, 2018. [Online]. Available: http://arxiv.org/abs/1808.10406
  • [28] E. J. Santana, S. M. Mastelini, and S. Barbon Jr., “Deep Regressor Stacking for Air Ticket Prices Prediction,” in XIII Brazilian Symposium on Information Systems: Information Systems for Participatory Digital Governance.   Brazilian Computer Society (SBC), 2017, pp. 25–31.
  • [29] S. J. Russell and P. Norvig, Artificial intelligence: a modern approach.   Malaysia; Pearson Education Limited,, 2016.
  • [30] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001.
  • [31] V. Vapnik,

    The Nature of Statistical Learning Theory

    .   New York: Springer-Verlag, 1995.
  • [32] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.   ACM, 2016, pp. 785–794.
  • [33] J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” The Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006.
  • [34] A. Osojnik, P. Panov, and S. Džeroski, “Tree-based methods for online multi-target regression,” Journal of Intelligent Information Systems, vol. 50, no. 2, pp. 315–339, 2018.