1 Introduction and Related Work
In traditional supervised learning, each instance is associated with one single outcome. Multioutput (or multitarget) prediction is a supervised learning task, where multiple targets can be assigned to each observation. In this learning problem, target variables can be of any kind (realvalued, discrete, categorical).
When all target variables are binary, this problem is known as multilabel classification [19, 24, 27, 34]. Multilabel classification originated from text classification [20] and is increasingly being used in many different applications such as music categorization [13] or semantic scene categorization [4].
On the other hand, if all target variables are realvalued, the multioutput prediction problem is known as multivariate regression. A broad overview of this topic can be found in [3]
. Applications appear in many different fields, such as ecological modeling of multiple realvalued target variables describing the quality of vegetation, predicting wind noise (represented by several variables), or the estimation of multiple gas tank levels of a gas converter system.
Multioutput prediction can be seen as the most generalized and flexible form of learning to predict multiple targets, as it allows the target variables to be of mixed kind as well. Important use cases for mixed target variables can be found in psychological research. For instance, much work in the field of personality psychology is focused on the prediction of personality and demographic traits based on behavioral data [12, 22]. As traits like gender and age [8, 9] have been found to be related to personality, it would be very useful to simultaneously predict personality via regression, gender via classification, and age via ordinal regression, instead of predicting them independently.
Currently, there are not many available methods that can handle learning tasks with objectives of different kinds (for an available method, see e.g. [11]). Instead of adapting existing methods to be able to handle more than one target, we will use the problem transformation method for predicting multiple targets instead. For this, we will analyze the similarityenforcing method [30] of using predicted targets as feature representation, which has been studied in the multilabel community extensively [16, 18, 17, 23] and has been adapted to multivariate regression [3, 25]
. We will define this method for the more general multioutput prediction problem and introduce a componentwise boosting approach for learning and visualizing the target dependencies. Since the interpretability of blackbox models has become an important topic in the machine learning community
[15], we aimed for a method, that not only uses target dependencies for predictions, but also makes them easy to understand.For general discussions about multioutput prediction in a broader context we refer to [30, 31]. Another method for multilabel classification, where label dependencies are learned in the form of rules, can be found here [14]. The problem transformation method for multilabel learning is extensively discussed in many papers [34, 16, 17, 18]. This method has also been used for multivariate regression in [26, 3].
Main contributions of this paper:

A formal definition of the problem transformation method for multioutput prediction problems.

A novel method similar to the twostep stacking method, which allows interpretations and visualizations of target dependencies.
2 Definition: MultiOutput Prediction
A multioutput prediction problem can be characterized by instances , and targets . The relationship between an instance and the target can be characterized by an onedimensional score , which can be nominal, ordinal, or real valued. A multioutput prediction problem can thus be written as a dataset
, where the target variable is a vector
. This dataset can be portrayed in matrix form:(1) 
We can get the formal definition for multivariate regression by only allowing real values for . By limiting to binary values 0 or 1, we get the formal definition for multilabel classification. However, since we do not need this limitation and want to deal with prediction problems with heterogeneous output spaces as well, we allow to be of any onevalued kind. We call this problem multioutput prediction, which can be seen as a generalization of multilabel classification and multivariate regression. We use the term multioutput prediction to refer to the general prediction task and only specify the terms multilabel classification, multivariate regression, or mixedtype prediction, if we specifically relate to them.
3 Measuring Performance in MultiOutput Prediction Problems
For traditional singletarget machine learning problems, performance measurement is intuitive and there are many metrics like accuracy, Fmeasure, AUC for classification, or the mean squared error (MSE), mean absolute error (MAE) for regression. Once we have multiple target variables, measuring performance becomes nontrivial.
There are many ways of handling this problem. First, we can compare the actual target vector with the predicted target vector and then calculate one performance metric. Many performance measures have been constructed this way for multilabel classification and multivariate regression problems [34, 3].
For multilabel learning, an example would be the so called Hammingloss, which compares the predicted labels with the actual labels:
(2) 
This value is calculated instancewise and the performance of a test set is the mean Hammingloss of each instance.
There are many more multilabel performance measures like loss, Accuracy, Precision, or Rankingloss (see [34]), which can be defined intuitively, because of the binary structure of multilabel learning problems.
For multivariate regression, an example is the multivariate mean squared error MMSE, which is the mean MSE of every target:
(3) 
Having only regression tasks for every target, many multivariate performance metrics can be defined (see e.g. [3]). When using such metrics for multivariate regression problems, one should pay attention to the value range of the target variable. Targets with larger value ranges have more influence on the metric than targets with smaller value ranges. One possible way of handling this problem is to standardize the target values.
However, in the more generalized multioutput prediction problem, calculating one single performance value out of possible mixed target spaces is not trivial. Note, that many multilabel and multivariate regression performance metrics are a weighted sum of performance metrics for each target. We could write a general performance metric like this:
(4) 
The Hammingloss and the MMSE are just special cases of this more general performance metric.
Since datasets with mixed target spaces can differ very much and classification performance metrics are combined with regression performance metrics during evaluation, a general definition of a performance metric is infeasible and should thus be left to the user. One could also handle multioutput prediction problems as multiobjective optimization problems, where tradeoffs between multiple (possibly conflicting) objectives (such as minimizing the MSE for a regression target and maximizing the AUC for a classification target) need to be considered. For multilabel classification this was discussed in [24].
Nevertheless, a further motivation to consider multioutput prediction methods instead of modeling each target independently is that improvements can be made for each target respectively. Each target can be treated independently and we can analyse whether more complex methods are feasible for each target.
For problems with mixed target variables, we will focus on target wise comparisons and use the mean classification error for classification problems and the mean squared error for regression problems. For multilabel classification and multivariate regression we will also report the Hammingloss and MMSE.
4 Learning Target Dependencies
There are two main ways to model problems with more than one target, that are extensively studied in the multilabel community [34, 17] and could also be applied to multioutput prediction. One of them is the algorithm adaptation method, which aims at adapting existing algorithms to handle multiple outputs [33]. The other one is called problem transformation method and aims to transform the multilabel learning problem into more established onetarget prediction problems [34, 18, 16]. The problem transformation method has the advantage that any already established onetarget machine learning model can be used.
In this paper, we will focus on the problem transformation method and how to use it for multioutput prediction problems. Originally used in the multilabel community, these methods were adapted to multivariate regression in [25]. The idea of modeling target dependencies by using other target information as features is not restricted by the type of outputs and can thus be used for multioutput prediction problems as well.
4.1 Independent Models (IM)
The easiest problem transformation method (called binary relevance method in the multilabel community) is to use one model for each target independently and to combine the predictions afterwards. Target dependencies are thus not being considered when using independent models.
Given a dataset , with target and (possibly mixed) targets, we train models for each target independently:
For train model on

(5) 
A new observation will get the prediction .
4.2 Stacking (STA): Using Targets as Features
One way to model target variable dependencies is to use target variables as features. A distinction can be made between different ways in which these target variables are being modeled. For instance, the real target values can be used as features, since these are available during training time. Examples would be the classifier chains
[18] or dependent binary relevance [16]. The alternative would be to create predicted target values by using an inner crossvalidation loop (e.g. nested stacking [23], stacking [30, 16]). A comparison between these methods is discussed in [17]. In this paper, however, we will discuss the stacking method in more detail. After fitting the same independent models (5), as they are needed at prediction time, we obtain predicted targets through an inner crossvalidation strategy:
For use innerCV on

(6) 
The inner crossvalidation strategy can become resourceintensive, as many models have to be fit. Hence, a tradeoff between a sufficient crossvalidation strategy and available computing resources needs to be made.
In a next step, these predicted target variables are used to extend the feature space, and a second set of models is fit for each target:
For train model on

(7) 
At prediction time, we first get predicted targets with independent models, which are then added to the new observation: . The final prediction is .
4.3 Componentwise Multioutput Boosting (CMOB)
For our novel method, we propose to use componentwise boosting to learn the target dependency structure. As for most machine learning models, the aim of componentwise boosting is to minimize the empirical risk:
Componentwise boosting, also called modelbased boosting, generalizes the boosting framework to multiple baselearners [6]. For each boosting iteration the algorithm selects one baselearner out of a space of baselearners by fitting them all to the pseudo residuals and choosing the one with the smallest sum of squared errors. This improves the empirical risk of the current model which is computed via stagewise additive modeling with a learning rate :
(8)  
(9) 
For our purpose, numerical features are included as linear effect , where is a mapping from iteration to the selected feature. For categorical features each group is added as single onehot coded baselearner that just includes an intercept for that group. Boosting these kind of baselearners maintains interpretability because of the additive structure of the model and the repeated selection of equal baselearners.
An important property of componentwise boosting is the intrinsic feature selection. This is achieved by selecting just one baselearner per iteration. After training
iterations we get a subset of all features that are required to predict the target. This provides information about the importance of each feature. In our multioutput prediction case we use this internal feature selection to learn which predicted target variables are required to explain the target .To go one step further we would also like to know which of the selected features are more important than others. Therefore, we can again use the additive structure of componentwise boosting to calculate a feature importance for all selected features. After boosting iterations we calculate the feature importance as the sum of the empirical risk improvements achieved by selecting the th feature:
One requirement for calculating meaningful feature importance scores is to choose an adequate which can be done, by using early stopping. This stops the procedure if the relative improvement
of the empirical risk consecutively falls below a predefined value
. We chose componentwise boosting over other methods, which produce sparse and interpretable models (like ridge regression), because of the flexibility of the choice of the baselearners. Nonlinear effects can easily be modeled using splines as base learners
[21].We now introduce ComponentWise Multioutput Boosting (CMOB) (see algorithm 1). The idea is to use componentwise boosting to learn the target dependencies in a sparse and interpretable manner. CMOB aims at modeling target dependencies through a dataset of predicted target variables , just like the stacking algorithm (see section 4.2).
One difference is that in our algorithm the original features are omitted, because we are only interested in the interactions between the target variables. Interactions between predicted target variables and features are thus not modeled.
Given the dataset of predicted target variables (obtained by (6)), we train componentwise boosting models for each target:
For j = 1,…,m train componentwise boosting model on

(10) 
A new observation will be predicted in a twostep procedure:

Use independent models (5) to create predicted targets:

Use boosting models to create final predictions:
5 Benchmark
5.1 Datasets
We use openly available datasets, that can be downloaded from OpenML [29, 7]. Since datasets for mixedtype prediction are quite uncommon, we mainly used multilabel and multivariate regression datasets. We have limited the number of targets to a maximum of 7 in order to keep the computing time reasonable and the visualizations more understandable. The multilabel classification and multivariate regression datasets are described in detail in [28, 26, 17]. The mixedtype datasets are both personality prediction datasets [2, 22]. See table 1 for more details on the datasets.
Dataset  n  nfeats  ntargets  Type  Reference  Data ID 

emotions  593  72  6  multilabel  link  41545 
image  2000  135  5  multilabel  link  41546 
reuters  2000  243  7  multilabel  link  41547 
scene  2407  294  6  multilabel  link  41548 
andro  49  30  6  multiv. regr.  link  41549 
atp1d  337  411  6  multiv. regr.  link  41550 
atp7d  296  411  6  multiv. regr.  link  41551 
edm  154  16  2  multiv. regr.  link  41552 
enb  768  8  2  multiv. regr.  link  41553 
jura  359  15  3  multiv. regr.  link  41554 
scpf  1137  23  3  multiv. regr.  link  41555 
sf1  323  10  3  multiv. regr.  link  41556 
sf2  1066  10  3  multiv. regr.  link  41557 
slump  103  7  3  multiv. regr.  link  41558 
youtube  404  25  6  mixedtype  link  41559 
sens  257  222  4  mixedtype  link  41560 
5.2 Benchmark
To analyze the potential of learning the target dependency structure, we compare the performance of the proposed CMOB algorithm with a stacking model (STA), which uses all other predicted labels as features. We compare these algorithms with independent models (IM) as baseline. See table 2 for an overview of the benchmark settings.
For CMOB we use linear base learners for the underlying componentwise boosting algorithm with a maximum number of 10000 iterations. Since we strive for sparse models, we have applied an earlystopping strategy. The boosting process stops when no improvement of at least has been achieved for 5 consecutive iterations.
As onetarget algorithms for classification and regression we will use random forests
[5], as they typically perform well in many different scenarios without the need of tuning hyperparameters.
Performance will be evaluated with an outer 10foldcrossvalidation strategy. For classification tasks we will use the mean misclassification error (mmce) as performance metric. For regression tasks we will use the mean squared error (MSE) of the standardized target values (test set target values are standardized using mean and standard deviation of the respective training sets). In the inner training sets, the predicted targets are created with an inner 10foldcrossvalidation strategy. The outer test sets are only used for prediction and performance evaluation. And finally, the models are trained on the whole datasets. For full reproducibility, the benchmark code is available here
[1].Multioutput algorithms  IM, STA, CMOB 

Outer resampling strategy  10 fold cv 
Resampling strategy for creating predictions  10 fold cv 
One target regression learner  Random Forest 
One target classification learner  Random Forest 
Classification measure  mmce 
Regression measure  mse (of normalized target values) 
Baselearners  Linear 
Maximum iterations for boosting  10000 
Early stopping strategy  No 0.01% improvement for 5 iterations. 
Results
We summarized the results of the benchmark in table 3. Reported values are mean values (over the outer test sets) of MSE or mmce (depending on the task) for each dataset and each target. For multilabel classification tasks we also included the Hammingloss (HL) and for multivariate regression tasks the multivariate mean squared error (MMSE).
CMOB could not improve the overall Hammingloss of the multilabel datasets used in this benchmark. Looking at the performance values for each target individually, we can see that for some targets (e.g. for the dataset emotions) CMOB could improve the mmce, but for others (e.g. for the dataset image) our algorithm did worse. However, the stacking algorithm (STA) neither performed very well on the multilabel datasets and could only improve the Hammingloss on the image dataset by a small margin. Independently modeling each target individually seems to be a strong baseline here.
More interesting is the performance of CMOB for the multivariate regression datasets. For 4 (andro, jura, sf1, slump) of the 10 multivariate regression datasets, CMOB could improve the MMSE over independent models. In 2 of these tasks, CMOB could even beat the stacking algorithm. Looking at each target variables individually, we can see that CMOB performs comparably well to the stacking algorithm, showing improvements over the MSE, when stacking also improves over independent models (with some exceptions e.g. from the dataset sf2).
For the mixedtype dataset youtube and sens we can see improvements for some targets when using CMOB. This suggests that the use of multioutput methods can be useful for personality prediction.
Based on datasets for which considerable improvements have been achieved, we show the interpretations of the target dependencies in the following section.
Dataset  Algorithm  MMSE  HL  

andro  IM  0.12  0.26  0.18  0.18  0.39  0.4  0.26  
andro  STA  
andro  CMOB  0.35  0.36  0.2  
atp1d  IM  
atp1d  STA  0.18  
atp1d  CMOB  0.19  
atp7d  IM  0.33  0.28  0.14  0.36  
atp7d  STA  0.3  
atp7d  CMOB  0.33  0.34  0.28  0.37  0.26  
edm  IM  0.41  0.45  0.43  
edm  STA  
edm  CMOB  0.4  0.45  0.43  
enb  IM  0.01  
enb  STA  
enb  CMOB  0.01  
jura  IM  0.45  0.27  0.38  0.37  
jura  STA  0.46  
jura  CMOB  0.26  0.35  
scpf  IM  
scpf  STA  1.31  0.62  1.6  1.18  
scpf  CMOB  0.61  1.58  1.16  
sf1  IM  1.05  1.16  1.29  1.17  
sf1  STA  1.02  1.08  1.18  1.09  
sf1  CMOB  
sf2  IM  0.98  1.12  1.34  1.14  
sf2  STA  1.1  
sf2  CMOB  0.98  1.39  1.15  
slump  IM  0.31  0.55  
slump  STA  0.76  0.64  0.26  0.55  
slump  CMOB  0.75  0.63  
emotions  IM  0.21  0.22  0.21  0.09  0.17  0.16  0.18  
emotions  STA  0.2  0.22  0.2  0.09  0.17  0.17  0.18  
emotions  CMOB  0.2  0.23  0.22  0.11  0.17  0.17  0.18  
image  IM  0.13  0.18  0.26  0.12  0.2  0.18  
image  STA  0.13  0.18  0.25  0.12  0.19  0.17  
image  CMOB  0.13  0.21  0.29  0.12  0.19  0.19  
reuters  IM  0.07  0.09  0.08  0.07  0.06  0.06  0.06  0.07  
reuters  STA  0.07  0.09  0.08  0.07  0.05  0.06  0.06  0.07  
reuters  CMOB  0.07  0.09  0.07  0.07  0.06  0.08  0.06  0.07  
scene  IM  0.09  0.03  0.05  0.05  0.15  0.12  0.08  
scene  STA  0.1  0.03  0.05  0.05  0.15  0.12  0.08  
scene  CMOB  0.09  0.03  0.05  0.05  0.15  0.12  0.08  
youtube  IM  0.13  0.73  1.02  1.05  
youtube  STA  0.13  0.73  1.02  1.06  
youtube  CMOB  0.14  
sens  IM  0.28  0.73  1.02  
sens  STA  0.28  0.73  1.02  
sens  CMOB  0.93  0.3 
5.3 Interpretation of Target Dependencies
Example: Andromeda Dataset
The Andromeda dataset (andro) [10] deals with the prediction of water quality variables (temperature, pH, conductivity, salinity, oxygen, turbidity). CMOB performed well on this dataset and made improvements for every target variable and performed almost as well as the stacking algorithm:
Dataset  Algorithm  temp.  pH  conductivity  salinity  oxygen  turbidity  MMSE 

andro  IM  0.12  0.26  0.18  0.18  0.39  0.4  0.26 
andro  STA  
andro  CMOB  0.35  0.36  0.2 
To further inspect the target dependencies, we plot the base learner coefficients for each target (for effect size and direction) together with the corresponding relative risk reduction of the underlying boosting algorithm (for feature importance) in figure 1. The relative risk reduction of a base learner is the proportion of the base learner’s risk reduction to the total risk reduction:
The numbers in the plots are the base learner’s coefficients and the background color displays the relative risk reduction.
Figure 1 needs to be read row wise, e.g. for the target variable salinity, the predicted targets conductivity and salinity have been selected by the boosting algorithm and both have a positive effect on the value of salinity. It is quite clear, that the predicted target value of the target itself should normally be the most important feature and should have a coefficient of around 1. However, we can see some anomalies, e.g. for the target turbidity, the predicted target value of oxygen, seems to be more important. A possible reason could be that the first prediction of the target turbidity was not accurate in the first place. Nevertheless, we can also see that the resulting boosting models are quite sparse, since only a few baselearners were chosen for most target variables.
Example: Slump Dataset
The Slump [32] dataset deals with the prediction of three properties of concrete (slump, flow and compressive strength). CMOB and STA could make considerable improvements in the prediction of the target compressive strength:
Dataset  Algorithm  slump  flow  compressive strength  MMSE 

slump  IM  0.31  0.55  
slump  STA  0.76  0.64  0.26  0.55 
slump  CMOB  0.75  0.63 
One might argue that the improvements are due to exploiting target dependencies. But if we look more closely at the selected baselearners of CMOB (see figure 2) we can see that the only baselearner chosen for the target compressive strength
is the target itself. This was also often the case for the models in the crossvalidation iterations. Other targets were rarely chosen and had small coefficients. Interestingly, we could achieve a performance improvement only by linearly transforming the predictions of the target
compressive strength.6 Conclusion and Outlook
In this paper we defined the problem transformation method for multioutput prediction problems of possibly mixed target spaces. We introduced a novel algorithm CMOB (componentwise multioutput boosting) which simultaneously learns dependencies within target variables in a sparse and interpretable manner. Through a benchmark experiment with realworld datasets, we showed that, at least for some datasets, the performance of CMOB was comparable to the stacking method’s performance (STA). In contrast to STA, which trains (possibly blackbox) machinelearning models in a second step, CMOB learns the target dependencies with an inherently interpretable model. With the help of CMOB, we were able to find an example, where improvements of predictive performance could be made for one target without using information of other targets. This would otherwise have been attributed to the exploitation of target dependencies.
We limited the choice of datasets to a rather small number of targets (less than 7). Future work should address investigations of the performance of CMOB on datasets with many targets. Since CMOB tries to model target dependencies in a sparse manner, this could be an advantage over STA, which, depending on the choice of the underlying machine learning models, cannot handle noisy variables very well.
Acknowledgements
This work has been partially supported by the German Federal Ministry of Education and Research (BMBF) under Grant No. 01IS18036A. The authors of this work take full responsibilities for its content.
References
 [1] Au, Q.: Benchmark Code for: ComponentWise Boosting of Targets for MultiOutput Prediction (4 2019). https://doi.org/10.6084/m9.figshare.7957292.v2
 [2] Biel, J.I., GaticaPerez, D.: The youtube lens: Crowdsourced personality impressions and audiovisual analysis of vlogs. Multimedia, IEEE Transactions on 15(1), 41–55 (2013)
 [3] Borchani, H., Varando, G., Bielza, C., Larrañaga, P.: A survey on multioutput regression. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 5(5), 216–233 (2015)

[4]
Boutell, M., Shen, X., Luo, J., Brown, C.: Multilabel semantic scene classification. Pattern Recognition. v37 i9
1771, 1–26 (2003)  [5] Breiman, L.: Random forests. Machine learning 45(1), 5–32 (2001)
 [6] Bühlmann, P., Yu, B.: Boosting with the l 2 loss: regression and classification. Journal of the American Statistical Association 98(462), 324–339 (2003)
 [7] Casalicchio, G., Bossek, J., Lang, M., Kirchhoff, D., Kerschke, P., Hofner, B., Seibold, H., Vanschoren, J., Bischl, B.: Openml: An r package to connect to the machine learning platform openml. Computational Statistics 32(3), 1–15 (2017)
 [8] Chapman, B.P., Duberstein, P.R., Sörensen, S., Lyness, J.M.: Gender differences in five factor model personality traits in an elderly cohort. Personality and individual differences 43(6), 1594–1603 (2007)
 [9] Donnellan, M.B., Lucas, R.E.: Age differences in the big five across the life span: evidence from two national samples. Psychology and aging 23(3), 558 (2008)
 [10] Hatzikos, E.V., Tsoumakas, G., Tzanis, G., Bassiliades, N., Vlahavas, I.: An empirical study on sea water quality prediction. KnowledgeBased Systems 21(6), 471–478 (2008)
 [11] Ishwaran, H., Kogalur, U.: Random Forests for Survival, Regression, and Classification (RFSRC) (2019), https://cran.rproject.org/package=randomForestSRC, r package version 2.8.0
 [12] Kosinski, M., Stillwell, D., Graepel, T.: Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences 110(15), 5802–5805 (2013)
 [13] Li, T.: Detecting Emotion in Text (November 2003) (2012)
 [14] Loza Mencía, E., Janssen, F.: Learning rules for multilabel classification: a stacking and a separateandconquer approach. Machine Learning 105(1), 77–126 (oct 2016)
 [15] Molnar, C.: Interpretable Machine Learning (2019)
 [16] Montañes, E., Senge, R., Barranquero, J., Ramón Quevedo, J., José Del Coz, J., Hüllermeier, E.: Dependent binary relevance models for multilabel classification. Pattern Recognition 47(3), 1494–1508 (2014)
 [17] Probst, P., Au, Q., Casalicchio, G., Stachl, C., Bischl, B.: Multilabel Classification with R Package mlr. The R Journal 9(1), 352–369 (2017)
 [18] Read, J., Pfahringer, B., Holmes, G., Frank, E., Brodley Read, C.J., Pfahringer, B., Holmes, G., Frank, E., Read, J.: Classifier chains for multilabel classification. Mach Learn 85, 333–359 (2011)
 [19] Read, J., Reutemann, P., Pfahringer, B., Holmes, G.: MEKA: A Multilabel/Multitarget extension to WEKA. Journal of Machine Learning Reasearch. Available at: http://jmlr.org/papers/volume17/12164/12164.pdf 17(21), 1–5 (2016)
 [20] Schapire, R.E., Singer, Y.: BoosTexter: A Boostingbased System for Text Categorization. Machine Learning 39, 135–168 (2000)
 [21] Schmid, M., Hothorn, T.: Boosting additive models using componentwise psplines. Computational Statistics & Data Analysis 53(2), 298–311 (2008)
 [22] Schoedel, R., Au, Q., Völkel, S.T., Lehmann, F., Becker, D., Bühner, M., Bischl, B., Hussmann, H., Stachl, C.: Digital Footprints of Sensation Seeking. Zeitschrift für Psychologie 226(4), 232–245 (2018)
 [23] Senge, R., José Del Coz, J., Hüllermeier, E.: Rectifying Classifier Chains for MultiLabel Classification. Tech. rep.
 [24] Shi, C., Kong, X., Yu, P.S., Wang, B.: MultiObjective MultiLabel Classification. Proceedings of the 2012 SIAM International Conference on Data Mining pp. 355–366 (2012)
 [25] SpyromitrosXioufis, E., Tsoumakas, G., Groves, W., Vlahavas, I.: MultiTarget Regression via Input Space Expansion: Treating Targets as Inputs. Machine Learning 104(1), 55–98 (nov 2012)
 [26] SpyromitrosXioufis, E., Tsoumakas, G., Groves, W., Vlahavas, I.: Multitarget regression via input space expansion: treating targets as inputs. Machine Learning 104(1), 55–98 (2016)
 [27] Tsoumakas, G., Katakis, I.: Multilabel classification: An overview. International Journal of Data Warehousing and Mining 3, 1–13 (2007)
 [28] Tsoumakas, G., SpyromitrosXioufis, E., Vilcek, J., Vlahavas, I.: Mulan: A java library for multilabel learning. Journal of Machine Learning Research 12, 2411–2414 (2011)
 [29] Vanschoren, J., van Rijn, J.N., Bischl, B., Torgo, L.: Openml: Networked science in machine learning. SIGKDD Explorations 15(2), 49–60 (2013)
 [30] Waegeman, W., Dembczynski, K., Huellermeier, E.: MultiTarget Prediction: A Unifying View on Problems and Methods (sep 2018)
 [31] Xu, D., Shi, Y., Tsang, I.W., Ong, Y.S., Gong, C., Shen, X.: A survey on multioutput learning. CoRR abs/1901.00248 (2019)

[32]
Yeh, I.C.: Modeling slump flow of concrete using secondorder regressions and artificial neural networks. Cement and Concrete Composites
29(6), 474–480 (2007) 
[33]
Zhang, M.L., Zhou, Z.H.: MLKNN: A lazy learning approach to multilabel learning. Pattern Recognition
40(7), 2038–2048 (jul 2007)  [34] Zhang, M.L., Zhou, Z.H.: A review on multilabel learning algorithms (2014)
Comments
There are no comments yet.