In a multi-label dataset a single datapoint is associated with more than one relevant label. This type of data is obtained naturally from real-world domains like text [14, 10], bioinformatics , video , images [2, 19, 16] and music . We denote a multi-label dataset as . Here, is the input datapoint in dimensions, and is the corresponding label assignment for among the possible labels. indicates if the label is applicable (or relevant) for the datapoint: denotes that the label is relevant to , and denotes that the label is not applicable, or is irrelevant, to . The target of multi-label learning is to build a model that can correctly predict all of the relevant labels for a datapoint .
Multi-label datasets are often found to possess an imbalance in the representation of the different labels—some labels are relevant to a very large number of datapoints while other labels are only relevant to a few. We can consider this an example of the class imbalance problem common in binary classification problems if we consider the relevance of each label to be analogous to a binary classification. Often, the labels in a multi-label dataset have widely varying degrees of imbalance and this is a challenging aspect of building multi-label classification models.
Addressing label imbalance to improve multi-label classification is an active field of research, and several methods have been proposed to address this problem [38, 7, 17]. There is, however, room for significant improvement. Label-specific oversampling can be a solution to address the issue of varying label imbalances in multi-label datasets. In this light, we propose UCLSO, which integrates Unsupervised Clustering and Label Specific data Oversampling. The essence of the UCLSO approach is to integrate information about the proximity of points and their label-specific class-memberships to solve the issue of class imbalance in multi-label datasets. In this work, i) synthetic minority points are generated from local data clusters which are obtained from unsupervised clustering of the feature space, and ii) the cardinality of the label-specific oversampled minority set obtained from a local cluster will depend on the cluster’s share of minority datapoints for that label. In effect, the method oversamples the minority class by focusing on per-cluster local distributions of the minority datapoints to maintain the local minority distribution ratio. The key highlights of our work are,
We propose UCLSO, a new minority class oversampling method for multi-label datasets, which generates synthetic minority datapoints specifically in the minority regions of the input space.
UCLSO preserves the intrinsic class distributions of the local clusters in order to avoid generating synthetic minority datapoints in the majority region, or as outliers in the input space.
UCLSO ensures that the number of synthetic minority points added to a region is proportionate to the original minority density in that region.
In UCLSO, datapoints belonging to individual clusters (consistent across the labels) have distinct label relevance which vary across the different labels. We integrate this label-specific information along with the information from the previous step to obtain sets of label-specific synthetic minority points.
An empirical study involving 12 well-known real-world multi-label datasets and nine competing methods indicates that UCLSO shows promising results and is able to perform better, in general, than the competing methods.
The remainder of the paper is structured as follows. Section 2 discusses the relevant existing work in the multi-label domain. In Section 3 we first describe the motivations of our approach and then present the steps of the proposed UCLSO algorithm. The experiment design is described in Section 4 and the results of the experiments are discussed in Section 5. Finally, Section 6 concludes the paper and discusses some directions for future work.
2 Related Work
Existing multi-label classification methods are principally classified into two types: i) Problem transformation methods that modify the multi-label dataset in different ways such that it can be used with existing multi-class classification algorithms [32, 8, 27, 41], and ii) Algorithm adaptation
approaches that modify existing machine learning algorithms to directly handle multi-label datasetets[40, 18, 36, 41].
Multi-label algorithms can also be categorised based on if and how they take label associations into account, which allows algorithms to be categorised as: i) first-order, ii) second-order or iii) higher-order approaches based on the number of labels that are considered together to train the models. First order approaches do not consider any label association and learn a classifier for each label independently of all other labels [40, 31, 37]. In second order methods, pair-wise label associations are explored to achieve enhanced learning of multi-label data [21, 8]. Higher order approaches considering associations between more than two labels . A number of diversified techniques have facilitated higher order label associations through interesting schemes including classifier chains [5, 26], RAkEL , random graph ensembles , DMLkNN , IBLR-ML+ , and Stacked-MLkNN .
In recent years, data transformation has been a popular choice for handling multi-label datasets. The two principal ways of data transformation in multi-label domain are: i) feature extraction or selection, and ii) data oversampling or undersampling. One of the earliest applications of feature extraction in multi-label learning was through LIFT
, which brought significant performance improvements. Most feature selection or extraction methods select a label-specific feature set for each label to improve the discerning capability of the label specific classifiers. Subsequently, a number of different feature selection and extraction approaches have been proposed[13, 34, 33, 11]. Recently, the class imbalance problem in multi-label learning has received more interest from the researchers. One common approach to handling imbalance is to balance the cardinalities of the relevant and irrelevant classes for each label. One way of achieving this is through the removal of points from the majority class of each label– for example using random undersampling  or tomek-link based undersampling . Another way to achieve this is by adding synthetic minority points to the minority class [17, 28, 3]. Although this approaches have been shown to be effective there is still a lot of room for improvement.
3 Unsupervised Clustering and Label Specific data Oversampling (UCLSO)
In this section we discuss the motivation and then present the proposed approach: Unsupervised Clustering and Label-Specific data Oversampling (UCLSO).
(b) shows 5 clusters in this datatset found using k-means. In Figures1 (c) and (d), we mark the points with respect to their label-specific class memberships. The colours red and blue indicate majority and minority class points respectively. Data pre-processing via minority class oversampling is a popular choice to tackle the issue of imbalance in imbalanced datasets . In a multi-label dataset, due to spatial and quantitative variation of class-memberships across the labels, we need to label specific oversampling. Figures 1 (e) and (f) show the label-specific SMOTE-based  oversampling (synthetic points in yellow) for label 1 and label 2 respectively. It can be seen that SMOTE oversamples the synthetic minority points in majority regions on a number of occasions for both labels 1 and 2 (highlighted by black circles in Figures 1 (e) and (f)).
In order to achieve effective learning of a dataset, we need to prevent the majority space encroachment during oversampling. We tackle this issue by clustering (using k-means) the feature space. Clustering the dataset will give us k localized subspaces. Oversampling only within each cluster can prevent the majority class encroachment.
This work is motivated by an effort to balance the cardinalities of the minority and majority classes of the labels without encroaching on the majority class spaces, as well as an effort to preserve the underlying distribution of the datapoints.
As indicated in Figures 1 (e) and (f), a generic oversampling for all labels will not be fruitful as different labels have different quantitative and spatial distribution of the minority points. The are two aspects we need to keep in mind. i) Where should we perform the oversampling? To answer this, we cluster the feature space in an unsupervised manner (only the feature attributes of the points are taken into account). ii) If there is more than one subspace in which to perform oversampling, how much should we oversample in each subspace? We look into the distribution of the minority points (label-specific) in the clusters to decide this. The degree of label-specific oversampling in a cluster should be proportional to its original minority class distribution for that label. Figures 1 (g) and (h) show the oversampling on labels 1 and labels 2 through the proposed method UCLSO. The degree of encroachment in the majority class region is much less for UCLSO compared to SMOTE. The next section will present the proposed UCLSO method in detail.
Following the motivation in the previous section, in this work we propose a first-order oversampling method for multi-label classification datasets, UCLSO, that handles class-imbalance for each label independently.
The main idea of this oversampling method of minority points is to keep the synthetic minority points concentrated within the minority region of the input space. This is done to introduce more synthetic minority points in the minority regions of the input space in a guided fashion, while avoiding introduction of synthetic minority points in non-minority regions. This ideally should improve the minority label area representation, which will help in classifier algorithm training to define a better decision boundary for the specific imbalanced label.
A common approach to generating synthetic minority datapoints is to select two points within a neighbourhood and then generate a synthetic point by interpolation at a random location between them. For a label with a high imbalance ratio and sparsely distributed minority points, the neighbours from the same class for this label can lie far apart. Consequently, the neighbourhood can encompass a large volume of feature space. Therefore, oversampling in such a manner may lead to the generation of synthetic minority points which end up in the majority region of the input space.
To tackle this issue, we partition the original points into clusters , based only on the input space inter-point Euclidean distances. We use the k-means algorithm to perform this clustering. After clustering the datapoints, for each cluster , we randomly select , a minority point from the cluster, and , which is a randomly chosen nearest neighbour of in
. We compute the synthetic minority point by interpolation at a random location of the direction vector connectingand . The synthetic point is computed as follows
where is the th synthetic datapoint generated in cluster for the label , and
is a random number sampled from the uniform distribution, which decides the location of the synthetic point betweenand .
The number of synthetic minority points generated from a cluster is directly proportional to the share of original minority points in that cluster. Therefore, more synthetic minority points will be introduced in the clusters with more original minority points. This is because, we are more confident about adding minority points in a region which originally had comparatively more original minority points. The number of synthetic minority points to be added is computed as follows
where and are the sets of minority and majority datapoints for the label respectively. Here is the number of original minority datapoints for label in cluster . This way, the clusters which have more original minority points will be populated with more synthetic minority point.
Following the above steps, once we obtain the synthetic minority set for the label , the original training dataset is appended with to get an augmented dataset for each label . This augmented training set, , is used to train a binary classifier model for the corresponding label . The above process is summarised in Algorithm 1.
We performed a set of experiments to evaluate the effectiveness of the proposed UCLSO method. This section describes the datasets, algorithms, experimental setup, and evaluation processes used for the experiments.
|Dataset||Instances||Inputs||Labels||Type||Cardinality||Density||Distinct||Proportion of||Imbalance Ratio|
indicate the total number of datapoints, the number of predictor variables, and the number of potential labels respectively in each dataset.Type indicates if the input space is numeric or nominal. Distinct labelsets indicates the number of unique combinations of labels. Cardinality is the average number of labels per datapoint, and Density is achieved by dividing Cardinality by the Labels.
The datasets are modified as recommended in [38, 12]. Labels having a very high degree of imbalance (50 or greater) or having too few positive samples (20 in this case) are removed. For text datasets (medical, enron, rcv1, bibtex), only the input space features with high degree of document frequencies are retained.
To compare the performance of different approaches, we have selected the label-based macro-averaged F-Score and label-based macro-averaged AUC scores recommended in. For the experiments evaluating the proposed algorithm we have performed a fold cross-validation experiment. The experiment setup and environment was kept identical to Zhang et. al.. For clustering, the number of clusters was set to for the k-means step of UCLSO. In the classification phase, a set of linear SVM classifiers are used, one for each label.
We compare the performance of UCLSO against several state-of-the-art multi-label classification algorithms – COCOA , THRSEL , IRUS , SMOTE-EN , RML , and binary relevance (BR), calibrated label ranking (CLR) , ensemble classifier chains (ECC)  and RAkEL . We base our experiments on the experiment presented in Zhang et. al. , and extend the results of that paper by adding the performance of UCLSO.
Tables 2 and 3 shows the label-based macro-average F-Score and label-based macro-averaged AUC results respectively222Note that results for Table 3 does not have the results RML  as the implementation does not provide prediction scores., along with the relative ranks in brackets (lower ranks are better) of the algorithms compared for each dataset. The last row of both tables indicate the average rank for the algorithms. The best values are highlighted in boldface.
Also, to further analyse the differences between the algorithms, we performed a non-parametric statistical test for a multiple classifier comparison test. Following , we have performed a Friedman test with Finner -value adjustments, and the critical difference plots from the test results are shown in Figure 2 333The full result tables in supplementary material: https://github.com/phoxis/uclso/blob/main/UCLSO_ICONIP2021_Supplementary_Material.pdf.
|yeast||0.505 (1)||0.461 (3)||0.427 (5 )||0.426 (6 )||0.436 (4 )||0.471 (2 )||0.409 (9 )||0.413 (8 )||0.389 (10 )||0.420 (7)|
|emotions||0.658 (2)||0.666 (1)||0.560 (9 )||0.622 (5 )||0.575 (8 )||0.645 (3 )||0.550 (10)||0.595 (7 )||0.638 (4 )||0.613 (6)|
|medical||0.783 (1)||0.759 (2)||0.733 (3.5)||0.537 (10)||0.700 (8 )||0.707 (7 )||0.718 (6 )||0.724 (5 )||0.733 (3.5)||0.672 (9)|
|cal500||0.273 (2)||0.210 (5)||0.252 (3 )||0.277 (1 )||0.235 (4 )||0.209 (6 )||0.169 (8 )||0.081 (10)||0.092 (9 )||0.193 (7)|
|rcv1-s1||0.443 (1)||0.364 (3)||0.292 (5 )||0.252 (8 )||0.313 (4 )||0.387 (2 )||0.285 (6 )||0.227 (9 )||0.192 (10 )||0.272 (7)|
|rcv1-s2||0.432 (1)||0.342 (3)||0.275 (5 )||0.234 (8 )||0.305 (4 )||0.363 (2 )||0.272 (6 )||0.226 (9 )||0.173 (10 )||0.263 (7)|
|rcv1-s3||0.480 (1)||0.339 (3)||0.275 (5 )||0.225 (8 )||0.302 (4 )||0.371 (2 )||0.271 (6 )||0.211 (9 )||0.163 (10 )||0.257 (7)|
|enron||0.352 (1)||0.342 (2)||0.291 (5 )||0.293 (4 )||0.266 (8 )||0.307 (3 )||0.246 (9 )||0.244 (10)||0.268 (6 )||0.267 (7)|
|bibtex||0.442 (1)||0.318 (3)||0.303 (4 )||0.253 (8 )||0.283 (5 )||0.326 (2 )||0.263 (7 )||0.265 (6 )||0.212 (10 )||0.252 (9)|
|llog||0.181 (1)||0.082 (6)||0.096 (3 )||0.124 (2 )||0.095 (4.5)||0.095 (4.5)||0.031 (7 )||0.024 (8 )||0.022 (10 )||0.023 (9)|
|corel5k||0.209 (2)||0.196 (3)||0.146 (4 )||0.105 (6 )||0.125 (5 )||0.215 (1 )||0.089 (7 )||0.049 (10)||0.054 (9 )||0.084 (8)|
|slashdot||0.443 (1)||0.374 (2)||0.355 (4 )||0.257 (10)||0.366 (3 )||0.343 (5 )||0.291 (8 )||0.290 (9 )||0.304 (6 )||0.296 (7)|
|yeast||0.666 (3)||0.711 (1 )||0.576 (8.5)||0.658 (4 )||0.582 (7)||0.576 (8.5)||0.650 (5 )||0.705 (2 )||0.641 (6)|
|emotions||0.819 (3)||0.844 (2 )||0.687 (8.5)||0.802 (4 )||0.698 (7)||0.687 (8.5)||0.796 (6 )||0.850 (1 )||0.797 (5)|
|medical||0.967 (1)||0.964 (2 )||0.869 (7.5)||0.955 (3.5)||0.873 (6)||0.869 (7.5)||0.955 (3.5)||0.952 (5 )||0.856 (9)|
|cal500||0.550 (4)||0.558 (2 )||0.509 (8.5)||0.545 (5 )||0.512 (7)||0.509 (8.5)||0.561 (1 )||0.557 (3 )||0.528 (6)|
|rcv1-s1||0.919 (1)||0.889 (3 )||0.643 (7.5)||0.882 (4 )||0.626 (9)||0.643 (7.5)||0.891 (2 )||0.881 (5 )||0.728 (6)|
|rcv1-s2||0.912 (1)||0.882 (2.5)||0.640 (7.5)||0.880 (4 )||0.622 (9)||0.640 (7.5)||0.882 (2.5)||0.874 (5 )||0.721 (6)|
|rcv1-s3||0.956 (1)||0.880 (2 )||0.633 (7.5)||0.872 (4.5)||0.628 (9)||0.633 (7.5)||0.877 (3 )||0.872 (4.5)||0.718 (6)|
|enron||0.719 (5)||0.752 (1 )||0.597 (8.5)||0.738 (3 )||0.619 (7)||0.597 (8.5)||0.720 (4 )||0.750 (2 )||0.650 (6)|
|bibtex||0.844 (4)||0.877 (2 )||0.673 (8.5)||0.894 (1 )||0.706 (6)||0.673 (8.5)||0.811 (5 )||0.873 (3 )||0.696 (7)|
|llog||0.721 (1)||0.663 (4 )||0.518 (7.5)||0.676 (2 )||0.561 (6)||0.518 (7.5)||0.612 (5 )||0.673 (3 )||0.514 (9)|
|corel5k||0.695 (4)||0.718 (3 )||0.559 (7.5)||0.687 (5 )||0.596 (6)||0.559 (7.5)||0.740 (1 )||0.723 (2 )||0.552 (9)|
|slashdot||0.806 (1)||0.774 (2 )||0.632 (8.5)||0.753 (4 )||0.714 (6)||0.632 (8.5)||0.742 (5 )||0.765 (3 )||0.638 (7)|
Table 2 clearly shows that the overall performance of the proposed UCLSO algorithm is better than all the other algorithms, attaining the best average rank of 1.25. The second best rank is attained by COCOA (avg. rank 3). Also, the proposed method UCLSO achieved much better performance than the other approaches for many datasets and attained the top rank for nine of the datasets, and on the remaining three datasets it attained the second rank. these results also show that methods that attempt to explicitly consider the label imbalance issue perform better than those that do not. The other algorithms which specifically address label imbalance attained the following order: RML (avg. rank 3.29), THRSEL (avg. rank 4.62), SMOTE-EN (avg. rank 5.12) and IRUS (arg. rank 6.33). The algorithms which do not consider the label imbalances like BR (avg. rank 7.42), RAkEL (avg. rank 7.5), ECC (avg. rank 8.12), and CLR (avg. rank 8.33) all performed poorly.
Multiple classifier comparison results in Figure (a)a
show that when UCLSO is compared with other algorithms, except for COCOA and RML, the null hypothesis can be rejected with a significance level of. Therefore, based on the statistical test, UCLSO is significantly better than the other algorithms, except COCOA and RML.
Table 3 shows the label-based macro-averaged AUC scores, which shows that proposed method UCLSO was able to attain the second best average rank of 2.42, being very close to COCOA attaining the best rank of 2.21. Interestingly UCLSO attained more rank ones (six) than COCOA (two rank ones). Also, interestingly ECC was able to perform better than UCLSO in six of the datasets, but was able to perform better in nine datasets when compared to COCOA. It is also interesting to notice that ECC and CLR had higherrankings for the label-based macro-averaged AUC metric than for macro-averaged F-Scores. It seems that a simple BR still performed poorly. As ECC and CLR takes label associations into consideration in a binary relevance and ranking fashion, respectively, it helped improve the comparative performances. RAkEL, on the other hand, taking label associations into account is sensitive on the label subset size (value of k) and the specific combination, which can lead to an even higher degree of imbalance. The difference in the results of the label-based macro-average AUC compared to the F-Score also indicates the importance of thresholding the predictions when deciding the relevance of a certain label.
Multiple classifier comparison results show in Figure (b)b that when UCLSO is compared with others, the null hypothesis could not be rejected for COCOA, ECC, CLR and IRUS in this case with a significance level of . Although, UCLSO performed significantly better than RAkEL, SMOTE-ML, THRSEL and BR. Overall, the experiments demonstrate the effectiveness of the proposed method UCLSO, as it outperforms the compared state of the art algorithms in almost all cases.
6 Conclusion and Future Work
In this work we have proposed an algorithm to address the class imbalance of labels in multi-label classification problems. The proposed algorithm, Unsupervised Clustering and Label-Specific data Oversampling (UCLSO), oversamples label-specific minority datapoints in a multi-label problem to balance the sizes of the majority and the minority classes of each label. The oversampling of the minority classes for each label is done in a way such that more minority class samples are generated in regions (or clusters) where the density of minority points is high. This avoids the introduction of minority datapoints in majority regions in the input space. The number of samples introduced per cluster also depends on the share of the minority class for that cluster.
An experiment with 12 well-known multi-label datasets and other state of the art algorithms demonstrates the efficacy of UCLSO with respect to label-based macro-averaged F-Score. UCLSO attained the best average rank and the degree of its improvement over existing approaches was significant. This shows that UCLSO has successfully improved the classification of imbalanced multi-label data. In future, we would specifically like to incorporate some imbalance informed clustering to extend our scheme. Moreover, it would be interesting to amalgamate the oversampling technique with label associated learning, another key component of multi-label data.
-  (2006) Hierarchical multi-label prediction of gene function. Bioinformatics 22 (7), pp. 830–836. Cited by: §1.
Learning multi-label scene classification. Pattern recognition 37 (9), pp. 1757–1771. Cited by: §1, §2.
-  (2015) MLSMOTE: approaching imbalanced multilabel learning through synthetic instance generation. Knowledge-Based Systems 89, pp. 385–397. External Links: Cited by: §2.
-  (2002-06) SMOTE: synthetic minority over-sampling technique. J. Artif. Int. Res. 16 (1), pp. 321–357. External Links: Cited by: §3.1, §4.
-  (2010) Bayes optimal multilabel classification via probabilistic classifier chains. In Proceedings of the 27th international conference on machine learning (ICML-10), pp. 279–286. Cited by: §2.
Combining instance-based learning and logistic regression for multilabel classification. Machine Learning 76 (2-3), pp. 211–225. Cited by: §2.
Addressing imbalance in multi-label classification using structured hellinger forests.
Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31. Cited by: §1.
-  (2008-11) Multilabel classification via calibrated label ranking. Mach. Learn. 73 (2), pp. 133–153. External Links: Cited by: §2, §2, §4.
-  (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Information sciences 180 (10), pp. 2044–2064. Cited by: §5.
-  (2004) Discriminative methods for multi-labeled classification. In Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 22–30. Cited by: §1.
-  (2017) Granular multi-label feature selection based on mutual information. Pattern Recognition 67, pp. 410 – 423. External Links: Cited by: §2.
-  (2009-09) Learning from imbalanced data. IEEE Trans. on Knowl. and Data Eng. 21 (9), pp. 1263–1284. External Links: Cited by: §3.1, §4.
-  (2018-03) Joint feature selection and classification for multilabel learning. IEEE Transactions on Cybernetics 48 (3), pp. 876–889. External Links: Cited by: §2.
Text categorization with support vector machines: learning with many relevant features. In European conference on machine learning, pp. 137–142. Cited by: §1.
-  (2006-06) Toward intelligent music information retrieval. Multimedia, IEEE Transactions on 8 (3), pp. 564–574. External Links: Cited by: §1.
-  (2014) Multi-label image classification with a probabilistic label enhancement model. In Uncertainty in Artificial Intelligence, Cited by: §1.
-  (2019) Synthetic oversampling of multi-label data based on local label distribution. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 180–193. Cited by: §1, §2.
Large-scale multi-label text classification—revisiting neural networks. In Joint european conference on machine learning and knowledge discovery in databases, pp. 437–452. Cited by: §2.
-  (2009-10) Clustering based multi-label classification for image annotation and retrieval. In Systems, Man and Cybernetics, 2009. SMC 2009. IEEE International Conference on, pp. 4514–4519. Cited by: §1.
-  (2017) Stacked-MLkNN: a stacking based improvement to multi-label k-nearest neighbours. In LIDTA@PKDD/ECML, Cited by: §2.
-  (2007) Efficient pairwise classification. In ECML 2007. LNCS (LNAI, pp. 658–665. Cited by: §2.
-  (2020) MLTL: a multi-label approach for the tomek link undersampling algorithm. Neurocomputing 383, pp. 95–105. External Links: Cited by: §2.
-  (2010) Reverse multi-label learning. In Advances in Neural Information Processing Systems 23, pp. 1912–1920. Cited by: §4, footnote 2.
-  (2013-07) Threshold optimisation for multi-label classifiers. Pattern Recogn. 46 (7), pp. 2055–2065. External Links: Cited by: §4.
-  (2007) Correlative multi-label video annotation. In Proceedings of the 15th ACM International Conference on Multimedia, MM ’07, New York, NY, USA, pp. 17–26. External Links: Cited by: §1.
-  (2013) Efficient monte carlo optimization for multi-label classifier chains. pp. 3457–3461. External Links: Cited by: §2.
-  (2011) Classifier chains for multi-label classification. Machine learning 85 (3), pp. 333. Cited by: §2, §4.
-  (2019) Reverse-nearest neighborhood based oversampling for imbalanced, multi-label datasets. Pattern Recognition Letters 125, pp. 813 – 820. External Links: Cited by: §2.
-  (2015-05-01) Multilabel classification through random graph ensembles. Machine Learning 99 (2). Cited by: §2.
-  (2012) Inverse random under sampling for class imbalance problem and its application to multi-label classification. 45 (10), pp. 3738–3750. External Links: Cited by: §2, §4.
A multi-label approach using binary relevance and decision trees applied to functional genomics. Journal of Biomedical Informatics 54, pp. 85–95. External Links: Cited by: §2.
-  (2011-07) Random k-labelsets for multilabel classification. IEEE Transactions on Knowledge and Data Engineering 23 (7), pp. 1079–1089. External Links: Cited by: §2, §2, §4.
A multi-label feature extraction algorithm via maximizing feature variance and feature-label dependence simultaneously. Knowledge-Based Systems 98, pp. 172–184. Cited by: §2.
-  (2018) A weighted linear discriminant analysis framework for multi-label feature extraction. Neurocomputing 275, pp. 107–120. Cited by: §2.
-  (2008) Multi-label classification algorithm derived from k-nearest neighbor rule with label dependencies. In 2008 16th European Signal Processing Conference, pp. 1–5. Cited by: §2.
-  (2006) Multi-label neural networks with applications to functional genomics and text categorization. IEEE Transactions on Knowledge and Data Engineering 18, pp. 1338–1351. Cited by: §2.
-  (2018) Binary relevance for multi-label learning: an overview. Frontiers of Computer Science 12 (2), pp. 191–202. Cited by: §2.
-  (2020) Towards class-imbalance aware multi-label learning. IEEE Transactions on Cybernetics. Cited by: §1, §4, §4, §4.
-  (2015-01) Lift: multi-label learning with label-specific features. Pattern Analysis and Machine Intelligence, IEEE Transactions on 37 (1), pp. 107–120. External Links: Cited by: §2.
ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40 (7), pp. 2038–2048. External Links: Cited by: §2, §2.
-  (2013) A review on multi-label learning algorithms. IEEE transactions on knowledge and data engineering 26 (8), pp. 1819–1837. Cited by: §2.