Multi-Label Learning with Label Enhancement

06/26/2017 ∙ by Ruifeng Shao, et al. ∙ 0

Multi-label learning deals with training instances associated with multiple labels. Many common multi-label algorithms are to treat each label in a crisp manner, being either relevant or irrelevant to an instance, and such label can be called logical label. In contrast, we assume that there is a vector of numerical label behind each multi-label instance, and the numerical label can be treated as the indicator to judge whether the corresponding label is relevant or irrelevant to the instance. The approach we are proposing transforms multilabel problem into regression problem about numerical labels which can reflect the hidden label importance. In order to explore the numerical label, one way is to extend the label space to a Euclidean space by mining the hidden label importance from the training instances. Such process of transforming logical labels into numerical labels is called Label Enhancement. Besides, we give three assumptions for numerical label of multi-label instance in this paper. Based on this, we propose an effective multi-label learning framework called MLL-LE, i.e., Multi-Label Learning with Label Enhancement, which incorporates the regression loss and the three assumptions into a unified framework. Extensive experiments validate the effectiveness of MLL-LE framework.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

In multi-label learning, each training instance is associated with multiple class labels and the task of multi-label learning is to predict a set of relevant labels for the unseen instance. During the past years, multi-label learning techniques have been widely applied to various fields such as document classification [1], video concept detection [2], image classification [3], audio tag annotation [4], etc.

Fig. 1: An exemplar natural scene image which has been annotated with multiple labels sky, desert and tree.

Formally speaking, let = be the -dimensional feature space and be the label set with possible labels. Given a multi-label training set = {(,) }, where is the -dimensional feature vector and is the label vector, the task of multi-label learning is to learn a multi-label predictor mapping from the space of feature vectors to the space of label vectors [5]. Traditional multi-label learning approaches treat each class label as a logical indicator of whether the corresponding label is relevant or irrelevant to the instance, i.e., represents relevant to the instance and represents irrelevant to the instance. Such label represented by or is called logical label. Furthermore, traditional approaches take the common assumption of equal label importance, i.e., the relative importance between relevant labels is not differentiated [6].

For real-world multi-label learning problems, the importance of each possible label is generally different. In detail, the difference of the label importance could be two-fold: 1) relevant label variance, i.e., different labels relevant to the same instance have different relevant levels. 2) irrelevant label variance, i.e., different labels irrelevant to the same instance have different irrelative levels. For example, as shown in Fig. 1 which is an image with five possible labels sky, desert, tree, camel and fish, the logical label vector

is provided by the annotator. For the relevant label variance, the label importance of

desert should be greater than that of tree and sky, because desert can describe the image more apparently. For the irrelevant label variance, the label importance of camel should be greater than that of fish, because although both are not shown in the image, it is obvious that fish is more irrelevant to this picture than camel.

As mentioned above, logical label uses or to describe each instance, which cannot reflect different label importance. So logical label can be viewed as a simplification of the instance’s essential class description. However, for real-world applications, it is difficult to obtain the label importance information directly. Thus we need a method to reconstruct the latent label importance information from the logical multi-label data. To reconstruct the essential class description of each instance, we assume that there is a vector of latent real-valued labels to describe each multi-label instance, which can reflect the importance of the corresponding labels. Such label is called numerical label. The process of reconstructing the numerical labels from the logical multi-label data via utilizing the logical label information and the topological structure in the feature space is called Label Enhancement (LE).

In this paper, we propose an effective multi-label learning approach based on LE named Label Enhanced Multi-Label Learning (LEMLL). In our approach, we formulate the problem by incorporating regression of the numerical labels and label enhancement into a unified framework, where numerical labels and predictive model are jointly learned.

Ii Related Work

Multi-label learning approaches can be roughly grouped into three types based on the thought of order of label correlations [6]. The simplest ones are the first-order approaches which decompose the problem into a series of binary classification problems, each for one label [7, 8]. The first-order approaches neglect the fact that the information of one label may be helpful for the learning of another label. The second-order approaches consider the correlations between pairs of class labels [9, 10]. But the second-order approaches such as CLR [10] and RankSVM [9] only focus on the difference between relevant label and irrelevant label. The high-order approaches consider the correlations among label subsets or all the class labels [11, 12]. For all of them, these approaches take the equal label importance assumption. In contrast, our approach assumes that each instance is described by a vector of latent real-valued labels and the importance of the possible labels is different.

There have been some supervised learning tasks using label importance information (e.g. label distributions) as supervision information. In Label Distribution Learning (LDL)

[13], the label distribution covers a number of labels, representing the degree to which each label describes the instance. Thus, the value of each label is numerical. The aim of LDL is to learn a model mapping from feature space to label distribution space. In Label Ranking (LR) [14, 15, 16], the label ranking of each instance describes different importance levels between labels. The goal of LR is to learn a function mapping from an instance space to rankings (total strict orders) over a predefined set of labels. However, the training of LDL or LR requires the availability of the label distributions or the label rankings in the training set. For the real applications, it is difficult to obtain such label importance information directly. On the contrary, LEMLL does not assume the availability of such explicit label importance information in training set. LEMLL can reconstruct the label importance information automatically from the logical multi-label data, while LR and LDL cannot preprocess logical label into numerical label explicitly. Therefore, LEMLL differs from these two existing works.

There have been some existing works which learn from multi-label data with auxiliary label importance information. According to [17], Multi-Label Ranking (MLR) can be understood as learning a model that associates with a query input both a ranking and a bipartition of the label set into relevant and irrelevant labels. A label ranking and a bipartition are given explicitly and accessible to the MLR algorithm. In [18], graded multi-label classification allows for graded membership of an instance belonging to a class label. An ordinal scale is assumed to characterize the membership degree and an ordinal grade is assigned for each label of the training example. In [19], a full ordering is assumed to be known to rank relevant labels of the training example. In these cases, those auxiliary label importance information are explicitly given and accessible to the learning algorithm. Therefore, it is obvious that LEMLL is different from these existing works without assuming the availability of such explicit information.

Though there is no explicit definition of LE defined in existing literatures, some methods with similar function to LE have been proposed in the past years. In [20] and [21], the membership degrees to the labels are constructed via fuzzy clustering [22] and kernel method. However, these two methods have not been applied to multi-label learning. There have been some existing multi-label learning algorithms based on LE. According to [23], a label propagation procedure over the training instances is used to constitute the label distributions from the logical multi-label data. According to [24], label manifold is explored to transfer the logical labels into real-valued labels. In [25], numerical labels are reconstructed by exploiting the structure of feature space via sparse reconstruction. These related works are all two-stage approaches: the numerical labels are first reconstructed, and then the predictive model is trained according to the reconstructed labels. In the two-stage approaches, the results of model training cannot impact label enhancement. In contrast, the LEMLL method is a single-stage learning algorithm where numerical labels and predictive model are jointly learned. Besides, the training of predictive model and the label enhancement are interrelated in LEMLL.

The contribution of this paper is to propose a single-stage learning strategy that jointly learns to reconstruct the numerical labels and train the predictive model. Comparing with those two-stage approaches, the LEMLL method has several advantages against those two-stage approaches: 1. LEMLL can reconstruct better latent label importance information than those two-stage approaches; 2. Learning process is single-stage, using label enhancement regularizers; 3. LEMLL has better predictive performance than those two-stage approaches.

Iii The LEMLL Approach

Iii-a The LEMLL Framework

Let = be the input space and the label space with l logical labels can be expressed as . The training set of multi-label learning can be described as = {(, ), …, (,)}. According to the above sections, we assume that the class description of each instance is a vector of numerical labels. We use = to denote the latent numerical label vector of the instance . To learn a model mapping from the input space to the numerical label space, i.e., f : , we assume that f is a linear model as:

(1)

where is a nonlinear transformation of to a higher dimensional feature space , and are the parameter matrices of the regression model, and is the predicted numerical label vector.

Aiming at learning a model mapping from the input space to the numerical label space, a regression model can be trained by solving the following problem:

(2)

where

is a loss function,

R denotes the regularizers, , and .

To consider all dimensions into a unique restriction and yield a single support vector for all dimensions, the Vapnik -insensitive loss based on -norm is used for , i.e.,

(3)

which will create an insensitive zone determined by

around the estimate, i.e., the loss of

less than will be ignored. Because of the nonzero value of , the solution takes into account all outputs to construct each individual regressor. In this way, the cross-output relations are exploited. Furthermore, the regression model can return a sparse solution.

To control the complexity of the model, we define the following regularizer as:

(4)

where denotes the Frobenius norm of the matrix .

Iii-A1 Label Enhancement Regularizers

The information of the feature space and the logical label space should be used to reconstruct the numerical labels of each instance. Based on this, we give the following assumptions about label enhancement: 1) the numerical label should be close enough to the original label; 2) the numerical label space and the feature space should share similar local topological structure.

As mentioned above, logical label can be viewed as a simplification of numerical label. Intuitively, the original label contains some information of numerical label, so the original label cannot differ too much from the numerical label. Thus we can get the first assumption and define the following regularizer as:

(5)

where is the logical label matrix.

According to the smoothness assumption [26], the points close to each other are more likely to share a label. We can easily infer that the points close to each other in the feature space are more likely to have similar numerical label vector. This intuition leads to the second assumption. The topological structure of the feature space can be expressed by a fully connected graph , where is the vertex set of the training instances, i.e., , is the edge set in which represents the relationship between and , and is the weight matrix in which each element represents the weight of the edge . To estimate the local topological structure of the feature space, the local neighborhood information of each instance should be used to construct the graph . According to Local Linear Embedding (LLE) [27], each point can be reconstructed by a linear combination of its neighbors. The approximation of the topological structure of the feature space can be obtained by solving the following problem:

(6)

where if is not one of ’s K-nearest neighbors. is constrained because of the translation invariance. Eq. (6) can be transformed into the n quadratic programming problems:

(7)

where . Because the feature space and the numerical label space should share similar local topological structure, we define the following regularizer as:

(8)

where and

is an identity matrix. For a matrix

, is its trace.

By replacing R in Eq. (2) with Eqs. (4), (5) and (8), the framework can be rewritten as:

(9)

where , and are tradeoff parameters.

Iii-B The Alternating Solution for the Optimization

When we fix to solve and , Eq. (9) can be rewritten as:

(10)

Notice that Eq. (10) is a MSVR with the Vapnik -insensitive loss based on -norm [28]. So and can be optimized by training a MSVR model.

When we fix and to solve , the objective function becomes:

(11)

We use an iterative quasi-Newton method called Iterative Re-Weighted Least Square (IRWLS) [29, 28] to minimize . Firstly, is approximated by its first order Taylor expansion at the solution of the current -th iteration, denoted by :

(12)

where and are calculated from . Then a quadratic approximation is further constructed

(13)

where

(14)

and is a constant term that does not depend on . By substituting Eqs. (13) and (14) into Eq. (11), the objective function becomes:

(15)

where , , ( is the Kronecker’s delta function) and is a constant term. Furthermore, Eq. (15) can be rewritten as:

(16)

where is a constant term. The minimization of Eq. (16) can be solved by setting the derivative of the above target function with respect to to be zero:

(17)

Solving Eq. (17), we can get

(18)

The direction of Eq. (18) is used as the descending direction for the minimization of Eq. (11). The solution for the next iteration is obtained via a line search algorithm along this direction.

0:  the training feature matrix and the training label matrix
0:  the numerical label matrix and the parameter matrices and
1:  ; ;
2:  Construct according to Eq. (7);
3:  repeat
4:     Optimize and with according to Eq. (10);
5:     Update according to Eq. (1);
6:     Update via the IRWLS procedure;
7:     ;
8:  until convergence reached
9:  Return , and .
Algorithm 1 The LEMLL Algorithm

The pseudo code of the LEMLL algorithm is presented in Algorithm 1. In order to distinguish the relevant and irrelevant labels, numerical labels should be divided into two sets, i.e., the relevant and irrelevant sets. According to [10] and [23], an extra virtual label is added into the original label set, i.e., the extended original label set . In this paper, the logical value of is set to 0. Using the extended original label set to do the training process, the optimal parameter matrices and are learnt. Given a test instance , the model can predict an extended numerical label vector . The predicted numerical label greater than is relevant to the example and the label smaller than is irrelevant to the example.

Data set Domain
cal500 502 68 174 numeric 26.044 0.150 502 1.000 audio
medical 978 1,449 45 nominal 1.245 0.028 94 0.096 text
llog 1,460 1,004 75 nominal 1.180 0.016 304 0.208 text
enron 1,702 1,001 53 nominal 3.378 0.064 753 0.442 text
msra 1,868 898 19 numeric 6.315 0.332 947 0.507 images
scene 2,407 294 5 numeric 1.074 0.179 15 0.006 images
yeast 2,417 103 14 numeric 4.237 0.303 198 0.082 biology
slashdot 3,782 1,079 22 nominal 1.181 0.054 156 0.041 text
corel5k 5,000 499 374 nominal 3.522 0.009 3,175 0.635 images
rcv1-s1 6,000 944 101 numeric 2.880 0.029 1,028 0.171 text
rcv1-s2 6,000 944 101 numeric 2.634 0.026 954 0.159 text
bibtex 7,395 1,836 159 nominal 2.402 0.015 2,856 0.386 text
corel16k-s1 13,766 500 153 nominal 2.859 0.019 4,803 0.349 images
corel16k-s2 13,761 500 164 nominal 2.882 0.018 4,868 0.354 images
tmc2007 28,696 981 22 nominal 2.158 0.098 1341 0.047 text
TABLE I: Characteristics of the 15 benchmark multi-label data sets used in the experiments.

Iv Experiments

This section is divided into two parts. In the first part, we evaluate the predictive performance of our method on multi-label data sets. In the second part, we reconstruct the label importance information from the logical labels via the LE methods, and then compare the recovered label importance with the ground-truth label importance.

Iv-a Predictive Performance Evaluation

Iv-A1 Experimental Settings

For comprehensive performance evaluation, a total of fifteen benchmark multi-label data sets in Mulan [30] and Meka [31] are collected for experimental studies. For a data set , we use , , , , , , and to represent its number of examples, number of features, number of class labels, feature type, label cardinality, label density, distinct label set and proportion of distinct label sets respectively. Table I summarizes the characteristics of the fifteen data sets.

To examine the effectiveness of label enhancement, LEMLL is first compared with MSVR [28], which can be considered as a degenerated version of LEMLL without label enhancement. Besides, three well-established two-stage approaches are employed for comparative studies, each implemented with parameter setup suggested in respective literatures: 1) Multi-label Learning with Feature-induced labeling information Enrichment (MLFE) [25]: [suggested setup: , , , , and chosen among {1,2, … ,10}, {1,10,15} and {1,10} respectively ]; 2) Multi-Label Manifold Learning (ML) [24]: [suggested setup: = +1, = 1, and chosen among { 1, 2, … , 10 }]; 3) RElative Labeling-Importance Aware multi-laBel learning (RELIAB) [23]: [suggested setup: = 0.5, chosen among { 0.001, 0.01, … , 10 },

chosen among {0.1, 0.15, … ,0.5} ]. Besides, we choose to compare the performance of LEMLL against three state-of-the-art algorithms, including one first-order approach ML-kNN

[8], one second-order approach Calibrated Label Ranking (CLR) [10]

, and one high-order approach Ensemble of Classifier Chains (ECC)

[11]. For the three comparing algorithms, parameter configurations suggested in the literatures are used. For ML-kNN, is set to 10. The ensemble size of ECC is set to 30. The three state-of-the-art comparing algorithms are implemented under the Mulan multi-label learning package [30]

by instantiating the base learners of CLR and ECC with logistic regression. For LEMLL,

K is set to 10. is set to 0.1. , and are all chosen among {, , , , , , } with cross-validation on the training set. For the sake of fairness, linear kernel is used in MSVR, ML and LEMLL.

Five widely-used evaluation metrics are used in comparative studies:

Hamming loss (HL), Ranking loss (RL), One-error (OE), Coverage (CO) and Average precision (AP). Note that for all the five multi-label metrics, their values vary between . Furthermore, for average precision, the larger the values the better the performance; While for the other four metrics, the smaller the values the better the performance. These metrics serve as good indicators for comparative studies as they evaluate the performance of the models from various aspects. Concrete metric definitions can be found in [6].

Data set Hamming loss
LEMLL MLFE ML RELIAB MSVR ML-kNN CLR ECC
CAL500 0.1370.002 0.1410.002 0.1620.013 0.1670.004 0.1370.002 0.1390.001 0.1650.005 0.1470.002
medical 0.0110.001 0.0140.001 0.0190.003 0.0170.001 0.0120.001 0.0170.001 0.0230.002 0.0130.001
llog 0.0150.000 0.0210.001 0.0280.001 0.0160.000 0.0290.001 0.0150.000 0.0210.003 0.0160.000
enron 0.0500.001 0.1160.005 0.1600.015 0.0620.003 0.0700.001 0.0550.001 0.0720.002 0.0640.001
msra 0.1870.009 0.2110.006 0.2210.005 0.2790.018 0.1930.008 0.2130.008 0.3420.035 0.3530.039
scene 0.1090.003 0.1250.002 0.1400.012 0.1270.005 0.1150.003 0.0920.003 0.1810.004 0.1330.002
yeast 0.2010.002 0.2270.003 0.2300.003 0.2140.004 0.2040.003 0.2060.001 0.2220.003 0.2160.002
slashdot 0.0410.001 0.0740.002 0.0500.001 0.0600.002 0.0430.001 0.0520.000 0.0580.001 0.0490.001
Data set Ranking loss
LEMLL MLFE ML RELIAB MSVR ML-kNN CLR ECC
CAL500 0.1820.003 0.1850.003 0.2090.019 0.1810.003 0.1820.003 0.1890.002 0.2390.028 0.2050.004
medical 0.0270.005 0.0350.004 0.0590.008 0.0340.006 0.0400.008 0.0550.007 0.1230.028 0.0320.007
llog 0.1460.008 0.2660.013 0.3100.010 0.1240.005 0.3070.009 0.1680.007 0.1970.018 0.1540.009
enron 0.0840.003 0.2000.007 0.2710.019 0.0910.003 0.1890.006 0.1000.002 0.0890.002 0.1200.004
msra 0.1340.011 0.1610.007 0.1670.007 0.1410.015 0.1470.008 0.1670.011 0.2880.019 0.3320.050
scene 0.0860.003 0.1170.005 0.1310.018 0.0860.006 0.1120.005 0.0840.004 0.1270.003 0.1510.005
yeast 0.1730.003 0.1930.004 0.1950.004 0.1740.004 0.1810.004 0.1820.003 0.1980.003 0.1900.003
slashdot 0.1180.003 0.1800.005 0.1530.005 0.1320.005 0.1410.005 0.1780.005 0.2580.005 0.1230.005
Data set One-error
LEMLL MLFE ML RELIAB MSVR ML-kNN CLR ECC
CAL500 0.1220.017 0.1400.031 0.2580.146 0.1200.016 0.1220.017 0.1360.017 0.3310.117 0.1910.022
medical 0.1400.010 0.1730.014 0.2450.043 0.2130.022 0.1630.012 0.2970.020 0.6880.151 0.1820.019
llog 0.7820.021 0.7920.013 0.8000.014 0.7480.011 0.8080.012 0.8020.013 0.8830.024 0.7850.009
enron 0.2410.013 0.3920.016 0.6620.049 0.3110.013 0.3840.013 0.3280.013 0.3760.017 0.4240.013
msra 0.0510.019 0.0870.014 0.0880.020 0.0970.029 0.0800.017 0.0810.020 0.3120.089 0.4200.110
scene 0.2530.010 0.3160.015 0.3370.034 0.2700.017 0.2960.013 0.2460.009 0.3710.008 0.3730.009
yeast 0.2330.009 0.2860.013 0.3080.011 0.2410.011 0.2410.009 0.2470.010 0.2700.007 0.2560.008
slashdot 0.4110.008 0.5150.014 0.4630.011 0.5570.010 0.4280.010 0.6700.017 0.9780.003 0.4810.014
Data set Coverage
LEMLL MLFE ML RELIAB MSVR ML-kNN CLR ECC
CAL500 0.7480.008 0.7480.008 0.7680.016 0.7470.008 0.7480.008 0.7550.006 0.7940.010 0.7880.008
medical 0.0400.007 0.0510.006 0.0770.009 0.0520.009 0.0560.009 0.0760.010 0.1430.032 0.0480.010
llog 0.1500.009 0.2610.015 0.2990.013 0.1590.006 0.2980.010 0.1690.009 0.2340.020 0.1920.011
enron 0.2450.007 0.4520.013 0.5260.031 0.2410.005 0.4470.011 0.2650.006 0.2380.006 0.3000.010
msra 0.5440.017 0.5810.015 0.5850.016 0.5430.021 0.5660.015 0.5900.013 0.7200.024 0.7430.034
scene 0.0850.002 0.1110.004 0.1240.015 0.1030.006 0.1080.004 0.0840.003 0.1440.003 0.1690.004
yeast 0.4550.005 0.4790.006 0.4800.006 0.4510.006 0.4710.006 0.4650.005 0.4920.006 0.4760.004
slashdot 0.1370.004 0.2000.006 0.1740.005 0.1480.005 0.1610.005 0.1910.005 0.2710.005 0.1390.005
Data set Average precision
LEMLL MLFE ML RELIAB MSVR ML-kNN CLR ECC
CAL500 0.4970.005 0.4880.005 0.4350.032 0.4960.005 0.4970.005 0.4790.006 0.3950.045 0.4620.007
medical 0.8910.011 0.8650.013 0.7950.033 0.8370.019 0.8690.011 0.7700.017 0.4000.065 0.8600.016
llog 0.3370.012 0.3000.010 0.2740.009 0.3900.010 0.2740.008 0.3040.009 0.2090.020 0.3420.009
enron 0.6780.006 0.5390.011 0.3680.017 0.6610.008 0.5550.008 0.6090.010 0.6100.008 0.5590.008
msra 0.8160.014 0.7830.011 0.7740.010 0.8000.021 0.8000.012 0.7750.016 0.6240.023 0.5670.050
scene 0.8500.005 0.8070.008 0.7920.022 0.8420.010 0.8180.008 0.8530.005 0.7780.004 0.7660.006
yeast 0.7540.005 0.7270.005 0.7200.005 0.7530.006 0.7520.005 0.7440.005 0.7300.003 0.7410.004
slashdot 0.6830.007 0.5940.010 0.6360.007 0.5790.009 0.6630.006 0.4800.012 0.2510.007 0.6280.009
TABLE II: Predictive performance (mean std. deviation). () indicates LEMLL is significantly better (worse) than the corresponding method on the criterion based on paired t-test at 95 significance level. () implies the smaller (larger), the better.
Data set Hamming loss
LEMLL MLFE ML RELIAB MSVR ML-kNN CLR ECC
corel5k 0.0090.000 0.0130.000 0.0190.000 0.0240.000 0.0100.000 0.0090.000 0.0110.000 0.0100.000
rcv1-s1 0.0270.000 0.0350.003 0.0450.003 0.0310.000 0.0270.000 0.0270.000 0.0350.001 0.0290.000
rcv1-s2 0.0240.000 0.0330.003 0.0510.004 0.0280.000 0.0240.000 0.0250.000 0.0340.001 0.0280.001
bibtex 0.0130.000 0.0150.000 0.0260.002 0.0140.000 0.0130.000 0.0140.000 0.0500.002 0.0210.001
corel16k-s1 0.0190.000 0.0200.000 0.0190.000 0.0220.000 0.0190.000 0.0190.000 0.0200.000 0.0190.000
corel16k-s2 0.0170.000 0.0190.000 0.0180.000 0.0210.000 0.0180.000 0.0180.000 0.0190.000 0.0180.000
tmc2007 0.0630.000 0.0660.000 0.0630.000 0.0660.001 0.0630.000 0.0750.000 0.0720.001 0.0670.000
Data set Ranking loss
LEMLL MLFE ML RELIAB MSVR ML-kNN CLR ECC
corel5k 0.1340.002 0.2480.004 0.4620.015 0.1450.002 0.2360.003 0.1370.002 0.1510.008 0.1810.002
rcv1-s1 0.0440.001 0.1090.002 0.1290.008 0.0650.001 0.0900.002 0.0880.002 0.0460.001 0.0830.002
rcv1-s2 0.0450.001 0.1100.002 0.1450.011 0.0620.003 0.0900.003 0.0980.003 0.0500.001 0.0870.002
bibtex 0.0740.002 0.1260.002 0.1380.005 0.0600.002 0.1320.002 0.2260.006 0.0790.001 0.1450.002
corel16k-s1 0.1500.001 0.1950.001 0.1980.003 0.1660.002 0.1960.002 0.1750.001 0.1630.001 0.2270.002
corel16k-s2 0.1660.001 0.1940.001 0.1970.001 0.1610.001 0.1940.001 0.1700.002 0.1550.001 0.2200.002
tmc2007 0.0540.001 0.0540.001 0.0550.001 0.0480.001 0.0550.001 0.0980.002 0.0630.001 0.0670.001
Data set One-error
LEMLL MLFE ML RELIAB MSVR ML-kNN CLR ECC
corel5k 0.6620.009 0.7340.003 0.8280.011 0.7550.005 0.6720.005 0.7440.012 0.7490.007 0.7470.006
rcv1-s1 0.4380.004 0.4740.007 0.5710.026 0.5040.007 0.4530.007 0.5100.007 0.4840.008 0.5270.010
rcv1-s2 0.4360.011 0.4710.009 0.5880.026 0.4760.009 0.4490.008 0.5240.012 0.4630.007 0.5200.009
bibtex 0.3950.005 0.4070.005 0.5810.022 0.4110.008 0.3960.005 0.6100.005 0.5100.006 0.5190.007
corel16k-s1 0.6550.004 0.6890.004 0.6590.004 0.7150.006 0.6560.005 0.7470.006 0.7620.006 0.7870.005
corel16k-s2 0.6440.003 0.6800.004 0.6520.005 0.7140.006 0.6500.004 0.7510.004 0.7580.005 0.7780.007
tmc2007 0.2380.005 0.2350.004 0.2380.005 0.2380.007 0.2380.005 0.3200.004 0.2710.005 0.2390.003
Data set Coverage
LEMLL MLFE ML RELIAB MSVR ML-kNN CLR ECC
corel5k 0.3220.004 0.5230.006 0.7650.014 0.3280.005 0.5180.006 0.3120.004 0.3140.013 0.4160.005
rcv1-s1 0.1100.001 0.2290.005 0.2560.012 0.1470.003 0.2010.003 0.1880.003 0.1120.001 0.1790.003
rcv1-s2 0.1080.003 0.2230.002 0.2710.015 0.1340.005 0.1940.004 0.2010.004 0.1150.002 0.1810.004
bibtex 0.1400.003 0.2330.004 0.2330.007 0.1100.002 0.2430.004 0.3650.009 0.1350.001 0.2530.003
corel16k-s1 0.2980.002 0.3780.003 0.3860.005 0.3240.003 0.3820.004 0.3390.002 0.3030.002 0.4160.004
corel16k-s2 0.3310.002 0.3770.003 0.3860.002 0.3170.003 0.3820.002 0.3330.003 0.2880.002 0.4060.003
tmc2007 0.1350.001 0.1340.001 0.1370.001 0.1220.001 0.1370.001 0.1950.002 0.1450.001 0.1570.001
Data set Average precision
LEMLL MLFE ML RELIAB MSVR ML-kNN CLR ECC
corel5k 0.2930.003 0.2240.003 0.1440.003 0.2410.002 0.2680.002 0.2400.005 0.2210.006 0.2340.004
rcv1-s1 0.5930.002 0.5260.006 0.4400.017 0.5360.006 0.5590.005 0.5130.003 0.5800.004 0.4980.006
rcv1-s2 0.6050.005 0.5490.005 0.4450.020 0.5630.008 0.5800.006 0.5150.008 0.5960.003 0.5160.005
bibtex 0.5680.004 0.5240.003 0.3960.013 0.5640.004 0.5290.004 0.3270.006 0.4730.003 0.4120.005
corel16k-s1 0.3370.002 0.3100.003 0.3240.003 0.2990.003 0.3250.003 0.2790.002 0.2600.003 0.2310.003
corel16k-s2 0.3320.002 0.3070.003 0.3210.003 0.2960.002 0.3220.002 0.2720.002 0.2600.003 0.2320.003
tmc2007 0.8000.003 0.8010.002 0.8000.003 0.8000.004 0.8000.003 0.7120.003 0.7680.002 0.7830.002
TABLE III: Predictive performance (mean std. deviation). () indicates LEMLL is significantly better (worse) than the corresponding method on the criterion based on paired t-test at 95 significance level. () implies the smaller (larger), the better.
Algorithm HL RL OE CO AP
LEMLL 1.500 1.567 1.400 2.000 1.400
MLFE 5.567 5.400 4.067 5.233 4.600
ML 6.033 6.867 5.900 6.600 6.067
RELIAB 5.567 2.233 4.067 2.667 3.400
MSVR 2.833 4.833 2.833 5.033 3.067
ML-kNN 3.300 4.833 5.333 4.667 5.467
CLR 6.667 4.800 6.333 4.600 6.133
ECC 4.533 5.467 6.067 5.600 5.867
TABLE IV: The average ranks of the 8 algorithms on the 5 measures.
Data set (abbr.)
SJAFFE (SJA) 213 243 6
Natural Scene (NS) 2,000 294 9
Yeast-spoem (spoem) 2,465 24 2
Yeast-spo5 (spo5) 2,465 24 3
Yeast-dtt (dtt) 2,465 24 4
Yeast-cold (cold) 2,465 24 4
Yeast-spo (spo) 2,465 24 6
Yeast-heat (heat) 2,465 24 6
Yeast-diau (diau) 2,465 24 7
Yeast-elu (elu) 2,465 24 14
Yeast-cdc (cdc) 2,465 24 15
Yeast-alpha (alpha) 2,465 24 18
SBU_3DFE (3DFE) 2,500 243 6
Movie (Mov) 7,755 1,869 5
Human Gene (HG) 30,542 36 68
TABLE V: Characteristics of the 15 data sets used in the experiments.

Iv-A2 Experimental Results

The detailed experimental results of each comparing algorithm on the 15 data sets are presented in Table II and Table III. The average ranks of the eight algorithms on the five measures are given in Table IV. On each data set, 50% examples are randomly sampled without replacement to form the training set, and the rest 50% examples are used to form the test set. The sampling process is repeated for ten times. The mean metric value and the standard deviation across ten training/testing trials are recorded for comparative studies.

Algorithm Avg. Rank
SJA NS spoem spo5 dtt cold heat spo diau elu cdc alpha 3DFE Mov HG
LEMLL 0.077(1) 0.328(3) 0.063(1) 0.084(1) 0.069(1) 0.069(1) 0.054(1) 0.065(1) 0.052(1) 0.026(1) 0.026(1) 0.023(1) 0.087(1) 0.117(1) 0.048(2) 1.200
MLFE 0.093(2) 0.313(1) 0.149(4) 0.176(3) 0.203(3) 0.189(2) 0.161(2) 0.148(2) 0.155(2) 0.081(3) 0.078(2) 0.071(2) 0.088(2) 0.143(2) 0.044(1) 2.200
ML 0.122(3) 0.316(2) 0.165(5) 0.220(4) 0.265(5) 0.265(4) 0.273(5) 0.269(3) 0.312(5) 0.191(5) 0.195(5) 0.194(5) 0.123(3) 0.236(3) 0.055(3) 4.000
RELIAB 0.159(4) 0.427(6) 0.081(2) 0.173(2) 0.208(4) 0.210(3) 0.234(4) 0.278(4) 0.285(4) 0.135(4) 0.148(4) 0.146(4) 0.164(4) 0.379(5) 0.195(5) 3.933
FCM 0.163(5) 0.393(5) 0.089(3) 0.252(5) 0.173(2) 0.295(5) 0.164(3) 0.350(5) 0.175(3) 0.080(2) 0.084(3) 0.086(3) 0.203(5) 0.364(4) 0.061(4) 3.800
KM 0.718(6) 0.348(4) 0.411(6) 0.586(6) 0.719(6) 0.703(6) 0.796(6) 0.779(6) 0.824(6) 0.461(6) 0.460(6) 0.469(6) 0.695(6) 0.624(6) 0.365(6) 5.867
Algorithm Avg. Rank
SJA NS spoem spo5 dtt cold heat spo diau elu cdc alpha 3DFE Mov HG
LEMLL 0.077(1) 0.328(3) 0.063(1) 0.084(1) 0.069(1) 0.069(1) 0.049(1) 0.061(1) 0.052(1) 0.025(1) 0.026(1) 0.022(1) 0.087(2) 0.117(1) 0.048(3) 1.333
MLFE 0.092(2) 0.312(2) 0.149(4) 0.176(3) 0.203(3) 0.189(2) 0.127(3) 0.123(2) 0.106(2) 0.067(3) 0.065(2) 0.053(3) 0.085(1) 0.143(2) 0.042(1) 2.333
ML 0.117(3) 0.308(1) 0.165(5) 0.220(4) 0.265(5) 0.265(4) 0.198(5) 0.210(4) 0.173(5) 0.133(5) 0.138(5) 0.116(5) 0.116(3) 0.236(3) 0.046(2) 3.933
RELIAB 0.153(4) 0.418(6) 0.081(2) 0.173(2) 0.208(4) 0.210(3) 0.138(4) 0.200(3) 0.155(4) 0.082(4) 0.095(4) 0.072(4) 0.155(4) 0.379(5) 0.088(5) 3.867
FCM 0.161(5) 0.394(5) 0.089(3) 0.252(5) 0.173(2) 0.295(5) 0.115(2) 0.272(5) 0.135(3) 0.050(2) 0.07(3) 0.050(2) 0.199(5) 0.364(4) 0.056(4) 3.667
KM 0.706(6) 0.348(4) 0.411(6) 0.586(6) 0.719(6) 0.703(6) 0.565(6) 0.621(6) 0.392(6) 0.278(6) 0.281(6) 0.223(6) 0.666(6) 0.624(6) 0.179(6) 5.867
Algorithm Avg. Rank
SJA NS spoem spo5 dtt cold heat spo diau elu cdc alpha 3DFE Mov HG
LEMLL 0.073(1) 0.323(4) 0.063(1) 0.084(1) 0.055(1) 0.061(1) 0.046(1) 0.052(1) 0.052(1) 0.024(1) 0.023(1) 0.020(1) 0.088(2) 0.116(1) 0.048(3) 1.400
MLFE 0.080(2) 0.308(3) 0.149(4) 0.176(3) 0.122(4) 0.130(2.5) 0.108(4) 0.101(2) 0.104(2) 0.055(3) 0.050(2) 0.044(2) 0.078(1) 0.140(2) 0.042(1) 2.500
ML 0.089(3) 0.298(2) 0.165(5) 0.220(4) 0.151(5) 0.168(4) 0.151(5) 0.147(4) 0.166(5) 0.103(5) 0.098(5) 0.089(5) 0.092(3) 0.223(3) 0.043(2) 4.000
RELIAB 0.096(4) 0.284(1) 0.081(2) 0.173(2) 0.089(2) 0.130(2.5) 0.093(2) 0.111(3) 0.149(4) 0.058(4) 0.058(4) 0.052(4) 0.108(4) 0.357(5) 0.055(5) 3.233
FCM 0.183(5) 0.382(6) 0.089(3) 0.252(5) 0.102(3) 0.223(5) 0.106(3) 0.167(5) 0.133(3) 0.046(2) 0.057(3) 0.047(3) 0.182(5) 0.351(4) 0.054(4) 3.933
KM 0.408(6) 0.346(5) 0.411(6) 0.586(6) 0.316(6) 0.414(6) 0.342(6) 0.339(6) 0.352(6) 0.187(6) 0.165(6) 0.148(6) 0.455(6) 0.585(6) 0.105(6) 5.933
Algorithm Avg. Rank
SJA NS spoem spo5 dtt cold heat spo diau elu cdc alpha 3DFE Mov HG
LEMLL 0.078(1) 0.320(4) 0.063(1) 0.068(1) 0.053(1) 0.056(1) 0.041(1) 0.049(1) 0.051(1) 0.022(1) 0.021(1) 0.019(1) 0.095(1) 0.101(1) 0.049(4) 1.400
MLFE 0.093(2.5) 0.306(3) 0.149(4) 0.138(3) 0.114(4) 0.111(3) 0.084(4) 0.087(3) 0.077(2) 0.042(3) 0.040(2.5) 0.034(2) 0.098(2) 0.106(2) 0.043(2.5) 2.833
ML 0.099(4) 0.294(2) 0.165(5) 0.162(5) 0.139(5) 0.136(4) 0.117(5) 0.121(5) 0.119(4) 0.077(5) 0.075(5) 0.067(5) 0.104(4) 0.142(3) 0.043(2.5) 4.233
RELIAB 0.093(2.5) 0.265(1) 0.081(2) 0.089(2) 0.084(2) 0.102(2) 0.062(2) 0.082(2) 0.116(3) 0.040(2) 0.040(2.5) 0.041(3) 0.103(3) 0.203(4) 0.042(1) 2.267
FCM 0.164(5) 0.373(6) 0.089(3) 0.143(4) 0.100(3) 0.191(5) 0.079(3) 0.117(4) 0.134(5) 0.046(4) 0.046(4) 0.045(4) 0.151(5) 0.226(5) 0.054(5) 4.333
KM 0.308(6) 0.346(5) 0.411(6) 0.420(6) 0.257(6) 0.254(6) 0.217(6) 0.239(6) 0.195(6) 0.113(6) 0.109(6) 0.095(6) 0.337(6) 0.321(6) 0.067(6) 5.933
Algorithm Avg. Rank
SJA NS spoem spo5 dtt cold heat spo diau elu cdc alpha 3DFE Mov HG
LEMLL 0.083(1) 0.318(4) 0.063(1) 0.072(1) 0.053(1) 0.056(1) 0.041(1) 0.047(1) 0.044(1) 0.020(1) 0.019(1) 0.018(1) 0.107(3) 0.105(2) 0.047(3) 1.533
MLFE 0.087(3) 0.305(3) 0.148(4) 0.115(3) 0.114(4) 0.112(3) 0.078(4) 0.078(3) 0.070(2) 0.035(3) 0.034(3) 0.028(2) 0.100(1) 0.102(1) 0.047(3) 2.800
ML 0.093(4) 0.291(2) 0.163(5) 0.130(4) 0.139(5) 0.136(4) 0.106(5) 0.103(5) 0.100(4) 0.063(5) 0.062(5) 0.052(5) 0.105(2) 0.129(3) 0.047(3) 4.067
RELIAB 0.086(2) 0.257(1) 0.079(2) 0.084(2) 0.084(2) 0.101(2) 0.056(2) 0.061(2) 0.086(3) 0.034(2) 0.031(2) 0.035(3) 0.111(4) 0.168(4) 0.039(1) 2.267
FCM 0.157(5) 0.369(6) 0.089(3) 0.154(5) 0.100(3) 0.189(5) 0.070(3) 0.088(4) 0.124(5) 0.045(4) 0.042(4) 0.042(4) 0.158(5) 0.207(5) 0.055(5) 4.400
KM 0.213(6) 0.346(5) 0.408(6) 0.276(6) 0.257(6) 0.252(6) 0.175(6) 0.175(6) 0.152(6) 0.078(6) 0.076(6) 0.063(6) 0.238(6) 0.234(6) 0.058(6) 5.933
TABLE VI: Reconstruction performance (value(rank)) measured by Chebyshev with threshold varying from 0.1 to 0.5 with step size of 0.1
Algorithm Avg. Rank
SJA NS spoem spo5 dtt cold heat spo diau elu cdc alpha 3DFE Mov HG
LEMLL 0.038(1) 3.223(4) 0.019(1) 0.028(1) 0.020(1) 0.022(1) 0.020(1) 0.030(1) 0.023(1) 0.014(1) 0.016(1) 0.013(1) 0.037(2) 0.145(1) 0.213(2) 1.333
MLFE 0.043(2) 2.844(3) 0.059(4) 0.076(2) 0.100(2) 0.090(2) 0.084(2) 0.078(2) 0.089(2) 0.084(2) 0.084(2) 0.077(2) 0.033(1) 0.149(2) 0.200(1) 2.067
ML 0.073(3) 2.588(2) 0.078(5) 0.136(4) 0.211(5) 0.224(4) 0.317(5) 0.328(4) 0.467(5) 0.620(5) 0.677(5) 0.772(5) 0.081(3) 0.279(3) 0.430(3) 4.067
RELIAB 0.163(4) 1.361(1) 0.023(2) 0.085(3) 0.121(4) 0.138(3) 0.203(3) 0.314(3) 0.421(4) 0.206(4) 0.293(4) 0.305(4) 0.152(4) 0.446(4) 1.063(5) 3.467
FCM 0.218(5) 3.693(6) 0.035(3) 0.187(5) 0.112(3) 0.226(5) 0.226(4) 0.392(5) 0.344(3) 0.123(3) 0.222(3) 0.138(3) 0.284(5) 0.578(5) 0.443(4) 4.133
KM 1.287(6) 3.518(5) 0.536(6) 0.891(6) 1.271(6) 1.218(6) 1.595(6) 1.523(6) 1.742(6) 1.825(6) 1.880(6) 2.054(6) 1.218(6) 0.990(6) 2.099(6) 5.933
Algorithm Avg. Rank
SJA NS spoem spo5 dtt cold heat spo diau elu cdc alpha 3DFE Mov HG
LEMLL 0.038(1) 3.220(4) 0.019(1) 0.028(1) 0.020(1) 0.022(1) 0.019(1) 0.028(1) 0.027(1) 0.015(1) 0.017(1) 0.016(1) 0.037(2) 0.145(1) 0.205(2) 1.333
MLFE 0.043(2) 2.840(3) 0.059(4) 0.076(2) 0.100(2) 0.090(2) 0.084(2) 0.077(2) 0.092(2) 0.095(3) 0.093(2) 0.096(3) 0.033(1) 0.149(2) 0.188(1) 2.200
ML 0.068(3) 2.551(2) 0.078(5) 0.136(4) 0.211(5) 0.224(4) 0.250(5) 0.263(4) 0.296(5) 0.464(5) 0.525(5) 0.526(5) 0.078(3) 0.279(3) 0.340(3) 4.067
RELIAB 0.154(4) 1.352(1) 0.023(2) 0.085(3) 0.121(4) 0.138(3) 0.105(3) 0.191(3) 0.244(3) 0.131(4) 0.189(4) 0.164(4) 0.145(4) 0.446(4) 0.710(5) 3.400
FCM 0.208(5) 3.694(6) 0.035(3) 0.187(5) 0.112(3) 0.226(5) 0.110(4) 0.269(5) 0.249(4) 0.088(2) 0.157(3) 0.079(2) 0.278(5) 0.578(5) 0.351(4) 4.067
KM 1.271(6) 3.517(5) 0.536(6) 0.891(6) 1.271(6) 1.218(6) 1.261(6) 1.301(6) 1.139(6) 1.421(6) 1.480(6) 1.436(6) 1.178(6) 0.990(6) 1.542(6) 5.933
Algorithm Avg. Rank
SJA NS spoem spo5 dtt cold heat spo diau elu cdc alpha 3DFE Mov HG
LEMLL 0.030(1) 3.118(4) 0.019(1) 0.028(1) 0.017(1) 0.019(1) 0.017(1) 0.023(1) 0.027(1) 0.016(1) 0.017(1) 0.017(1) 0.036(2) 0.143(1) 0.196(2) 1.333
MLFE 0.041(2) 2.674(3) 0.059(4) 0.076(2) 0.086(4) 0.077(3) 0.087(4) 0.075(2) 0.092(2) 0.098(4) 0.097(2) 0.099(3) 0.035(1) 0.146(2) 0.178(1) 2.600
ML 0.053(3) 2.323(2) 0.078(5) 0.136(4) 0.134(5) 0.140(5) 0.203(5) 0.192(5) 0.287(5) 0.374(5) 0.377(5) 0.413(5) 0.059(3) 0.267(3) 0.279(3) 4.200
RELIAB 0.061(4) 1.196(1) 0.023(2) 0.085(3) 0.041(2) 0.067(2) 0.065(2) 0.088(3) 0.237(3) 0.092(3) 0.106(4) 0.113(4) 0.082(4) 0.415(4) 0.509(5) 3.067
FCM 0.233(5) 3.441(5) 0.035(3) 0.187(5) 0.042(3) 0.137(4) 0.072(3) 0.150(4) 0.243(4) 0.070(2) 0.099(3) 0.065(2) 0.226(5) 0.566(5) 0.308(4) 3.800
KM 0.883(6) 3.475(6) 0.536(6) 0.891(6) 0.701(6) 0.806(6) 0.959(6) 0.925(6) 1.086(6) 1.141(6) 1.093(6) 1.141(6) 0.907(6) 0.936(6) 1.189(6) 6.000
Algorithm Avg. Rank
SJA NS spoem spo5 dtt cold heat spo diau elu cdc alpha 3DFE Mov HG
LEMLL 0.032(1) 3.056(4) 0.019(1) 0.023(1) 0.016(1) 0.017(1) 0.015(1) 0.021(1) 0.027(1) 0.015(1) 0.016(1) 0.016(1) 0.038(1) 0.116(2) 0.189(2) 1.333
MLFE 0.047(2) 2.578(3) 0.059(4) 0.055(3) 0.084(4) 0.073(3) 0.075(4) 0.068(3) 0.081(2) 0.092(4) 0.093(4) 0.096(4) 0.046(2) 0.103(1) 0.168(1) 2.933
ML 0.057(3.5) 2.219(2) 0.078(5) 0.083(5) 0.127(5) 0.116(5) 0.145(5) 0.143(5) 0.195(4) 0.265(5) 0.285(5) 0.310(5) 0.059(4) 0.163(3) 0.235(3) 4.300
RELIAB 0.057(3.5) 1.099(1) 0.023(2) 0.031(2) 0.037(2) 0.046(2) 0.034(2) 0.052(2) 0.142(3) 0.058(2.5) 0.065(2) 0.076(3) 0.057(3) 0.251(4) 0.377(5) 2.600
FCM 0.186(5) 3.284(5) 0.035(3) 0.074(4) 0.041(3) 0.101(4) 0.036(3) 0.104(4) 0.200(5) 0.058(2.5) 0.067(3) 0.054(2) 0.142(5) 0.375(5) 0.280(4) 3.833
KM 0.760(6) 3.449(6) 0.536(6) 0.573(6) 0.619(6) 0.591(6) 0.682(6) 0.715(6) 0.724(6) 0.816(6) 0.836(6) 0.863(6) 0.764(6) 0.574(6) 0.931(6) 6.000
Algorithm Avg. Rank
SJA NS spoem spo5 dtt cold heat spo diau elu cdc alpha 3DFE Mov HG
LEMLL 0.032(1) 3.019(4) 0.018(1) 0.021(1) 0.016(1) 0.016(1) 0.014(1) 0.018(1) 0.021(1) 0.013(1) 0.014(1) 0.014(1) 0.047(1) 0.108(2) 0.183(2) 1.333
MLFE 0.050(3) 2.530(3) 0.059(4) 0.045(3) 0.084(4) 0.073(3) 0.073(4) 0.063(3) 0.068(2) 0.082(4) 0.082(4) 0.085(4) 0.062(2) 0.088(1) 0.158(1) 3.000
ML 0.058(4) 2.181(2) 0.078(5) 0.061(4) 0.127(5) 0.115(5) 0.131(5) 0.117(5) 0.140(4) 0.204(5) 0.217(5) 0.228(5) 0.072(4) 0.137(3) 0.201(3) 4.267
RELIAB 0.041(2) 1.083(1) 0.023(2) 0.024(2) 0.037(2) 0.045(2) 0.029(2) 0.033(2) 0.077(3) 0.041(2) 0.043(2) 0.052(3) 0.067(3) 0.210(4) 0.287(5) 2.467
FCM 0.156(5) 3.194(5) 0.035(3) 0.067(5) 0.041(3) 0.100(4) 0.031(3) 0.069(4) 0.160(5) 0.052(3) 0.053(3) 0.046(2) 0.148(5) 0.339(5) 0.264(4) 3.933
KM 0.561(6) 3.432(6) 0.532(6) 0.334(6) 0.619(6) 0.588(6) 0.589(6) 0.564(6) 0.540(6) 0.621(6) 0.635(6) 0.636(6) 0.606(6) 0.454(6) 0.729(6) 6.000
TABLE VII: Reconstruction performance (value(rank)) measured by K-L with threshold varying from 0.1 to 0.5 with step size of 0.1
Algorithm Avg. Rank
SJA NS spoem spo5 dtt cold heat spo diau elu cdc alpha 3DFE Mov HG
LEMLL 0.969(1) 0.694(4) 0.989(1) 0.980(1) 0.983(1) 0.983(1) 0.982(1) 0.975(1) 0.980(1) 0.986(1) 0.985(1) 0.987(1) 0.965(2) 0.941(1) 0.857(2) 1.333
MLFE 0.958(2) 0.754(1) 0.957(4) 0.937(2) 0.907(2) 0.917(2) 0.915(2) 0.923(2) 0.909(2) 0.910(2) 0.910(2) 0.914(2) 0.967(1) 0.928(2) 0.876(1) 1.933
ML 0.936(3) 0.749(2) 0.949(5) 0.908(4) 0.852(5) 0.854(4) 0.803(5) 0.808(3) 0.743(5) 0.689(5) 0.672(5) 0.642(5) 0.942(3) 0.879(3) 0.775(3) 4.000
RELIAB 0.892(4) 0.718(3) 0.984(2) 0.936(3) 0.896(4) 0.890(3) 0.836(4) 0.782(4) 0.752(4) 0.814(4) 0.773(4) 0.751(4) 0.901(4) 0.812(4) 0.563(5) 3.733
FCM 0.856(5) 0.527(6) 0.977(3) 0.861(5) 0.905(3) 0.820(5) 0.842(3) 0.723(5) 0.803(3) 0.895(3) 0.843(3) 0.884(3) 0.804(5) 0.766(5) 0.718(4) 4.067
KM 0.637(6) 0.621(5) 0.811(6) 0.695(6) 0.559(6) 0.586(6) 0.492(6) 0.525(6) 0.458(6) 0.429(6) 0.420(6) 0.384(6) 0.675(6) 0.716(6) 0.500(6) 5.933
Algorithm Avg. Rank
SJA NS spoem spo5 dtt cold heat spo diau elu cdc alpha 3DFE Mov HG
LEMLL 0.969(1) 0.695(4) 0.989(1) 0.980(1) 0.983(1) 0.983(1) 0.984(1) 0.977(1) 0.977(1) 0.986(1) 0.983(1) 0.985(1) 0.964(2) 0.941(1) 0.866(2) 1.333
MLFE 0.959(2) 0.755(2) 0.957(4) 0.937(2) 0.907(2) 0.917(2) 0.922(2) 0.928(2) 0.918(2) 0.907(3) 0.908(2) 0.906(3) 0.968(1) 0.928(2) 0.886(1) 2.133
ML 0.940(3) 0.765(1) 0.949(5) 0.908(4) 0.852(5) 0.854(4) 0.844(5) 0.842(4) 0.829(5) 0.752(5) 0.731(5) 0.732(5) 0.944(3) 0.879(3) 0.824(3) 4.000
RELIAB 0.897(4) 0.722(3) 0.984(2) 0.936(3) 0.896(4) 0.890(3) 0.909(4) 0.852(3) 0.850(3.5) 0.882(4) 0.847(4) 0.865(4) 0.906(4) 0.812(4) 0.684(5) 3.633
FCM 0.860(5) 0.526(6) 0.977(3) 0.861(5) 0.905(3) 0.820(5) 0.912(3) 0.790(5) 0.850(3.5) 0.927(2) 0.881(3) 0.940(2) 0.808(5) 0.766(5) 0.766(4) 3.967
KM 0.641(6) 0.621(5) 0.811(6) 0.695(6) 0.559(6) 0.586(6) 0.579(6) 0.583(6) 0.609(6) 0.520(6) 0.507(6) 0.516(6) 0.686(6) 0.716(6) 0.608(6) 5.933
Algorithm Avg. Rank
SJA NS spoem spo5 dtt cold heat spo diau elu cdc alpha 3DFE Mov HG
LEMLL 0.973(1) 0.713(4) 0.989(1) 0.980(1) 0.987(1) 0.985(1) 0.986(1) 0.982(1) 0.977(1) 0.985(1) 0.984(1) 0.984(1) 0.964(2) 0.942(1) 0.871(2) 1.333
MLFE 0.965(2) 0.784(3) 0.957(4) 0.937(2) 0.931(4) 0.937(3) 0.925(4) 0.936(2) 0.918(2) 0.911(4) 0.913(3) 0.910(3) 0.970(1) 0.931(2) 0.892(1) 2.667
ML 0.957(3) 0.810(1) 0.949(5) 0.908(4) 0.908(5) 0.907(4) 0.871(5) 0.884(4) 0.834(5) 0.794(5) 0.795(5) 0.78(5) 0.959(3) 0.887(3) 0.854(3) 4.000
RELIAB 0.953(4) 0.802(2) 0.984(2) 0.936(3) 0.966(2) 0.946(2) 0.943(2) 0.930(3) 0.855(3) 0.918(3) 0.911(4) 0.907(4) 0.943(4) 0.824(4) 0.761(5) 3.133
FCM 0.835(5) 0.563(6) 0.977(3) 0.861(5) 0.963(3) 0.880(5) 0.935(3) 0.876(5) 0.854(4) 0.942(2) 0.919(2) 0.952(2) 0.836(5) 0.771(5) 0.788(4) 3.933
KM 0.749(6) 0.630(5) 0.811(6) 0.695(6) 0.734(6) 0.713(6) 0.659(6) 0.686(6) 0.623(6) 0.594(6) 0.609(6) 0.593(6) 0.759(6) 0.733(6) 0.682(6) 5.933
Algorithm Avg. Rank
SJA NS spoem spo5 dtt cold heat spo diau elu cdc alpha 3DFE Mov HG
LEMLL 0.971(1) 0.725(4) 0.989(1) 0.986(1) 0.988(1) 0.988(1) 0.988(1) 0.984(1) 0.978(1) 0.987(1) 0.986(1) 0.985(1) 0.960(1) 0.955(2) 0.874(2) 1.333
MLFE 0.959(2) 0.801(3) 0.957(4) 0.959(3) 0.934(4) 0.944(3) 0.940(4) 0.946(3) 0.935(2) 0.923(4) 0.922(4) 0.919(4) 0.959(2) 0.959(1) 0.896(1) 2.933
ML 0.953(4) 0.831(2) 0.949(5) 0.946(4) 0.914(5) 0.924(4) 0.907(5) 0.912(5) 0.884(4) 0.847(5) 0.839(5) 0.828(5) 0.954(3) 0.939(3) 0.872(3) 4.133
RELIAB 0.954(3) 0.844(1) 0.984(2) 0.979(2) 0.970(2) 0.964(2) 0.971(2) 0.959(2) 0.904(3) 0.950(3) 0.944(2) 0.938(3) 0.952(4) 0.898(4) 0.812(4) 2.600
FCM 0.857(5) 0.600(6) 0.977(3) 0.943(5) 0.964(3) 0.908(5) 0.967(3) 0.913(4) 0.867(5) 0.951(2) 0.942(3) 0.960(2) 0.877(5) 0.832(6) 0.801(5) 4.133
KM 0.774(6) 0.635(5) 0.811(6) 0.805(6) 0.759(6) 0.779(6) 0.749(6) 0.752(6) 0.735(6) 0.693(6) 0.686(6) 0.675(6) 0.784(6) 0.850(5) 0.737(6) 5.867
Algorithm Avg. Rank
SJA NS spoem spo5 dtt cold heat spo diau elu cdc alpha 3DFE Mov HG
LEMLL 0.967(1) 0.731(4) 0.989(1) 0.985(1) 0.988(1) 0.988(1) 0.989(1) 0.985(1) 0.983(1) 0.988(1) 0.988(1) 0.987(1) 0.949(1) 0.955(2) 0.875(2) 1.333
MLFE 0.957(3) 0.809(3) 0.957(4) 0.969(3) 0.934(4) 0.944(3) 0.943(4) 0.952(3) 0.948(2) 0.935(4) 0.934(4) 0.932(4) 0.946(2) 0.962(1) 0.898(1) 3.000
ML 0.954(4) 0.840(2) 0.950(5) 0.961(4) 0.914(5) 0.924(4) 0.917(5) 0.927(5) 0.915(4) 0.880(5) 0.874(5) 0.869(5) 0.942(3) 0.947(3) 0.873(3) 4.133
RELIAB 0.960(2) 0.866(1) 0.985(2) 0.982(2) 0.970(2) 0.964(2) 0.975(2) 0.974(2) 0.941(3) 0.965(2) 0.964(2) 0.958(3) 0.937(4) 0.914(4) 0.846(4) 2.467
FCM 0.863(5) 0.621(6) 0.978(3) 0.941(5) 0.964(3) 0.909(5) 0.972(3) 0.941(4) 0.882(5) 0.954(3) 0.953(3) 0.965(2) 0.868(5) 0.847(6) 0.806(5) 4.200
KM 0.827(6) 0.638(5) 0.812(6) 0.882(6) 0.759(6) 0.779(6) 0.779(6) 0.800(6) 0.798(6) 0.758(6) 0.754(6) 0.751(6) 0.811(6) 0.880(5) 0.780(6) 5.867
TABLE VIII: Reconstruction performance (value(rank)) measured by Cosine with threshold varying from 0.1 to 0.5 with step size of 0.1

Based on the experimental results, the following observations can be apparently made:

  • LEMLL achieves optimal (lowest) average rank in terms of each evaluation metric (Table IV). On the 15 benchmark data sets, across all the evaluation metrics, LEMLL ranks 1st in 69.3% cases and ranks 2nd in 21.3% cases.

  • When compared with the three well-established two-step approaches, on the 15 data sets (Table II) (Table III), across all the evaluation metrics, LEMLL is significantly superior to MLFE in 89.3% cases, LEMLL is significantly superior to ML in 90.7% cases and LEMLL is significantly superior to RELIAB in 80% cases. Thus LEMLL achieves superior performance over those two-stage approaches.

  • When compared with the three state-of-the-art algorithms, on the 15 data sets (Table II) (Table III), across all the evaluation metrics, LEMLL is significantly superior to ML-kNN in 82.7% cases, LEMLL is significantly superior to CLR in 93.3% cases and LEMLL is significantly superior to ECC in 88% cases.

  • Another interesting observation is that on all the data sets (Table II) (Table III), across all the evaluation metrics, the performance of LEMLL is superior or equal to MSVR, and LEMLL is significantly superior to MSVR in 76% cases, which verify the superiority of the reconstructed numerical labels to the logical labels.

To summarize, LEMLL achieves superior performance over the well-established two-stage algorithms and the three state-of-the-art algorithms across extensive benchmark data sets. LEMLL significantly outperforms MSVR in most cases, which validates the effectiveness of label enhancement for boosting the multi-label learning performance.

Iv-B Reconstruction Performance Evaluation

Iv-B1 Experimental Settings

To further evaluate the numerical labels reconstructed by LEMLL, experimental studies on 15 real-world label distribution data sets [13] with ground-truth label importance are conducted. Table V summarizes the detailed characteristics of the 15 real-world data sets.

Note that the problem of reconstructing label importance from logical labels is relatively new, and the logical multi-label data with ground-truth label importance is not available yet. Thus we consider the following settings of the reconstruction tasks. In a label distribution data set, each instance is associated with a label distribution. The data set used in our experiments, however, contains for each instance not the real distribution, but a set of labels. The set includes the labels with the highest weights in the distribution, and is the smallest set such that the sum of these weights exceeds a given threshold. The settings can model, for instance, the way in which annotators label images or add keywords to texts: it assumes that annotators add labels starting with the most relevant ones, until they feel the labeling is sufficiently complete. Therefore, the logical labels in the data sets can be binarized from the real label distributions as follows. For each instance

, of which the label distribution is , the greatest description degree is found, and the label is set to relevant label. Then, we calculate the sum of the description degrees of all the current relevant labels , where is the set of the current relevant labels. If is less than a predefined threshold , we continue finding the greatest description degree among other labels excluded from and select the label corresponding to the greatest description degree into . This process continues until . Finally, the logical labels to the labels in are set to 1, and other logical labels are set to . In the experiments, varies from 0.1 to 0.5 with step size of 0.1. Thus we use each label distribution data set to form five logical multi-label data sets.

After binarizing the logical labels from the ground-truth label distributions, we recover the numerical labels from the logical labels via the LE algorithms and then the numerical labels are transferred to the label distribution via normalization , where