1. Introduction
Recently, a growing number of studies have focused on the challenging label ambiguity learning problems. Since the singlelabel learning paradigm, in which an instance is mapped to one single logical label simply, has limitations in practice (Zhang and Zhou, 2013), multilabel learning (MLL) is highlighted to address this issue. During past years, a collection of scenarios have applied this learning process (Tsoumakas and Katakis, 2007; Huang and Zhou, 2012; Zhang et al., 2017; Chen et al., 2019; Zhang and Zhang, 2010)
, which simultaneously assigns multiple logical labels to each instance. For example, in supervised MLL, each sample is described by a label vector, elements of which are either 1 or 0 to demonstrate whether this instance belongs to the corresponding label or not. Since multiple logical labels with the same values contribute equally for MLL, the relative importance among these multiple associated labels, which is supposed to be different under most circumstances, is ignored and cannot be well investigated.
Therefore, despite MLL’s success, in some sophisticated scenario, such as facial age estimation
(Geng et al., 2013; Yang et al., 2018) and facial expression recognition (Jia et al., 2019; Kim and Provost, 2015; Zhou et al., 2015), the performance of primitive MLL is hindered, since a model precisely mapping the instance to a realvalued label vector with the quantitative description degrees, i.e., label distribution, is required in these tasks. To meet this demand, the learning process for the abovementioned model called ”label distribution learning” (LDL) (Geng, 2016) has attracted significant attention. In LDL, an instance is annotated by a label vector, i.e., the label distribution, and each element ranging from 0 to 1 is the description degree of the relevant label and all values add up to 1. As many pieces of literature have demonstrated (Geng, 2016; Gao et al., 2017; Zheng et al., 2018), label distributions can describe attributes of samples more precisely because the relative importance of multiple labels is much different in many realworld applications, and implicit cues within the label distributions can be effectively leveraged through LDL for reinforcing the supervised training.Nevertheless, since manually annotating each instance with label distribution is laborintensive and timeconsuming, it is unavailable in most training sets practically (Xu et al., 2019b). The requirement of label distribution among different datasets arises some progress in the label enhancement (LE), which is proposed by (Xu et al., 2018, 2019a). Specifically, LE can be a preprocessing of LDL, in other words, label distributions can be exactly recovered from the offtheshelf logical labels and the implicit information of the given features by LE, as shown in Fig. 1.
Obviously, according to the definition (Xu et al., 2018, 2019a), the essence of recovering process is to utilize the information from two aspects: 1) the underlying topological structure in the feature space, and 2) the existing logical labels. Accordingly, several approaches have been proposed in recent years. To leverage the knowledge in the feature space, some prior efforts assign the membership degree of each instance to different labels via fuzzy clustering method (FCM) (Melin and Castillo, 2005; El Gayar et al., 2006). Besides, some approaches construct graph structures in the feature space to improve the recovering process (Zhu and Goldberg, 2009; Xu et al., 2018). However, most existing LE methods doesn’t fully investigate and utilize both the underlying structure in the feature space and the implicit information of the existing logical labels. For example, graphs and similarity matrices used in the aforementioned existing LE methods can not fully explore the intrinsic information of data samples, since edges in graphs or elements of similarity matrices are calculated by a pairwise method (Li et al., 2015) or a
nearest neighbors (KNN) strategy
(Hou et al., 2016; Xu et al., 2018). The downside of these partialbased graph construction processes is that only local topological features can be utilized, and the holistic information of the feature space is largely untapped. In addition, these approaches always require some prior knowledge for the graph construction, that is to say, if the parameters of KNN is tuned slightly, the recovery performance of these algorithms may vary on a large scale, which is not expected in practice.Here, we aim to employ the intrinsic global sample correlations to obtain an exact label distribution recovering. Since the lowrank representation (LRR) (Liu et al., 2012) can unearth the global structure of the whole feature space, it is expected to achieve a promising LE performance by employing LRR to supervise the label distribution recovering process. To this end, a novel Label Enhancement with Sample Correlations, termed LESC, is proposed in this paper. The proposed method imposes a lowrank constraint on the data subspace representation to capture the global relationship of all instances. Clearly, LRR is employed to benefit the LE performance by exploiting an intrinsic sample correlations in the feature space from a global perspective (Liu and Yan, 2011; Yin et al., 2015; Zheng et al., 2020a, b). Since both labels are also the semantic features of data samples, it is natural and intuitive to transfer the constructed lowrank structure in the feature space to the label space smoothly. More importantly, by extending on the investigation of sample correlations employed in our previous work (Tang et al., 2020)
, this paper also proposes a generalized Label Enhancement with Sample Correlations, dubbed gLESC for short. This method can jointly explore the implicit information in both the feature space and the label space by employing a tensorSingular Value Decomposition (tSVD)
(Kilmer et al., 2013) based lowrank tensor constraint. Actually, the sample correlations simply obtained from the feature space is not the optimal choice for label distributions recovering, since excessive ineffective information, which is useless for LE, is also contained in the feature space. For example, regarding to the facial emotion labels, the sample correlations information of gender and identity in the feature space may hinder the recovering process of LE. To address this problem, the existing logical labels are also leverage to attain the desired and intrinsic sample correlations, which can be more suitable for LE. It is clear that samples with similar label distributions have similar logical labels, but not vice versa. Figuratively speaking, by imposing a tSVD based lowrank tensor constraint on both the feature space and label space jointly, logical labels play a role to remove unwanted information. Once the desired sample correlations are attained, they are leveraged to supervise the recovering process of LE, and optimal recovered label distributions can be achieved after LE. Extensive experiments conducted on 14 benchmark datasets illustrate that our proposed methods are stable to obtain remarkable performance.Our contributions can be summarized as follows:

By incorporating the sample correlations into the recovering process of LE, a novel Label Enhancement with Sample Correlations, named LESC, is proposed in this paper. It uses the lowrank representation in the feature space to explore the global instances relationship for the LE improvement.

To further investigate the intrinsic sample correlations for LE, a novel generalized LESC (gLESC) is also proposed. By imposing a tSVD based lowrank tensor constraint on both the feature space and label space, the proper sample correlations for LE can be achieve effectively.

Comprehensive experiments conducted on 14 datasets, including an artificial dataset and 13 realworld datasets, show the excellent power and generation of our methods compared with several stateoftheart methods.
The remainder of this paper is organized as follows. The next section reviews related works of LE. Section 3 elaborates our proposed approaches, including LESC and gLESC. Comprehensive experimental results and corresponding discussions are provided in Section 4. Finally, conclusions of this paper are drawn in Section 5.
2. Related Work
For the convenience of the description of related works, we declare the fundamental notations in advance. The set of labels is , where is the size of the label set. For an instance , the logical label is denoted as and , while the corresponding label distribution is denoted as:
(1) 
where depicts the degree to which belongs to label . The goal of the LE process is to recover the associated label distributions of every instance from the existing logical labels in a given training set.
This definition is formally presented by (Xu et al., 2018, 2019a), in which a LE method, termed GLLE, is also proposed. It is worth noting that some studies have concentrated on the same issue earlier. For instance, fuzzy clustering method (Melin and Castillo, 2005) is applied in (El Gayar et al., 2006), which intends to allocate the description values to each instance over diverse clusters. Specifically, features are clustered into clusters via fuzzy means clustering where denotes the th cluster center. The cluster membership for each instance is obtained by calculating the description value over the center as follows:
(2) 
where
is larger than 1. Afterward, a zero matrix
is initialized and it is continuously updated by:(3) 
where denotes the th row of . They constructed prototype label matrix through which classes and clusters are softly associated. After normalizing the columns and rows of to sum to 1, the label distribution is computed for each instance using fuzzy composition:
In addition, other recent studies have focused on the graphbased approaches to tackle the LE problem. They construct the similarity matrix over the features space via various strategies. (Hou et al., 2016) recoveres the label distribution according to manifold learning (ML), which ensures them to gradually convert the local structure of the feature space into the label space. In particular, to represent this structure, the similarity matrix is established based on the assumption that each feature can be represented by the linear combination of its KNN, which means to minimize:
(4) 
where if belongs to the KNNs of ; otherwise, . They further constrain that for translation invariance. The constructed graph is transferred into the label space to minimize the distance between the target label distribution and the identical linear combination of its KNN label distributions (Roweis and Saul, 2000), which infers the optimization of:
(5) 
by adding the constraint of , where . This formula is minimized with respect to the target label distribution through a constrained quadratic programming process.
(Li et al., 2015) regards the LE as the label propagation (LP) process (Zhu and Goldberg, 2009). The pairwise similarity is calculated over the complete feature space and a fullyconnected graph is established as:
(6) 
where and is fixed to be 1. The required LP matrix is built from the formula: with denoting a diagonal matrix where equals to the sum of th row element in . Thus far, The LP is iteratively implemented, and it is proved that the recovered label distribution matrix converges to:
(7) 
with denoting the tradeoff parameter that controls the contribution between the label propagation and the initial logical label matrix .
For the GLLE algorithm, the similarity matrix is also constructed in the feature space by partial topological structure. Different from LP, which calculates the pairwise distance within the whole feature space, the GLLE algorithm computes the distance between a specific instance and its KNNs to define the relevant element in the similarity matrix as follows:
(8) 
where is the set of
’s KNNs. Because of the same intuition that these relationships could be converted into the label distribution space, this constructed graph is incorporated into the label space to attain a matrix linearly transforming the logical labels to the label distributions, obtaining the previous stateoftheart results. Since we normalize each
by the softmax normalization for the abovementioned algorithms, the condition can be satisfied.Because it is fully recognized that establishing the similarity matrix based on pairwise or local feature structure can hinder these approaches’ performances, here, the LRR and the tSVD based lowrank tensor constraint are introduced to excavate the global information and to leverage the attained proper sample correlations to overcome these aforementioned drawbacks in the label distribution recovering process.
3. Our Proposed Approaches
In this section, our methods, i.e., LESC and gLESC, are introduced detailed. In a training set , all instances are vertically concatenated along the column to attain the feature matrix , where and . After the LE process, a new LDL training set can be rehabilitated to implement the LDL process. Here we use and denote the logical label matrix and the objective label distribution matrix respectively.
3.1. LESC Approach
For a given instance , it is necessary to find an effective model to recover the best label distribution. and the mapping model employed in this paper can be written as follows:
(9) 
where indicates a linear transformation parameterized by , and embeds into a highdimensional space where the Gaussian kernel function is determined to be employed.
To get an optimal , the following objective function can be formulated:
(10) 
where
denotes a loss function,
is used to excavate the underlying information of sample correlations, and is a tradeoff parameter. To be specific, we will elaborate and detailed in this section.Since the prior knowledge of the groundtruth label distribution is unavailable, we establish the loss function between the recovered label distributions and the logical labels. The leastsquares (LS) loss function is adopted as the first term in (10):
(11) 
As for , the sample correlations are employed here. It is noteworthy the LRR is imposed on the feature space in our proposed LESC. Global sample correlations in the feature space can be achieved by LRR, since all samples and their global relationships are expressed by the linear combination of other related samples. Accordingly, this property can be transferred to the label space under general conditions. Therefore, it is expected that the lowrank recovery to the label distribution can be expressed, which means to discover a proper for minimizing the distance between and , where is the minimized LRR of the feature space. This leads the second term of the optimization formula (10) to be as follows:
(12) 
To be clear, the flow chart of our LESC is present in Fig. 2. As can be observed, the sample correlations are obtained by applying the lowrank representation on the feature space. In other words, the proposed LESC aims at seeking the LRR of the feature matrix to excavate the global structure in the feature space. Consenquently, by assuming that , it is natural and necessary to solve the following rank minimization problem:
(13) 
where indicates the samplespecific corruptions, and is the lowrank coefficients which balances the effects between two parts. is used to denote the desired lowrank representation of feature X with respect to the variable . Practically, the rank function can be replaced by a nuclear norm to transfer (13) into a convex optimization problem. As a result, we have the following problem:
(14) 
To get optimal solution, the augmented Lagrange multiplier (ALM) with Alternating Direction Minimization strategy (Lin et al., 2011) is employed in this paper. Specifically, an auxiliary variable, i.e., , is introduced here so as to make the objective function separable and convenient for optimization. Therefore, (14) can be rewritten as follows:
(15) 
and the corresponding ALM problem can be solved by minimizing the following function:
(16)  
which can be decomposed into following subproblems:
3.1.1. Jsubproblem
The subproblem of updating J can be written as follows:
(17) 
By leveraging the singular value threshold (SVT) method (Lin et al., 2011), the optimal can be achieved. To be specific, we impose the singular value decomposition (SVD) on , i.e., , and the following solution can be achieved:
(18) 
in which is a soft thresholding operator with the following formulation:
(19) 
3.1.2. Csubproblem
By fixing other variables, the subproblem with respect to C can be formulated as follows:
(20)  
and the optimal solution is , with
(21) 
where
denotes an identity matrix with the proper size.
3.1.3. Esubproblem
For updating E, we solve the following problem:
(22) 
the closeform solution of which can be written as follows:
(23) 
where .
3.1.4. Updating Lagrange multipliers and
When other variables are fixed, Lagrange multipliers and can be updated as follows:
(24)  
in which . Obverously, is increased monotonically by until reaching the maximum, i.e., .
When is optimized, the desired sample correlations can be achieved, i.e., . Consequently, (10) can be rewritten as follows:
(25)  
where .
Aiming to achieve the optimal solution, i.e., , the minimization of this objective function will be solved by an effective quasiNewton method called the limited memory BFGS (LBFGS) (Nocedal and Wright, 2006), of which the optimizing process is associated with the firstorder gradient. Once the formula converges, we feed the optimal into (9) to form the label distribution . Furthermore, since the defined label distribution needs to meet the requirement , is normalized by the softmax normalization form.
3.2. gLESC Approach
LESC employs a lowrank subspace representation in the feature space to get the sample correlations, which can be utilized to supervise the recovering process of label distributions. Under the assumption that both the features and the labels are all the semantic information of samples, it is natural and reasonable to impose the aforementioned constraint on the desired label distributions. However, only the sample correlations in the feature space are investigated in the proposed LESC, and the corresponding information hidden in the existing logical labels are ignore. Actually, the sample correlations obtained by LRR in the feature space are influenced by some interference information. For example, in the facial emotion dataset, the gender and identity information may be contained in the sample correlations, which are attained by LRR in the feature space. Obviously, it obstructs the exact recovering process of label distributions.
Since the existing logical labels do not contain the unwished information, it is a good choice to incorporate the underlying information of these existing logical labels into the formation process of desired sample correlations. To this end, a generalized label enhancement with sample correlations (gLESC) is also proposed in this paper, and the corresponding flow chart is shown in Fig. 3. As can be observed in Fig. 3, the underlying sample correlations of both the sample features and existing logical label can be achieved to supervise the whole recovering process of label distributions, so the refinement of LE can be attained, as well as the implicit information of data samples can be leveraged fully. To achieve this goal, the tensorSingular Value Decomposition (tSVD) based lowrank tensor constraint (Kilmer et al., 2013) is introduced in this section. It should be noted that the difference between LESC and gLESC is the construction of sample correlations. In the proposed gLESC, we have the following formulation:
(26)  
where and are 3order tensors constructed by and , respectively.
denotes a tSVD based tensor nuclear norm
(Kilmer et al., 2013), which can be calculated as follows:(27) 
in which
denotes a fast Fourier transformation (FFT) along the 3rd dimension of
, i.e, the frontal slices of (Zhang et al., 2014), and indicates the kth diagonal element of , which can be calculated as follows:(28) 
To solve the minimization problem of (26), we can construct the following ALM equation:
(29)  
where is an auxiliary tensor variable, , and are the Lagrange multipliers. Consequently, we have the following subproblems:
3.2.1. subproblem
By fixing other variables, the optimal solution can be written as follows:
(30) 
which is a standard tSVD based tensor nuclear norm minimization problem with the following optimization (Hu et al., 2016):
(31) 
in which and can be calculated as follows:
(32) 
and is a tensor tubalshrinkage operator with the following definition:
(33) 
where is a fdiagonal tensor. Specifically, elements of can be formulated as follows:
(34) 
3.2.2. subproblem
With other variables fixed, the subproblem of updating can be written as follows:
(35)  
which has the closedform solutions. To be specific, we take the derivative of the above function with respect to and respectively, then set the corresponding derivative to 0, the optimal solutions can be attained as follows:
(36)  
where , , , and have the following formulations:
(37)  
where I indicates an identity matrix with the proper size.
3.2.3. subproblem
When other variables are fixed, the subproblem can be formulated as follows:
(38)  
Since the definition of norm of a tensor is to get the total sum of norm of each fiber in the 3rd dimension of this tensor, it is obvious that . So here we can reformulate (38) as follows:
(39) 
where denotes the matricization along the 3rd direction, with
(40)  
Accordingly, the closedform solution of (38) can be obtained as follows:
(41) 
in which and indicate the ith column of and respectively.
3.2.4. Updating Lagrange multipliers
we update Lagrange multipliers as follows:
(42)  
Once the problem of (26) is optimized, we can obtain the desired sample correlations as follows so as to achieve an exact recovering process of label distributions:
(43) 
The subsequent operation of our gLESC is similar to (25), for the compactness of this paper, we skip them over in this section.
4. Experiments
To validate the effectiveness and superiority of our methods, extensive experiments are conducted, and experimental results together with the corresponding analyses are reported in this section. Label recovery experiments are performed on 14 benchmark datasets^{1}^{1}1http://palm.seu.edu.cn/xgeng/LDL/index.htm, and the corresponding flowchart is shown in Fig. 4.
4.1. Datasets
The fundamental statistics of 14 datasets, including 13 realworld datasets and a toy dataset, employed for evaluation can be observed in Table 1. To be specific, the first 3 realworld datasets are created from movies, facial expression images, the remaining 10 realworld datasets from Yeastalpha to Yeastspoem are collected from the records of some biological experiments on the budding yeast genes(Eisen et al., 1998). As for the artificial dataset, which is also adopted in (Xu et al., 2018) to intuitively exhibits the model’s ability of label enhancement, each instance is chosen following the rule that the first two dimensions and are formed as a grid with an interval of 0.04 in the range [1,1], while the third dimension is computed by:
(44) 
The corresponding label distribution is collected through the following equations:
(45) 
(46) 
and label distributions can be obtained as follows:
(47) 
where , , , , , , , , and .
Dataset  # Instances  # Features  # Labels  






It is noteworthy that due to the lack of datasets with both logical labels and label distributions, the logical labels had to be binarized from the groundtruth label distributions in the original datasets so as to implement LE algorithms and measure the similarity between the recovered label distributions and the groundtruths. To ensure the consistency of evaluation, we binarized the logical labels through the way in (Xu et al., 2018) in this section.
4.2. Experimental Settings
To fully investigate the performance of our algorithms, i.e., LESC and gLESC, five stateoftheart algorithms, including, FCM (El Gayar et al., 2006), KM (Jiang et al., 2006), LP (Li et al., 2015), ML (Hou et al., 2016), and GLLE (Xu et al., 2019a) are employed. We list the parameter settings here. The parameters and are selected among in our LESC and gLESC. In consistent with the parameters used in (Xu et al., 2019a), the parameter in LP is fixed to be 0.5, Gaussian kernel is employed in KM, the number of neighbors for ML is assigned to be , and the parameter in FCM is fixed to be 2. Regarding to GLLE, the number of neighbors is assigned to be and the optimal value of parameter is chosen from .
Since both the recovered and groundtruth label distributions are label vectors, the average distance or similarity between them is calculated to evaluate the LE algorithms thoroughly. For a fair comparison, six measures are selected, where the first four are distancebased measures and the last two are similaritybased measures, reflecting the performance of LE algorithms from different aspects in semantics. As shown in Table 2 where
denotes the real label distributions, for these metrics, i.e., Chebyshev distance (Cheb), Canberra metric (Canber), Clark distance (Clark), KullbackLeibler divergence (KL), cosine coefficient (Cosine), and intersection similarity (Intersec),
states ”the smaller the greater”, and states ”the larger the greater”.Measure  Formula 

Cheb  
Canber  
Clark  
KL  
Cosine  
Intersec  
4.3. Analysis of Recovery Performance
First, we evaluate the recovery performance on the artificial dataset, and to illustrate the recovery performance visually, the threedimensional label distributions are separately converted into the RGB color channels, which are reinforced by the decorrelation stretch process for easier observation. In other words, the label distribution of each point in the feature space can be represented by its color. Thus far, the color patterns can be directly observed to compare both the ground truth and the recovered label distributions. As shown in Fig. 5, in contrast to the groundtruth color patterns, our algorithms, both LESC and gLESC, can nearly recover these patterns identically, while GLLE obtains almost the same results. In addition, the color patterns in other four algorithms, i.e., FCM, KM, LP, and ML, are barely satisfactory, which proves the limits of excavating the space structure of features locally. Clearly, our methods can achieve good recovery performance on the artificial dataset.
Dataset  Measure Results by Cheb  Measure Results by Canber  
215  FCM  KM  LP  ML  GLLE  LESC  gLESC  FCM  KM  LP  ML  GLLE  LESC  gLESC  
















Avg.Rank  5.07  7.00  3.93  5.86  3.07  1.86  1.14  4.27  7.00  4.71  6.00  3.00  1.86  1.07  
Dataset  Measure Results by Clark  Measure Results by KL  
215  FCM  KM  LP  ML  GLLE  LESC  gLESC  FCM  KM  LP  ML  GLLE  LESC  gLESC  
















Avg.Rank  4.43  7.00  4.57  6.00  3.00  2.00  1.00  4.64  7.00  4.36  5.79  2.86  1.86  1.00  
Dataset  Measure Results by Cosine  Measure Results by Intersec  
215  FCM  KM  LP  ML  GLLE  LESC  gLESC  FCM  KM  LP  ML  GLLE  LESC  gLESC  
















Avg.Rank  4.86  6.93  4.29  5.93  2.93  1.79  1.14  4.50  7.00  4.64  5.86  3.00  2.13  1.07  
To further investigate the recovery performance, we present the quantitative results of these aforementioned algorithms in metrics of Cheb, Canber, Clark, KL, Cosine, and Intersec (as shown in Table 3). To exhibit the mean accuracy of the recovered label distributions, the average rank of different methods among all datasets is also listed, and the optimal results for each dataset are highlighted with boldface. In a big picture, the proposed LESC achieves the secondbest recovery performance, and the proposed gLESC obtains the best results. For example, average rank of LESC and gLESC in the metric of Clark is 2.00 and 1.00, respectively. Regrading to the artificial dataset, apparently, the corresponding quantitative results are consistent with the recovered color patterns in Fig. 5. For 13 realworld datasets, results from Table 3 also demonstrate the superiority of our LESC and gLESC. For example, from Yeastalpha dataset to Yeastspoem dataset, LESC and gLESC attain the best recovery performance, and gLESC always ranks the first place. Additionally, we also report the critical difference (CD) of average rank in this section. As can be seen in Fig. 6, the CD diagrams show that our gLESC achieves the optimal recovery results on all metrics, and the proposed LESC also attain the suboptimal performance. In general, the recovery performance can be ranked as gLESC>LESC>GLLE>LP>FCM>ML>KM.
Are sample correlations obtained by the lowrank representation suitable for LE? As can be observed from Table 3 and Fig. 6, we can conclude that LESC and gLESC, which leverage the lowrank representation to attain global sample correlations for LE, outperform GLLE, which use the distancebased similarity to get label recovery, by a large margin. Consequently, it is clear that sample correlations obtained by the lowrank representation are suitable for LE.
Are sample correlations captured from both the feature space and label space better for LE? Comparing to LESC, gLESC leverages a tensor multirank minimization to obtain the sample correlations from both the feature space and label space. Since the sample correlations investigated in gLSEC are more suitable than that in LESC, it is expected that gLESC can attain better recovery performance. From the quantitative experimental results in Table 3 and Fig. 6, we can conclude that sample correlations captured from both the feature space and label space are better for LE.
4.4. Parameters Sensitivity
Two tradeoff hyperparameters, including
and , are involved in our proposed methods. The influence of them is analyzed separately by fixing one parameter and tuning another one chosen from . In this section, we take experimental results on SBU_3DFE, Yeastalpha, and Yeastcold in metrics of Cheb and Cosine for example, which can be seen in Fig. 7 and Fig. 8. Although only the cases of three datasets are illustrated here, the same observations can be obtained in other datasets.For LESC, when the lowrank coefficient varies with the tradeoff parameter fixed, two shown measure results of the recovery performance fluctuates in a very tiny range that could not even be distinguished. As we increase the parameter from 0.0001 to 0.1, the recovery performance also turns out to change within a small scope. When is geared to 1 or 10, the results even zooms up to a higher level. Particularly, taking Yeastalpha dataset for reference, it is found that when is chosen from , our worst measure result still far exceeds that of the previous stateoftheart baseline, i.e, 0.987 versus 0.973 (best result attained by GLLE) in the metric of Cosine. Regrading to gLESC, similar observations can be reached as well, and we skip them here for the compactness of this paper. As discussed before, these phenomena indicate that our algorithms, both LESC and gLESC, are robust when the values of and in the objective function vary by a large scope. This ensures us to generalize our algorithm to different datasets without much effort in terms of adjusting the values of hyperparameters.
5. Conclusion
In this paper, two novel LE methods, i.e., LESC and gLESC, are proposed to boost the LE performance by exploiting the underlying sample correlations. LESC explores the lowrank representation from the feature space, and gLESC further investigates the sample correlations by utilizing a tensor multirank minimization to obtain more suitable sample correlations from both the feature space and label space during the label distribution recovery process. Extensive experimental results on 14 datasets show that LE can really benefit from the sample correlations. They demonstrate the remarkable superiority of the proposed LESC and gLESC over several stateoftheart algorithms in recovering the label distributions. Further analysis on the influence of hyperparameters verifies the robustness of our methods.
Acknowledgements.
This work is supported in by the Fundamental Research Funds for Central Universities under Grant Nos. xzy012019045.References

Multilabel image recognition with graph convolutional networks.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 5177–5186. External Links: Link Cited by: §1.  Cluster analysis and display of genomewide expression patterns. Proceedings of the National Academy of Sciences 95 (25), pp. 14863–14868. External Links: Link Cited by: §4.1.

A study of the robustness of knn classifiers trained using soft labels
. InIAPR Workshop on Artificial Neural Networks in Pattern Recognition
, pp. 67–80. External Links: Link Cited by: §1, §2, §4.2.  Deep label distribution learning with label ambiguity. IEEE Transactions on Image Processing 26 (6), pp. 2825–2838. External Links: Link Cited by: §1.
 Facial age estimation by learning from label distributions. IEEE transactions on pattern analysis and machine intelligence 35 (10), pp. 2401–2412. External Links: Link Cited by: §1.
 Label distribution learning. IEEE Transactions on Knowledge and Data Engineering 28 (7), pp. 1734–1748. External Links: Link Cited by: §1, Figure 4.

Multilabel manifold learning.
In
Thirtieth AAAI Conference on Artificial Intelligence
, Cited by: §1, §2, §4.2.  The twist tensor nuclear norm for video completion. IEEE transactions on neural networks and learning systems 28 (12), pp. 2961–2973. External Links: Link Cited by: §3.2.1.
 Multilabel learning by exploiting label correlations locally. In Twentysixth AAAI conference on artificial intelligence, Cited by: §1.
 Deep learning for time series classification: a review. Data Mining and Knowledge Discovery 33 (4), pp. 917–963. External Links: Link Cited by: Figure 6.
 Facial emotion distribution learning by exploiting lowrank label correlations locally. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9841–9850. External Links: Link Cited by: §1.
 Fuzzy svm with a new fuzzy membership function. Neural Computing & Applications 15 (34), pp. 268–276. External Links: Link Cited by: §4.2.
 Thirdorder tensors as operators on matrices: a theoretical and computational framework with applications in imaging. SIAM Journal on Matrix Analysis and Applications 34 (1), pp. 148–172. External Links: Link Cited by: §1, §3.2.
 Emotion recognition during speech using dynamics of multiple regions of the face. ACM Trans. Multimedia Comput. Commun. Appl. 12 (1s). External Links: ISSN 15516857, Link, Document Cited by: §1.
 Leveraging implicit relative labelingimportance information for effective multilabel learning. In 2015 IEEE International Conference on Data Mining, pp. 251–260. External Links: Link Cited by: §1, §2, §4.2.
 Linearized alternating direction method with adaptive penalty for lowrank representation. In Advances in neural information processing systems, pp. 612–620. Cited by: §3.1.1, §3.1.
 Robust recovery of subspace structures by lowrank representation. IEEE transactions on pattern analysis and machine intelligence 35 (1), pp. 171–184. External Links: Link Cited by: §1.

Latent lowrank representation for subspace segmentation and feature extraction
. In 2011 International Conference on Computer Vision, pp. 1615–1622. External Links: Link Cited by: §1.  Hybrid intelligent systems for pattern recognition using soft computing: an evolutionary approach for neural networks and fuzzy systems. Vol. 172, Springer Science & Business Media. Cited by: §1, §2.
 Numerical optimization. Springer Science & Business Media. Cited by: §3.1.4.
 Nonlinear dimensionality reduction by locally linear embedding. science 290 (5500), pp. 2323–2326. External Links: Link Cited by: §2.
 Label enhancement with sample correlations via lowrank representation. In Proceedings of the AAAI Conference on Artificial Intelligence, Cited by: §1.
 Multilabel classification: an overview. International Journal of Data Warehousing and Mining (IJDWM) 3 (3), pp. 1–13. External Links: Link Cited by: §1.
 Label enhancement for label distribution learning. IEEE Transactions on Knowledge and Data Engineering. External Links: Link Cited by: §1, §1, §2, §4.2.
 Partial label learning via label enhancement. In AAAI Conference on Artificial Intelligence, External Links: Link Cited by: §1.
 Label enhancement for label distribution learning.. In IJCAI, pp. 2926–2932. External Links: Link Cited by: §1, §1, §2, §4.1, §4.1.
 Joint estimation of age and expression by combining scattering and convolutional networks. ACM Trans. Multimedia Comput. Commun. Appl. 14 (1). External Links: ISSN 15516857, Link, Document Cited by: §1.
 Laplacian regularized lowrank representation and its applications. IEEE transactions on pattern analysis and machine intelligence 38 (3), pp. 504–517. External Links: Link Cited by: §1.
 Sparse representationbased semisupervised regression for people counting. ACM Trans. Multimedia Comput. Commun. Appl. 13 (4). External Links: ISSN 15516857, Link, Document Cited by: §1.
 Multilabel learning by exploiting label dependency. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’10, New York, NY, USA, pp. 999–1008. External Links: ISBN 9781450300551, Link, Document Cited by: §1.
 A review on multilabel learning algorithms. IEEE transactions on knowledge and data engineering 26 (8), pp. 1819–1837. External Links: Link Cited by: §1.
 Novel methods for multilinear data completion and denoising based on tensorsvd. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3842–3849. External Links: Link Cited by: §3.2.
 Feature concatenation multiview subspace clustering. Neurocomputing 379C, pp. 89–102. External Links: Link Cited by: §1.
 Constrained bilinear factorization multiview subspace clustering. Knowledge Based Systems, pp. 105514. External Links: Link Cited by: §1.
 Label distribution learning by exploiting sample correlations locally. In ThirtySecond AAAI Conference on Artificial Intelligence, Cited by: §1.
 Emotion distribution recognition from facial expressions. In Proceedings of the 23rd ACM International Conference on Multimedia, MM ’15, New York, NY, USA, pp. 1247–1250. External Links: ISBN 9781450334594, Link, Document Cited by: §1.

Introduction to semisupervised learning
. Synthesis lectures on artificial intelligence and machine learning 3 (1), pp. 1–130. Cited by: §1, §2.