I Introduction
As a weaklysupervised learning framework, partiallabel learning
^{1}^{1}1In some literature, partiallabel learning is also named as superset label learning [1], ambiguous label learning[2] or soft label learning [3]. (PLL) learns from ambiguous labeling information where each training example is associated with a candidate label set instead of a uniquely explicit label [4][5] [6]. It aims at disambiguating the groundtruth label from the candidate label set among which the other labels are incorrect.In recent years, such learning mechanism has been widely used in many realworld scenarios. For example, in crowdsourcing online annotation system(Figure 1(A)) [7]
, users with different knowledge background probably annotate the same image with different labels. Thus, it is necessary to find the correspondence between each image and its groundtruth label which is resided in the candidate annotations. Another representative application is naming faces in images using text captions (Figure
1(B)) [8] [9]. In this setting, since the name of the depicted people typically appears in the caption, the resulting set of images is ambiguously labeled if more than one name appears in the caption. In other words, the specific correspondence between the faces and their names are unknown. Partial label learning provides an effective solution to resolve such weakly supervision problem by disambiguating the correct label from large number of ambiguous labels. In addition to the applications mentioned above, PLL has also been widely used in many other scenarios, including web mining [7], facial age estimation
[10], multimedia content analysis [11] [12], ecoinformatics [13], etc.Ia Related Work
Existing PLL algorithms can be roughly grouped into the following three categories: Average Disambiguation Strategy, Identification Disambiguation Strategy and DisambiguationFree Strategy, respectively.
IA1 Average Disambiguation Strategy (ADS)
ADS based PLL methods assume that each candidate label contributes equally to the modeling process and make prediction for unseen instances by averaging the output from all candidate labels. Following such strategy, [14] and [15] adopt an instancebased model following to predict the label of unseen instance . [5] disambiguates the groundtruth label by averaging the outputs from all candidate labels, i.e. . [16] and [17] also adopt an instancebased model and they make prediction via knearest neighbors weighted voting and minimum error reconstruction criterion. [10] proposes the socalled PLLEAF algorithm, which facilitates the disambiguation process by taking the local topological information from feature space into consideration. Obviously, the ADS based PLL methods mentioned above are intuitive and easy to implement. However, these algorithms share a critical shortcoming that the output of the groundtruth label could be overwhelmed by the outputs of the other false positive labels, which will enforce negative influence on the effectiveness of the final model.
IA2 Identification Disambiguation Strategy (IDS)
IDS based PLL methods are proposed to alleviate the shortcoming of ADS based methods mentioned in Section IA1. This strategy aims at directly identifying the groundtruth label from corresponding candidate label set instead of averaging the output from all candidate labels. Existing PLL algorithms following this strategy often regard the groundtruth label as a latent variable first, identified as , and then refine the model parameter iteratively by utilizing some specific criterions. For example, considering the fact that treating each candidate label equally is inappropriate, Jin et al. [18]
utilizes ExpectationMaximization (EM) procedure to optimize the latent variable
, which is based on maximum likelihood criterion: . Similarly, [2], [13], [19], [20], [18] and [21] also adopt the maximum likelihood criterion to refine the latent variable. Moreover, Maximum margin technique is also widely employed as objective function in the PLL problem. For example, [22]maximums the margin between the output from candidate labels and noncandidate labels to train a multiclass classifier:
, while [23] directly maximums the margin between the groundtruth label and the other candidate labels: . Although these IDS based PLL methods have achieved satisfactory performances in many scenarios, they suffer from the common shortcoming that training instances may be assigned with incorrect labels during each iterative optimization, especially for the PL data whose instances or candidate labels are difficult to disambiguate, and it will affect the optimization of classifier parameters in the next iteration.IA3 DisambiguationFree Strategy (DFS)
More recently, different from the above two PLL strategies, some attempts have been made to learn from PL data by fitting the PL data to existing learning techniques instead of disambiguation. [4] proposes a disambiguationfree algorithm named PLECOC, which utilizes ErrorCorrecting Output Codes (ECOC) coding matrix [24] and transfers the PLL problem into binary learning problem. [25] proposes another disambiguationfree algorithm called PALOC, which enables binary decomposition for PL data in a more concise manner without relying on extra manipulations such as coding matrix. However, the performance of these two algorithms are inferior to IDS based methods in some scenarios.
IB Our Motivation
Although the algorithms mentioned above have obtained desirable performance in many realworld scenarios, they still suffer from several common drawbacks. For example, the PLL methods mentioned above treat all the training examples equally together with their candidate labels, but none of them take the complexity of training examples and that of the labels into consideration. However, in realworld scenarios, examples with different backgrounds and labels in different candidate label sets often express varying difficulties. For example, as shown in Figure 2, we can easily see that images (B,D) is harder than images (A,C) not only from the complexity of instance but also from the number of candidate labels. In particular, when we utilize the iterative optimization method to refine the model parameters, the label cat may be assigned to the image B in an iteration, but obviously such assignment is a large noise in the subsequent iterations, which has bad effect on the classifier model. Thus, to improve the effectiveness of model, the complexity of the training instances together with the candidate labels should be taken into consideration.
In recent years, inspired by the cognitive process of human, SelfPaced Learning (SPL) is proposed to deal with the above problem, which automatically leads the learning process from easy to hard [26][27]. Concretely, during the optimization process of SPL, the ’easy’ samples will be selected and learned first in the previous iterations, and then the ’hard’ samples can be gradually selected in the subsequent iterations. The learning mechanism can smoothly guide the learning process to pay more attention to the reliable discriminative data rather than the confusing ones [28]. So far, SPL has obtained empirical success in many research fields, such as multiinstance learning [29][30], multilabel learning [31], multitask learning [32], multiview learning [33], matrixfactorization [34], face identification [35] and so on.
In light of this observation, in this paper, we build a connection between the PLL and the SPL, and propose a novel unified framework SelfPaced PartialLabel Learning (SPPLL). With an adaptive step from ’easy’ to ’hard’, SPPLL can dynamically adjust the learning order of the training data (i.e. examples together with their candidate labels) and guide the learning to focus more on the data with highconfidence. Benefiting from the selfcontrolled sampleselected scheme, SPPLL can effectively capture the valuable label information from the true label and minimize the negative impact of other candidate labels. Experimental results on UCI data sets and realworld data sets demonstrate the effectiveness of the proposed approach.
Ii Background
In the following two sections, we separately give brief introduction about the work of partiallabel learning [23] and selfpaced learning [34], which our approach originates from.
Iia Partiallabel learning (PLL)
Formally speaking, we denote by the ddimensional input space, and the output space with q class labels. PLL aims to learn a classifier from the PL training data , where the instance is described as a d
dimensional feature vector and the candidate label set
is associated with the instance . Furthermore, let be the groundtruth label assigments for training instances and each of is not directly accessible during the training phase.In our algorithm, we adopt the MaxiMumMargin PartialLabel learning (M3PL) [23]
to design the SPPLL framework. Given the parametric model
and the modeling output of on label , different from other maximummargin PLL algorithms, M3PL focuses on differentiating the output from groundtruth label against the maximum output from all other labels (i.e.), instead of the maximum output from candidate labels against that from noncandidate labels (i.e. ), which can avoid the negative effect produced by the noisy labels in candidate label set. M3PL deals with the task of PLL by solving the following optimization problem (OP1):where is the regularization parameter, is the slack variables set, is the prior number of examples for the pth class label in , and is the feasible solution space. is an indicator function where if and only if is true, otherwise .
Note that (OP1) is a mixedtyped variables optimization problem, which needs to optimize the integer variables y and the realvalued variables simultaneously, and (OP1) could be solved by using an alternating optimization procedure in an iterative manner. However, during each iterative optimization process of M3PL, the (unknown) assigned label is not always the true label for each instance, and the training instances assigned with such unreliable label will have negative effect on the optimization of . In such case, the effectiveness and robustness of model cannot be guaranteed.
IiB Selfpaced learning (SPL)
In this subsection, we first define notations and then introduce the selfpaced function.
We denote as ndimensional weight vector for the n training examples, as the empirical loss for the ith training example, and as the selfpaced parameter for controlling the learning process. The general selfpaced framework could be designed as: [26] [36]
(1) 
Here, is the selfpaced regularization, and it satisfies the three following constraints [34]:

is convex with respect to ;

is monotonically increasing with respect to , and it holds that , and ;

is monotonically decreasing with respect to , and it holds that , and ;
From the three constraints mentioned above, we can easily find how the selfpaced function works: by controlling the selfgrowth variable , SPL tends to select several easy examples (the loss is smaller) to learn first, and then gradually takes more, probably complex (the loss is larger), into the learning process. After all the instances are fed into the training model, SPL can get a more ’mature’ model than the algorithm without selfpaced learning scheme.
Learning from the two strategies above, we incorporate the SPL strategy into the PLL framework and propose the SPPLL algorithm, which is introduced in the following section in a more detailed manner.
Iii The SPPLL Approach
Here, we first introduce how we integrate the selfpaced scheme into the task of PLL, and then present the formulation of SPPLL. After that, we give an efficient algorithm to solve the optimization problem of our proposed method.
Iiia Formulation
As is shown in Section IIA, compared with other methods based on margin criterion, M3PL is an effective algorithm to alleviate the noise produced by the false labels in candidate label sets. Nonetheless, during the optimization iterations, M3PL ignores the fact that each training instance and the corresponding candidate labels, which have varying complexity, often contribute differently to the learning result. And the instances assigned with false candidate labels will damage the effectiveness and robustness of the learning model.
To overcome the potential shortcoming mentioned above, selfpaced scheme is incorporated into our framework, which has the following advantages: 1) it can avoid the negative effect produced by the instances assigned with unreliable label 2) and it can make the instances assigned with highconfidence labels contribute more to the learning model. Specifically, during each learning iteration, we fix the assigned labels to update the classifier . The instances assigned with highconfidence labels (i.e. the loss is smaller) can be learned first, and then the instances assigned with lowconfidence labels (i.e. the loss is larger) can be admitted into the learning process, when the model has already become mature and the unreliable labels associated with the untrained instances have also become more reliable.
Following the proposed method, we design SPPLL according to the steps as follows: Firstly, we define the loss by deforming the hinge loss, which is defined as:
(2) 
where, , which is the margin between modeling outputs from groundtruth label and that from all other labels.
Next, we choose a suitable selfpaced regularizer for SPPLL framework, which is associated with the weight values of the training examples. Learning from [34], soft weighting can assign realvalued weights, which tends to reflect the latent importance of training samples more faithfully. Meanwhile, soft weighting has also been demonstrated that it is more effective than hard weighting in many real applications. Thus, we choose the soft SPregularizer as follows:
(3) 
Finally, we integrate the loss Eq.(2) and SPregularizer Eq.(3) into the partiallabel learning framework, and then get the framework of SPPLL corresponds to the following optimization problem OP(2):
where is the prior number of training instances belonging to the pth class label in and . The is defined as:
(4) 
here, is the integer part of and , is the residual number after the rounding operation and .
Since OP(2) is a optimization problem with mixedtype variables, alternating optimization procedure is a good choice to solve the problem. We give the optimization details in the following subsection.
IiiB Optimization
During the process of alternating optimization, we first improve the optimization algorithm of [23] to make it suitable for our method to update the variables y and , which is briefly introduced in Part (A) and (B), respectively. Then, we give the optimization algorithm of selfpaced learning to update v that is used to control the weight of different instances in each iterative process, which is introduced in part (C). And finally we summarize the procedure of SPPLL at the end of the section.
IiiB1 Update w,b with other variables fixed
After initializing the weight vector v and the groundtruth labels y of training examples, OP(2) turns to be the following optimization problem OP(3):
IiiB2 Update y with other variables fixed
By fixing the classification model and the weight vector V, OP(2) can turn to be the following optimization problem OP(4):
To simplify the OP(4), inspired by [23], we first replace with according to the first two constraints. Then, we define a labeling matrix and a coefficient matrix , where indicates that the groundtruth label of belongs to the pth class and represents the loss that the pth class label is assigned to the candidate examples . Here, if , , otherwise would obtain a large value. Based on the steps mentioned above, OP(4) can be formulated as the following optimization problem OP(5):
OP(5)
is an easy linear programming problem, which can be solved by utilizing the standard LP solver.
IiiB3 Update v with other variables fixed
By fixing the classification model , and the groundtruth labels y, we update the weight vector v by solving the following optimization problem OP(6):
According to OP(6), in the SPPLL model could be computed as
(5) 
here, it is easy to see that examples assigned with higherconfidence labels (i.e. is smaller) can get higher weight values than the examples assigned with lowerconfidence labels (i.e. is larger), while examples assigned with extremely unreliable labels (i.e. is larger than ) even can not be chosen in previous iterative flow, which is the so called ’learning from easy to hard’ selfpaced scheme.
During the entire process of alternating optimization, we first initialize the required variables, and then repeat the above process until the algorithm converges. Finally, we get the predicted labels of the unseen instances according to the trained classifier. The detail process of SPPLL is summarized in Algorithm 1.
Real World data sets  EXP*  FEA*  CL*  AVGCL*  TASK DOMAIN 

Lost  1122  108  16  2.33  Automatic Face Naming [5] 
BirdSong  4998  38  13  2.18  Bird Sound Classification [13] 
MSRCv2  1758  48  23  3.16  Image Classification [40] 
Soccer Player  17472  279  171  2.09  Automatic Face Naming [41] 
FGNET  1002  262  99  7.48  Facial Age Estimation [42] 
Iv Experiments
Iva Experimental Setup
To evaluate the performance of the proposed SPPLL algorithm, we implement experiments on two controlled UCI data sets and five realworld data sets: (1) UCI data sets. Under different configurations of two controlling parameters (i.e. p and r), the two UCI data sets generate 84 partiallabeled data sets with different configurations [5][2]. Here, is the proportion of partiallabeled examples and is the number of candidate labels instead of the correct one. (2) RealWorld (RW) data sets . These data sets are collected from the following task domains: (A) Facial Age Estimation; (B) Automatic Face Naming; (C) Image Classification; (D) Bird Sound Classification;
UCI data sets  EXP*  FEA*  CL*  Configurations 

glass  214  10  7  
segment  2310  18  7  
vehicle  846  18  4  
letter  5000  16  26 
Table II and Tabel I separately summarizes the characteristics of the above UCI data sets and real world data sets, including the number of examples (EXP*), the number of the feature (FEA*), and the whole number of class labels (CL*).
Meanwhile, we employ eight stateoftheart partial label learning algorithms^{2}^{2}2We partially use the open source codes from Zhang Minling’s homepage: http://cse.seu.edu.cn/PersonalPage/zhangml/ for comparative studies, where the configured parameters of each method are utilized according to that suggested in the respective literatures:

PLSVM [22]: Based on the maximummargin strategy, it gets the predictedlabel according to calculating the maximum values of model outputs. [suggested configuration: ] ;

CLPL [5]: A convex optimization partiallabel learning method via averagingbased disambiguation [suggested configuration: SVM with hinge loss];

LSBCMM [13]: Based on maximumlikelihood strategy, it gets the predictedlabel according to calculating the maximumlikelihood value of the model with unseen instances input. [suggested configuration: q mixture components];

M3PL [23]: Originated from PLSVM, it is also based on the maximummargin strategy, and it gets the predictedlabel according to calculating the maximum values of model outputs. [suggested configuration: ] ;

PLLEAF [10]: A partiallabel learning method via featureaware disambiguation [suggested configuration: k=10, , ];

IPAL [16]: it disambiguates the candidate label set by utilizing instancebased techniques [suggested configuration: k=10];

PLECOC [4]: Based on a codingdecoding procedure, it learns from partiallabel training examples in a disambiguationfree manner [suggested configuration: the codeword length ];
Inspired by [22] and [23], we set among via crossvalidation. And the initial value of is empirically set to more than to guarantee that at least half of the training instances can be learned during the first iterative optimization process. Furthermore, the other variables are set as , , and . After initializing the above variables, we adopt tenfold crossvalidation to train each data set and report the average classification accuracy on each data set.
IvB Experimental Results
In our paper, the experimental results of the comparing algorithms originate from two aspects: one is from the results we implement by using the source codes provided by the authors; another is from the results shown in the respective literatures.
IvB1 UCI data sets
We compare the SPPLL with the comparing methods PLSVM and M3PL, which SPPLL originates from, to evaluate the effect of SPregularizer in SPPLL. Meanwhile, we also compare the proposed method (SPPLL) with other baseline methods that are not based on maximummargin strategy. The classification accuracy on the four UCI data sets ( glass, segment, vehicle and letter, each data set with configurations) are shown in Figure 2.

A) SPPLL achieves superior performance against M3PL in 95.24% cases and PLSVM in 97.62% cases respectively (total cases: 3*7*4 = 84);

B) SPPLL outperforms PLKNN in 60.72% cases and is inferior to PLKNN in 39.28% cases;

C) SPPLL has been outperformed by CLPL in only 3 cases and by LSBCMM in only 4 cases, and it outperforms them in the rest cases.
As is described in Figure 35, SPPLL achieves superior performance against the two algorithms (i.e. PLSVM and M3PL) that our methods originate from and obtains competitive performance against than other comparing methods, which is embodied in the following aspects:

Average Classification Accuracy With the increasing of p (the proportion of partiallabel examples) and r (the number of extra labels in candidate label set), more noisy labels are added to the training data. As shown in Figure 3, M3PL and PLSVM are greatly influenced by these noises and the classification accuracy decreases significantly. In contrast, SPPLL still performs well on disambiguating candidate labels, where the average classification accuracy of SPPLL is higher than that of M3PL on the glass data set, higher on the segment data set, higher on the vehicle data set and higher on the letter, respectively.

MaxMin and Standard deviation of Classification Accuracy As more noisy candidate labels are gradually fed into the training data, the classification accuracy of M3PL declines dramatically. For the glass
data set, MaxMin and standard deviation of M3PL’s classification accuracy separately reach to
and 0.055, while SPPLL reaches to only and 0.027. For segment data set, the classification accuracy of SPPLL and M3PL have the similar MaxMin value but the standard deviation of SPPLL’s classification accuracy is 0.002 smaller than that of M3PL’s. And for vehicle data set, the classification accuracy of SPPLL and M3PL have similar standard deviation value while the MaxMin of SPPLL’s classification accuracy is 0.024 smaller than that of M3PL’s. The above results demonstrate that the proposed SPPLL is more robust than M3PL. 
Data Sets with Varying Complexities According to the statistical comparisons of classification accuracy on the two data sets (accuracy on glass is lower than segment), we can see that examples in glass is much more difficult to be disambiguated than that in segment. However, SPPLL can express more effective disambiguation ability on such difficult data set. Specifically, the performance on glass is higher than that on segment, which again demonstrates the disambiguation ability of the proposed SPPLL.
Lost  MSRCv2  BirdSong  SoccerPlayer  FGNET  

SPPLL  0.7490.033  0.5810.010  0.7100.008  0.4700.010  0.0780.022 
PLSVM  0.6390.056  0.4170.027  0.6710.018  0.4300.004  0.0580.010 
M3PL  0.7320.035  0.5460.030  0.7090.010  0.4460.013  0.0370.025 
CLPL  0.6700.024  0.3750.020  0.6240.009  0.3470.004  0.0470.017 
PLKNN  0.3320.030  0.4170.012  0.6370.009  0.4940.004  0.0370.008 
LSBCMM  0.5910.019  0.4310.008  0.6920.015  0.5060.006  0.0560.008 
PLLEAF  0.6640.020  0.4590.013  0.7060.012  0.5150.004  0.0720.010 
IPAL  0.7260.041  0.5230.025  0.7080.014  0.5470.014  0.0570.023 
PLECOC  0.7030.052  0.5050.027  0.7400.016  0.5370.020  0.0400.018 
Besides, we note that the performance of SPPLL is lower than PLKNN on a few UCI data set (glass and letter). We attribute it to the difference of what the two methods are based on, where the former is based on maximum margin strategy and the latter is based on kNN strategy. For different data sets, varying learning strategies have difference in the performance of learning results. However, according to the above comparing results, our method (SPPLL) not only outperforms all existing maximummargin PLL methods but also obtains competitive performance as compared with most methods based on other strategies.
IvB2 Realworld (RW) data sets
We compare the SPPLL with all above comparing algorithms on the realworld data sets, and the comparison results are reported in Table III, where the recorded results are based on tenfold crossvalidation.
It is easy to conclude that SPPLL performs better than most comparing partiallabel learning algorithms on these RW data sets. The superiority of SPPLL can be embodied in the following two aspects:

Compared with PLSVM and M3PL, which are also based on maximum margin strategy, SPPLL outperforms all of them on the whole RW data sets. Especially, the classification accuracy of the proposed method is 7% more than M3PL’s and 20% more than PLSVM’s respectively on MSRCv2 data set; And on the FGNET data set, SPPLL achieves around 100% classification accuracy improvement than M3PL.

SPPLL also performs great superiority on some data sets when comparing with the other baseline algorithms. Specifically, for the same data set (such as Lost and SoccerPlayer), in contrast to the M3PL algorithm which has worse performance than the other algorithms, the proposed SPPLL algorithm usually performs well, which again demonstrates the advantage of incorporating SPL regime in our proposed method.
The two series of experiments mentioned above powerfully demonstrate the effectiveness of SPPLL, and we attribute the success to the easy to hard selfpaced scheme. To learn the instances assigned with highconfidence label firstly can make the reliable label information contribute more to the model. Specifically, during each optimization iteration, we first optimize the assigned labels y and then optimize the classifier parameters . When SPPLL finishes learning from the instances assigned with highconfidence labels in the previous iterations, the unreliable labels associated with the untrained instances have been optimized to become relatively reliable. Thus, the SP scheme can make most data become more reliable before the learning process, which eliminate the noise and increase the reliability of training data to a certain extent. As expected, the experimental results demonstrate the motivation behind our proposed method.
IvC Sensitivity Analysis
The proposed method learns from the PLL examples by utilizing two parameters, i.e. (regularization parameter) and (the selfgrowth variable). Figure 6 and Figure 7 respectively illustrates how SPPLL performs under different and configurations. We study the sensitivity analysis of SPPLL in the following subsection.
IvC1 MaximumMargin regularization parameter
The proposed method is based on the maximummargin strategy, where is the regularization parameter to measure the influence of each sample loss on the learning model. Since the is usually sensitive to the learning model [39], we empirically set the optimal value of on each data set among via crossvalidation, which is shown in Table IV.
The value of  Data Sets 

0.01  Lost, FGNET, SoccerPlayer, BirdSong 
0.1  Segment 
10  Glass, Letter 
100  Vehicle, MSRCV2 
IvC2 SelfGrowth variable
As mentioned above, the main contribution of our method is incorporating the SelfPaced Learning (SPL) scheme into PartialLabel Learning (PLL). The SP parameter plays an important role in controlling the learning process from easy to hard. The larger the initial value of is, the more training instances can be learned during the first iterative optimization. As described in Figure 6, SPPLL with a larger value of tend to achieve poor performance, and we attribute such phenomena to that incorporating more training instances during the first iteration would bring much noise into the learning process, which has negative effect on the final model. Meanwhile, SPPLL with a smaller value of , which leads to overfitting and weaken the generalization ability of the learning model, also show poor performance. Thus, we empirically guarantee that half of the training instances can be learned in the first iterative optimization. According to Figure 6, the proposed method achieves desirable performance when the SP parameter is set to 0.6.
V Conclusion
In this paper, we have proposed a novel selfpaced partiallabel learning method SPPLL. To the best of our knowledge, it is the first time to deal with PLL problem by integrating the SPL technique into PLL framework. By simulating the human cognitive process which learns both instances and labels from easy to hard, the proposed SPPLL algorithm can effectively alleviate the noise produced by the false assignments in PLL setting. Extensive experiments have demonstrated the effectiveness of our proposed method. In the future, we will integrate the SPL technique into PLL framework in a more sophisticated manner to improve the effectiveness and robustness of the model.
References
 [1] L. Liu and T. Dietterich, “Learnability of the superset label learning problem,” in International Conference on Machine Learning, 2014, pp. 1629–1637.
 [2] Y. Chen, V. Patel, R. Chellappa, and P. Phillips, “Ambiguously labeled learning using dictionaries,” IEEE Transactions on Information Forensics and Security, pp. 2076–2088, 2014.
 [3] L. Oukhellou, T. Denux, and P. Aknin, “Learning from partially supervised data using mixture models and belief functions,” Pattern Recognition, pp. 334–348, 2009.
 [4] M. Zhang, F. Yu, and C. Tang, “Disambiguationfree partial label learning,” IEEE Transactions on Knowledge and Data Engineering, pp. 2155–2167, 2017.
 [5] T. Cour, B. Sapp, and B. Taskar, “Learning from partial labels,” IEEE Transactions on Knowledge and Data Engineering, pp. 1501–1536, 2011.
 [6] Z.H. Zhou, “A brief introduction to weakly supervised learning,” National Science Review, pp. 1–1, 2017.
 [7] J. Luo and F. Orabona, “Learning from candidate labeling sets,” in Advances in Neural Information Processing Systems, 2010, pp. 1504–1512.
 [8] C. Chen, V. Patel, and R. Chellappa, “Learning from ambiguously labeled face images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2017.

[9]
S. Xiao, D. Xu, and J. Wu, “Automatic face naming by learning discriminative
affinity matrices from weakly labeled images,”
IEEE Transactions on Neural Networks and Learning Systems
, pp. 2440–2452, 2015.  [10] M. Zhang, B. Zhou, and X. Liu, “Partial label learning via featureaware disambiguation,” in International Conference on Knowledge Discovery and Data Mining, 2016, pp. 1335–1344.

[11]
Z. Zeng, S. Xiao, K. Jia, T. Chan, S. Gao, D. Xu, and Y. Ma, “Learning by
associating ambiguously labeled images,” in
IEEE Conference on Computer Vision and Pattern Recognition
, 2013, pp. 708–715. 
[12]
M. Xie and S. Huang, “Partial multilabel learning,” in
AAAI Conference on Artificial Intelligence
, 2018, pp. 1–1.  [13] L. Liu and T. G. Dietterich, “A conditional multinomial mixture model for superset label learning,” in Advances in Neural Information Processing Systems, 2012, pp. 548–556.
 [14] E. Hullermeier and J. Beringer, “Learning from ambiguously labeled examples,” International Symposium on Intelligent Data Analysis, pp. 168–179, 2005.
 [15] C. Tang and M. Zhang, “Confidencerated discriminative partial label learning,” in AAAI Conference on Artificial Intelligence, 2017, pp. 2611–2617.
 [16] M. Zhang and F. Yu, “Solving the partial label learning problem: an instancebased approach,” in International Joint Conference on Artificial Intelligence, 2015, pp. 4048–4054.
 [17] G. Chen, T. Liu, Y. Tang, Y. Jian, Y. Jie, and D. Tao, “A regularization approach for instancebased superset label learning,” IEEE Transactions on Cybernetics, pp. 1–12, 2017.
 [18] R. Jin and Z. Ghahramani, “Learning with multiple labels,” in Advances in Neural Information Processing Systems, 2003, pp. 921–928.
 [19] Y. Grandvalet and Y. Bengio, “Learning from partial labels with minimum entropy,” Cirano Working Papers, pp. 512–517, 2004.
 [20] Y. Zhou, J. He, and H. Gu, “Partial label learning via gaussian processes,” IEEE Transactions on Cybernetics, pp. 4443–4450, 2016.
 [21] P. Vannoorenberghe and P. Smets, “Partially supervised learning by a credal em approach,” in European Conference on Symbolic and Quantitative Approaches to Reasoning and Uncertainty, 2005, pp. 956–967.
 [22] N. Nguyen and R. Caruana, “Classification with partial labels,” in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 551–559.
 [23] F. Yu and M. Zhang, “Maximum margin partial label learning,” Machine Learning, pp. 573–593, 2017.
 [24] T. G. Dietterich and G. Bakiri, “Solving multiclass learning problems via errorcorrecting output codes,” Journal of Artificial Intelligence Research, pp. 263–286, 1994.
 [25] M.L. Z. Xuan Wu, “Towards enabling binary decomposition for partial label learning,” in International Joint Conference on Artificial Intelligence, 2018, pp. 1–1.
 [26] M. Kumar, B. Packer, and D. Koller, “Selfpaced learning for latent variable models,” in International Conference on Neural Information Processing Systems, 2010, pp. 1189–1197.
 [27] D. Meng and Q. Zhao, “What objective does selfpaced learning indeed optimize,” arXiv preprint arXiv:1511.06049, pp. 1–9, 2015.
 [28] T. Pi, X. Li, Z. Zhang, D. Meng, F. Wu, J. Xiao, and Y. Zhuang, “Selfpaced boost learning for classification,” in International Joint Conference on Artificial Intelligence, 2016, pp. 1932–1938.
 [29] D. Zhang, D. Meng, and J. Han, “Cosaliency detection via a selfpaced multipleinstance learning framework,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 865–878, 2017.

[30]
E. Sangineto, M. Nabi, D. Culibrk, and N. Sebe, “Self paced deep learning for weakly supervised object detection,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2018.  [31] C. Li, F. Wei, J. Yan, X. Zhang, Q. Liu, and H. Zha, “A selfpaced regularization framework for multilabel learning,” IEEE Transactions on Neural Networks Learning Systems, pp. 1–7, 2016.
 [32] C. Li, J. Yan, F. Wei, W. Dong, Q. Liu, and H. Zha, “Selfpaced multitask learning,” in AAAI Conference on Artificial Intelligence, 2017, pp. 2175–2181.
 [33] C. Xu, D. Tao, and C. Xu, “Multiview selfpaced learning for clustering,” in International Joint Conference on Artificial Intelligence, 2015, pp. 3974–3980.
 [34] Q. Zhao, D. Meng, L. Jiang, Q. Xie, Z. Xu, and A. Hauptmann, “Selfpaced learning for matrix factorization,” in AAAI Conference on Artificial Intelligence, 2015, pp. 3196–3202.
 [35] L. Lin, K. Wang, D. Meng, W. Zuo, and L. Zhang, “Active selfpaced learning for costeffective and progressive face identification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 7–19, 2017.
 [36] D. Meng, Q. Zhao, and L. Jiang, “A theoretical understanding of selfpaced learning,” Information Sciences, pp. 319–328, 2017.
 [37] K. Crammer and Y. Singer, “On the algorithmic implementation of multiclass kernelbased vector machines,” Journal of Machine Learning Research, pp. 265–292, 2013.

[38]
C. Hsu and C. Lin, “A comparison of methods for multiclass support vector machines,”
IEEE Transactions Neural Networks, pp. 415–425, 2002.  [39] S. Bengio, J. Weston, and D. Grangier, “Liblinear: A library for large linear classification,” Journal of Machine Learning Research, pp. 1871–1874, 2008.
 [40] F. Briggs, X. Fern, and R. Raich, “Rankloss support instance machines for miml instance annotation,” in International Conference on Knowledge Discovery and Data Mining, 2012, pp. 534–542.
 [41] M. Guillaumin, J. Verbeek, and C. Schmid, “Multiple instance metric learning from automatically labeled bags of faces,” in European Conference on Computer Vision, 2010, pp. 634–647.
 [42] G. Panis and A. Lanitis, “An overview of research activities in facial age estimation using the fgnet aging database,” Journal of American History, pp. 455–462, 2015.
Comments
There are no comments yet.