Learning with Privileged Information for Multi-Label Classification

03/29/2017 ∙ by Shiyu Chen, et al. ∙ USTC 0

In this paper, we propose a novel approach for learning multi-label classifiers with the help of privileged information. Specifically, we use similarity constraints to capture the relationship between available information and privileged information, and use ranking constraints to capture the dependencies among multiple labels. By integrating similarity constraints and ranking constraints into the learning process of classifiers, the privileged information and the dependencies among multiple labels are exploited to construct better classifiers during training. A maximum margin classifier is adopted, and an efficient learning algorithm of the proposed method is also developed. We evaluate the proposed method on two applications: multiple object recognition from images with the help of implicit information about object importance conveyed by the list of manually annotated image tags; and multiple facial action unit detection from low-resolution images augmented by high-resolution images. Experimental results demonstrate that the proposed method can effectively take full advantage of privileged information and dependencies among multiple labels for better object recognition and better facial action unit detection.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Figure 1: Examples of privileged information and multiple labels in different applications.

Traditional supervised learning utilizes the same information during both training and testing. However, in many learning applications, training data have additional information that may be difficult to collect during testing. For example, the order of image tags conveys useful cues about what is most notable to the human viewer. The image shown in Figure 

1, has the following tag list: {horse, person, barrier, hoist}. Here, ‘horse’ and ‘person’ occupy the first and second place in the list, and thus have higher absolute rank than ‘barrier’ and ‘hoist’, which appear third and fourth respectively. The implied importance of cues conveyed by the order of manually annotated image tags can be leveraged for object recognition from images during training, but is not available during testing [Hwang and Grauman2012]. As another example, most works use high-resolution facial images for facial action unit detection. However, images which are captured with low-resolution surveillance cameras may be used for action unit detection. The high-resolution images are only available during training.

To address this, Vapnik and Vashist [Vapnik and Vashist2009] proposed a new learning paradigm, i.e., learning using privileged information (LUPI), where in addition to available information (training examples), some privileged information was available during training but not testing. When encoded into the structure or parameters of a classifier, the privileged information can be exploited to construct better classifiers during training. To implement the LUPI paradigm, Vapnik and Vashist [Vapnik and Vashist2009]

introduced a discriminative model called Support Vector Machine (SVM)+, which uses privileged information to predict the slack variables in the SVM.

Since the training of SVM+ is computationally expensive, several follow-up studies have focused on fast optimization algorithms. For example, instead of using L2-norm as in SVM+, Niu and Wu [Niu and Wu2012]

proposed the extended L1-norm SVM for LUPI with nonlinear feature mapping supported by kernel tricks, and formulated the optimization problem as linear programming with less computational cost than the same scale of L2-norm. In addition to using the privileged data to identify easy- and hard-to-classify examples through slacks, Sharmanska et al. 

[Sharmanska et al.2013] adopted a ranking SVM to identify easy- and hard-to-separate example pairs from privileged information, then transferred the learned ranking prior to the classifier from available information using another ranking SVM. Both ranking SVMs from privileged information and available information are convex optimization problems, and can be solved by standard SVM packages with less computational cost. Li et al. [Li et al.2016] proposed two efficient algorithms for solving the linear and kernel SVM+s. For linear SVM+, they proposed to absorb the bias term into the weight vector, and formulated a new optimization problem with simpler constraints in the dual form. For kernel SVM+, they applied the L2-loss, and thus obtained a simpler optimization problem in the dual form, with only half of the dual variables of the original SVM+ method.

SVM+ is formulated for binary classification; therefore, several studies extend SVM+ to the multi-class problem. For example, Liu et al. [Liu et al.2013] formulated the v-K-SVCR method [Zhong and Fukushima2006], which uses the one-against-rest structure during decomposition, with privileged information to v-K-SCVR+ for multi-class recognition problems. Ji et al. [Ji et al.2012] extended the multi-task multi-class support vector machines (SVMs) to multi-task multi-class privileged information SVMs (PiSVMs) to incorporate the advantages of multi-task learning as well as privileged information. Cai [Cai2011] introduced the SVM+-based multi-task learning methods (SVM+MTL) for classification and regression. Cai [Cai2011] also extended generalized Sequential Minimal Optimization (gSMO) for SVM+MTL.

There are several other approaches in addition to SVM+ and its variants. For example, Chen et al. [Chen et al.2013] developed the Adaboost+ algorithm, which constructs extra weak classifiers from the available information to simulate the performance of better weak classifiers obtained from privileged information. Yang and Patras [Yang and Patras2013] proposed the privileged information-based conditional regression forest (PI-CRF), which selects the split function based on the information gained from privileged information at some random internal nodes. Wang et al. [Wang et al.2015]

proposed three-node Bayesian Networks (BNs) to incorporate privileged information. Wang and Ji 

[Wang and Ji2015] proposed two general approaches to exploit privileged information as additional features and extra labels to improve different classifiers. Motiian et al. [Motiian et al.2016] extended the information bottleneck principle to LUPI, and expanded it further for learning any type of classifier based on risk minimization. Pasunuri et al. [Pasunuri et al.]

utilized privileged features to create additional labels for training data, and extended decision trees and boosting for learning with privileged information. Sharmanska et al. 

[Sharmanska et al.2014] employed privileged information to distinguish between easy and hard examples in the training set, and transferred the information embedded in the privileged information to the available information. Yan et al. [Yan et al.2016]

proposed an active learning method that uses the learned slack function from privileged information as one of the measurements of uncertainty. Vrigkas et al. 

[Vrigkas et al.2016] introduced a probabilistic classification approach, t-CRF+, to indirectly propagate knowledge from privileged to available feature space through a penalty term , which encourages the model to assign larger weights to samples that have good evidence to distinguish between classes both in privileged and available feature spaces. Zhang et al. [Zhang et al.2015] extended extreme learning machine (ELM) to ELM+ through relating privileged information to the slack variables. Yang [Yang2016] proposed a metric learning method that jointly learns two distance metrics by minimizing the empirical loss, penalizing the difference between the distance in the original space and that in the privileged space. Niu et al. [Niu et al.2016] proposed a new framework called multi-instance learning with privileged information (MIL-PI) to simultaneously take advantage of privileged information and cope with noise in the loose labels.

All of the above works demonstrate that privileged information can be successfully exploited during training to construct a better binary or multi-class classifier. However, to the best of our knowledge, little work examines the use of privileged information for multi-label classification.

Multi-label classification has attracted increasing attention in recent years due to its potentially wide applications. For example, as shown in Figure 1, an image may includes multiple objects; multiple facial action units may appear on a face.

Due to the large number of possible label sets, multi-label classification is rather challenging. The idea of exploiting the dependencies inherent in multiple labels has been widely employed in the work of multi-label classification. Considering dependencies among labels, current multi-label learning strategies can be categorized into three groups [Zhang and Zhou2014]. The first group decomposes the multi-label learning problem into a number of independent binary classification problems, ignoring the dependencies among multiple labels. This method is simple and effective, but may not generalize well due to the ignorance of label correlations. This is referred to as the first order strategy. The second group considers pairwise relations between labels, such as the ranking between a relevant and irrelevant label, or interaction between any pair of labels. We refer to this method as using second-order strategy. Since the second group exploits label correlations to some extent by second-order strategy, it is somewhat generalizable. However, there are certain real-world applications where label correlations go beyond the second-order assumption. The third group considers high-order relations among labels, such as random subsets of the combinations. A comprehensive survey of multi-label classification types can be found in [Zhang and Zhou2014].

Although current works on multi-label classification successfully leverage label dependencies to recognize multiple labels from an object, almost all multi-label work assumes the same information during both training and testing. There are no existing methods for multi-label classification with the help of privileged information. We incorporate the privileged information and the dependencies among the multiple labels to improve the performance of the multi-label classification. Specifically, we leverage the extra information-the relationship between available information and privileged information captured by similarity constraints, and dependencies among multiple labels captured by ranking constraints. During testing, only available information is used. The proposed method is evaluated on two applications: multiple object recognition from images with the help of implied importance cues embedded in the list of manually annotated image tags; and multiple facial action unit detection from low-resolution images, augmented by high-resolution images. For multiple object recognition, we conduct experiments on the Pascal VOC 2007 database and the LabelMe database. For multiple facial action unit detection, we conduct experiments on the Extended Cohn-Kanade (CK+) database. The experimental results demonstrate the relationship between available information and privileged information, and imply that dependencies among multiple labels are beneficial to construct the multi-label classifier. This further indicates the effectiveness of the proposed method.

2 Problem Statement

Our goal is to develop a method for multi-label classification with the help of privileged information. During training, both available information and privileged information are utilized to construct a multi-label classifier. During testing, only available information is used.

Given the training data , where represents the available information, represents the privileged information, represents the target labels, and represents the number of training instances. indicates the multiple labels, where represents the number of labels. The objective of LUPI for multi-label classification is to map the available information of an instance to its multiple labels with the help of privileged information and the label dependencies imbedded in . Therefore, the objective function of LUPI for multi-label classification is defined as Eq. (1).

(1)

where and

are the loss functions of the available information classifier and privileged information classifier,

and reflect the dependencies among multiple labels, and reflects the constraints from privileged information. , , and are the weighted parameters. Eq. (1) is a general framework of LUPI for multi-label classification. Therefore, any loss function, constraints from privileged information, and constraints from multi-label dependencies can be used in Eq. (1).

In this paper, we adopt the maximum margin classifier as the loss function, the similarity between the classifier from available information and that from privileged information as the constraints from privileged information, and the ranking order of the predicted labels as constraints for reflecting multi-label dependencies.

3 Proposed LUPI for Multi-Label Classification

The privileged information and dependencies among multiple labels are encoded to refine model estimation during training.

Since we use similarity constraints between the classifier from available information and that from privileged information as the constraints from privileged information, we train two classifiers for available information and privileged information simultaneously during training. Thus, we split the training data into two parts:

(2)

which can be used to construct classifiers from available information and privileged information, respectively.

For a certain label, two mappings from available information and privileged information can be represented as follows:

(3)

where and project and into the kernel space. Then, the similarity constraint of is shown as Eq. (4):

(4)

where is the slack variable that measures the failure to meet similarity from two classifiers from available information and privileged information.

To exploit the dependencies among multiple labels, we consider the ranking order between present labels and absent labels according to Eq. (5).

(5)

where and are the slack variables to allow some number of disorders that the present labels are ranked below the absent labels. and represent the indexes of the present label and absent label respectively.

The similarity constraints from privileged information and the ranking constraints from multiple labels are integrated into the learning process of classifiers from available information. Thus, the objective function of the proposed method is as follows:

(6)

where are the parameters to be optimized, are the weighted coefficients, and and are the numbers of present labels and absent labels for the instance. The mappings and are represented as Eq. (3). The first two terms of Eq. (6) are the loss functions of the available information classifier and privileged information classifier, respectively. The next two terms are ranking constraints existing in multi-labels. The last term is a slack variable that measures the failure to meet similarity from two classifiers from available information and privileged information.

This optimization problem can be translated to its dual problem by applying the Lagrangian multiplier method. The variable represents whether or not the label belongs to the label set for instance.

(7)

where and respectively represent the indexes of the present label and absent label for the instance.

Thus, the dual problem is shown as follows:

(8)

where are the Lagrangian multipliers and serves as a bridge to available information and privileged information. Since Eq. (8) is a quadratic programming, we adopt the conditional gradient method to solve it.

In the training phase, the similarity constraints from the privileged information and the ranking constraints from the multiple labels are used to construct a better classifier. Since we adopt ranking constraints, the proposed method is a kind of ranking SVM. Therefore, we should train a model to predict the present label set size from training samples, and then use the trained label size predictor to determine the number of present labels of testing samples.

In the testing phase, only available information is used. We use the function which is represented in Eq. (3) to map the features of available information to a real value. Then we assign labels according to the order of mapping values and the number of present labels. Although there is only one feature space during testing, the information from privileged information and multiple labels has been translated and encoded into the classifier parameters.

The detailed algorithm of the proposed method is shown in Algorithm 1.

  Input:Available information , Privileged information ,Labels , Weighted coefficients
  Output: Predicted labels Stage 1: Predict the mapping values1) Choose the kernel function and project and into the kernel space;2) Solve the dual problem in Eq. (8) using the conditional gradient method;3) Obtain the Lagrangian multiplier , , ;4) Estimate the values of mapping with the Lagrangian multiplier and projected features from ;
  Stage 2: Predict label size5) Learn a model with the original features and label counts from the training instance;6) Predict the size of labels of the testing instance, denoted as ;
  Stage 3: Obtain the predicted labels7) Output the predicted labels by considering a label is the label set of if the corresponding mapping value is among the top elements.
Algorithm 1 Algorithm of proposed model

4 Experiments

4.1 Experimental Conditions

We evaluate the proposed method on two applications: multiple object recognition from images with the help of implicit importance cues from the order of image tags supplied by annotators, and multiple action unit detection from low-resolution facial images enhanced by high-resolution facial images. Two image benchmark databases (the Pascal VOC 2007 database [Everingham et al.2010] and the LabelMe database [Russell et al.2008]), and one expression database (the CK+ database [Lucey et al.2010]) are adopted.

Database SVM SVM+ML SVM+PI Ours
Pascal VOC 2007 0.2520.2870.155 0.3150.3680.186 0.3000.3480.169 0.3460.4000.211
LabelMe 0.5480.6550.154 0.6020.6990.220 0.5610.6700.158 0.6060.7040.223
CK+ 0.5550.6460.268 0.6090.6940.353 0.5750.6680.280 0.6210.7070.358

”---” refers to example-based accuracy, example-based F-measure, and example-based subset-accuracy respectively.

Table 1: Experimental results on the Pascal VOC 2007 database, the LabelMe database, and the CK+ database.

The Pascal VOC 2007 database consists of 9963 samples of images and 20 target labels. For image representation, 200 dimensional bag of visual word features, 512 dimensional gist features [Oliva and Torralba2006], and 64 dimensional color histogram features provided by [Hwang and Grauman2010] are used. For implicit importance cues from the order of image tags supplied by annotators, 339 dimensional absolute tag rank features provided by [Hwang and Grauman2010] [Hwang and Grauman2012] are used. The train-test split provided by the database constructor is adopted (5011 trainning and 4952 testing samples).

The LabelMe database consists of 3852 samples of images and 209 target labels. Like the Pascal VOC 2007 database, for image presentation, 200 dimensional bag of visual word features, 512 dimensional gist features [Oliva and Torralba2006], and 64 dimensional color histogram features provided by [Hwang and Grauman2010] are employed. For implicit importance cues from the order of image tags supplied by annotators, 209 dimensional absolute tag rank features provided by [Hwang and Grauman2010] [Hwang and Grauman2012] are used. Most of the target labels are sparse; therefore, labels representing less than 15% of all samples are discarded. A random 50-50 split is performed on the database to construct training and testing sets. 1889 instances with 14 labels are obtained for training, and 1878 instances are obtained for testing.

The CK+ database includes 593 facial expression image sequences staring from the neutral frame and ending with the peak frame [Lucey et al.2010]. The original images, with resolutions of 640*490, are viewed as high-resolution images. We generate low-resolution images by normalizing the face region to a 256*256 image. For high-resolution images, the similarity-normalized shape differences between the neutral frame and exaggerated frame are used as features [Lucey et al.2010]. The similarity-normalized shape differences are difficult to obtain for low-resolution images, so local binary pattern features [Ahonen et al.2006] are extracted from the difference images. PCA is used for dimension reduction. Facial action units available for more than 15% of all samples are chosen. Thus, we obtain 575 samples with 9 labels. We perform 10-fold subject independent cross validation.

For multiple object recognition from images with the help of implicit importance cues from the order of image tags, we use image representation as available information, and implicit importance cues from the order of image tags as privileged information. For multiple action unit detection from low-resolution facial images enhanced by high-resolution facial images, we use low-resolution images as the available information, and high-resolution images as the privileged information.

To evaluate the performance of the proposed method, we conduct four experiments: one using only available information (SVM), one using available information with the help of the dependencies among multiple labels (SVM+ML), one using available information with the help of the privileged information (SVM+PI), and the proposed method (Ours). Example-based accuracy, example-based F-measure, and example-based subset-accuracy are used as performance metrics [Sorower2010].

4.2 Experimental Results and Analysis

Table 1 shows the experimental results on the Pascal VOC 2007 database, the LabelMe database, and the CK+ database. From Table 1, we observe the following:

First, SVM+ML achieves higher accuracy, F-measure, and subset-accuracy than SVM on all three databases. Compared to SVM, SVM+ML takes advantage of the dependencies among multiple labels. The improvement is especially obvious on the LabelMe database and the CK+ database. This may be caused by the more balanced label distribution of these databases, which may make label dependencies easier to capture. Specifically, for the Pascal VOC 2007 database, just 10% of samples have more than two present labels, while for the LabelMe database and the CK+ database, 70% and 56% samples have at least three present labels, respectively. This indicates the importance of label dependencies for multi-label classification.

Second, compared to SVM, SVM+PI achieves better performance in terms of accuracy, F-measure, and subset-accuracy on all three databases. SVM is a traditional supervised learning method which utilizes the same information during both training and testing. SVM+PI captures and exploits the relationships between the available information and privileged information during training. This demonstrates that the privileged information can be exploited during training to build better classifiers.

Third, SVM+ML achieves better performance than SVM+PI for all metrics on all three databases. SVM+ML and SVM+PI use either label dependencies and privileged information, which are both crucial for multiple classification. Unlike the privileged information, which is only available during training, but unavailable during testing, label dependencies not only exist in the ground-truth labels during training, but also inhere in the predicted labels during testing, since multiple objects and multiple AUs coexist on a scene image and a facial image respectively. This indicates that using the dependencies among multiple labels builds a better classifier than one built using privileged information.

Finally, the proposed method performs best on all three databases. The proposed method not only considers the relationship between available information and privileged information, but also considers the dependencies among multiple labels. This demonstrates that the proposed method can successfully captures the privileged information and label dependencies for multiple classification.

To further demonstrate the importance of privileged information and label dependencies, we analyze the recognition performance of every label on the LabelMe database and the CK+ database. Table 2 and Table 3 show the F-measure for each label on the LabelMe database and the CK+ database, respectively. From Table 2 and Table 3, we find that:

Object SVM SVM+ML SVM+PI Ours
‘unmatched’ 0.533 0.585 0.517 0.586
‘person’ 0.655 0.662 0.634 0.637
‘window’ 0.492 0.602 0.528 0.634
‘car side’ 0.530 0.574 0.599 0.613
‘tree’ 0.625 0.698 0.641 0.697
‘building’ 0.800 0.873 0.800 0.875
‘sky’ 0.725 0.722 0.776 0.750
‘road’ 0.753 0.872 0.784 0.862
‘sidewalk’ 0.606 0.770 0.671 0.765
‘sign’ 0.305 0.463 0.551 0.551
‘table’ 0.831 0.809 0.821 0.798
‘screen’ 0.901 0.857 0.900 0.867
‘keyboard’ 0.721 0.729 0.7291 0.777
‘car’ 0.746 0.787 0.767 0.782
Avg. 0.659 0.715 0.694 0.728
Table 2: F-measure for each object on the LabelMe database.
Action unit SVM SVM+ML SVM+PI Ours
AU1 0.603 0.644 0.605 0.661
AU2 0.611 0.688 0.661 0.711
AU4 0.596 0.624 0.624 0.655
AU5 0.604 0.677 0.598 0.667
AU6 0.447 0.506 0.487 0.508
AU7 0.474 0.508 0.494 0.547
AU12 0.603 0.619 0.657 0.674
AU17 0.742 0.746 0.754 0.742
AU25 0.833 0.825 0.846 0.828
Avg. 0.613 0.648 0.636 0.666
Table 3: F-measure for each action unit on the CK+ database.

Compared to SVM, SVM+ML achieves a better average F-measure and significantly improves on several labels, such as ‘building’ and ‘road’ on the LabelMe database and AU1 and AU2 on the CK+ database. The improvements are about 7% and 12% for ‘building’ and ‘road’, and 4% and 7% for AU1 and AU2 respectively. Through analyzing the ground truth labels, we find that and , showing high positive correlations between ‘building’ and ‘road’, and between AU1 (inner brow raiser) and AU2 (outer brow raiser). Such coexistence is successfully captured by the proposed ranking constraints, and results in better performance.

Compared to SVM, SVM+PI has a better average F-measure on both databases. The most significant improvement is about 24% for ‘sign’ on the LabelMe database. The image features capture the total scene structure, color, and the appearance of component objects. However, due to the diversity of ‘sign’, it is difficult to recognize from image features. The implicit importance cues from the order of image tags supplied by annotators may capture the importance of ‘sign’, making it easier to recognize. Similarly, the most significant improvement on the CK+ database is about 5% for AU12. Since AU12 captures the lip corner puller, it is more difficult to detect from low-resolution facial images. High-resolution images are more helpful for AU12 detection. This demonstrates that the proposed method successfully leverages privileged information to construct better classifiers during training, especially for those objects which are difficult to recognize from available information only.

The proposed method achieves the best results on the average F-measure on the two databases. This indicates that both privileged information and label dependencies contribute to better multi-label classification.

4.3 Comparison to Related Work

Since current works on LUPI do not consider multi-label classification, a direct comparison cannot be made. We compare our work to a related method, SVM+ [Vapnik and Vashist2009][Li et al.2016]. SVM+ is used to recognize objects from images with the help of implicit importance cues from the order of image tags supplied by annotators on the Pascal VOC 2007 database and the LabelMe database, and to detect action units from low-resolution images enhanced by high-resolution images on the CK+ database.

Database
Example-based
accuracy
Example-based
F-measure
Example-based
subset-accuracy
VOC 0.321 0.366 0.197
LabelMe 0.555 0.648 0.249
CK+ 0.566 0.662 0.249
Table 4: Results of SVM+ [Li et al.2016] on the three databases.

The experimental results of SVM+ on the three databases are illustrated in Table 4. Table 1 and Table 4 show that:

First, since SVM+ considers the relationship between available information and privileged information, it has better results than SVM, with higher performance metrics on all three databases. It indicates the effectiveness of privileged information.

Second, SVM+PI and SVM+ have similar results. Specifically, SVM+PI has higher accuracies and F-measures on the three databases, while SVM+ has higher subset-accuracies on two databases. SVM+ assumes that privileged information and available information share the same slacking variable, and SVM+PI adopts similarity constraints on the classifiers from privileged and available information. The two methods have similar abilities to capture privileged information.

Finally, the proposed method outperforms SVM+ in all performance metrics on the three databases. Compared to SVM+, the proposed method not only considers privileged information, but also utilizes the multi-label dependencies to refine the parameters of the classifier. The experimental results demonstrate the usefulness of the dependencies among multiple labels for multi-label classification.

5 Conclusions

In this paper, we propose a new multi-label classification method from available information with the help of privileged information. Specifically, we adopt the similarity constraints between the available information and privileged information to adjust the performance of the classifier. We also utilize the ranking constraints from the dependencies among multiple labels to improve classification performance. The proposed method is evaluated on different applications. Experimental results on three benchmark databases show that both privileged information and dependencies among multiple labels are beneficial for building a better classifier. This further demonstrates the effectiveness and superiority of the proposed method as compared to state-of-the art methods.

References

  • [Ahonen et al.2006] Timo Ahonen, Abdenour Hadid, and Matti Pietikainen.

    Face description with local binary patterns: Application to face recognition.

    IEEE transactions on pattern analysis and machine intelligence, 28(12):2037–2041, 2006.
  • [Cai2011] F. Cai. Advanced Learning Approaches Based on SVM+ Methodology. PhD thesis, UNIVERSITY OF MINNESOTA, 2011.
  • [Chen et al.2013] Jixu Chen, Xiaoming Liu, and Siwei Lyu. Boosting with side information. In

    Asian Conference on Computer Vision

    , pages 563–577. Springer Berlin Heidelberg, 2013.
  • [Everingham et al.2010] Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2):303–338, 2010.
  • [Hwang and Grauman2010] Sung Ju Hwang and Kristen Grauman.

    Accounting for the relative importance of objects in image retrieval.

    In BMVC, volume 1, page 5, 2010.
  • [Hwang and Grauman2012] Sung Ju Hwang and Kristen Grauman. Learning the relative importance of objects from tagged images for retrieval and cross-modal search. International Journal of Computer Vision, 100(2):134–153, 2012.
  • [Ji et al.2012] You Ji, Shiliang Sun, and Yue Lu. Multitask multiclass privileged information support vector machines. In Pattern Recognition (ICPR), 2012 21st International Conference on, pages 2323–2326. IEEE, 2012.
  • [Li et al.2016] Wen Li, Dengxin Dai, Mingkui Tan, Dong Xu, and Luc Van Gool. Fast algorithms for linear and kernel svm+. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2258–2266, 2016.
  • [Liu et al.2013] Jun Liu, Wenxin Zhu, and Ping Zhong. A new multi-class support vector algorithm based on privileged information. Journal of Information & Computational Science, 10:443–450, 2013.
  • [Lucey et al.2010] Patrick Lucey, Jeffrey F Cohn, Takeo Kanade, Jason Saragih, Zara Ambadar, and Iain Matthews. The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. pages 94–101, 2010.
  • [Motiian et al.2016] Saeid Motiian, Marco Piccirilli, Donald A Adjeroh, and Gianfranco Doretto. Information bottleneck learning using privileged information for visual recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1496–1505, 2016.
  • [Niu and Wu2012] Lingfeng Niu and Jianmin Wu. Nonlinear l-1 support vector machines for learning using privileged information. In Proceedings of the 2012 IEEE 12th International Conference on Data Mining Workshops, pages 495–499, Washington, DC, USA, 2012. IEEE Computer Society.
  • [Niu et al.2016] Li Niu, Wen Li, and Dong Xu. Exploiting privileged information from web data for action and event recognition. International Journal of Computer Vision, pages 1–21, 2016.
  • [Oliva and Torralba2006] Aude Oliva and Antonio Torralba. Building the gist of a scene: The role of global image features in recognition. Progress in brain research, 155:23–36, 2006.
  • [Pasunuri et al.] Rahul Pasunuri, Phillip Odom, Tushar Khot, Kristian Kersting, and Sriraam Natarajan. Learning with privileged information: Decision-trees and boosting.
  • [Russell et al.2008] Bryan C Russell, Antonio Torralba, Kevin P Murphy, and William T Freeman. Labelme: a database and web-based tool for image annotation. International journal of computer vision, 77(1-3):157–173, 2008.
  • [Sharmanska et al.2013] Viktoriia Sharmanska, Novi Quadrianto, and Christoph H Lampert. Learning to rank using privileged information. In Proceedings of the IEEE International Conference on Computer Vision, pages 825–832, 2013.
  • [Sharmanska et al.2014] Viktoriia Sharmanska, Novi Quadrianto, and Christoph H Lampert. Learning to transfer privileged information. arXiv preprint arXiv:1410.0389, 2014.
  • [Sorower2010] Mohammad S Sorower. A literature survey on algorithms for multi-label learning. Oregon State University, Corvallis, 2010.
  • [Vapnik and Vashist2009] Vladimir Vapnik and Akshay Vashist. A new learning paradigm: Learning using privileged information. Neural Networks the Official Journal of the International Neural Network Society, 22(5-6):544–57, 2009.
  • [Vrigkas et al.2016] Michalis Vrigkas, Christophoros Nikou, and Ioannis A Kakadiaris. Exploiting privileged information for facial expression recognition. In Biometrics (ICB), 2016 International Conference on, pages 1–8. IEEE, 2016.
  • [Wang and Ji2015] Ziheng Wang and Qiang Ji. Classifier learning with hidden information. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4969–4977, 2015.
  • [Wang et al.2015] Shangfei Wang, Menghua He, Yachen Zhu, Shan He, Yue Liu, and Qiang Ji. Learning with privileged information using bayesian networks. Frontiers of Computer Science, 9(2):185–199, 2015.
  • [Yan et al.2016] Yan Yan, Feiping Nie, Wen Li, Chenqiang Gao, Yi Yang, and Dong Xu. Image classification by cross-media active learning with privileged information. IEEE Transactions on Multimedia, 2016.
  • [Yang and Patras2013] Heng Yang and Ioannis Patras. Privileged information-based conditional regression forest for facial feature detection. In Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on, pages 1–6. IEEE, 2013.
  • [Yang2016] Xun Yang. Empirical risk minimization for metric learning using privileged information. In

    International Joint Conference on Artificial Intelligence

    , 2016.
  • [Zhang and Zhou2014] Min-Ling Zhang and Zhi-Hua Zhou. A review on multi-label learning algorithms. IEEE transactions on knowledge and data engineering, 26(8):1819–1837, 2014.
  • [Zhang et al.2015] Wenbo Zhang, Hongbing Ji, Guisheng Liao, and Yongquan Zhang. A novel extreme learning machine using privileged information. Neurocomputing, 168:823–828, 2015.
  • [Zhong and Fukushima2006] Ping Zhong and Masao Fukushima. A new multi-class support vector algorithm. Optimisation Methods and Software, 21(3):359–372, 2006.