Log In Sign Up

CEMENT: Incomplete Multi-View Weak-Label Learning with Long-Tailed Labels

by   Zhiwei Li, et al.

A variety of modern applications exhibit multi-view multi-label learning, where each sample has multi-view features, and multiple labels are correlated via common views. In recent years, several methods have been proposed to cope with it and achieved much success, but still suffer from two key problems: 1) lack the ability to deal with the incomplete multi-view weak-label data, in which only a subset of features and labels are provided for each sample; 2) ignore the presence of noisy views and tail labels usually occurring in real-world problems. In this paper, we propose a novel method, named CEMENT, to overcome the limitations. For 1), CEMENT jointly embeds incomplete views and weak labels into distinct low-dimensional subspaces, and then correlates them via Hilbert-Schmidt Independence Criterion (HSIC). For 2), CEMEMT adaptively learns the weights of embeddings to capture noisy views, and explores an additional sparse component to model tail labels, making the low-rankness available in the multi-label setting. We develop an alternating algorithm to solve the proposed optimization problem. Experimental results on seven real-world datasets demonstrate the effectiveness of the proposed method.


page 1

page 2

page 3

page 4


Latent Heterogeneous Graph Network for Incomplete Multi-View Learning

Multi-view learning has progressed rapidly in recent years. Although man...

Uplift Modeling from Separate Labels

Uplift modeling is aimed at estimating the incremental impact of an acti...

View-labels Are Indispensable: A Multifacet Complementarity Study of Multi-view Clustering

Consistency and complementarity are two key ingredients for boosting mul...

Multi-view Vector-valued Manifold Regularization for Multi-label Image Classification

In computer vision, image datasets used for classification are naturally...

Adaptive incomplete multi-view learning via tensor graph completion

With the advancement of the data acquisition techniques, multi-view lear...

Spectral Perturbation Meets Incomplete Multi-view Data

Beyond existing multi-view clustering, this paper studies a more realist...

Multi-View Matrix Completion for Multi-Label Image Classification

There is growing interest in multi-label image classification due to its...

1 Introduction

In many real-world applications, samples are often represented by several feature subsets, and meanwhile associated with multiple labels [Xu et al.2013a]. For example, a natural scene image can be annotated with multiple tags {, , }, and described by various visual features, such as histogram of oriented gradients, color features and scale invariant feature transform. As an effective way to deal with such data, multi-view multi-label learning has attracted a lot of attention in various real-world applications [Wu et al.2019, Zhang et al.2020]. Though these approaches have achieved much success, there still exists two problems. The first one is that it is difficult to collect all the relevant labels of every sample. For example, in image annotations, an annotator may annotate an image with a partial label set from the large number of ground-truth labels. To address such problem, weak-label learning methods [Yu et al.2014, Dong et al.2018, Wu et al.2018, Tan et al.2018b] are proposed based on the assumption that similar instances have similar labels. Though these methods have shown promising results in real applications, they do not consider the second problem, i.e., samples may miss their representations on some views, which possibly leads to the performance degradation [Xu et al.2015]. Many incomplete multi-view learning methods [Zhang et al.2013, Liu et al.2015, Xu et al.2015, Yin et al.2017] are then proposed to improve the performance by exploiting the complementary information from multiple incomplete views.

The co-existence of incomplete views and weak labels poses a severe challenge. To the best of our knowledge, only few studies [Tan et al.2018a, Zhu et al.2019, Li and Chen2021] take both two issues into consideration. However, iMVWL [Tan et al.2018a] and IMVL-IV [Zhu et al.2019] impose the low-rank constraint on the label matrix, which is usually violated in practice due to the presence of tail labels in multi-label learning [Li and Chen2021]. NAIML [Li and Chen2021]

assumes that the label matrix is high-rank, but treats all views equally just as iMVWL and IMVL-IV do, which probably suffers from the problem of noisy views.

Figure 1: The framework of CEMENT. CEMENT first maps incomplete views and weak labels into the low-dimensional representations and , respectively, with adaptive weights . It then projects the embedded representations into RKHSs, and correlates them via HSIC with adaptive weights . To make the low-rankness valid, the weak labels is obtained by separating a sparse component from the original labels .

To cope with the aforementioned challenges, we propose a novel method for inCompletE Multi-view wEak-label learNing with long-Tailed labels (CEMENT) in this paper. Specifically, CEMENT first embeds both incomplete views and weak labels into low-dimensional subspaces with adaptive weights, which automatically detects noisy views by assigning relatively lower weights to them. It then adaptively correlates embedded views and labels via Hilbert-Schmidt Independence Criterion (HSIC) in Reproducing Kernel Hilbert Spaces (RKHSs). To capture tail labels, it separates an additional sparse component from weak labels, that makes the low-rankness valid in the multi-label setting. The framework of CEMENT is shown in Fig. 1. An alternating algorithm is developed to optimize the proposed problem, and its effectiveness is demonstrated on seven real-world datasets. The contributions of this work are summarized into three-folds:

  • A novel method CEMENT is proposed to handle the incomplete multi-view weak-label issue. It jointly embeds incomplete views and weak labels into low-dimensional subspaces with adaptive weights, and adaptively correlates the embeddings via HSIC in RKHSs.

  • CEMENT enables to capture noisy views and tail labels in real-world datasets by learning adaptive embedding weights and exploring an additional sparse component from weak labels, respectively.

  • Experimental results on seven widely used real-world datasets show the effectiveness of CEMENT.

The rest of this paper is organized as follows: Section 2 reviews some related works. Section 3 introduces the proposed model of CEMENT. Section 4 introduces the optimization algorithm of CEMENT. Experimental results are reported in Section 5, followed by the conclusion in Section 6.

2 Related Work

In this section, we discuss the related works with this paper, and focus on the works from three research fields: Incomplete Multi-view Learning, Weak-label Learning and Incomplete Multi-view Weak-label Learning.

2.1 Incomplete Multi-View Learning

Multi-view learning handles the data represented by multiple views and aims to improve learning performance by discovering view correlations [Yin et al.2017]

. Under the incomplete multi-view setting, many algorithms have been proposed to handle the problem of missing views in recent years. Previous approaches have shown promising results in conjunction with semi-supervised learning

[Xu et al.2015, Yin et al.2017], or with contrastive learning [Lin et al.2021]. And some tried to seek shared information by projecting original multi-view data into a single low-dimensional subspace [Zhang et al.2013, Liu et al.2015].

2.2 Weak-Label Learning

Previous weak-label learning studies focus mainly on the single-view setting. MAXIDE [Xu et al.2013b]

uses the input feature data as side information to recover the label matrix, based on the assumption that the label matrix is low-rank. COCO

[Xu et al.2018] leverages a latent possibility matrix to generate the label matrix, and can recover the feature matrix and the label matrix simultaneously without the low-rank assumption. lrMMC [Liu et al.2013] and McWL [Tan et al.2018b] are multi-view weak-label learning methods, but they both need all the views to be complete.

2.3 Incomplete Multi-View Weak-Label Learning

As far as we know, there are only few studies focused on the incomplete multi-view weak-label learning. iMVWL [Tan et al.2018a] learns a shared subspace from incomplete views with weak labels, and leverages both cross-view relationships and local label correlations. IMVL-IV [Zhu et al.2019] designs a multi-view multi-label learning method with incomplete views and weak labels by learning label-specific features, label correlations, and complementary information of multiple views. These methods assume that the label matrix is low-rank, which is typically unsuitable in practice. NAIML [Li and Chen2021] explicitly exploits the high-rank structure of the multi-label matrix, and jointly takes incompleteness of views and missing of labels into account. However, the existing three methods treat all views equally, limiting the real applications in presence of noisy views.

3 Methodology

3.1 Preliminaries

For the

-th instance, we denote its feature vector of

-th view by , and its corresponding label vector by , where is the feature dimension of the -th view, and is the number of distinct labels. Let denote the input data with samples and views, where indicates the feature matrix in the -th view. Let denote the label matrix, where means that the -th label is assigned to the -th instance, while otherwise. In the incomplete multi-view weak-label scenario, partial views and labels of some samples may be missing. Thus, we introduce and to denote indexes of the missing entries in the feature matrix and the label matrix , respectively. or if is an observed entry in or , and or otherwise.

3.2 Formulation

Given a multi-view dataset, we can optimize the following problem to find a shared latent subspace (), that integrates complementary information from different views [Gao et al.2015]:


where represents the Frobenius norm, and is the coefficient matrix of the -th view. Eq. (1) treats each view equally, whose objective actually equals to with and . Therefore, it might deviate from the true latent subspace, due to the existence of noisy views. Moreover, structured missing views in many applications also make Eq. (1) unreliable. A naive way to solve this problem is to fill the missing entries with average feature values, but it may introduce errors. To overcome the limitations, we propose the incomplete multi-view model as follows:


where weights the embedding importance of the -th view, and is the Hadamard product. According to Eq. (2), the -th view data is mapped to the view-specific latent representation (), with the view-specific adaptive weight . In addition, Eq. (2) minimizes the reconstruction error between and based only on the observed entries, which is indexed by . In this way, we successfully overcome the two limitations of Eq. (1).

Similarly, we map the label matrix to its latent representation () by , where is the coefficient matrix. However, the presence of long-tailed labels makes the low-rank assumption invalid in practice [Li and Chen2021]

. Thus, it is desired to separate tail labels from the entire labels. To this end, we treat tail labels as outliers and decompose the label matrix



In Eq. (3), models non-tail labels under the low-rank assumption, and captures tail labels with a sparse constraint. Besides, in the weak-label setting, the label matrix is often incomplete and contains many missing entries. Thus, we propose to solve the following problem:



is a trade-off hyperparameter. Therefore, we successfully capture tail labels in the weak-label setting, and thus make the low-rankness valid.

Next, we adopt the Hilbert-Schmidt Independence Criterion (HSIC) [Gretton et al.2005] to build the correlations among the embedded views and the embedded labels in an adaptive manner. HSIC computes the squared norm of the cross-covariance operator over and

, in Reproducing Kernel Hilbert Spaces (RKHSs) to estimate the dependency, which is empirically defined by:


where and are two Gram matrices, measuring the kernel induced similarity between row vectors of and . is the centering matrix, where

is an identity matrix, and

is an all-one vector. In theory, the larger the value of HSIC, the higher the dependence between and . Thus, we promote the dependence between and by maximizing the value of HSIC:


where weights the importance of the -th view embedding and the label embedding .

By incorporating Eq. (2), Eq. (4) and Eq. (6), we now have the optimization problem for the proposed CEMENT method:


In fact, Eq. (7) treats the label matrix as the -th view, and uses an additional non-negative parameter to weight its embedding. It is worth noting that weights the reconstruction between and , while balances the correlation between and , . In other words, will be assigned to a large value once is well recovered by , and will take a large value if is highly correlated to . In this way, CEMENT adaptively embeds incomplete views and weak labels into low-dimensional subspaces, and correlates them with adaptive weights, enabling to handle the real problems in presence of both noisy views and tail labels.

4 Optimization

The objective function in Eq. (7) is convex w.r.t , , , , , and , respectively, that motivates us to develop an alternating optimization algorithm222We provide the algorithm and the MATLAB code of CEMENT in the supplementary materials.. For simplicity, the linear kernel is used in HSIC, and it is easily extended to apply the other kernels. The algorithm repeats following steps until convergence.

Update with fixed others.

When , , , , , are fixed, each can be updated individually, and the objective function becomes


We then optimize Eq. (8) with Project Gradient Descent algorithm (PGD) [Calamai and Moré1987] by:


where is a learning rate, and is the partial derivative of w.r.t. . The projection function if ; otherwise.

Update with fixed others.

When , , , , , are fixed, the objective function becomes:


Similar with updating , we use PGD to update by:


where is the partial derivative of w.r.t. .

Update with fixed others.

With the others fixed, the computation of each is independent. The objective function w.r.t. is


Under the Karush-Kuhn-Tucker (KKT) condition [Boyd and Vandenberghe2004], we can derive the following updating rule:


Update with fixed others.

With the others fixed, the objective function w.r.t. becomes


By using the KKT condition, we can derive the following updating rule:


Update with fixed others.

We solve the following problem to update the long-tailed label matrix :


Eq. (16) can be easily optimized by soft-thresholding [Donoho1995], and the updating rule is


where is the shrinkage operator, and it is defined as .

Update with fixed others.

When , , , , and are fixed, updating is to solve the following problem


Based on [Li et al.2021], each is updated independently according to the following equation


Eq. (19) actually is an inverse distance weighting. Obviously, the larger the distance, the smaller the value of , .

Update with fixed others.

When , , , , , are fixed, the objective function w.r.t. becomes


Given , we have the following derivations according to Cauchy-Schwarz inequality [Steele2004]:


The inequality in Eq. (21) holds when


which is the closed solution of Eq. (20).

4.1 Complexity Analysis

In terms of computational complexity, updating needs a cost of , updating and cost , and updating costs , where and represent the largest dimensionality of the subspaces and feature matrices from all views, respectively. Thus, the total computational complexity of the algorithm at each iteration is .

5 Experiments

5.1 Experimental Settings

Datasets #Samples #Views #Features #Labels #Average Domain
Corel5k 4999 6 100/512/1000/4096/4096/4096 260 3.397 image
ESPGame 20770 6 100/512/1000/4096/4096/4096 268 4.686 image
IAPRTC-12 19627 6 100/512/1000/4096/4096/4096 291 5.719 image
Mirflickr 25000 6 100/512/1000/4096/4096/457 38 4.716 image
Pascal07 9963 6 1000/1000/512/4096/4096/804 20 1.465 image
Yeast 2417 2 79/24 14 4.237 biology
Emotions 593 2 64/8 6 1.869 music
Table 1: Statistics of seven multi-view multi-label datasets. #Samples is the number of samples; #Views is the number of views; #Features are the dimensions of all views; #Labels is the number of distinct labels; #Average is the average number of labels per sample.


We conduct a comprehensive experimental study to evaluate the performance of the proposed CEMENT on seven widely used multi-view multi-label datasets. The statistics of the used datasets are summarized in Table 1. The first five datasets (Corel5k, ESPGame, IAPRTC-12, Mirflickr, and Pascal07)333 are all image datasets, and obtained from [Guillaumin et al.2010]. Each sample of these datasets is represented by six feature views. In the Yeast dataset444 [Bu et al.2003], each gene is represented by a genetic expression and a phylogenetic profile. In the Emotions dataset555 [Tsoumakas et al.2008]

, each music is represented by rhythmic and timbre feature views, and classified into emotions that it evokes.

Comparing Methods.

We compare the proposed method CEMENT with four state-of-the-art methods: lrMMC [Liu et al.2013], McWL [Tan et al.2018b], iMVWL [Tan et al.2018a], NAIML [Li and Chen2021]. lrMMC and McWL are two multi-view weak-label learning methods, but they all assume that the views of features are complete. Thus, we adapt lrMMC and McWL by filling missing features with zero. iMVWL and NAIML are two incomplete multi-view weak-label learning methods, which can be seen as the baselines. The implementations of the above algorithms are publicly available in corresponding papers.


On the five image datasets, the hyperparameters of McWL, iMVWL and NAIML are selected as recommended in the original papers. We tune the hyperparameters of lrMMC and CEMENT on all datasets, and the other three methods on the Yeast and Emotions datasets by grid search to produce the best possible results. We select the value of the hyperparameter from , and the ratio of and from {0.2, 0.5, 0.8} for our method. We set the values of the hyperparameters of the other methods from the ranges recommended in the original paper. The prediction performance of all algorithms is evaluated by three widely used metrics: Hamming Score (HS), Ranking Score (RS) [Zhang and Zhou2013], and Area Under Roc-Curve (AUC) [Bucak et al.2011]. We randomly sample 2000 samples of each image dataset, and use all samples from the Yeast and Emotions datasets in the experiment. Furthermore, we follow the protocol given in [Tan et al.2018a] to create incomplete multi-view weak-label scenarios: we randomly remove sampled positive and negative samples for each label, and

samples from each view by ensuring that each sample appears in at least one view. For all comparing algorithms, we repeat the experiment by ten times and report the average values and the standard deviations.

5.2 Experimental Results

Corel5k HS
Mirflickr HS
Pascal07 HS
Yeast HS
Emotions HS
Table 2: Experimental results on seven real-world datasets with , and . The best results of each dataset are highlighted in boldface.
Figure 2: Ablation study of CEMENT on the Yeast dataset with and different values of .

Evaluations of Comparing Methods.

Table 2 shows the experimental results of all comparing methods on seven real-world datasets with , and . From Table 2, we can see that CEMENT outperforms compared methods in most of the cases. The performance superiority probably comes from the ability of CEMENT on capturing noisy views and tail labels. The incompleteness of multi-view data causes the degradation of results on lrMMC and McWL. iMVWL and NAIML are able to handle the incomplete multi-view and weak-label datasets, but perform worse than CEMENT. There are two possible reasons. One is that iMVWL assumes that the label matrix is low-rank, and the other is that both iMVWL and NAIML treat every view equally. In contrast, CEMENT measures the importance of each view by adaptively choosing the appropriate values of and .

Ablation Study.

We first introduce three variants of CEMENT, namely CEMENT-1, CEMENT-2 and CEMENT-3, to investigate the effects of the components of CEMENT666The formulations of the three variants of CEMENT and more results of the study are provided in the supplementary materials.. CEMENT-1 only learns shared information from all feature views, and ignores individual information, i.e. , , and . CEMENT-2 assumes that the label matrix is low-rank by ignoring the tail label matrix . CEMENT-3 only learns a single shared subspace among all views and labels, i.e., (), which does not need HSIC. Fig. 2 shows the ablation study of CEMENT on the Yeast dataset with and different values of . As shown in Fig. 2, we can see that CEMENT-2 performs the worst, while CEMENT has the best performance on almost all metrics. This demonstrates that capturing tail labels is beneficial to recover the missing labels.

Parameter Analysis.

Figure 3: Hyperparameter sensitivity analysis of CEMENT under different combinations of and on the Yeast dataset.

In this section, we analyze the sensitivity of CEMENT w.r.t. and . The value of is selected from , and the value of is selected from . The results in terms of HS and AUC on the Yeast dataset are reported in Fig. 3, and similar results are obtained on the other datasets. From Fig. 3, we can see that CEMENT achieves relatively stable and good performance when and . And we can also observe that when , HS and AUC decrease sharply. The possible reason is that CEMENT may not successfully capture long-tailed labels, given a large penalty on . It again confirms the contribution of capturing long-tailed labels in improving the performance of CEMENT.

Convergence Analysis

We plot the convergence curve of the optimization algorithm on the Yeast and Emotions datasets, as shown in Fig. 4. We terminate the optimization algorithm of CEMENT once the relative change of its objective value is below . To show the convergence curve clearly, we omit the objective value of the first iteration in Fig. 4. We observe that the objective value monotonically decreases as the number of iterations increases, and it usually converges within 200 iterations. Similar results are obtained on the other datasets.

Figure 4: Convergence analysis of CEMENT on the Yeast and Emotions datasets.

6 Conclusion

In this paper, we propose a novel model named CEMENT to deal with incomplete multi-view weak-label data. CEMENT jointly embeds incomplete views and weak labels into low-dimensional subspaces with adaptive weights, and adaptively correlates them via HSIC. Moreover, CEMEMT explores an additional sparse component to model tail labels, making the low-rankness available in the multi-label setting. An alternating algorithm is developed to solve the proposed optimization problem. Empirical evidence verified that CEMENT is flexible enough to handle the incomplete multi-view weak-label learning problems in presence of missing views and tail labels, leading to improved performance.


  • [Boyd and Vandenberghe2004] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.
  • [Bu et al.2003] Dongbo Bu, Yi Zhao, Lun Cai, Hong Xue, Xiaopeng Zhu, Hongchao Lu, Jingfen Zhang, Shiwei Sun, Lunjiang Ling, Nan Zhang, et al. Topological structure analysis of the protein–protein interaction network in budding yeast. Nucleic acids research, 31(9):2443–2450, 2003.
  • [Bucak et al.2011] Serhat Selcuk Bucak, Rong Jin, and Anil K Jain. Multi-label learning with incomplete class assignments. In CVPR, pages 2801–2808. IEEE, 2011.
  • [Calamai and Moré1987] Paul H Calamai and Jorge J Moré. Projected gradient methods for linearly constrained problems. Mathematical programming, 39(1):93–116, 1987.
  • [Dong et al.2018] Hao-Chen Dong, Yu-Feng Li, and Zhi-Hua Zhou. Learning from semi-supervised weak-label data. In AAAI, volume 32, 2018.
  • [Donoho1995] David L Donoho. De-noising by soft-thresholding. TIT, 41(3):613–627, 1995.
  • [Gao et al.2015] Hongchang Gao, Feiping Nie, Xuelong Li, and Heng Huang. Multi-view subspace clustering. In ICCV, pages 4238–4246, 2015.
  • [Gretton et al.2005] Arthur Gretton, Olivier Bousquet, Alex Smola, and Bernhard Schölkopf. Measuring statistical dependence with hilbert-schmidt norms. In ALT, pages 63–77. Springer, 2005.
  • [Guillaumin et al.2010] Matthieu Guillaumin, Jakob Verbeek, and Cordelia Schmid.

    Multimodal semi-supervised learning for image classification.


    2010 IEEE Computer society conference on computer vision and pattern recognition

    , pages 902–909. IEEE, 2010.
  • [Li and Chen2021] Xiang Li and Songcan Chen. A concise yet effective model for non-aligned incomplete multi-view and missing multi-label learning. TPAMI, 2021.
  • [Li et al.2021] Lusi Li, Zhiqiang Wan, and Haibo He. Incomplete multi-view clustering with joint partition and graph learning. TKDE, 2021.
  • [Lin et al.2021] Yijie Lin, Yuanbiao Gou, Zitao Liu, Boyun Li, Jiancheng Lv, and Xi Peng. Completer: Incomplete multi-view clustering via contrastive prediction. In CVPR, pages 11174–11183, 2021.
  • [Liu et al.2013] Jialu Liu, Chi Wang, Jing Gao, and Jiawei Han. Multi-view clustering via joint nonnegative matrix factorization. In SDM, pages 252–260. SIAM, 2013.
  • [Liu et al.2015] Meng Liu, Yong Luo, Dacheng Tao, Chao Xu, and Yonggang Wen. Low-rank multi-view learning in matrix completion for multi-label image classification. In AAAI, 2015.
  • [Steele2004] J Michael Steele. The Cauchy-Schwarz master class: an introduction to the art of mathematical inequalities. Cambridge University Press, 2004.
  • [Tan et al.2018a] Qiaoyu Tan, Guoxian Yu, Carlotta Domeniconi, Jun Wang, and Zili Zhang. Incomplete multi-view weak-label learning. In IJCAI, pages 2703–2709, 2018.
  • [Tan et al.2018b] Qiaoyu Tan, Guoxian Yu, Carlotta Domeniconi, Jun Wang, and Zili Zhang. Multi-view weak-label learning based on matrix completion. In SDM, pages 450–458. SIAM, 2018.
  • [Tsoumakas et al.2008] Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. Effective and efficient multilabel classification in domains with large number of labels. In MMD’08, volume 21, pages 53–59, 2008.
  • [Wu et al.2018] Baoyuan Wu, Fan Jia, Wei Liu, Bernard Ghanem, and Siwei Lyu. Multi-label learning with missing labels using mixed dependency graphs. IJCV, 126(8):875–896, 2018.
  • [Wu et al.2019] Xuan Wu, Qing-Guo Chen, Yao Hu, Dengbao Wang, Xiaodong Chang, Xiaobo Wang, and Min-Ling Zhang. Multi-view multi-label learning with view-specific information extraction. In IJCAI, pages 3884–3890, 2019.
  • [Xu et al.2013a] Chang Xu, Dacheng Tao, and Chao Xu. A survey on multi-view learning. arXiv preprint arXiv:1304.5634, 2013.
  • [Xu et al.2013b] Miao Xu, Rong Jin, and Zhi-Hua Zhou. Speedup matrix completion with side information: Application to multi-label learning. In NIPS, pages 2301–2309, 2013.
  • [Xu et al.2015] Chang Xu, Dacheng Tao, and Chao Xu. Multi-view learning with incomplete views. TIP, 24(12):5812–5825, 2015.
  • [Xu et al.2018] Miao Xu, Gang Niu, Bo Han, Ivor W Tsang, Zhi-Hua Zhou, and Masashi Sugiyama. Matrix co-completion for multi-label classification with missing features and labels. arXiv preprint arXiv:1805.09156, 2018.
  • [Yin et al.2017] Qiyue Yin, Shu Wu, and Liang Wang. Unified subspace learning for incomplete and unlabeled multi-view data. Pattern Recognition, 67:313–327, 2017.
  • [Yu et al.2014] Hsiang-Fu Yu, Prateek Jain, Purushottam Kar, and Inderjit Dhillon. Large-scale multi-label learning with missing labels. In ICML, pages 593–601. PMLR, 2014.
  • [Zhang and Zhou2013] Min-Ling Zhang and Zhi-Hua Zhou. A review on multi-label learning algorithms. TKDE, 26(8):1819–1837, 2013.
  • [Zhang et al.2013] Wei Zhang, Ke Zhang, Pan Gu, and Xiangyang Xue. Multi-view embedding learning for incompletely labeled data. In IJCAI, 2013.
  • [Zhang et al.2020] Yongshan Zhang, Jia Wu, Zhihua Cai, and S Yu Philip.

    Multi-view multi-label learning with sparse feature selection for image annotation.

    TMM, 22(11):2844–2857, 2020.
  • [Zhu et al.2019] Changming Zhu, Duoqian Miao, Rigui Zhou, and Lai Wei. Improved multi-view multi-label learning with incomplete views and labels. In ICDMW, pages 689–696. IEEE, 2019.