1 Introduction
Clustering is an unsupervised classification problem that groups samples into different clusters. Among different clusters, examples have no overlap, while they are similar to each other in the same cluster jain1999data ; Ren2018Big . Clustering can not only be used to find out the inner structure of data but also be used as the precursor of other machine learning tasks xu2005survey . Over the past decades, many clustering approaches have been developed, such as partitional algorithms (e.g., means type clustering MacQueen:some ; HUANG2018Anew
jain1999data , densitybased clustering ester1996density ; Ren2018SSDC, distributionbased clustering (e.g., Gaussian mixture model
Bishop2006Pattern_SR ), clustering based on nonnegative matrix factorization (NMF) Lee01algorithmsfor ; Huang2018neucom ; HUANG2018Self , mean shift clustering Dorin2002mean ; Ren2014weighted ; Ren2014Boosted , ensemble clustering Strehl:cluster ; Ren2013WOEC ; Ren2017WOEC , etc. All of the abovementioned methods can only tackle single task and are considered as singletask clustering.Considering the relationship among similar tasks, multitask learning is proposed to learn the shared information among tasks liu2009multi and it is proved to achieve great performance in many applications such as medical diagnosis bickel2008multi ; qi2010semi , web search ranking chapelle2010multi
li2014heterogeneous , and so on.Tasks being closely related to each other is a common issue in real clustering problems. Consequently, multitask clustering (MTC) that uses the information shared by multiple related tasks is proposed gu2009learning . Particularly, the resulting method is named learning the shared subspace multitask clustering (LSSMTC), which learns individual means clustering models simultaneously and a common shared subspace for each task. After that, many multitask clustering methods have been proposed in the past few years gu2011learning ; zhang2012multi ; zhang2016multi ; zhang2017multi . The abovementioned work has shown that MTC outperforms traditional singletask clustering algorithms in general. However, the existing MTC methods typically solve a nonconvex problem and easily achieve a suboptimal performance gu2009learning .
In this paper, we adopt selfpaced learning (SPL) paradigm to alleviate this problem and propose selfpaced multitask clustering (SPMTC). Concretely, selfpaced learning is an example sampling strategy that is inspired by human learning process kumar2010self . Instead of treating all the examples equally, SPL starts with “easy” examples and then gradually adds “hard” examples to train the model. Unlike curriculum learning bengio2009curriculum , SPL does not need prior knowledge to determine the training order of the examples. The easiness of examples is defined by the offtheshelf model itself Jiang2015SPCL . It has been shown that SPL can avoid bad local optimum and can achieve a better generalization ability Jiang2015SPCL . Traditional SPL model treats the selected samples equally. Recently, some variations of SPL are designed to not only choose examples but also assign weights to them Jiang2014easy ; Pi2016Self ; Ren2017Robust .
Furthermore, outliers and noisy data which can negatively affect the clustering performance are generally common in multitask clustering. To address this issue, a soft weighting strategy is designed in SPMTC. By assigning relatively small weights to noisy data or outliers, their negative influence will be significantly reduced. Overall, the main contributions of this paper are stated as below:

We make use of SPL to address the nonconvexity issue of multitask learning in unsupervised setting. To the best of our knowledge, this is the first work to apply selfpaced learning to solve the multitask clustering model.

The reconstruction error is used to assess how difficult it is to clustering a set of examples, with which a selfpaced multitask clustering (SPMTC) model is proposed. SPMTC helps obtain a better optimum and thus achieves better multitask clustering performance.

A soft weighting strategy of SPL is employed to estimate the weights of data samples, according to which samples participate in training the MTC model. In this way, the negative influence of noisy data and outliers can be reduced and the clustering performance is further enhanced. An alternative strategy is developed to solve the proposed model and the convergence analysis is also given.
2 Related Work
2.1 MultiTask Clustering
It has been shown that learning multiple related tasks simultaneously can be advantageous relative to learning these tasks independently evgeniou2004regularized ; Liu2017Learning
. Multitask learning (MTL) methods are firstly developed for classification problems and can be classified into two different types: regularizationbased learning and joint feature learning. Regularizationbased MTL minimizes the regularized objective function
evgeniou2004regularized , while joint feature learning methods capture the task relatedness by constraining all the tasks to a shared common feature set jalali2010dirty .To introduce multitask learning in clustering, li2004document proposed adaptive subspace iteration (ASI) that specifically identifies the subspace structure of multiple clusters, that is, projects all the examples into a subspace defined by the linear combination of the original feature space. Whereafter, gu2009learning came up with the multitask clustering method which combines the traditional means clustering and the ASI method by a balancing parameter, where the means clustering learns the individual clustering task and the ASI model learns the shared subspace of the multiple tasks. gu2011learning proposed learning spectral kernel for multitask clustering (LSKMTC) which learns a reproducing kernel hilbert space (RKHS) by formulating a unified kernel learning framework. After that, a number of multitask clustering methods have been proposed zhang2012multi ; zhang2017multi ; xie2012multi ; al2014multi ; zhang2015convex ; Zhang2016SAMTC .
2.2 SelfPaced Learning
Curriculum learning (CL) is an instructordriven model that organizes examples in a meaningful order under the premise of prior knowledge bengio2009curriculum . Selfpaced learning is a studentdriven model that learns examples’ easiness automatically kumar2010self . Both CL and SPL order the examples by a certain rule rather than learning the model with them randomly. The difference between CL and SPL lies on the way they define the order of the examples: CL defines the order in advance by the prior knowledge while SPL defines the order by the loss computed by the model and updates the easiness according to the updated model Jiang2015SPCL . Jiang2014easy designed the soft weighting strategy to weaken the negative impact of noisy data. Due to its effectiveness, SPL has gained more and more attentions in various fields, e.g., computer vision Jiang2014easy ; Tang2012Shifting ; XU2018MSPL , feature corruption Ren2017Balanced , boosting learning Pi2016Self , diagnosis of Alzheimer’s Disease Que2017Regularized , multiclass classification Ren2017Robust , and so on.
Recently, selfpaced multitask learning (SPMTL) has been proposed for supervised problems. For instance, Murugesan2017self proposed a selfpaced task selection method for multitask learning, and li2017self proposed a novel multitask learning framework that learns the tasks by simultaneously considering the order of both tasks and instances. However, as we know, none of work has been done to enhance the multitask clustering performance with selfpaced learning. This paper fills this gap and proposes selfpaced multitask clustering in unsupervised setting. Soft weighting scenario for SPMTC is developed to further improve the clustering performance.
3 Preliminaries
Suppose clustering tasks are given, each associates with a set of data examples, , , where
is the dimension of feature vectors for all tasks and
is the number of examples in the th task. Let represent the th data set. Multitask clustering seeks to group data examples of each task into disjoint clusters. We assume , which is generally assumed in the multitask learning literature gu2009learning . Furthermore, we let the feature dimensionalitybe the same for all the tasks. In fact, we can use feature extraction or feature augmentation to make the dimensionality of all tasks the same. For example, the bagofwords representation used in document analysis actually achieves this goal.
3.1 MultiTask Clustering
Let us consider partitioning the th data set into clusters by means, the following objective should be minimized:
(1) 
where is the mean of cluster , is the th cluster in the th task, and means the norm of a vector. By letting , equation (1) can be rewritten as:
(2)  
s.t. 
denotes the Frobenius norm. is a binary matrix that specifies each example into different clusters. It has the following assignment rules:
(3) 
Then, LSSMTC (learning the shared subspace multitask clustering) defines the objective of MTC as gu2009learning :
(4)  
s.t. 
where is the orthonormal projection which learns a shared subspace across all the related tasks, and with is the center of cluster of all tasks in the shared subspace.
is the identity matrix. The parameter
controls the balance of the two parts in equation (3.1). The constraint of is relaxed into nonnegative continuous domain for optimization convenience.3.2 SelfPaced Learning
The selfpaced learning strategy kumar2010self iteratively learns the model parameter and the selfpaced learning weight :
(5) 
where , is the latent weighting variable, is the number of examples, and
is the corresponding loss function for example
in traditional classification or regression problems. denotes the SPL regularizer. kumar2010self defines . When fix , the optimal values of is calculated by:(6) 
When is small, a small number of examples are selected. As grows gradually, more and more examples will be chosen to train the model until all the samples are chosen.
4 SelfPaced MultiTask Clustering
4.1 The Objective
In this work we propose selfpaced multitask clustering (SPMTC) to address the nonconvexity of multitask clustering. By making use of selfpaced learning paradigm which trains a model with examples from simplicity to complexity, SPMTC is able to avoid bad local optimum and find better solutions. When designing the SPL regularizer, SPMTC considers the easiness of examples not only from the example level, but also from the task level. Moreover, we develop a soft weighting technique of SPL to reduce the negative influence of noisy examples and outliers and to further enhance the multitask clustering performance. The common optimization model of SPMTC is defined as:
(7)  
The first part of equation (4.1) contains independent means clustering tasks and is called withintask clustering, while the second part involves clustering data of all tasks in the shared subspace and is referred as crosstask clustering. is the tradeoff parameter. The third part is the SPL regularizer. Matrix where denotes the weights of examples in the th task, is the SPL controlling parameter for the th task, and is the corresponding SPL regularization term.
Traditional SPL regularizer assigns weights and selects examples from the entire data, which may suffer from a problem in multitask setting. That is, if data from some tasks inherently have small loss values and thus own large weights, then these tasks will contribute more in the training process. By contract, those tasks whose data generally obtain small weights (which correspond to large loss values) participate little in the learning model. If this happens, the underlying relation among multiple tasks is not adequately utilized. In extreme cases, none or limited number of examples from these tasks are selected to train in the beginning of SPL. Different from the traditional one, the developed SPL regularizer in this work is able to compute weights and choose examples from each task independently. This ensures that all the tasks have equal opportunity to participate in multitask clustering process, and thus the relation among tasks can be sufficiently used.
4.2 Optimization
We firstly run LSSMTC a small number of iterations (which is set to 20 in the experiments) to obtain an initialization of model parameters . By doing this, we can obtain an initialized estimation of reconstructing error for examples. Then, an alternative optimization method is designed to solve equation (4.1) and the optimization process mainly contains the following two steps.
Step 1: Fix model parameters, update .
For fixed MTC model parameter values (i.e., ), minimizing equation (4.1) is equivalent to solve
(8)  
where means the th column of matrix . Let be the reconstruction error of example , then equation (4.2) becomes:
(9) 
In classification tasks, the labels of examples can be used to evaluate the loss values and assign weights. But, no supervised information is provided in clustering problems. Thus, we use the reconstruction error of an example as its loss value. Intuitively, examples which are close to the centers in the original feature space and in the shared feature space obtain small reconstruction error, while examples that are far away from centers obtain large loss values and are considered as “hard” examples. To solve equation (9), we design two functions of , which correspond to hard weighting and soft weighting strategies, respectively.
Hard weighting:
(10) 
where denotes the trace of a matrix and is the norm of a vector. By substituting equation (10) into equation (9), we can easily find the optimal value of given by:
(11) 
Soft weighting:
(12) 
Equation (12) is referred as mixture soft weighting in Jiang2014easy ; Ren2017Robust . The SPL regularizer designed in this work is actually a variation of mixture soft weighting in multitask setting. By substituting equation (12) into equation (9), setting the derivation w.r.t. , and noting that , the optimal can be obtained by:
(13) 
For simplicity, parameter is always set to .
Both hard and soft weighting assign weights and select examples from each task independently. The difference is that hard weighting assign weight 1 to all the chosen examples, while soft weighting not only selects examples but also assigns weights (whose values range in ) to them. Thus, in soft weighting scenario, those “hard” examples (which are typically noisy data, outliers or overlapped data) generally obtain small weights and their negative influence is reduced.
Step 2: Fix , update model parameters.
For fixed , the last term of equation (4.1) is a constant value. Then, we update each of the four MTC model parameters (i.e., ) when others are fixed.

Update rule of : Optimizing (4.1) w.r.t. while keeping and fixed is equivalent to minimizing:
(14) Representing , and
, where , can be written as:(15) Setting , we have
(16) 
Update rule of : When fix , we can optimize equation (4.1) w.r.t. by minimizing :
(17) setting , we can observe
(18) 
Update rule of : Solving (4.1) w.r.t. while keeping , and fixed is equivalent to optimizing the following equation:
(19) s.t. By introducing the Lagrangian multiplier , we obtain the Lagrangian function:
(20) Setting , we observe
(21) where , . According to ding2010convex , we can obtain the following update rule:
(22) Here, and , where and .

Update rule of : Optimizing (4.1) w.r.t. while keeping fixed is equivalent to minimizing:
(23) s.t. Substituting into , we have
s.t. (24) Then, the optimal
is composed of the eigenvectors of the following matrix corresponding to the
smallest eigenvalues:
(25)
With fixed , step 2 actually solves a weighted version of the LSSMTC model gu2009learning , whose convergence property has been guaranteed. Step 2 stops when converges or reaches the maximum number of iterations , which is set to 50 in our experiments.
SPMTC runs steps 1 and 2 iteratively. In the beginning, we initialize to let half of the data examples be chosen to train the model. Then, we increase to add 10% more examples from the th task in each of the following iterations. SPMTC stops when all the data of all tasks are included. Then, the learned are the final parameter values. In the th task, the example is assigned to the th cluster if , . In summary, the pseudo code of SPMTC is described in Algorithm 1.
4.3 Algorithm Analysis
The proposed SPMTC has the following properties: First, SPMTC inherits the advantage of avoiding bad local optima and thus can find better clustering solution. Second, SPMTC with soft weighting assigns extremely small weight values to noises and outliers. Thus, these data samples will have extremely small influence in the training process. This property is consistent with the losspriorembedding property of SPL described in Meng2017Theo . That is, outliers and noises typically show large loss values and are associated with small weights, thus their negative affect can be significantly reduced. Third, the convergence of the proposed SPMTC model is theoretically guaranteed and the corresponding proof is given in Theorem 1.
Theorem 1.
SPMTC is guaranteed to converge.
Proof.
The proposed SPMTC iteratively updates model parameters () and selfpaced learning weight . In each iteration, when model parameters are updated, selfpaced learning weight can be obtained by a closedform solution, i.e., equation (11) or equation (13). When is fixed, solving model parameters of SPMTC is equivalent to solving a weighted version of the LSSMTC model, which has been theoretically prone to converge gu2009learning . Specifically, solving model parameters of SPMTC can be divided into four subproblems, which correspond to parameters , , , and , respectively. By alternatively updating each parameter with the other three fixed, the value of objective function (4.1) decreases correspondingly. Then, we obtain that the objective function value (4.1) is monotonically decreasing and is obviously lower bounded when its third part is a constant. Thus, SPMTC converges to a local minima when is fixed.
As lambda grows and more data instances are selected for training, the model parameters and selfpaced learning weight are updated accordingly. In the last iteration of SPMTC, when all the data instances are chosen, is calculated and SPMTC finally converges. ∎
5 Experiments
5.1 Experimental Setup
Data Sets
We select two popular data sets based on which we design several binary clustering tasks since our focus is to verify the effectiveness of introducing selfpaced learning to multitask clustering. The 20 Newsgroups^{1}^{1}1http://people.csail.mit.edu/jrennie/20Newsgroups/ data set is consisted of roughly 20,000 newsgroup documents, which are partitioned into 20 different newsgroups, each corresponding to a different topic. Some of the newsgroups are very closely related to each other, while others are highly unrelated. In this paper, we use a subset of 20 Newsgroup which is also used in gu2009learning , and it includes two different subsets: Comp vs Sci and Rec vs Talk. Reuters21578^{2}^{2}2http://www.cse.ust.hk/TL/ is a famous data base for text analysis. Among its five categories, orgs, people and places are the three biggest ones. We use three data sets orgs vs people, orgs vs places and people vs places generated by ling2008spectral . The WebKB dataset^{3}^{3}3http://www.cs.cmu.edu/afs/cs/project/theo20/www/data/ collects webpages of computer science departments from different universities. It can be classified into 7 different categories: student, faculty, staff, department, course, project and other. The detailed information of data sets can be found in Table 1.
Comparing Methods
We compare the proposed SPMTC with the following single clustering methods: means (KM) MacQueen:some
, spectral clustering (SC)
Luxburg:spectral , and adaptive subspace iteration (ASI) li2004document . We also use the above three methods to clustering all the tasks’ data together and present the three methods: All KM, All SC, and All ASI, respectively. The multitask clustering method LSSMTC gu2009learning is also tested in the experiments. Our methods SPMTCh and SPMTCs denote SPMTC model with hard weighting and soft weighting, respectively.Parameter Setting
We always set the number of clusters equal to the true number of data labels. For LSSMTC, SPMTCh and SPMTCs, we tune and in the same way and report the best results, where , and .
Data Set  TaskID  #Sample  #Feature  #Class  

Task 1  1875  2000  2  
Task 2  1827  2000  2  

Task 1  1844  2000  2  
Task 2  1545  2000  2  

Task 1  1237  4771  2  
Task 2  1208  4771  2  

Task 1  1016  4405  2  
Task 2  1043  4405  2  

Task 1  1077  4562  2  
Task 2  1077  4562  2  

Task 1  226  2500  6  
Task 2  252  2500  6  
Task 3  255  2500  6  
Task 4  307  2500  6 
Methods  Task1  Task2  
ACC  NMI  ACC  NMI  
KM  69.50  26.21  58.22  7.30 
SC  50.88  6.66  51.94  4.12 
ASI  87.86  52.33  69.44  18.93 
All KM  66.89  18.75  53.72  2.04 
All SC  51.57  0.00  51.94  4.12 
All ASI  87.76  52.09  84.21  37.48 
LSSMTC  89.92  56.91  75.40  29.64 
SPMTCh  90.87  59.44  81.72  35.05 
SPMTCs  92.02  61.34  76.31  29.20 
Methods  Task1  Task2  
ACC  NMI  ACC  NMI  
KM  63.72  12.57  58.33  3.92 
SC  57.66  4.87  58.87  4.20 
ASI  67.02  10.80  63.14  5.02 
All KM  64.96  12.44  59.20  3.34 
All SC  56.18  3.33  55.47  0.61 
All ASI  67.29  12.29  64.23  8.69 
LSSMTC  91.85  61.23  84.16  44.52 
SPMTCh  92.00  61.47  84.76  45.47 
SPMTCs  92.56  64.05  88.69  52.61 
Methods  Task1  Task2  
ACC  NMI  ACC  NMI  
KM  54.62  0.62  52.08  0.12 
SC  63.86  11.34  52.81  1.48 
ASI  61.35  4.52  59.67  3.56 
All KM  54.66  0.49  52.67  0.19 
All SC  52.06  0.73  52.15  0.33 
All ASI  62.37  5.15  62.30  5.01 
LSSMTC  71.45  14.05  60.26  4.08 
SPMTCh  73.27  16.18  62.12  5.14 
SPMTCs  73.67  17.31  68.62  12.33 
Methods  Task1  Task2  
ACC  NMI  ACC  NMI  
KM  55.17  0.73  58.39  1.79 
SC  65.19  9.63  61.12  3.66 
ASI  67.26  9.79  62.91  5.43 
All KM  53.25  0.71  59.18  1.89 
All SC  57.57  0.02  60.59  3.01 
All ASI  66.71  9.08  64.56  6.91 
LSSMTC  67.11  10.63  62.40  5.65 
SPMTCh  68.09  11.10  68.73  9.69 
SPMTCs  67.24  10.03  71.14  12.45 
Evaluation Measures
The accuracy (ACC) and the normalized mutual information metric (NMI) are adopted in this paper to evaluate the performance of comparing methods. Larger values of ACC or NMI indicate better clustering performance. The average results of 20 independent runs of each method are reported. test is used to assess the statistical significance of the results at 5% significance level.
5.2 Results on Real Data Sets
The clustering results are shown in Tables 27. In each column, we highlight the best and comparable results. First, we can observe from these tables that the multitask clustering methods (i.e., ASI, LSSMTC, SPMTCh, and SPMTCs) generally outperform the singletask clustering methods. Second, compared with LSSMTC, higher ACC and NMI values are achieved by SPMTCh in most of time. This demonstrates the advantage of applying SPL in the MTC model. Third, All ASI performs the best for task 2 of COMPvsSCI and task 1 of WebKB, and SC outperforms other methods for task 2 of PEOPLEvsPLACES. Besides, the proposed SPMTCs always obtains the best or comparable performance, indicating the usefulness of evaluating and utilizing weights of data instances in the model according to their loss values.
Methods  Task1  Task2  
ACC  NMI  ACC  NMI  
KM  58.18  0.17  61.69  3.09 
SC  61.16  0.91  63.53  6.31 
ASI  62.49  2.99  56.19  0.46 
All KM  58.25  0.11  61.61  3.09 
All SC  59.91  0.62  60.74  3.83 
All ASI  62.76  3.28  61.10  2.67 
LSSMTC  62.49  4.77  55.51  0.87 
SPMTCh  63.41  5.55  55.98  0.80 
SPMTCs  67.43  10.92  57.31  1.03 
Methods  Task1  Task2  Task3  Task4  
ACC  NMI  ACC  NMI  ACC  NMI  ACC  NMI  
KM  59.86  13.71  55.71  12.40  54.19  10.93  55.60  14.21 
SC  44.95  20.34  50.11  16.17  46.43  24.53  54.03  29.23 
ASI  63.53  26.27  63.65  26.75  58.43  23.88  62.86  30.55 
All KM  61.15  14.75  61.42  12.40  51.29  5.51  57.85  12.48 
All SC  61.76  23.92  53.01  19.50  60.31  18.81  66.12  37.97 
All ASI  66.72  35.09  64.92  23.74  60.62  24.31  62.93  33.02 
LSSMTC  63.09  24.82  60.00  25.61  61.09  21.31  58.89  26.47 
SPMTCh  64.34  23.80  64.29  26.00  57.18  17.61  65.60  29.24 
SPMTCm  64.07  28.84  66.19  27.79  62.20  26.99  69.51  39.64 
5.3 Sensitivity and Time Complexity Analysis
In real multitask clustering applications, if the tasks are more related, the crosstask clustering term should be considered more and should be set to a smaller value, and vice versa. In this section, we set and test the sensitivity of SPMTC w.r.t. the feature dimension of the shared space . Fig. 1 shows the results of LSSMTC, SPMTCh, and SPMTCs on PEOPLEvsPLACES. We can see that different choices of do not significantly affect the performance of SPMTC.
SPMTC needs to solve a weighted version of LSSMTC several times (which is much less than 10 in the experiments). But, SPL generally speeds the convergence because each iteration of SPL takes the model parameter values trained by the previous iteration as initializations. Thus, the time complexity of the proposed SPMTC is similar with LSSMTC. For instance, the average running time of LSSMTC and SPMTCh on PEOPLEvsPLACES is 3.1 and 4.2 seconds, respectively.
6 Conclusion and Future Work
We have proposed selfpaced multitask clustering (SPMTC) to alleviate the nonconvexity issue of traditional multitask clustering. Furthermore, we develop a soft weighting scheme of selfpaced learning for SPMTC model to further enhance its clustering performance. The convergence analysis of SPMTC is given and its effectiveness is demonstrated by experimental results on real data sets. We are interested in extending the framework proposed in this paper to more general situations in our future work, e.g., the number of clusters or feature dimensionality is different among different tasks.
Acknowledgments
This paper was in part supported by Grants from the Natural Science Foundation of China (Nos. 61806043, 61572111, and 61806043), a Project funded by China Postdoctoral Science Foundation (No. 2016M602674), a 985 Project of UESTC (No. A1098531023601041), and two Fundamental Research Funds for the Central Universities of China (Nos. ZYGX2016J078 and ZYGX2016Z003).
References
 (1) A. K. Jain, M. N. Murty, P. J. Flynn, Data clustering: a review, ACM computing surveys (CSUR) 31 (3) (1999) 264–323.
 (2) Y. Ren, Big data clustering and its applications in regional science, in: L. A. Schintler, Z. Chen (Eds.), Big Data for Regional Science, Routledge, 2018, Ch. 21, pp. 257–264.

(3)
R. Xu, D. Wunsch, Survey of clustering algorithms, IEEE Transactions on neural networks 16 (3) (2005) 645–678.

(4)
J. MacQueen, Some methods for classification and analysis of multivariate observations, in: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, 1967, pp. 281–297.

(5)
X. Huang, X. Yang, J. Zhao, L. Xiong, Y. Ye, A new weighting kmeans type clustering framework with an l2norm regularization, KnowledgeBased Systems 151 (2018) 165–179.
 (6) M. Ester, H.P. Kriegel, J. Sander, X. Xu, A densitybased algorithm for discovering clusters in large spatial databases with noise, in: Proceedings of the ACM SIGKDD international conference on Knowledge discovery and data mining, 1996, pp. 226–231.

(7)
Y. Ren, X. Hu, K. Shi, G. Yu, D. Yao, Z. Xu, Semisupervised denpeak clustering with pairwise constraints, in: Proceedings of the 15th Pacific Rim International Conference on Artificial Intelligence, 2018, pp. 837–850.

(8)
C. M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006, pp. 206–210.
 (9) D. D. Lee, H. S. Seung, Algorithms for nonnegative matrix factorization, in: In NIPS, MIT Press, 2001, pp. 556–562.
 (10) S. Huang, Y. Ren, Z. Xu, Robust multiview data clustering with multiview cappednorm kmeans, Neurocomputing 311 (2018) 197–208.
 (11) S. Huang, Z. Kang, Z. Xu, Selfweighted multiview clustering with soft capped norm, KnowledgeBased Systems 158 (2018) 1–8.
 (12) D. Comaniciu, P. Meer, Mean shift: a robust approach toward feature space analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (5) (2002) 603–619.
 (13) Y. Ren, C. Domeniconi, G. Zhang, G. Yu, A weighted adaptive mean shift clustering algorithm, in: Proceedings of the 2014 SIAM International Conference on Data Mining, 2014, pp. 794–802.
 (14) Y. Ren, U. Kamath, C. Domeniconi, G. Zhang, Boosted mean shift clustering, in: Proceedings of the The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2014, pp. 646–661.
 (15) A. Strehl, J. Ghosh, Cluster ensembles  a knowledge reuse framework for combining multiple partitions, JMLR 3 (2002) 583–617.
 (16) Y. Ren, C. Domeniconi, G. Zhang, G. Yu, Weightedobject ensemble clustering, in: Proceedings of the IEEE 13th International Conference on Data Mining, IEEE, 2013, pp. 627–636.
 (17) Y. Ren, C. Domeniconi, G. Zhang, G. Yu, Weightedobject ensemble clustering: methods and analysis, Knowledge and Information Systems 51 (2) (2017) 661–689.
 (18) J. Liu, S. Ji, J. Ye, Multitask feature learning via efficient l 2, 1norm minimization, in: Proceedings of the twentyfifth conference on uncertainty in artificial intelligence, AUAI Press, 2009, pp. 339–348.
 (19) S. Bickel, J. Bogojeska, T. Lengauer, T. Scheffer, Multitask learning for hiv therapy screening, in: Proceedings of the 25th International Conference on Machine Learning, ACM, 2008, pp. 56–63.
 (20) Y. Qi, O. Tastan, J. G. Carbonell, J. KleinSeetharaman, J. Weston, Semisupervised multitask learning for predicting interactions between hiv1 and human proteins, Bioinformatics 26 (18) (2010) i645–i652.
 (21) O. Chapelle, P. Shivaswamy, S. Vadrevu, K. Weinberger, Y. Zhang, B. Tseng, Multitask learning for boosting with application to web search ranking, in: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2010, pp. 1189–1198.

(22)
S. Li, Z.Q. Liu, A. B. Chan, Heterogeneous multitask learning for human pose estimation with deep convolutional neural network, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2014, pp. 482–489.
 (23) Q. Gu, J. Zhou, Learning the shared subspace for multitask clustering and transductive transfer classification, in: Proceedings of the 9th IEEE International Conference on Data Mining, IEEE, 2009, pp. 159–168.
 (24) Q. Gu, Z. Li, J. Han, Learning a kernel for multitask clustering, in: Proceedings of the 25th AAAI Conference on Artificial Intelligence, 2011, pp. 368–373.
 (25) Z. Zhang, J. Zhou, Multitask clustering via domain adaptation, Pattern Recognition 45 (1) (2012) 465–473.
 (26) X. Zhang, X. Zhang, H. Liu, X. Liu, Multitask multiview clustering, IEEE Transactions on Knowledge and Data Engineering 28 (12) (2016) 3324–3338.
 (27) X. Zhang, X. Zhang, H. Liu, X. Liu, Multitask clustering through instances transfer, Neurocomputing 251 (2017) 145–155.
 (28) M. P. Kumar, B. Packer, D. Koller, Selfpaced learning for latent variable models, in: Advances in Neural Information Processing Systems, 2010, pp. 1189–1197.
 (29) Y. Bengio, J. Louradour, R. Collobert, J. Weston, Curriculum learning, in: Proceedings of the 26th International Conference on Machine Learning, 2009, pp. 41–48.
 (30) L. Jiang, D. Meng, Q. Zhao, S. Shan, A. G. Hauptmann, Selfpaced curriculum learning, in: Proceedings of the 29th AAAI Conference on Artificial Intelligence, 2015, pp. 2694–2700.
 (31) L. Jiang, D. Meng, T. Mitamura, A. G. Hauptmann, Easy samples first: Selfpaced reranking for zeroexample multimedia search, in: Proceedings of the 22nd ACM International Conference on Multimedia, ACM, 2014, pp. 547–556.
 (32) T. Pi, X. Li, Z. Zhang, D. Meng, F. Wu, J. Xiao, Y. Zhuang, Selfpaced boost learning for classification, in: Proceedings of the 25th International Joint Conference on Artificial Intelligence, 2016, pp. 1932–1938.
 (33) Y. Ren, P. Zhao, Y. Sheng, D. Yao, Z. Xu, Robust softmax regression for multiclass classification with selfpaced learning, in: Proceedings of the International Joint Conference on Artificial Intelligence, 2017, pp. 2641–2647.
 (34) T. Evgeniou, M. Pontil, Regularized multitask learning, in: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2004, pp. 109–117.
 (35) B. Liu, Z. Xu, B. Dai, H. Bai, X. Fang, Y. Ren, S. Zhe, Learning from semantically dependent multitasks, in: Proceedings of the International Joint Conference on Neural Networks, 2017, pp. 3498–3505.
 (36) A. Jalali, S. Sanghavi, C. Ruan, P. K. Ravikumar, A dirty model for multitask learning, in: Advances in Neural Information Processing Systems, 2010, pp. 964–972.
 (37) T. Li, S. Ma, M. Ogihara, Document clustering via adaptive subspace iteration, in: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, 2004, pp. 218–225.
 (38) S. Xie, H. Lu, Y. He, Multitask coclustering via nonnegative matrix factorization, in: Proceedings of the 21st International Conference on Pattern Recognition, IEEE, 2012, pp. 2954–2958.
 (39) S. AlStouhi, C. K. Reddy, Multitask clustering using constrained symmetric nonnegative matrix factorization, in: Proceedings of the 2014 SIAM International Conference on Data Mining, SIAM, 2014, pp. 785–793.
 (40) X.L. Zhang, Convex discriminative multitask clustering, IEEE transactions on pattern analysis and machine intelligence 37 (1) (2015) 28–40.
 (41) X. Zhang, X. Zhang, H. Liu, Selfadapted multitask clustering, in: Proceedings of the TwentyFifth International Joint Conference on Artificial Intelligence, 2016, pp. 2357–2363.
 (42) K. Tang, V. Ramanathan, F.F. Li, D. Koller, Shifting weights: Adapting object detectors from image to video, in: Advances in Neural Information Processing Systems, 2012, pp. 647–655.
 (43) W. Xu, W. Liu, X. Huang, J. Yang, S. Qiu, Multimodal selfpaced learning for image classification, Neurocomputing 309 (2018) 134–144.
 (44) Y. Ren, P. Zhao, Z. Xu, D. Yao, Balanced selfpaced learning with feature corruption, in: Proceedings of the International Joint Conference on Neural Networks, 2017, pp. 2064–2071.
 (45) X. Que, Y. Ren, J. Zhou, Z. Xu, Regularized multisource matrix factorization for diagnosis of alzheimer’s disease, in: Proceedings of the International Conference on Neural Information Processing, 2017, pp. 463–473.
 (46) K. Murugesan, J. Carbonell, Selfpaced multitask learning with shared knowledge, in: Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017, pp. 2522–2528.
 (47) C. Li, J. Yan, F. Wei, W. Dong, Q. Liu, H. Zha, Selfpaced multitask learning, in: Proceedings of the 31st AAAI Conference on Artificial Intelligence, 2017, pp. 2175–2181.
 (48) C. H. Ding, T. Li, M. I. Jordan, Convex and seminonnegative matrix factorizations, IEEE transactions on pattern analysis and machine intelligence 32 (1) (2010) 45–55.
 (49) D. Meng, Q. Zhao, L. Jiang, A theoretical understanding of selfpaced learning, Information Sciences 414 (2017) 319–328.

(50)
X. Ling, W. Dai, G.R. Xue, Q. Yang, Y. Yu, Spectral domaintransfer learning, in: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2008, pp. 488–496.
 (51) U. V. Luxburg, A tutorial on spectral clustering, Statistics and Computing 17 (4) (2007) 395–416.
Comments
There are no comments yet.