Introduction
As one of the major topics in the machine learning and the data mining communities, clustering algorithms aim to group a set of samples into several clusters such that samples from intraclusters are more similar to each other than samples from interclusters
[Hartigan1975]. The most commonlyused clustering methods in practice aremeans and its soft version, i.e. Gaussian mixture models. In particular, after initialization of cluster centers,
means clustering alternates between two steps: membership assignment of samples and update of cluster centers, until satisfactory convergence reaches. Due to its properties of simplicity, efficiency, and interpretability, means clustering has been greatly developed in recent years in both computational and theoretical aspects [Ding et al.2015, Newling and Fleuret2016, Georgogiannis2016].As with most of the machine learning algorithms, means clustering has been extended to a kernel version by mapping data into a highdimensional feature space with the kernel trick [Girolami2002]. In this way, kernel means can handle data that is not linearly separable in the original feature space. The cluster structure obtained with
means and its kernel version is closely related to the initialization, and inappropriate initial cluster centers would render the sumofsquare minimization to a local minimum. Fortunately, the original optimization problem can be formulated as a constrained trace minimization problem and optimized with the eigenvalue decomposition of the associated matrix
[Schölkopf, Smola, and Müller1998, Ding and He2004]. On the other hand, similar to other kernel methods, the performance of kernel means clustering is largely dependent on the choice of the kernel function. However, the most suitable kernel for a particular task is unknown in advance.In most realworld applications, samples are characterized by features from multiple groups. For example, flowers can be classified based on three different features: shape, color, and texture
[Nilsback and Zisserman2006]. Web pages can be represented with their content and the texts of inbound links [Bickel and Scheffer2004]. These features are different in attributes, scales, etc. and provide complementary views for the representation of datasets. Therefore, rather than concatenating different views into one or simply using one of the views, it is preferred to integrate distinctive views optimally based on learning algorithms, which is known as multiview learning or multiple kernel learning. In the literature of clustering, the existing work on the combination strategy of data integration is divided into two categories: multiview clustering and multiple kernel clustering.Related Work
Multiview clustering attempts to obtain consistent cluster structures from different views [Bickel and Scheffer2004, Chaudhuri et al.2009, Kumar and Daumé2011, Wang, Nie, and Huang2013, Chao, Sun, and Bi2017]. In [Bickel and Scheffer2004], multiview versions of clustering approaches, including
means, expectation maximization and hierarchical agglomerative methods, are studied for document clustering to demonstrate their advantages over singleview counterparts. The work in
[Kumar and Daumé2011]proposes to constrain the similarity graph from one view with the spectral embedding from the other view in the framework of spectral clustering using the idea of
cotraining. Based on canonical correlation analysis, [Chaudhuri et al.2009] presents a simple subspace learning method for multiview clustering under a natural assumption that different views are uncorrelated given the label of the cluster. In consideration of the limitation that most existing work on data fusion assumes the same weight for features from one source, [Wang, Nie, and Huang2013] provides a novel framework for multiview clustering which learns a weight for individual feature via a structured sparsity regularization.Following the central idea of multiple kernel learning that multiple kernels of different similarity measurement are combined with coefficients to obtain an optimal linearly or nonlinearly kernel combination [Gönen and Alpaydın2011], multiple kernel clustering utilizes the combined kernel in clustering tasks associated with multiview data since different kernel corresponds to different view naturally [Zhao, Kwok, and Zhang2009, Huang, Chuang, and Chen2012, Lu et al.2014, Liu et al.2016, Wang et al.2017, Zhu et al.2018]. For example, [Zhao, Kwok, and Zhang2009]
proposes a multiple kernel version of maximum margin clustering, which searches for cluster labeling, maximum margin hyperplane, and the optimal combine kernel simultaneously. The obtained nonconvex optimization problem is resolved with a variant of the cutting plane algorithm. Based on a kernel evaluation measure: centered kernel alignment,
[Lu et al.2014] integrates the clustering task into the framework of multiple kernel learning. Considering the correlation between different kernels, the work in [Liu et al.2016] adds a matrixinduced regularization term in the objective of multiple kernel clustering to reduce the redundancy of kernels. In [Wang et al.2017], the deep neural network is utilized to approximate the generation of multiple kernels and optimization process, which makes multiple kernel clustering applicable to largescale problems.
We focus on multiple kernel clustering in this paper. Although a lot of efforts have been made during the past years to improve the efficiency and robustness of multiple kernel clustering, there are still two major problems with the exiting work. First, few of them consider the dissimilarity between kernels. In other words, the combination coefficients of kernels are updated independently, which results in the fact that the selected kernels might contain high redundancy. Second, none of them models the sparsity of combination coefficients based on the diversity of kernels. Due to the norm constraint imposed on the combination weights, the coefficients of kernels with low dissimilarity would be reduced undesirably, which could highlight the importance of inappropriate kernels. Selecting a diverse subset from the prespecified kernels would mitigate these two problems and enhance the quality of the combined kernel.
Our Contributions
Motivated by the representatives used in dissimilaritybased sparse subset selection [Zhou and Zhao2016, Elhamifar, Sapiro, and Sastry2016], we propose a new approach for multiple kernel clustering with the representative kernels. A subset of the base kernels termed representative kernels are selected and integrated to construct the optimal kernel combination. The key insight of the proposed approach is that all prespecified kernels can be characterized by the representative kernels. In particular, if one kernel is selected by another kernel as the representative kernel, it indicates that the similarity measurements in these two kernels are relevant. By imposing a constraint that only some of the kernels are selected with a diversity regularization, we obtain a subset of kernels whose magnitude is smaller than that of the prespecified kernels. In addition, the number of representative kernels is determined by the training data automatically. In contrast to the previous work in [Liu et al.2016] that imposes a matrixinduced regularization to reduce the risk of assigning large weights to pairwise kernels with high correlation simultaneously, our approach introduces a new strategy that each base kernel can be encoded (represented) with other kernels and manages to minimize the total encoding cost. As a result, the obtained representative kernels are a sparse and diverse subset of the prespecified kernels due to the implicit sparsity constraint (norm) on the combination coefficients. In summary, the contributions of our work are:

A representative kernels selection method is introduced to construct a diverse subset of the prespecified kernels for multiple kernels clustering.

The strategy of representative kernels selection is incorporated into the objective function of multiple kernel mean clustering seamlessly.

An alternating minimization method is developed to optimize the cluster membership and combination coefficients alternatively.

Experimental results on several benchmark and realworld datasets of multiple kernel learning demonstrate the effectiveness of the proposed approach.
The rest of our work is organized as follows. We first introduce the proposed approach, including the preliminaries on multiple kernel means clustering, representative kernels selection, multiple kernel clustering with representative kernels and alternating optimization, and then evaluate our approach on several datasets in comparison with the stateoftheart methods. Finally, we conclude this paper and give some directions for future work.
The Proposed Approach
This section presents multiple kernel clustering by selecting representative kernels. We first present the preliminaries on multiple kernel means clustering, and then introduce the strategy for representative kernels selection. Next, we incorporate this strategy into the objective function of multiple kernel means clustering. Finally, an alternating minimization method is developed to optimize the combination coefficients and cluster membership alternatively.
Multiple Kernel Means Clustering
Given a set of samples , kernel
means clustering aims to minimize the sumofsquares loss function over the cluster indicator matrix
, which is formulated as an optimization problem as follows,(1)  
s.t. 
where is a function that maps the original features onto a reproducing kernel Hilbert space , and and are the centroid and number of the th cluster, respectively.
The optimization problem in Eq. (1
) can be rewritten as the following matrixvector form,
(2)  
s.t. 
where is a kernel matrix with the th element , is a column vector with all elements equal to 1, and . It is difficult to solve the above optimization problem due to the discrete variable in Eq. (2). Fortunately, the optimization problem can be approximated by relaxing with (where is obtained by taking the square root of the diagonal elements in ). In this way, we can obtain a relaxed version of the optimization problem,
(3)  
s.t. 
where
is an identity matrix of size
.In the framework of multiple kernel learning, each sample has several feature representations associated with a group of feature mappings . In particular, each sample is represented as , where denotes the weights of base kernels and needs to be learned during optimization. Therefore, the th element of the combined kernel over the above mapping function can be formulated as,
(4) 
By replacing the single kernel in Eq.(3) with this combined kernel , we can obtain the optimization objective of multiple kernel means clustering as follows,
(5)  
s.t. 
As will be detailed hereinafter, this optimization problem can be solved by alternatively updating and .
Representative Kernels Selection
Given a collection of base kernels , our goal is to find a diverse subset of , dubbed representative kernels, that could represent the collection [Elhamifar, Sapiro, and Vidal2012, Elhamifar, Sapiro, and Sastry2016].
Dissimilarity between Kernels
Assume that the pairwise dissimilarity between base kernels and is given by , which indicates how well represents . Specifically, the smaller the value of dissimilarity is, the better the th base kernel represents the th base kernel . To reduce the redundancy and select a subset of base kernels as the representatives, we first define a measurement that is able to characterize the dissimilarity between pairwise kernels. Such dissimilarity can be directly computed by using the Euclidean distance or the inner products between base kernel matrix. Here we utilize the measurement adopted in [Liu et al.2016] as follows,
(6) 
A larger means the high dissimilarity between and , while a smaller value implies that their dissimilarity is low. Advanced dissimilarity measurement such as Bregman matrix divergence [Kulis, Sustik, and Dhillon2009] would be discussed in the future work. The dissimilarities can be arranged into a matrix of the following form,
where denotes the th row of .
Constrained Linear Optimization
We consider an optimization program on unknown variables associated with the dissimilarity . The matrix of all variables can be arranged into a matrix of the following form,
where is the th row of . The th element is interpreted as the indicator of representing . In particular, if the th base kernel is the representative of the th base kernel and otherwise. To ensure that each base kernel is represented by one representative kernel, we constrain .
Define the cost of encoding with is , then the cost of encoding with and the cost of encoding are and , respectively. The goal of selecting a representative subset from is that the selected representative kernels could well encode according to the dissimilarities, i.e., the encoding cost should be as small as possible. Therefore, we have the following equality constrained minimization program,
(7)  
s.t. 
where the objective function corresponds to the total cost of encoding via representatives. Due to the norm in the constraints, there would be zero rows in , which means that some base kernels are not the representative of any kernels in . Therefore, the nonzero rows of correspond to the representative kernels.
Convex Relaxation
The constraints in Eq. (7
) contains binary variables
, which makes the optimization nonconvex and NPhard in general. To make the optimization convex, the relaxation is needed for the program. In particular, we relax the binary constraints to, which can be viewed as the probability that
is the representative of . Thus, we have the following convex minimization program,(8)  
s.t. 
In this way, we obtain a soft assignment of representatives, i.e. .
Multiple Kernel Means Clustering by Selecting Representative Kernels
To reduce the redundancy of kernels by selecting representative kernels in the process of multiple kernel clustering, we integrate the strategy of representative kernels selection into the objective function of multiple kernel means, and associate , the probability of representing , with the weight of each base kernels. In particular, we define the weight of the base kernel as the average probability of base kernel representing all the base kernels as follows,
(9) 
Since , we have
(10) 
which indicates that the weights are valid coefficients of base kernels.
Therefore, the optimization objective of multiple kernel means clustering in Eq.(5) can be written as the following form,
(11)  
s.t. 
where denotes a column vector whose elements are all equal to one,
is the zero matrix of size
, and is defined as follows,(12)  
Rewriting representative kernels selection Eq. (8) in the matrix form and integrating it into Eq. (11), we obtain the final optimization problem of the proposed algorithm,
(13)  
s.t. 
where the parameter controls the diversity of representative kernels.
Alternating Optimization
Finally, we optimize the optimization problem Eq. (13). There are two parameters and in Eq. (13), which can be solved by the alternating gradient descent method.
Given , the optimization problem with respect to is a standard kernel mean clustering problem, i.e. Eq. (3), and the optimal can be obtained by taking the eigenvectors that correspond to the largest eigenvalues of . Specifically, Eq. (3) can be written as
(14)  
s.t. 
By interpreting the columns of as a collection of mutually orthonormal basis vectors , the objective can then be written as
(15) 
Choosing proportional to the largest eigenvectors of , we would obtain the maximal value of the objective [Welling2013].
Given , the optimization problem with respect to can be written in the following form,
(16)  
s.t. 
where and is the th row of . This optimization problem can be rewritten as
(17)  
s.t. 
where . It is obvious that Eq. (17) is a convex quadratic programming (QP) problem with decision variables, equality constraints, and inequality constraints. Therefore, we can solve it with standard QP solver [Grant and Boyd2014], and then the weights of base kernels can be computed with Eq. (9).
The main algorithm of the proposed approach is summarized in Algorithm 1. We analyze the computational complexity of the proposed approach, which is composed of four main parts as follows:

In the beginning, the kernel matrices are needed to compute, whose cost is .

Then the computational complexity of dissimilarity matrix with Eq. (6) is .

Next, after obtaining the combined kernel , the complexity of eigendecomposition to update with Eq. (14) is in each iteration.

Finally, the standard QP solver to update with Eq. (17) typically needs complexity in each iteration.
Assuming that is the number of iteration, the total complexity of the proposed approach is . Since in general, for example, , , in our experiments, we have . Therefore, the final computational complexity is approximated by , which is equal to the complexity of the vanilla MKKM.
Experimental Studies
Datasets and Experimental Setup
The clustering algorithms are performed on seven benchmark datasets and two Flowers datasets that are frequently used in different clustering methods for performance evaluation. Three of the benchmark datasets are collected from text corpora, and the remaining four are image datasets. The Flowers datasets are collected from http://www.robots.ox.ac.uk/~vgg/data/flowers/. The detailed descriptions of these datasets are presented in Table 1.
Name  # Samples  # Features  # Classes 

TR11  414  6429  9 
TR41  878  7454  10 
TR45  690  8261  10 
JAFFE  213  676  10 
ORL  400  1024  40 
AR  840  768  20 
COIL20  1440  768  20 
Flowers17  1360  7 (# Kernel)  17 
Flowers102  8189  4 (# Kernel)  102 
Dataset  Metric  SBKKM  AMKKM  MKKM  LMKKM  RMKKM  MKKMMR  Proposed 

TR11  Acc  
NMI  
Purity  
TR41  Acc  
NMI  
Purity  
TR45  Acc  
NMI  
Purity  
JAFFE  Acc  
NMI  
Purity  
ORL  Acc  
NMI  
Purity  
AR  Acc  
NMI  
Purity  
COIL20  Acc  
NMI  
Purity  
Flowers17  Acc  
NMI  
Purity  
Flowers102  Acc  
NMI  
Purity  
Computational Complexity  – 
Following the strategy that most multiple kernel learning methods utilize, twelve different kernel functions are employed to construct the base kernels for seven benchmark datasets. Specifically, these kernel functions include one cosine function kernel , four polynomial function kernels with and
, and seven radial basis function kernels
with , where is the maximum distance between pairwise samples and . The kernel matrices for Flowers datasets are precomputed and downloaded directly from the above website. All of the constructed kernels are normalized and scaled to through .For all clustering methods and datasets, the number of clusters is set to be the true number of classes, i.e., we assume the true number of clusters is known in advance. In addition, the parameters of the clustering methods are selected by grid search. In particular, the parameter search scope of the comparative methods adopt the suggestions in their original papers. For the proposed approach, the search ranges of the diversity parameter are . Besides, three metrics are employed to evaluate the performance of clustering results, including clustering accuracy (Acc), normalized mutual information (NMI) and purity. Moreover, to reduce the influence induced by the random initialization in means, all experiments on different clustering algorithms are repeated for times and the best results are reported.
Comparative Approaches
To demonstrate the competitiveness of our approach, we compare our approach with following recently proposed strategies for multiple kernel means clustering:

Single Best Kernel means (SBKKM): This approach performs kernel means on every single kernel and reports the best result of them.

Average Multiple Kernel means (AMKKM): In this case, the final kernel is constructed by a linear combination of the equalweighted single kernel.

Multiple Kernel means (MKKM): As introduced in multiple kernel clustering, MKKM conducts kernel means clustering and updates kernel coefficients alternatively [Yu et al.2012].

Localized Multiple Kernel means (LMKKM): LMKKM assigns each single kernel function a samplespecific weight such that the final kernel is in a form of localized combination [Gönen and Margolin2014].

Robust Multiple Kernel means (RMKKM): To improve the robustness of MKKM, RMKKM replaces the squared Euclidean distance between the data point and the cluster center with the norm [Du et al.2015].

Multiple Kernel means with Matrixinduced Regularization (MKKMMR): MKKMMR constrains the objective function of MKKM with a matrixinduced regularization term to reduce the redundancy between base kernels [Liu et al.2016]
Results and Discussion
The experimental results with respect to Acc, NMI, and Purity are reported in Table 2, in which the best results are boldfaced and the last row is the computational complexity. From these results, we obtain the following conclusions:

In comparison with six competitive approaches, the proposed approach obtains the best results on eight out of nine datasets with respect to Acc and NMI, and is only slightly inferior to MKKMMR and RMKKM on dataset TR11 and TR45, respectively. As for Purity, our approach beats the competitive approaches on all nine datasets. Therefore, the proposed approach is superior to the comparative approaches.

Single best kernel means method performs better than the multiple kernel means with equal weights on several datasets, which indicates that the inappropriate kernel functions would degrade the performance of kernel means algorithm, and highlights the importance of the kernel selection in multiple kernel means method.

The performance of vanilla MKKM is slightly inferior to the single best kernel means in most cases. However, some appropriate strategies for kernel weights learning, such as LMKKM and RMKKM, would improve the multiple kernel means and usually obtain better performance than the single best kernel mean.

The superior results obtained by MKKMMR and our approach reveal that enhancing the diversity between pairwise base kernels has a beneficial effect on the performance of multiple kernel mean. In addition, by characterizing the prespecified kernels with representative kernels, the proposed approach improves MKKMMR in terms of effectiveness.
In a nutshell, these observations demonstrate the advantages and effectiveness of the proposed approach.
Parameter Sensitivity and Convergence
The parameter in the objective function of the proposed approach controls the diversity of base kernels. To analyze the effect of on the clustering performance, we illustrate the results on one image dataset ORL and one document dataset TR11 in Figure (a)a and Figure (b)b, respectively. As we can see, the performance on image dataset ORL is stable with respect to . For dataset TR11, with the increase of , the clustering performance drops to the minimum at , and keeps stable afterward.
In addition, we illustrate the obtained matrix on dataset ORL with different diversity parameter in Figure 2. It can be observed that when is small, many base kernels, such as , select more than one kernels as their representatives with moderate probabilities (indicated by gray and black colors). However, as the value of becomes large, more base kernels select just one kernel as their representatives. In particular, when , only base kernels are selected as representatives (nonzero rows), and a lot of the probabilities are close to .
Moreover, the effect of the regularization parameter in the proposed approach and MKKMMR on the number of the selected kernels are shown in Figure (a)a and Figure (b)b, respectively. In contrast to the trend in MKKMMR that the number of selected kernels increases first and then decrease with the increase of , the number of selected kernels obtained by our approach fluctuates but tends to decrease in the long run on datasets ORL and TR11. These results indicate the proposed approach is more explainable due to the expectation that algorithms with larger regularization parameter should select fewer base kernels.
Finally, the objective value of the proposed approach at each iteration is plotted in Figure 4, from which we can observe that our approach converges to the optimal value in less than iterations in most cases.
Conclusion
This paper presents a new approach for multiple kernel clustering by selecting representative kernels to improve the quality of the combined kernel. More concretely, we first devise a strategy to select a diverse subset of the prespecified kernels, and then incorporate this representative kernels selection strategy into the objective function of multiple kernel means method. Finally, an alternating optimization method is developed to optimize the clustering membership and the kernel weights alternatively. Experimental results on several benchmark and realworld datasets validate the advantages and effectiveness of the proposed approach. In the future work, we plan to develop a customized optimization method for the proposed approach with resort to the alternating direction method of multipliers framework to reduce the computational complexity.
References
 [Bickel and Scheffer2004] Bickel, S., and Scheffer, T. 2004. Multiview clustering. In 2004 IEEE 4th International Conference on Data Mining, volume 4, 19–26. IEEE.
 [Chao, Sun, and Bi2017] Chao, G.; Sun, S.; and Bi, J. 2017. A survey on multiview clustering. arXiv preprint arXiv:1712.06246.
 [Chaudhuri et al.2009] Chaudhuri, K.; Kakade, S. M.; Livescu, K.; and Sridharan, K. 2009. Multiview clustering via canonical correlation analysis. In Proceedings of the 26th International Conference on Machine Learning, 129–136. ACM.

[Ding and He2004]
Ding, C., and He, X.
2004.
Kmeans clustering via principal component analysis.
In Proceedings of the 21th International Conference on Machine Learning, 29. ACM.  [Ding et al.2015] Ding, Y.; Zhao, Y.; Shen, X.; Musuvathi, M.; and Mytkowicz, T. 2015. Yinyang kmeans: A dropin replacement of the classic kmeans with consistent speedup. In Proceedings of the 30th International Conference on Machine Learning, 579–587.

[Du et al.2015]
Du, L.; Zhou, P.; Shi, L.; Wang, H.; Fan, M.; Wang, W.; and Shen, Y.D.
2015.
Robust multiple kernel kmeans using l21norm.
In
Proceedings of the 24th International Joint Conference on Artificial Intelligence
, 3476–3482. AAAI Press.  [Elhamifar, Sapiro, and Sastry2016] Elhamifar, E.; Sapiro, G.; and Sastry, S. S. 2016. Dissimilaritybased sparse subset selection. IEEE Transactions on Pattern Analysis and Machine Intelligence 38(11):2182–2197.
 [Elhamifar, Sapiro, and Vidal2012] Elhamifar, E.; Sapiro, G.; and Vidal, R. 2012. Finding exemplars from pairwise dissimilarities via simultaneous sparse recovery. In Advances in Neural Information Processing Systems, 19–27.
 [Georgogiannis2016] Georgogiannis, A. 2016. Robust kmeans: a theoretical revisit. In Advances in Neural Information Processing Systems, 2891–2899.
 [Girolami2002] Girolami, M. 2002. Mercer kernelbased clustering in feature space. IEEE Transactions on Neural Networks 13(3):780–784.
 [Gönen and Alpaydın2011] Gönen, M., and Alpaydın, E. 2011. Multiple kernel learning algorithms. Journal of Machine Learning Research 12(Jul):2211–2268.
 [Gönen and Margolin2014] Gönen, M., and Margolin, A. A. 2014. Localized data fusion for kernel kmeans clustering with application to cancer biology. In Advances in Neural Information Processing Systems, 1305–1313.
 [Grant and Boyd2014] Grant, M., and Boyd, S. 2014. CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx.
 [Hartigan1975] Hartigan, J. A. 1975. Clustering algorithms.
 [Huang, Chuang, and Chen2012] Huang, H.C.; Chuang, Y.Y.; and Chen, C.S. 2012. Multiple kernel fuzzy clustering. IEEE Transactions on Fuzzy Systems 20(1):120–134.
 [Kulis, Sustik, and Dhillon2009] Kulis, B.; Sustik, M. A.; and Dhillon, I. S. 2009. Lowrank kernel learning with bregman matrix divergences. Journal of Machine Learning Research 10(Feb):341–376.
 [Kumar and Daumé2011] Kumar, A., and Daumé, H. 2011. A cotraining approach for multiview spectral clustering. In Proceedings of the 28th International Conference on Machine Learning, 393–400.
 [Liu et al.2016] Liu, X.; Dou, Y.; Yin, J.; Wang, L.; and Zhu, E. 2016. Multiple kernel kmeans clustering with matrixinduced regularization. In AAAI, 1888–1894.
 [Lu et al.2014] Lu, Y.; Wang, L.; Lu, J.; Yang, J.; and Shen, C. 2014. Multiple kernel clustering based on centered kernel alignment. Pattern Recognition 47(11):3656–3664.
 [Newling and Fleuret2016] Newling, J., and Fleuret, F. 2016. Nested minibatch kmeans. In Advances in Neural Information Processing Systems, 1352–1360.

[Nilsback and
Zisserman2006]
Nilsback, M.E., and Zisserman, A.
2006.
A visual vocabulary for flower classification.
In
2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
, volume 2, 1447–1454. IEEE.  [Schölkopf, Smola, and Müller1998] Schölkopf, B.; Smola, A.; and Müller, K.R. 1998. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10(5):1299–1319.
 [Wang et al.2017] Wang, Y.; Liu, X.; Dou, Y.; and Li, R. 2017. Approximate largescale multiple kernel kmeans using deep neural network. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, 3006–3012. AAAI Press.
 [Wang, Nie, and Huang2013] Wang, H.; Nie, F.; and Huang, H. 2013. Multiview clustering and feature learning via structured sparsity. In Proceedings of the 30th International Conference on Machine Learning, 352–360.
 [Welling2013] Welling, M. 2013. Kernel kmeans and spectral clustering.
 [Yu et al.2012] Yu, S.; Tranchevent, L.; Liu, X.; Glanzel, W.; Suykens, J. A.; De Moor, B.; and Moreau, Y. 2012. Optimized data fusion for kernel kmeans clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(5):1031–1039.
 [Zhao, Kwok, and Zhang2009] Zhao, B.; Kwok, J. T.; and Zhang, C. 2009. Multiple kernel clustering. In Proceedings of the 2009 SIAM International Conference on Data Mining, 638–649. SIAM.
 [Zhou and Zhao2016] Zhou, Q., and Zhao, Q. 2016. Flexible clustered multitask learning by learning representative tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence 38(2):266–278.
 [Zhu et al.2018] Zhu, X.; Liu, X.; Li, M.; Zhu, E.; Liu, L.; Cai, Z.; Yin, J.; and Gao, W. 2018. Localized incomplete multiple kernel kmeans. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, 3271–3277. AAAI Press.
Comments
There are no comments yet.