I Introduction
Data driven  Selfweighted Learning  Framework  
MDcR [zhang2017flexible]  ✘  
MSE [Xia2010]  ✘  ✘  
GMA [sharma2012generalized]  ✘  
CCA [michaeli2016nonparametric]  ✘  ✘  
MvDA [kan2016multi]  ✘  ✘  
CoRegu [kumar2011co]  ✘  ✘  ✘ 
KMSA (Ours) 
With the development of information technology, we have witnessed a surge of techniques to describe the same sample from multiple views [Xia2010, guo2018partial, LiG2012, wang2017unsupervised, Yang2017TCYB] . Commonly, multiview data generated from various descriptors [wei2018glad] or sensors are commonly seen in realworld applications [XuW2018, cao2015diversity, zhang2017latent], which hastens the related research on multiview learning [zhang2016deep]. For example, one image can always be represented by different descriptors, such as Local Binary Patters (LBP) [Ojala2002], ScaleInvariant Feature Transform (SIFT) [rublee2011orb], Histogram [dalal2005histograms] and Localityconstrained Linear Coding (LLC) [wang2010locality] etc. For text analysis [dong2018predicting], documents can be written in different languages [bisson2012co]. Notably, multiview data may share consistent correlation information [wang2015robust, wang2016iterative, wang2018multiview], which is crucial to greatly promote the performance of related tasks [dhillon2011multi, li2002statistical, zhang2018generalized, liu2018late].
Nowadays, multiview dimension reduction (DR) methods have been well studied in many applications [wu2018and, nie2018auto, nie2018multiview]. In particular, Kumar et al. [kumar2011co] proposed a multiview spectral embedding approach by introducing a coregularized framework which can narrow the divergence between graphs from multiple views. Xia et al. [Xia2010] introduced an autoweighted method to construct common lowdimensional representations for multiple views, which has achieved good performance for image retrieval and clustering. Wang et al [wang2018multiview] exploited the consensus of multiview structures beyond the low rankness to construct lowdimensional representations for multiview data, to boost the clustering performance. Kan et al. [kan2016multi] extend Linear Discriminant Analysis (LDA) [izenman2013linear] to Multiview Discriminant Analysis (MvDA) which updates the projection matrices for all views through an iterative procedure. Luo et al. [luo2015tensor]
proposed a tensor CCA to deal with multiview data in general tensor form. Tensor CCA is an extension of CCA
[michaeli2016nonparametric] which has achieved the ideal performances in many applications. Zhang et al. [zhang2017flexible] proposed a novel method to flexibly exploit the complementary information between multiple views on the stage of dimension reduction, meanwhile preserving the similarity of the data points across different views.Up to now, most of the multiview DR methods [kumar2011co, Xia2010, nie2018auto] are graphbased approaches [cui2017general] which care more about data correlations, while overlooked the information regarding the multiview data. Likewise, such limitations hold for abundant research[kumar2011co, nie2018auto]. To name a few typical work below: Multiview Spectral Embedding (MSE) [Xia2010] is an extension of Laplacian Eigenmaps (LE) [belkin2002laplacian] and considers the laplacian graphs between multiview data rather than the information within the data representation. Kumar et al. [kumar2011co] also exploited only the information within the laplacian graphs and utilized a coregularized term to minimize the divergence between different views. However, this method failed to exploit the information within the multiview data representation. Even though there are some approaches, such as MvDA [izenman2013linear], CCA [michaeli2016nonparametric], etc., can fully consider the original multiview data and extend traditional DR [mika1999fisher] to the multiview version, they failed to provide a general framework to most of the DR approaches. Therefore, how to construct a general framework to integrate features from multiple views to construct lowdimensional representations, while achieve the ideal performance is the goal.
In this paper, we aim to develop a unified framework to project multiview data into a lowdimensional subspace. Our proposed KMSA is equipped with a selfweighted learning method to make different weights for multiple views according to their contributions. We also discuss the influence of the parameter in KMSA for the learned weights of multiple views in . Furthermore, KMSA adopts the coregularized term to minimize divergence between each two views, which can encourage all views to learn from each other. The construction process of KMSA has been shown as Fig.1. We compare the proposed KMSA with some typical methods in TABLE I.
We remark that Yan et al. [yan2007graph] proposed a framework of dimension reduction techniques. Different from that, our proposed KMSA extend it into kernel space with multiviews to address the problem that are caused by different dimensions of features from multiple views. Then, KMSA adopts a selfweighted learning trick to make different weights to these views according to their contributions. Finally, KMSA is equipped with a coregularized term to minimize the divergence between different views, so as to achieve the multiview consensus.


Notation  Description 
set of all features in the th view  
set of all lowdimensional representations in the th view  
the th feature in the th view  
the lowdimensional representation for  
the dimension of features in the th view  
the spares relationships for the th feature in the th view  
the projection direction for the th view  
the sparse reconstructive matrix for features in the th view  
kernel matrix for features in the th view  
coefficients matrix for the th view  
coefficients vector for the th view 

the weighting factor for the th view  
the power exponent for the weight  
the constraint matrix for the th view  

The major contributions of this paper are summarized as follows:

we developed a novel framework named KMSA for the task of multiview dimension reduction. We discussed that most of the eigendecompositionbased DR methods [jolliffe2011principal, he2004locality] can be extended to the corresponding multiview versions throughout KMSA.

KMSA fully considers both the singleview graph correlations between multiple views to calculate the importance of all views, which attempts to combine selfweighted learning with coregularized term, so as to deeply exploit the information from multiview data.

We discussed the details of the optimization process for KMSA, the results have shown that our proposed method can achieve the stateoftheart performance.
Ii Kernelbased Multiview Embedding with Selfweighted Learning
In this section, we discuss the intuition of our proposed method named KMSA.
Assume we are given a multiview dataset which consists of samples from views, where contains all features from the th view. is the dimensions of features from the th view. is the number of training samples. The goal of KMSA is to construct an appropriate architecture to obtain lowdimensional representations for the original multiview data, where . Notations utilized in this paper are summarized in Table II.
Ii1 Kernelization for Multiview Data
The proposed KMSA extended singleview DR method into kernel spaces which provides a feasible way for direct manipulations on the multiview data rather than similarity graphs. Before taking kernel space into considerations, KMSA exploits the heterogenous information for each view as follows:
(1) 
where is the projection vector. is the correlation between and in the th view. or according to their respective different constraints of various dimension reduction algorithms. Most algorithms can be generated automatically by using different construction tricks of and , which has been illustrated in [yan2007graph]. can be further expressed as according to the mathematical transformation [yan2007graph] and , where is the diagonal matrix and . In order to facilitate KMSA to handle multiview data, we project all feature representations into kernel space as . is a nonlinear mapping function. contains the features which have been mapped into the kernel space .
Then, we extend Eq.1 into the kernel representation as follow:
(2) 
where is the projection direction of and locates in the space spanned by . Consequently, can be replaced with . Then, Eq.2 can be further modified as follows:
(3) 
is the kernel matrix which is symmetric and . or which is corresponding to the setting of . Therefore, if we want to obtain an optimal subspace with dimensions, can be utilized to construct the subspace, corresponding to the largest
positive eigenvalues of
, which is equivalent to find the coefficients matrix as follows:(4) 
The lowdimensional representations of original are . Even though we can extend DR methods into the kernel space to avoid the problem that the dimensions of features from multiple views are different from each other, the construction procedures of are still independent and waste a lot of information from the other views.
Ii2 Selfweighted Learning of the Weights for Multiple Views
In order to integrate information from multiple views, the most straightforward way is to minimize the sum of Eq.4 for all views. Then we can get the following objective function:
(5) 
However, different views make different contributions to the objective value in Eq.5. Some adversarial views may make negative contribution to the final lowdimensional representations. Therefore, it is rationale to treat these views discriminatively. We proposed different weighting factors to these views while learned the refinement of the lowdimensional representations. Therefore, the selfweighted learning strategy have been proposed below:
(6) 
where . is a tradeoff between these two terms above. ensues that all views have particular contributions [Xia2010] to the final lowdimensional representations . Otherwise, only one entry in will be while the other entries will be zero. The second term in Eq.6 minimize the th power of the  norm for , which can also make to be as nonsparse as possible. The rationale is achieves its minimum when with respect to . Therefore, the second term in Eq.6 can further promote the participation for all views. All these two tricks can equip these views with different weights according to their contributions.
According to Eq.6, we can obtain the lowdimensional representations simultaneously. However, the construction process of each cannot learn from the information from the other views. Even though we have set different views with different weights, the learned are equal to those ones in Eq.4. Finally, we proposed a coregularized term to help all views to learn from each other.
Ii3 Minimize Divergence between Different Views by Coregularized Term
Multiview learning aims to enable all views to learn from each other to improve the overall performance, it is essential for KMSA to develop a method to integrate compatible and complementary information from all views. Some researchers [kumar2011co] have attempted to minimize the divergence between lowdimensional representations via various coregularized terms, which can facilitate the information to transfer across views. However, we cannot get the lowdimensional representations directly through Eq.6 which prevents us from utilizing those methods without some modifications.
Because is the coefficient matrix to reconstruct the lowdimensional representations, each column of can be regard as a coding of the original samples. Therefore, KMSA attempts to minimize the divergence between the two cofficient matrices from each pair of views as follows:
(7) 
We define and is a graph which contains relationships between all features in the th view. The th row with the th column element in is equal to . Minimizing Eq.7 urges each two views to learn from each other and bridge the gap between them. Furthermore, can be replaced with through mathematical deductions [kumar2011co]. And we can utilized Eq.7 as one regularized term in our proposed KMSA in the following content.
Ii4 Overall Objective Function
Based on the above, we proposed our objective function as follows:
(8) 
where is a negative constant. It is notable that and are learned automatically by considering both the graph for each view and multiple views correlations, can get better solutions. It has 2 advantages as follows:

can better reflect the influence of the regularized term between these two views. Compared with our proposed KMSA, some multiview learning methods [kumar2011co] have parameters to be set. This matter could get even worse with the number increase of views. Fortunately, only one parameter need to be set for KMSA, which can better balance the influence of the coregularized term.

The learning process of fully considers the correlations between different views. Minimizing Eq.8 means that some similar views will get larger weights, the obtained lowdimensional representations will inclined to be consistent views while avoiding the disturbance of some adversarial views as Fig.2.
We can obtain the lowdimensional representations for these views as . can be calculated by Eq.8 with eigenvalue decomposition.
Iia Optimizaion Process for KMSA
In this section, we provide the optimization process for KMSA. We develop an alternating optimization strategy, which separates the problem into several subproblems such that each subproblem is tractable. That is, we alternatively update each variable when fixing others. We summarized the optimization process in Algorithm 1.
Updating : By fixing all variables but , Eq.8 will reduce to the following equation without considering the constant additive and scaling terms:
(9) 
which has a feasible solution and can be transformed according to the operational rules of matrix trace as follows:
(10) 
We set . Therefore, with the constraint , the optimal can be solved by generalized eigendecomposition as .
consists of eigenvectors which corresponds to the smallest
eigenvalues. And can be calculated by the above procedure to update themselves.Updating : After are fixed as above, how to update is the main purpose in this part. By using a Lagrange multiplier to take the constraint into consideration, we get the Lagrange function as
(11) 
Calculating the derivative of with respect to and to zero, we can get
(12) 
where
(13) 
Because , we can further transformed as
(14) 
where . Therefore, we can got as
(15) 
It is notable that the value of () can directly influence the weighting factor . And we analysis the influece as follow:

If infinitely approaches , there is only one nonzero element , and is the smallest among all views.

Conversely, if is infinite, all elements in tend to be equal to .
After the are obtained, the lowdimensional representations for the th view can be calculated as 16:
(16) 
IiB One May Doubt Whether KMSA Converges
Because our proposed KMSA is solved by alternating optimization strategy, it’s essential to analysis the convergence of it.
Theorem 1. The objective function in Eq.8 is bounded. The proposed optimization algorithm monotonically decreases the value of in each step.
Lower Bound: It is easy to see that there must exist one view (assumed as the th view) which can make to be smallest among all views. Furthermore, there must exist two views (the th and th views) which can make to be largest among all pair of views. Because , it is provable that . Therefore, has a lower bound.
Monotone Decreasing: During the optimization process, eigenvalue decomposition are adopted to solve the . Assume is calculated after the th main iterations. Because the solving method is based on eigenvalue decomposition, only the eigenvectors which corresponds to the smallest th eigenvalues are maintained in . Therefore, in the process of updating during the th main iteration, it always true that
(17) 
where is a constant because all the other variables are remain unchanged. And are the smallest eigenvalues of . Furthermore, the solution method of adopts gadient descent which always updates to make smaller.
Convergence Explanation: Denote the value of as , and let be a sequence generated by the th main iteration of the propoed optimization, and is a bounded below monotone decreasing sequence based on the above theorem. Therefore, according to the bounded monotone convergence theorem [rudin1976principles] that asserts the convergence of every bounded monotone sequence, the proposed optimization algorithm converges.
Meanwihle, in order to further show the convergence of the proposed KMSA, we provide a figure to give the objective function values with the iterations. We extended LDA and PCA into multiview mode using KMSA and named them as KMSALDA and KMSAPCA. We recorded the objective function values with the number of iterations for these 2 methods on Corel1K, Caltech101 and ORL datasets as Fig.6.
It can be seen that both the objective function values of KMSALDA and KMSAPCA decrease with the iteration increasing. And the objective function values tend to be stable after 1012 iterations. It can verify that our proposed KMSA converges once enough iterations are finished.
IiC Extend Various DR Algorithms by KMSA
To facilitate related research, we provided the typical methods of and of DR algorithms as follows:
1. PCA: . and
2. LPP: if or in the th view, and . is a diagonal matrix and is the sum of all elements in the th line of .
3. LDA: , and , where is the label of the th view. is the number of samples in the th class. if , otherwise .
4. SPP: , and is constructed by sparse representation [qiao2010sparsity].
Iii Experiment
In order to verify the excellent performance of our proposed framework, we conduct several experiments on image retrieval (including Corel1K ^{1}^{1}1https://sites.google.com/site/dctresearch/Home/contentbasedimageretrieval, Corel5K and Holidays ^{2}^{2}2http://lear.inrialpes.fr/ jegou/data.php) and image classification (including Caltech101 ^{3}^{3}3http://www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html, ORL ^{4}^{4}4https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html datasets) and 3Sources ^{5}^{5}5http://http://erdos.ucd.ie/datasets/3sources.html. In this section, we first introduced the datails of the utilized datasets and comparing methods in IIIA. Then, we shown the experiments in IIIB and IIIC. Various experiments have shown the excellent performance of our proposed methods.
Iiia Datasets and Comparing Methods
We introduced the utilized datasets and comparing methods in this section. We conducted our experiments on image retrieval and multiview data classification. Corel1K, Corel5K and Holidays are utilized for image retrieval while Caltech101, ORL and 3Sources are utilized for multiview data classification. And the details information of the utilized datasets are listed as follows:
Corel1K is a specific image dataset for image retrieval. It contains 1000 images from 10 categories, including bus, dinosaur, beach, flower, etc.. And there are 100 images in each category.
Corel5K is an extension version of Corel1K for image retrieval. It contains 5000 images from 50 categories which contains the images from Corel1K and some other images. Each category contains 100 images respectively.
Holidays contains 1491 images corresponding to 500 categories, which are mainly captured from various sceneries. Holidays dataset is utilized for the experiment of image retrieval.
Caltech101 consists of 9145 images which are corresponding to 101 object categories and one backgroud one. It is a benchmark image dataset for image classification.
ORL is a face dataset for classification. It consists of 400 faces corresponding to 40 peoples. Each people has 10 face images which are captured under different situations.
3Sources was collected from 3 wellknown online news sources: BBC, Reuters and the Guardian. Each source was treated as one view. 3Sources consists of 169 news in total.
We summarized the information of all views for thses datasets as follows in TABLE.III. In our experiment, we utilized several famous multiview subspace learning algorithms as comparing methods, including MDcR [zhang2017flexible], MSE [Xia2010], PCAFC [jolliffe2011principal], GMA [sharma2012generalized], CCA [michaeli2016nonparametric] and MvDA [kan2016multi]. It should be noticed that GMA can also extend some DR methods into multiview mode. In this paper, we utilized GMA to represent the multiview extension of PCA. Meanwhile, PCAFC is a method which concatenates multiview data into one vector and utilizes PCA to obtain the lowdimensional representation. For the proposed KMSA, we set , and in our experiments.


Dataset  View 1  View 2  View 3 
Corel1K  MSD  Gist  HOG 
Corel5K  MSD  Gist  HOG 
Holidays  MSD  Gist  HOG 
Caltech101  MSD  Gist  HOG 
ORL  GSI  LBP  EDH 
3Sources  BBC  Reuters  Guardian 

IiiB Image Retrieval
In this section, we conducted experiments on Corel1K, Corel5K and Holidays datasets for image retrieval.
For Corel1K dataset, we randomly selected 100 images as queries (each class has 10 images) while the other images were assigned as galleries. MSD [liu2011image] , Gist [oliva2001modeling] and HOG [dalal2005histograms] are utilized to extract different features for multiple views. We utilized all methods to project multiview features into a 50 dimensional subspace and adopted distance for image retrieval. All experiments were conducted on the lowdimensional representations from the best view. We repeated the experiment 20 times and calculated the mean values of Precision (P), Recall (R) and F1Measure (F1). The results are shown as Fig.15.
[width=5.8em]MethodsCriteria  MDcR  MSE  PCAFC  GMA  CCA  MvDA  CoRegu  KMSAPCA  KMSALDA 
Precision  77.69  77.48  62.84  77.91  77.07  80.24  78.04  78.84  80.73 
Recall  60.05  59.81  48.49  60.14  59.36  61.91  60.06  60.58  62.21 
mAP  89.08  88.74  77.22  89.22  88.43  90.02  88.92  89.64  90.77 
F1Measure  33.87  33.75  27.37  33.94  33.53  34.95  33.94  34.26  35.14 
It is clear that KMSAPCA can achieve better performance than the other unsupervised multiview algorithms. Meanwhile, KMSALDA outperforms MvDA. It can been shown that KMSA is an ideal framework to extend DR algorithms into multiview case and achieve better performance. Furthermore, even though PCAFC concatenates all multiple views into one single vector, it cannot achieve good performance becasue PCA is an essentially singleview method.
For Corel5K dataset, we randomly selected 500 images as queries (each class has 10 images) while the other images were assigned as galleries. MSD [liu2011image] , Gist [oliva2001modeling] and HOG [dalal2005histograms] are also utilized as the descriptors to extract features for multiple views. We utilized all methods to project multiview features into a 50 dimensional subspace and adopted distance to finish the task of image retrieval. The experiment settings are the same as the last one on Corel1K. And the results have been shown as Fig.20.
As can been seen in Fig.20 that KMSALDA outperforms all the other methods in most situations. Meanwhile, as unsupervised method, the performance of KMSAPCA is better. Furthermore, MDcR and CoRegu [kumar2011co] are another two good methods. PCAFC performs worst because it cannot fully exploit information from multiview data.
For Holidays dataset, there are 3 images in one class. For each class, we randomly selected 1 images as query, with the other 2 images as galleries. MSD [liu2011image] , Gist [oliva2001modeling] and HOG [dalal2005histograms] are exploited to extract different features for multiple views. All methods were conducted to project multiview features into a 50 dimensional subspace. The experiments were conducted 20 times and we calculated the mean values of those indices in table IV:
Through Table IV, we can also find that KMSAPCA and KMSALDA can achieve best performance in most situations. CoRegu and MvDA can also obtain good results. Since PCAFC is a singleview method, it achieves the worst performance.
IiiC Classification for Multiview Data
In this section, we conducted experiments for classification on 3 datasets (including Caltech 101, ORL and 3Sources) to verify the effectiveness of our proposed method.
For Caltech 101 dataset, we randomly selected 30 and 50 samples as training ones while the other samples are assigned as the testing ones. MSD [liu2011image] , Gist [oliva2001modeling] and HOG [dalal2005histograms] are utilized to extract different features for multiple views. All the methods are utilized to project multiview features into subspaces with different dimensions (
). 1NN is utilized to classify the testing samples. This experiment has been conducted for 20 times and the mean results of all methods are shown as Fig.
23.Percentage  Dim  MDcR  MSE  PCAFC  GMA  CCA  MvDA  CoRegu  KMSAPCA  KMSALDA 
30  10  58.10  63.25  60.23  56.19  62.50  64.52  60.48  64.16  67.42 
20  68.45  73.86  67.19  65.83  72.26  77.26  67.86  74.56  77.03  
30  71.19  78.31  74.33  70.83  77.26  84.20  74.52  79.44  84.55  
50  10  68.69  70.22  72.50  72.50  72.83  76.50  64.67  74.23  78.64 
20  79.44  81.58  79.83  79.50  82.33  87.17  76.67  83.73  87.28  
30  83.33  87.27  84.00  83.67  85.83  90.17  80.50  87.50  92.49 
For ORL dataset, we also randomly selected 30 and 50 samples as training ones. GrayScale Intensity, LBP [Ojala2002] and EDH [gao2008image] are utilized as 3 views. The operations for this experiment are same with those ones for Caltech 101. 1NN is utilized as the classifier. We conducted this experiment for 20 times and the mean classification results with different dimensions can be found in TABLE.V.
It can be seen in Fig.23 and TABLE V that with increase of dimension, the performance of all methods get better. KMSALDA is better than MvDA while KMSAPCA is the best unsupervised multiview method in our experiment. This is because our proposed framework KMSA can better exploit the information from the multiview data to learn ideal subspaces.
For 3Sources dataset, we also randomly selected 30 and 50 samples as training ones. It is a benchmark multiview dataset which consists of 3 views. We utilized all the methods to construct the 30dimensional representations and adopted 1NN to classify the testing ones. The boxplot figures are shown as Fig.26. All the experiments above can verify the superior performance of our proposed KMSA. It can extend different DR methods into multiview mode. Through the experiment results, KMSALDA is better than MvDA. KMSAPCA outperforms the other unsupervised methods in most situations.
Iv Conclusion
In this paper, we proposed a generalized multiview graph embedding framework named kernelized multiview subspace analysis (KMSA). KMSA deals with multiview data in kernel space to fully exploit the data representations within multiviews. Meanwhile, it adopts the coregularized term to minimize the divergence among views, while utilizing a selfweighted strategy to learn the weights for all views, which combines selfweighted learning with coregularized term, to deeply exploit the information from multiview data. We have conducted various experiments on 6 datasets for multiview data classification and image retrieval. The experiments have verified that our proposed KMSA can achieve the superiority than other multiview based methods.
Acknowledgment
We would like to thank the anonymous reviewers for their valuable comments and suggestions to significantly improve the quality of this paper. Yang Wang is supported by National Natural Science Foundation of China with Grant No 61806035. This work is also supported by the National Natural Science Foundation of China Grant 61370142 and Grant 61272368, by the Postdoctoral Science Foundation, No. 3620080307, by the Fundamental Research Funds for the Central Universities Grant 3132016352, by the Fundamental Research of Ministry of Transport of P. R. China Grant 2015329225300.
Comments
There are no comments yet.