|Data driven||Self-weighted Learning||Framework|
With the development of information technology, we have witnessed a surge of techniques to describe the same sample from multiple views [Xia2010, guo2018partial, LiG2012, wang2017unsupervised, Yang2017TCYB] . Commonly, multiview data generated from various descriptors [wei2018glad] or sensors are commonly seen in real-world applications [XuW2018, cao2015diversity, zhang2017latent], which hastens the related research on multiview learning [zhang2016deep]. For example, one image can always be represented by different descriptors, such as Local Binary Patters (LBP) [Ojala2002], Scale-Invariant Feature Transform (SIFT) [rublee2011orb], Histogram [dalal2005histograms] and Locality-constrained Linear Coding (LLC) [wang2010locality] etc. For text analysis [dong2018predicting], documents can be written in different languages [bisson2012co]. Notably, multi-view data may share consistent correlation information [wang2015robust, wang2016iterative, wang2018multiview], which is crucial to greatly promote the performance of related tasks [dhillon2011multi, li2002statistical, zhang2018generalized, liu2018late].
Nowadays, multiview dimension reduction (DR) methods have been well studied in many applications [wu2018and, nie2018auto, nie2018multiview]. In particular, Kumar et al. [kumar2011co] proposed a multiview spectral embedding approach by introducing a co-regularized framework which can narrow the divergence between graphs from multiple views. Xia et al. [Xia2010] introduced an auto-weighted method to construct common low-dimensional representations for multiple views, which has achieved good performance for image retrieval and clustering. Wang et al [wang2018multiview] exploited the consensus of multiview structures beyond the low rankness to construct low-dimensional representations for multiview data, to boost the clustering performance. Kan et al. [kan2016multi] extend Linear Discriminant Analysis (LDA) [izenman2013linear] to Multiview Discriminant Analysis (MvDA) which updates the projection matrices for all views through an iterative procedure. Luo et al. [luo2015tensor]
proposed a tensor CCA to deal with multiview data in general tensor form. Tensor CCA is an extension of CCA[michaeli2016nonparametric] which has achieved the ideal performances in many applications. Zhang et al. [zhang2017flexible] proposed a novel method to flexibly exploit the complementary information between multiple views on the stage of dimension reduction, meanwhile preserving the similarity of the data points across different views.
Up to now, most of the multiview DR methods [kumar2011co, Xia2010, nie2018auto] are graph-based approaches [cui2017general] which care more about data correlations, while overlooked the information regarding the multiview data. Likewise, such limitations hold for abundant research[kumar2011co, nie2018auto]. To name a few typical work below: Multiview Spectral Embedding (MSE) [Xia2010] is an extension of Laplacian Eigenmaps (LE) [belkin2002laplacian] and considers the laplacian graphs between multiview data rather than the information within the data representation. Kumar et al. [kumar2011co] also exploited only the information within the laplacian graphs and utilized a co-regularized term to minimize the divergence between different views. However, this method failed to exploit the information within the multiview data representation. Even though there are some approaches, such as MvDA [izenman2013linear], CCA [michaeli2016nonparametric], etc., can fully consider the original multiview data and extend traditional DR [mika1999fisher] to the multiview version, they failed to provide a general framework to most of the DR approaches. Therefore, how to construct a general framework to integrate features from multiple views to construct low-dimensional representations, while achieve the ideal performance is the goal.
In this paper, we aim to develop a unified framework to project multiview data into a low-dimensional subspace. Our proposed KMSA is equipped with a self-weighted learning method to make different weights for multiple views according to their contributions. We also discuss the influence of the parameter in KMSA for the learned weights of multiple views in . Furthermore, KMSA adopts the co-regularized term to minimize divergence between each two views, which can encourage all views to learn from each other. The construction process of KMSA has been shown as Fig.1. We compare the proposed KMSA with some typical methods in TABLE I.
We remark that Yan et al. [yan2007graph] proposed a framework of dimension reduction techniques. Different from that, our proposed KMSA extend it into kernel space with multi-views to address the problem that are caused by different dimensions of features from multiple views. Then, KMSA adopts a self-weighted learning trick to make different weights to these views according to their contributions. Finally, KMSA is equipped with a co-regularized term to minimize the divergence between different views, so as to achieve the multi-view consensus.
|set of all features in the th view|
|set of all low-dimensional representations in the th view|
|the th feature in the th view|
|the low-dimensional representation for|
|the dimension of features in the th view|
|the spares relationships for the th feature in the th view|
|the projection direction for the th view|
|the sparse reconstructive matrix for features in the th view|
|kernel matrix for features in the th view|
|coefficients matrix for the th view|
coefficients vector for theth view
|the weighting factor for the th view|
|the power exponent for the weight|
|the constraint matrix for the th view|
The major contributions of this paper are summarized as follows:
we developed a novel framework named KMSA for the task of multiview dimension reduction. We discussed that most of the eigen-decomposition-based DR methods [jolliffe2011principal, he2004locality] can be extended to the corresponding multiview versions throughout KMSA.
KMSA fully considers both the singleview graph correlations between multiple views to calculate the importance of all views, which attempts to combine self-weighted learning with co-regularized term, so as to deeply exploit the information from multiview data.
We discussed the details of the optimization process for KMSA, the results have shown that our proposed method can achieve the state-of-the-art performance.
Ii Kernel-based Multiview Embedding with Self-weighted Learning
In this section, we discuss the intuition of our proposed method named KMSA.
Assume we are given a multiview dataset which consists of samples from views, where contains all features from the th view. is the dimensions of features from the th view. is the number of training samples. The goal of KMSA is to construct an appropriate architecture to obtain low-dimensional representations for the original multiview data, where . Notations utilized in this paper are summarized in Table II.
Ii-1 Kernelization for Multiview Data
The proposed KMSA extended singleview DR method into kernel spaces which provides a feasible way for direct manipulations on the multiview data rather than similarity graphs. Before taking kernel space into considerations, KMSA exploits the heterogenous information for each view as follows:
where is the projection vector. is the correlation between and in the th view. or according to their respective different constraints of various dimension reduction algorithms. Most algorithms can be generated automatically by using different construction tricks of and , which has been illustrated in [yan2007graph]. can be further expressed as according to the mathematical transformation [yan2007graph] and , where is the diagonal matrix and . In order to facilitate KMSA to handle multiview data, we project all feature representations into kernel space as . is a nonlinear mapping function. contains the features which have been mapped into the kernel space .
Then, we extend Eq.1 into the kernel representation as follow:
where is the projection direction of and locates in the space spanned by . Consequently, can be replaced with . Then, Eq.2 can be further modified as follows:
is the kernel matrix which is symmetric and . or which is corresponding to the setting of . Therefore, if we want to obtain an optimal subspace with dimensions, can be utilized to construct the subspace, corresponding to the largest
positive eigenvalues of, which is equivalent to find the coefficients matrix as follows:
The low-dimensional representations of original are . Even though we can extend DR methods into the kernel space to avoid the problem that the dimensions of features from multiple views are different from each other, the construction procedures of are still independent and waste a lot of information from the other views.
Ii-2 Self-weighted Learning of the Weights for Multiple Views
In order to integrate information from multiple views, the most straightforward way is to minimize the sum of Eq.4 for all views. Then we can get the following objective function:
However, different views make different contributions to the objective value in Eq.5. Some adversarial views may make negative contribution to the final low-dimensional representations. Therefore, it is rationale to treat these views discriminatively. We proposed different weighting factors to these views while learned the refinement of the low-dimensional representations. Therefore, the self-weighted learning strategy have been proposed below:
where . is a trade-off between these two terms above. ensues that all views have particular contributions [Xia2010] to the final low-dimensional representations . Otherwise, only one entry in will be while the other entries will be zero. The second term in Eq.6 minimize the th power of the - norm for , which can also make to be as non-sparse as possible. The rationale is achieves its minimum when with respect to . Therefore, the second term in Eq.6 can further promote the participation for all views. All these two tricks can equip these views with different weights according to their contributions.
According to Eq.6, we can obtain the low-dimensional representations simultaneously. However, the construction process of each cannot learn from the information from the other views. Even though we have set different views with different weights, the learned are equal to those ones in Eq.4. Finally, we proposed a co-regularized term to help all views to learn from each other.
Ii-3 Minimize Divergence between Different Views by Co-regularized Term
Multiview learning aims to enable all views to learn from each other to improve the overall performance, it is essential for KMSA to develop a method to integrate compatible and complementary information from all views. Some researchers [kumar2011co] have attempted to minimize the divergence between low-dimensional representations via various co-regularized terms, which can facilitate the information to transfer across views. However, we cannot get the low-dimensional representations directly through Eq.6 which prevents us from utilizing those methods without some modifications.
Because is the coefficient matrix to reconstruct the low-dimensional representations, each column of can be regard as a coding of the original samples. Therefore, KMSA attempts to minimize the divergence between the two cofficient matrices from each pair of views as follows:
We define and is a graph which contains relationships between all features in the th view. The th row with the th column element in is equal to . Minimizing Eq.7 urges each two views to learn from each other and bridge the gap between them. Furthermore, can be replaced with through mathematical deductions [kumar2011co]. And we can utilized Eq.7 as one regularized term in our proposed KMSA in the following content.
Ii-4 Overall Objective Function
Based on the above, we proposed our objective function as follows:
where is a negative constant. It is notable that and are learned automatically by considering both the graph for each view and multiple views correlations, can get better solutions. It has 2 advantages as follows:
can better reflect the influence of the regularized term between these two views. Compared with our proposed KMSA, some multiview learning methods [kumar2011co] have parameters to be set. This matter could get even worse with the number increase of views. Fortunately, only one parameter need to be set for KMSA, which can better balance the influence of the co-regularized term.
The learning process of fully considers the correlations between different views. Minimizing Eq.8 means that some similar views will get larger weights, the obtained low-dimensional representations will inclined to be consistent views while avoiding the disturbance of some adversarial views as Fig.2.
We can obtain the low-dimensional representations for these views as . can be calculated by Eq.8 with eigenvalue decomposition.
Ii-a Optimizaion Process for KMSA
In this section, we provide the optimization process for KMSA. We develop an alternating optimization strategy, which separates the problem into several subproblems such that each subproblem is tractable. That is, we alternatively update each variable when fixing others. We summarized the optimization process in Algorithm 1.
Updating : By fixing all variables but , Eq.8 will reduce to the following equation without considering the constant additive and scaling terms:
which has a feasible solution and can be transformed according to the operational rules of matrix trace as follows:
We set . Therefore, with the constraint , the optimal can be solved by generalized eigen-decomposition as .
consists of eigenvectors which corresponds to the smallesteigenvalues. And can be calculated by the above procedure to update themselves.
Updating : After are fixed as above, how to update is the main purpose in this part. By using a Lagrange multiplier to take the constraint into consideration, we get the Lagrange function as
Calculating the derivative of with respect to and to zero, we can get
Because , we can further transformed as
where . Therefore, we can got as
It is notable that the value of () can directly influence the weighting factor . And we analysis the influece as follow:
If infinitely approaches , there is only one non-zero element , and is the smallest among all views.
Conversely, if is infinite, all elements in tend to be equal to .
After the are obtained, the low-dimensional representations for the th view can be calculated as 16:
Ii-B One May Doubt Whether KMSA Converges
Because our proposed KMSA is solved by alternating optimization strategy, it’s essential to analysis the convergence of it.
Theorem 1. The objective function in Eq.8 is bounded. The proposed optimization algorithm monotonically decreases the value of in each step.
Lower Bound: It is easy to see that there must exist one view (assumed as the th view) which can make to be smallest among all views. Furthermore, there must exist two views (the th and th views) which can make to be largest among all pair of views. Because , it is provable that . Therefore, has a lower bound.
Monotone Decreasing: During the optimization process, eigenvalue decomposition are adopted to solve the . Assume is calculated after the -th main iterations. Because the solving method is based on eigenvalue decomposition, only the eigenvectors which corresponds to the smallest -th eigenvalues are maintained in . Therefore, in the process of updating during the -th main iteration, it always true that
where is a constant because all the other variables are remain unchanged. And are the smallest eigenvalues of . Furthermore, the solution method of adopts gadient descent which always updates to make smaller.
Convergence Explanation: Denote the value of as , and let be a sequence generated by the -th main iteration of the propoed optimization, and is a bounded below monotone decreasing sequence based on the above theorem. Therefore, according to the bounded monotone convergence theorem [rudin1976principles] that asserts the convergence of every bounded monotone sequence, the proposed optimization algorithm converges.
Meanwihle, in order to further show the convergence of the proposed KMSA, we provide a figure to give the objective function values with the iterations. We extended LDA and PCA into multiview mode using KMSA and named them as KMSA-LDA and KMSA-PCA. We recorded the objective function values with the number of iterations for these 2 methods on Corel1K, Caltech101 and ORL datasets as Fig.6.
It can be seen that both the objective function values of KMSA-LDA and KMSA-PCA decrease with the iteration increasing. And the objective function values tend to be stable after 10-12 iterations. It can verify that our proposed KMSA converges once enough iterations are finished.
Ii-C Extend Various DR Algorithms by KMSA
To facilitate related research, we provided the typical methods of and of DR algorithms as follows:
1. PCA: . and
2. LPP: if or in the th view, and . is a diagonal matrix and is the sum of all elements in the th line of .
3. LDA: , and , where is the label of the th view. is the number of samples in the th class. if , otherwise .
4. SPP: , and is constructed by sparse representation [qiao2010sparsity].
In order to verify the excellent performance of our proposed framework, we conduct several experiments on image retrieval (including Corel1K 111https://sites.google.com/site/dctresearch/Home/content-based-image-retrieval, Corel5K and Holidays 222http://lear.inrialpes.fr/ jegou/data.php) and image classification (including Caltech101 333http://www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html, ORL 444https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html datasets) and 3Sources 555http://http://erdos.ucd.ie/datasets/3sources.html. In this section, we first introduced the datails of the utilized datasets and comparing methods in III-A. Then, we shown the experiments in III-B and III-C. Various experiments have shown the excellent performance of our proposed methods.
Iii-a Datasets and Comparing Methods
We introduced the utilized datasets and comparing methods in this section. We conducted our experiments on image retrieval and multiview data classification. Corel1K, Corel5K and Holidays are utilized for image retrieval while Caltech101, ORL and 3Sources are utilized for multiview data classification. And the details information of the utilized datasets are listed as follows:
Corel1K is a specific image dataset for image retrieval. It contains 1000 images from 10 categories, including bus, dinosaur, beach, flower, etc.. And there are 100 images in each category.
Corel5K is an extension version of Corel1K for image retrieval. It contains 5000 images from 50 categories which contains the images from Corel1K and some other images. Each category contains 100 images respectively.
Holidays contains 1491 images corresponding to 500 categories, which are mainly captured from various sceneries. Holidays dataset is utilized for the experiment of image retrieval.
Caltech101 consists of 9145 images which are corresponding to 101 object categories and one backgroud one. It is a benchmark image dataset for image classification.
ORL is a face dataset for classification. It consists of 400 faces corresponding to 40 peoples. Each people has 10 face images which are captured under different situations.
3Sources was collected from 3 well-known online news sources: BBC, Reuters and the Guardian. Each source was treated as one view. 3Sources consists of 169 news in total.
We summarized the information of all views for thses datasets as follows in TABLE.III. In our experiment, we utilized several famous multiview subspace learning algorithms as comparing methods, including MDcR [zhang2017flexible], MSE [Xia2010], PCAFC [jolliffe2011principal], GMA [sharma2012generalized], CCA [michaeli2016nonparametric] and MvDA [kan2016multi]. It should be noticed that GMA can also extend some DR methods into multiview mode. In this paper, we utilized GMA to represent the multiview extension of PCA. Meanwhile, PCAFC is a method which concatenates multiview data into one vector and utilizes PCA to obtain the low-dimensional representation. For the proposed KMSA, we set , and in our experiments.
|Dataset||View 1||View 2||View 3|
Iii-B Image Retrieval
In this section, we conducted experiments on Corel1K, Corel5K and Holidays datasets for image retrieval.
For Corel1K dataset, we randomly selected 100 images as queries (each class has 10 images) while the other images were assigned as galleries. MSD [liu2011image] , Gist [oliva2001modeling] and HOG [dalal2005histograms] are utilized to extract different features for multiple views. We utilized all methods to project multiview features into a 50 dimensional subspace and adopted distance for image retrieval. All experiments were conducted on the low-dimensional representations from the best view. We repeated the experiment 20 times and calculated the mean values of Precision (P), Recall (R) and F1-Measure (F1). The results are shown as Fig.15.
It is clear that KMSA-PCA can achieve better performance than the other unsupervised multiview algorithms. Meanwhile, KMSA-LDA outperforms MvDA. It can been shown that KMSA is an ideal framework to extend DR algorithms into multiview case and achieve better performance. Furthermore, even though PCAFC concatenates all multiple views into one single vector, it cannot achieve good performance becasue PCA is an essentially singleview method.
For Corel5K dataset, we randomly selected 500 images as queries (each class has 10 images) while the other images were assigned as galleries. MSD [liu2011image] , Gist [oliva2001modeling] and HOG [dalal2005histograms] are also utilized as the descriptors to extract features for multiple views. We utilized all methods to project multiview features into a 50 dimensional subspace and adopted distance to finish the task of image retrieval. The experiment settings are the same as the last one on Corel1K. And the results have been shown as Fig.20.
As can been seen in Fig.20 that KMSA-LDA outperforms all the other methods in most situations. Meanwhile, as unsupervised method, the performance of KMSA-PCA is better. Furthermore, MDcR and Co-Regu [kumar2011co] are another two good methods. PCAFC performs worst because it cannot fully exploit information from multiview data.
For Holidays dataset, there are 3 images in one class. For each class, we randomly selected 1 images as query, with the other 2 images as galleries. MSD [liu2011image] , Gist [oliva2001modeling] and HOG [dalal2005histograms] are exploited to extract different features for multiple views. All methods were conducted to project multiview features into a 50 dimensional subspace. The experiments were conducted 20 times and we calculated the mean values of those indices in table IV:
Through Table IV, we can also find that KMSA-PCA and KMSA-LDA can achieve best performance in most situations. Co-Regu and MvDA can also obtain good results. Since PCAFC is a singleview method, it achieves the worst performance.
Iii-C Classification for Multiview Data
In this section, we conducted experiments for classification on 3 datasets (including Caltech 101, ORL and 3Sources) to verify the effectiveness of our proposed method.
For Caltech 101 dataset, we randomly selected 30 and 50 samples as training ones while the other samples are assigned as the testing ones. MSD [liu2011image] , Gist [oliva2001modeling] and HOG [dalal2005histograms] are utilized to extract different features for multiple views. All the methods are utilized to project multiview features into subspaces with different dimensions (
). 1NN is utilized to classify the testing samples. This experiment has been conducted for 20 times and the mean results of all methods are shown as Fig.23.
For ORL dataset, we also randomly selected 30 and 50 samples as training ones. Gray-Scale Intensity, LBP [Ojala2002] and EDH [gao2008image] are utilized as 3 views. The operations for this experiment are same with those ones for Caltech 101. 1NN is utilized as the classifier. We conducted this experiment for 20 times and the mean classification results with different dimensions can be found in TABLE.V.
It can be seen in Fig.23 and TABLE V that with increase of dimension, the performance of all methods get better. KMSA-LDA is better than MvDA while KMSA-PCA is the best unsupervised multiview method in our experiment. This is because our proposed framework KMSA can better exploit the information from the multiview data to learn ideal subspaces.
For 3Sources dataset, we also randomly selected 30 and 50 samples as training ones. It is a benchmark multiview dataset which consists of 3 views. We utilized all the methods to construct the 30-dimensional representations and adopted 1NN to classify the testing ones. The boxplot figures are shown as Fig.26. All the experiments above can verify the superior performance of our proposed KMSA. It can extend different DR methods into multiview mode. Through the experiment results, KMSA-LDA is better than MvDA. KMSA-PCA outperforms the other unsupervised methods in most situations.
In this paper, we proposed a generalized multiview graph embedding framework named kernelized multiview subspace analysis (KMSA). KMSA deals with multiview data in kernel space to fully exploit the data representations within multi-views. Meanwhile, it adopts the co-regularized term to minimize the divergence among views, while utilizing a self-weighted strategy to learn the weights for all views, which combines self-weighted learning with co-regularized term, to deeply exploit the information from multiview data. We have conducted various experiments on 6 datasets for multiview data classification and image retrieval. The experiments have verified that our proposed KMSA can achieve the superiority than other multi-view based methods.
We would like to thank the anonymous reviewers for their valuable comments and suggestions to significantly improve the quality of this paper. Yang Wang is supported by National Natural Science Foundation of China with Grant No 61806035. This work is also supported by the National Natural Science Foundation of China Grant 61370142 and Grant 61272368, by the Postdoctoral Science Foundation, No. 3620080307, by the Fundamental Research Funds for the Central Universities Grant 3132016352, by the Fundamental Research of Ministry of Transport of P. R. China Grant 2015329225300.