The problem of identifying significant outliers is an essential and challenging issue in statistical machine learning for multiple sources data analysis. The atypical objects or outliers, data that cause surprise in relation to the majority of the data, often occur in the real data. Outliers may be right, but we need to examine for transcription errors. They can play havoc with classical statistical methods(Gogoi et al., 2011)
. Once a statistical approach is applied to imaging genetics data containing outliers, the results can be deceptive with high probability. To overcome this problem, sincemany robust methods have been developed, which are less sensitive to outliers. The goals of robust statistics are to use the methods from the bulk of the data and identify the points deviating from the original patterns for further investment (Huber and Ronchetti, 2009, Hampel et al., 2011, Naser and N. A. Hamzah, 2012)
. But it is well-known that most robust methods are computationally intensive and have the curse of dimensionality problem. The outliers need to be removed or downweighted prior to fitting non-robust statistical or machine learning approaches(Filzmoser et al., 2008, Oh and Gao, 2009, Roth, 2006).
The incorporation of various unsupervised learning methods into genomic analysis is a rather recent topic. Using the dual representations, the task of learning from multiple data sources is related to the kernel-based data integration, which has been actively studied in the last decade(Hofmann et al., 2008, Alam, 2014)
. Kernel fusion in unsupervised learning has a close connection with unsupervised kernel methods. As unsupervised kernel methods, kernel principal component analysis(Schölkopf et al., 1998, Alam and Fukumizu, 2014), kernel canonical correlation analysis (Akaho, 2001, Alam and Fukumizu, 2015, 2013), weighted multiple kernel CCA have been extensively studied for decades (S. Yu and Moreau, 2011). But these methods are not robust; they are sensitive to contaminated data. To apply all of these non-robust methods, for instance in genomics, outliers identification and/ or robust approaches are essential.
Due to the properties of eigen decomposition, kernel CCA is still a well-applied method for multiple sources data analysis and integration. An empirical comparison and sensitivity analysis for robust linear CCA and kernel CCA were also discussed, which give similar interpretation as kernel PCA without any theoretical results (Alam et al., 2010, 2008). In addition, (Romanazzi, 1992) and (Alam et al., 2016) have proposed the IF of canonical correlation and kernel CCA but the IF of multiple kernel CCA has not been studied. All of these considerations motivate us to conduct studies on the IF of multiple kernel CCA to identify outliers in imaging genetics data sets: SNP, fMRI, and DNA methylation.
The contribution of this paper is fourfold. We address the IF of kernel mean element (kernel ME), kernel covariance operator (kernel CO), kernel cross-covariance operator (kernel CCO), kernel canonical correlation analysis (kernel CCA) and multiple kernel CCA. After that, we propose the IF of multiple kernel CCA, which can be applied for more than two datasets. Based on this results, we propose a visualization method to detect influential observations of multiple sources data based. The proposed method is capable of analyzing the outliers usually found in biomedical application, in which the number of dimension is large. To confirm the outliers, we use the step-and-leaf display. The results imply that the proposed method enables to identify outliers in synthesized and imaging genetics data (e.g., SNP, fMRI, and DNA methylation).
The remainder of the paper is organized as follows. In the next section, we provide a brief review of kernel ME, kernel CO, and kernel CCO. In Section , we discuss in brief the IF, IF of kernel ME and IF of kernel CO. After a brief review of kernel CCA in Section 4.1, we propose the IF of classical multiple kernel CCA in Section 4.2. In Section , we describe experiments conducted on both synthesized and real data analysis from an imaging genetics study with a visualizing method.
2 Kernel Mean element and kernel covariance operator
Kernel ME, kernel CO and kernel CCO with positive definite kernel have been extensively applied to nonparametric statistical inference through representing distribution in the form of means and covariance in RKHS (Gretton et al., 2008, Fukumizu et al., 2008, Song et al., 2008, Kim and Scott, 2012, Gretton et al., 2012). Basic notations of kernel MEs, kernel CO and kernel CCO with their robustness through IF are briefly discussed below.
2.1 Kernel mean element
Let , and be the probability measure on , and , respectively. Also let ,; and be the random sample from the respective distribution. A symmetric kernel defined on a space is called positive definite kernel if the Gram matrix is positive semi-definite (Aronszajn, 1950)
. By the reproduction properties and kernel trick, the kernel can evaluate the inner product of any two feature vectors efficiently without knowing an explicit form of either thefeature map () or feature space (). In addition, the computational cost does not depend on the dimension of the original space after computing the Gram matrices (Fukumizu and Leng, 2014, Alam and Fukumizu, 2014). A mapping with is an element of the RKHS . By the reproducing property with , kernel mean element is defined as
for all . Given an independent and identically distributed sample, the mapping is an empirical element of the RKHS, ,
2.2 Kernel covariance operator
By the reproducing property, kernel CCO, with , and is defined as
3 Influence function of kernel operators
To define the notation of robustness in statistics, different approaches have been proposed, for examples, the minimax approach (Huber, 1964), the sensitivity curve (Tukey, 1977), the influence functions (Hampel, 1974, Hampel et al., 1986) and in the finite sample breakdown point (Donoho and Huber, 1983)
. Due to its simplicity, the IF is the most useful approach in statistical supervised learning(Christmann and Steinwart, 2007, 2004). In this section, we briefly discuss the notations of IF, IF of kernel ME, and IF of kernel CO and kernel CCO.
Let (, ) be a probability space and
a measure space. We want to estimate the parameterof a distribution in . We assume that there exists a functional , where
is the set of all probability distribution in. Let be a distribution in . If data do not fallow the model exactly but slightly going toward , the Gâteaux Derivative at is called influence function (Kim and Scott, 2012)
. The IF of complicated statistics, which is a function of simple statistics, can be calculated with the chain rule, Say. Specifically,
It can also be used to find the IF of a transformed statistic, given the influence function for the statistic itself.
The IF of kernel CCO,
, with joint distribution,, using complicated statistics at is denote as and given by
which is estimated with the data points , for every as
For the bounded kernels, the above IFs have three properties: gross error sensitivity, local shift sensitivity and rejection point. These are not true for the unbounded kernels, for example, liner and polynomial kernels. We are able to make similar conclusion for the kernel CO and kernel CCO. Most of the unsupervised methods explicitly or implicitly depend on the kernel CO or kernel CCO. They are sensitive to contaminated data, even when using the bounded positive definite kernels. To overcome the problem, the outliers need to removed from the data.
4 Kernel CCA and multiple kernel CCA
In this section, we review the kernel CCA, the IF and empirical IF (EIF) of kernel CCA. After that we address the multiple kernel CCA and proposed the IF and EIF of multiple kernel CCA based on the IF of kernel CO and kernel CCO.
4.1 Kernel CCA
The aim of kernel CCA
is to seek two sets of functions in the RKHS for which the correlation (Corr) of random variables is maximized. Given two sets of random variablesand with two functions in the RKHS, and , the optimization problem of the random variables and is
The optimizing functions and are determined up to scale.
Using a finite sample, we are able to estimate the desired functions. Given an i.i.d sample, from a joint distribution , by taking the inner products with elements or “parameters” in the RKHS, we have features and , where and are the associated kernel functions for and , respectively. The kernel Gram matrices are defined as and . We need the centered kernel Gram matrices and , where with and is the vector with ones. The empirical estimate of Eq. (2) is then given by
where and are the directions of and , respectively. After using simple algebra, We can write
Unfortunately, the naive kernelization (3
) of CCA is trivial and non-zero solutions of generalized eigenvalue problem are(Alam et al., 2010, Bach and Jordan, 2002). To overcome this problem, we introduce small regularization terms in the denominator of the right hand side of (3) as
where the small regularized coefficient is .
Using the IF of kernel mean element and covariance operator in the eigenvalue problem in Eq. (4), as shown in (Alam et al., 2016) , the influence function of kernel canonical correlation (kernel CCA ) and kernel canonical variate at is given by
where and similar for the kernel CV of , It is known that the inverse of an operator may not exit or even exist but may not be continuous in general (Fukumizu et al., 2007). While we can derive kernel canonical correlation using correlation operator , even when and are not proper operators, the IF of covariance operator is true only for the finite dimensional RKHSs. For infinite dimensional RKHSs, we can find IF of by introducing a regularization term as follows
where is a regularization coefficient, which gives an empirical estimator. Let be a sample from the distribution . The EIF of Eq.(5) at for all points are
For the bounded kernels the IFs or EIFs, which are stated in Eq.(5), they have the three properties: gross error sensitivity, local shift sensitivity and rejection point. But for unbounded kernels, say a linear or polynomial, the IFs are not bounded. As a consequence, the results of classical kernel CCA using the bounded kernels are less sensitive than classical kernel CCA using the unbounded kernels (Alam et al., 2016, Huang et al., 2009).
4.2 Multiple kernel CCA
Multiple kernel CCA seeks more than two sets of functions in the RKHSs for which the correlation (Corr) of random variables is maximized. Given sets of random variables and functions in the RKHS, ,, , the optimization problem of the random variables , , is
Given an i.i.d sample, from a joint distribution , by taking the inner products with elements or “parameters” in the RKHS, we have features
where are the associated kernel functions for , respectively. The kernel Gram matrices are defined as , , . Similar to Section 4.1, using this kernel Gram matrices, the centered kernel Gram matrices are defined as , , , where with and is the vector with ones. As in the two sets of data the empirical estimate of Eq. (8) is obtained using the generalized eigenvalue problem, as given by following problem:
We demonstrate the experiments on synthesized and real imaging genetics data analysis including SNP, fMRI, and DNA methylation. For synthesized experiments, we generate two types of data: original data and those with of contamination, which are called ideal data (ID) and contaminated data (CD), respectively. In all experiments, for the bandwidth of Gaussian kernel we use the median of the pairwise distance (Gretton et al., 2008, Sun and Chen, 2007). Since the goal is to find the outlier, the regularization parameter of kernel CCA is set as . The description of real data sets is in Sections 5.2 and the synthetic data sets are described as follows:
Multivariate Gaussian structural data (MGSD): Given multivariate normal data, () where is the same as in (Alam et al., 2008). We divide into two sets of variables (,), and use the first 6 variables of as and perform transformation of the absolute value of the remaining variables () as . For the CD ().
Sign and cosine function structural data (SCSD): We use uniform marginal distribution, and transform the data by two periodic and functions to make two sets and , respectively, with additive Gaussian noise: For the CD .
SNP and fMRI structural data (SMSD): Two sets of SNP data X with SNPs and fMRI data Y with 1000 voxels were simulated. To correlate the SNPs with the voxels, a latent model is used as in (Parkhomenko et al., 2009)). For data contamination, we consider the signal level, and noise level, to and , respectively.
In the experiments, first, for the effect of kernel CCA we compared ID with CD. To measure the influence, we calculated the ratio between ID and CD of IF of kernel CC and kernel CV. Based on this ratio, we define two measures for kernel CC and kernel CV
respectively. The method does not depend on the contaminated data, and the above measures, and , should be approximately zero. In other words, the best methods should give smallest values. To compare, we consider simulated data sets: MGSD, SCSD, SMSD with 3 sample sizes, . For each sample size, we repeat the experiment for samples. Table 1 presents the results (e.g., mean standard deviation) of kernel CCA. From this table, we observe that kernel CCA is affected by the contaminated data in all cases.
|SNP & fMRI||SNP & Methylation||fMRI &Methylation||3 Data sets|
|SNP & fMRI|
|SNP & Methylation|
|SNP, fMRI & Methylation|
5.1 Visualizing influential observation using kernel CCA and multiple kernel CCA
Now, we propose a simple graphical display based on the EIF of kernel CCA, and the index plots (the data on -axis and the influence of observation, as shown in Eq. (4.1) on axis), to assess the related influence data points in data integration with respect to EIF of kernel CCA. To do this, we first consider simulated SMSD and then real imaging genomic dataset (see 5.2). The index plots of observations using the SMSD (ID and CD) and influence functions based on the EIF of kernel CCA are presented in Figure 1. The plots show that the influence of ID and CD has significance difference. On the one hand, the observations only for ID have less influence; on the other hand, the observations with CD have large influence. It is clear that the kernel CCA is affected by the CD significantly. In addition, using the visualization of the EIF of kernel CCA, we can easily identify the influence observations properly.
5.2 Real data Analysis: Mind Clinical Imaging Consortium
The Mind Clinical Imaging Consortium (MCIC) has collected three types of data (SNPs, fMRI and DNA methylation) from 208 subjects including schizophrenic patients (age: , females) and (age: , females) healthy controls. Without missing data, the number of subjects is ( schizophrenia (SZ) patients and healthy controls)(Lin et al., 2014).
SNPs: For each subject (SZ patients and healthy controls) a blood sample was taken and DNA was extracted. All subject genes typing was performed at the Mind Research Network using the Illumina Infinium HumanOmni1- Quad assay covering SNP loci. To form the final genotype calls and to perform a series of standard quality control procedures bead studio and PLINK software packages were applied, respectively. The final dataset spans loci having genes based on subjects (those without missing data). Genotypes “aa” (non-minor allele), “Aa” (one minor allele) and “AA” (two minor alleles) were coded as , and for each SNP, respectively (Lin et al., 2014) (Chen and Liu, 2012).
fMRI: Participants’ fMRI data was collected during their block design motor response to auditory stimulation. State-of-the-art approaches use mainly Participants’ feedback and experts’ observations for this purpose. The aim was to continuously monitor the patients, acquiring images with parameters (TR=2000 ms, TE= 30ms, field of view=22cam, slice thickness=4mm, 1 mm skip, 27 slices, acquisition matrix , flip angle =) on a Siemens3T Trio Scanner and 1.5 T Sonata with echo-planar imaging (EPI). Data were pre-processed with SPM5 software and were realigned spatially normalized and resliced to mm. It was smoothed with a Gaussian kernel and analyzed by multiple regression considering the stimulus and their temporal derivatives plus an intercept term as repressors . Finally the stimulus-on versus stimulus-off contrast images were extracted with mission measurements, excluding voxels without measurements. voxels were extracted from ROIs based on the aal brain atlas for analysis (Lin et al., 2014).
DNA methylation:DNA methylation is one of the main epigenetic mechanisms to regulate gene expression. It appears to be involved in the development of SZ. In this paper, we investigated DNA methylation markers in blood from SZ patients and healthy controls. Participants come from the MCIC, a collaborative effort of 4 research sites. For more details, site information and enrollment for SZ patients and healthy controls are in (Liu et al., 2014). All participants’ symptoms were evaluated by the Scale of the Assessment of Positive Symptoms and the Scale of the Assessment of Negative symptoms (Andreasen, 1984). DNA from blood samples was measured by the Illumina Infinium Methylation27 Assay. The methylation value is calculated by taking the ratio of the methylated probe intensity and the total probe intensity.
To detect influential subjects (in SZ patients and healthy controls), as discussed in Section 5.1, we use the EIF of kernel CC of kernel CCA and multiple kernel CCA. Figure 2 shows the influence of participants from MICI data: SNPs, fMRI and DNA methylation. The SZ patients and healthy controls are in st and nd rows, respectively. The analysis results of pairwise data sets (i.e., SNP & fMRI, SNP & Methylation, and fMRI& Methylation) using kernel CCA and all data sets, SNP, fMRI, & Methylation using multiple kernel CCA are in column st to th, respectively. These plots show that in all scenarios the healthy controls have less influence than the SZ patients group.
To extract the outliers of subjects from participants of MCIC data, we consider stem-and-leaf display of influence of MCIC data (e.g., SNP & fMRI, SNP & Methylation, and fMRI& Methylation) using kernel CCA and all data sets, SNP, fMRI, & Methylation using multiple kernel CCA. Table 2 shows the results of pairwise datasets and datasets together. Based on the stem-and-leaf display, the outliers of subject sets of SNP & fMRI, SNP & Methylation, and fMRI& Methylation, and SNP, fMRI, & Methylation are , ,
, . It is noted that, multiple kernel CCA is able to extract common SZ patient , which is also outlier for all pairwise results using kernel CCA. Finally, we investigated the difference between training correlation and test correlation using fold cross-validation with all subjects with or without outliers. Table 4 shows the outliers of subjects along with the result of all subjects with or without outliers using kernel CCA and multiple kernel CCA. We see that after removing the outliers by the proposed methods, both kernel CCA and multiple kernel CCA performed better using all subjects.
6 Concluding remarks and future research
The methods for identifying outliers in imaging genetics data presented in this paper are not only applicable to single data sets but also for integrated data sets, which is an essential and challenging issue for multiple sources data analysis. The proposed methods are based on the IF of kernel CCA and multiple kernel CCA, which can detect and isolate the outlier effectively in both synthesized and real data sets. After applying to pairwise data (e.g., SNP & fMRI, SNP & Methylation, and fMRI& Methylation) using kernel CCA and to all data sets (e.g., SNP, fMRI, & Methylation) using multiple kernel CCA, we found that in all scenarios the healthy controls have less influence than the SZ patients. In addition, multiple kernel CCA is able to extract the common SZ patient , which is also the outliers for all pairwise data analysis using kernel CCA. After removing the significant outliers indicated by both kernel CCA and multiple kernel CCA, the stem-and-leaf display shows that both methods performed much better than using all subjects.
Although we have argued that the kernel CCA and multiple kernel CCA procedure for detecting outliers worked effectively, there is also space for further improvement. The use of the Gaussian kernel function is an optimal selection; however, other classes of kernel functions may be more reasonable for a specific data set. In future work, it would be also interesting to develop robust kernel PCA and robust multiple kernel CCA and apply them to imaging genomic analysis.
The authors wish to thank the NIH (R01 GM109068, R01 MH104680) and NSF (1539067) for support.
- Akaho  S. Akaho. A kernel method for canonical correlation analysis. International meeting of psychometric Society., 35:321–377, 2001.
- Alam  M. A. Alam. Kernel Choice for Unsupervised Kernel Methods. PhD. Dissertation, The Graduate University for Advanced Studies, Japan, 2014.
- Alam and Fukumizu  M. A. Alam and K. Fukumizu. Higher-order regularized kernel CCA. 12th International Conference on Machine Learning and Applications, pages 374–377, 2013.
- Alam and Fukumizu  M. A. Alam and K. Fukumizu. Hyperparameter selection in kernel principal component analysis. Journal of Computer Science, 10(7):1139–1150, 2014.
- Alam and Fukumizu  M. A. Alam and K. Fukumizu. Higher-order regularized kernel canonical correlation analysis.
- Alam et al.  M. A. Alam, M. Nasser, and K. Fukumizu. Sensitivity analysis in robust and kernel canonical correlation analysis. 11th International Conference on Computer and Information Technology, Bangladesh., IEEE:399–404, 2008.
- Alam et al.  M. A. Alam, M. Nasser, and K. Fukumizu. A comparative study of kernel and robust canonical correlation analysis. Journal of Multimedia., 5:3–11, 2010.
- Alam et al.  M. A. Alam, K. Fukumizu, and Y.-P. Wang. Robust Kernel (Cross-) Covariance Operators in Reproducing Kernel Hilbert Space toward Kernel Methods. ArXiv e-prints, February 2016.
- Andreasen  NC. Andreasen. Scale for the assessment of positive symptoms (SAPS). Springer, Iowa City, University of Iowa, 1984.
- Aronszajn  N. Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 68:337–404, 1950.
Bach and Jordan 
F. R. Bach and M. I. Jordan.
Kernel independent component analysis.Journal of Machine Learning Research, 3:1–48, 2002.
- Chen and Liu  X. Chen and H. Liu. An efficient optimization algorithm for structured sparse cca, with applications to eqtl mapping. Statistics in BioSciences, 2012.
- Christmann and Steinwart  A. Christmann and I. Steinwart. On robustness properties of convex risk minimization methods for pattern recognition. Journal of Machine Learning Research, 5:1007–1034, 2004.
- Christmann and Steinwart  A. Christmann and I. Steinwart. Consistency and robustness of kernel-based regression in convex risk minimization. Bernoulli, 13(3):799–819, 2007.
- Donoho and Huber  D. L. Donoho and P. J. Huber. The notion of breakdown point. In P. J. Bickel, K. A. Doksum, and J. L. Hodges Jr, editors, A Festschrift for Erich L. Lehmann, Belmont, California, Wadsworth, 12(3):157–184, 1983.
- Filzmoser et al.  P. Filzmoser, R. Maronna, and M. Werner. Outlier identification in high dimensions. computational Stastistics& Data Analysis, 52(2008):1694–1711, 2008.
- Fukumizu and Leng  K. Fukumizu and C. Leng. Gradient-based kernel dimension reduction for regression. Journal of the American Statistical Association, 109(550):359–370, 2014.
- Fukumizu et al.  K. Fukumizu, F. R. Bach, and A. Gretton. Statistical consistency of kernel canonical correlation analysis. Journal of Machine Learning Research, 8:361–383, 2007.
- Fukumizu et al.  K. Fukumizu, A. Gretton, X. Sun, and B. Schölkopf. Kernel measures of conditional dependence. In Advances in Neural Information Processing Systems, Cambridge, MA, MIT Press, 20:489–496, 2008.
Gogoi et al. 
P. Gogoi, D.K. Bhattacharyya, B. Borah, and J. K. Kalita.
A survey of outlier detection methods in network anomaly identification.The Computer Journal, 54(4):570–588, 2011.
- Gretton et al.  A. Gretton, K. Fukumizu, C. H. Teo, L. Song, B. Schölkopf, and A. Smola. A kernel statistical test of independence. In Advances in Neural Information Processing Systems, 20:585–592, 2008.
- Gretton et al.  A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. J. Smola. A kernel two-sample test. Journal of Machine Learning Research, 13:723 – 773, 2012.
- Hampel  F. R. Hampel. The influence curve and its role in robust estimations. Journal of the American Statistical Association, 69:386–393, 1974.
- Hampel et al.  F. R. Hampel, E. M. Ronchetti, and W. A. Stahel. Robust Statistics. John Wiley & Sons, New York, 1986.
- Hampel et al.  F. R. Hampel, P. J. Rousseeuw E. M. Ronchetti, and W. A. Stahel. Robust Statistics: The Approach Based on Influence Functions. John Wiley & Sons, New York, 2011.
- Hofmann et al.  T. Hofmann, B. Schölkopf, and J. A. Smola. Kernel methods in machine learning. The Annals of Statistics, 36:1171–1220, 2008.
- Huang et al.  S. Y. Huang, Y. R. Yeh, and S. Eguchi. Robust kernel principal component analysis. Neural Computation, 21(11):3179–3213, 2009.
- Huber  P. J. Huber. Robust estimation of a location parameter. Annals of Mathematical Statistics, 35:73–101, 1964.
- Huber and Ronchetti  P. J. Huber and E. M. Ronchetti. Robust Statistics. John Wiley & Sons, England, 2009.
Kim and Scott 
J. Kim and C. D. Scott.
Robust kernel density estimation.Journal of Machine Learning Research, 13:2529–2565, 2012.
- Lin et al.  D. Lin, V. D. Callhoun, and Y. P. Wang. Correspondence between fmri and snp data by group sparse canonical correlation analysis. Medical Image Analysis, 18:891 – 902, 2014.
- Liu et al.  J. Liu, J. Chen, S. Ehrlich, E.Walton, T. White N. P.Bizzozero, J. Bustillo, J. A. Turner, and V. D. Calhoun. Methylation patterns in whole blood correlate with symptoms in schizophrenia patients. Schizophrenia Bulletin, 40(4):769–776, 2014.
- Naser and N. A. Hamzah  M. Naser and M. A. Alam N. A. Hamzah. Qualitative robustness in estimation. Pakistan Journal of Statistics and Operation Research, 8(3):619–634, 2012.
- Oh and Gao  J. H. Oh and J. Gao. A kernel-based approach for detecting outliers of high-dimensional biological data. IEEE International conference on Bioinformatics and Biomedicine (BIBM), 10:Suppl 4:S7, 2009.
- Parkhomenko et al.  E. Parkhomenko, D. Tritchler, and J. Beyene. Sparse canonical correlation analysis with application to genomic data integration. Statistical Applications in Genetics and Molecular Biolog, 8(1):1–34, 2009.
- Romanazzi  M. Romanazzi. Influence in canonical correlation analysis. Psychometrika, 57(2):237–259, 1992.
- Roth  V. Roth. Kernel fisher discriminants for outlier detection. Neural Computation, 18(4):942–960, 2006.
- S. Yu and Moreau  B. D. Moor S. Yu, L-C. Tranchevent and Y. Moreau. Kernel-based Data Fusion for Machine Learning. Springer, Verlag Berlin Heidelberg, 2011.
- Schölkopf et al.  B. Schölkopf, A. J. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation., 10:1299–1319, 1998.
Song et al. 
L. Song, A. Smola, K. Borgwardt, and A. Gretton.
Colored maximum variance unfolding.Advances in Neural Information Processing Systems, 20:1385–1392, 2008.
Sun and Chen 
T. Sun and S. Chen.
Locality preserving cca with applications to data visualization and pose estimation.Image and Vision Computing, 25 (5):531–543, 2007.
- Tukey  J. W. Tukey. Exploratory Data Analysis. Addison-Wesley, Reading, Massachusetts, 1977.