Due to a large number of human single nucleotide polymorphisms (SNPs), kernel methods, methods using the positive definite kernel, have become a popular and effective tool for conducting genome-wide association studies (GWAS), especially for identifying disease associated genes. They offer real-world and principled algorithms to learn how a large number of genetic variants are associated with complex phenotypes, to help expose the complexity in the relationship between the genetic markers and the outcome of interest. Human complex diseases are usually caused by the combined effect of multiple genes without any standard patterns of inheritance. Indeed, to gain a better understanding of the genetic mechanisms and explain the pathogenesis of human complex diseases, the detection of interactions between genes (GGIs) is important instead of SNP-SNP interactions (SSIs). SSI methods, which examine one SNP in a gene, cannot completely interpret the GGIs. Conversely, GGI methods, which consider genes that have many SNPs, not only take into account the interactions between SNPs but also the interactions between genes (Wang et al., 1978, Li et al., 2015).
In the last decade, a number of statistical methods have been used to detect GGIs. Logistic regression, multifactor dimensionality reduction, linkages disequilibrium and entropy based statistics are examples of such methods. Among others, whole genome association analysis toolsets such as(Hieke et al., 2014, minPtest),(Wan et al., 2010, BOOST), (Zhang and Liu, 2007, BEAM),(Schwarz et al., , Random Jungle),(Moore and White, 2007, Tuning ReliefF), and (Purcell et al., 2007, PLINK), have also been developed by the genomics, bioinformatics and biomedical communities. While most of these methods are based on the unit association of SNP, testing the associations between the phenotype and SNPs has limitations and is not sufficient for interpretation of GGIs (Yuan et al., 2012). A powerful tool for multivariate gene-based genome-wide associations is proposed (van der Sluis et al., 2015, MGAS). In the case-control study, linear canonical correlation based U statistic (CCU) is utilized to identify gene-gene interaction (Peng et al., 2010). In recent years, this method was extended to nonlinear statistics using kernel canonical correlation analysis (Classical kernel CCA), which is proposed in (Akaho, 2001). Extending linear CCU to the reproducing kernel Hilbert space (RKHS) is known as kernel CCU (or KCCU)(Larson et al., , Li and Cui, 2012). To estimate the variance in KCCU, researchers have used resampling-based methods, despite their high computational burden.
Bootstrapping, a resampling method that takes a sample and resamples several times, can be prohibitively expensive for large data or a complicated estimator. It may also have poor finite sample performance. Fortunately, the influence function (IF), the effect of change in a single observation of an estimator, directly relates to the asymptotic distribution of the estimator. As such, IF is a convenient way to find the variances and covariances of a variety of estimators(Huber and Ronchetti, 2009, Hampel et al., 2011).
Classical kernel CCA, weighted multiple kernel CCA and other kernel methods have been extensively studied in unsupervised kernel fusion for decades (S. Yu and Moreau, 2011, Ge et al., 2005). But these methods are not robust; they are sensitive to contaminated data (Alam et al., 2008, 2010). Including classical kernel CCA, the unsupervised methods explicitly or implicitly depend on kernel mean elements (kernel ME), the kernel covariance operator (kernel CO) and/or kernel cross-covariance operator (kernel CCO). They can be formulated as an empirical optimization problem to achieve robustness by combining empirical optimization problem with ides of Huber or Hamples M-estimation model. The robust kernel CO and CCO can be computed efficiently via kernelized iteratively re-weighted least square (KIRWLS) problem (Alam et al., 2016)2004, 2007, Debruyne et al., 2008)
, there are no general well-founded robust methods for unsupervised learning.
Robustness is an essential and challenging issue in statistical machine learning for multiple source data analysis because outliers, data that cause surprise in relation to the majority of the data, often occur in the real data. Outliers may be correct, but we need to examine them for transcription errors. They can cause havoc with classical statistical or statistical machine learning methods. To overcome this problem, since 1960 many robust methods have been developed, which are less sensitive to outliers. The goals of robust statistics are to use statistical methods on the whole dataset and to identify points deviating from the original pattern for further investigation(Huber and Ronchetti, 2009, Hampel et al., 2011)
. In recent years, a robust kernel density estimation (robust kernel DE) has been proposed(Kim and Scott, 2012), which is less sensitive than the kernel density estimation. Through an empirical comparison and sensitivity analysis, it has been shown that classical kernel CCA is as sensitive to outliers as kernel PCA (Alam et al., 2010, Alam, 2014).
The contribution of this paper is fourfold. First, we address the robust kernel ME, the robust kernel CO, and CCO. Second, we propose a method based on IFs to estimate the variance of the CCU. Third, we propose a nonparametric robust CCU method based on robust kernel CCA, which is designed for contaminated data and less sensitive to noise than classical kernel canonical correlation analysis. Finally we apply the proposed methods to synthesized data and imaging genetic data sets. Experiments on synthesized (both ideal data (ID) and contaminated data (CD)) and genetics analysis demonstrate that the proposed robust method performs markedly better than the state-of-the-art methods.
This paper is organized as follows. In the next section, we provide a brief review of the robust kernel ME, the robust kernel CO, the robust kernel CCO and the robust Gram matrices. In Section , we discuss in brief the classical kernel CCA, robust kernel CCA and KCCU. After a brief review of classical kernel CCA in Section 3.1, we propose the robust kernel CCA in Section 3.2
and the IF based test statistic to estimate the variance of the CCU is proposed in3.3. In Section 4, we describe experiments conducted on both synthesized data and the imaging genetics data sets. We conclude with a summary of findings and mention areas of future research in Section 5.
2 Robust kernel (cross-) covariance operator
As shown in Eq. (1), we can define kernel CCO as an empirical risk optimization problem. Given the pair of independent and identically distributed sample, , the kernel CCO is an operator of the RKHS, ,
where . In the special case that Y is equal to X, we get the kernel CO.
This type of estimator is sensitive to the presence of outliers in the features. In recent years, the robust kernel ME has been proposed for density estimation (Kim and Scott, 2012)
. Our goal is to extend this notion to kernel CO and kernel CCO. To do so, we estimate kernel CO and kernel CCO based on the robust loss functions, M-estimator. The estimated kernel CO and kernel CCO are called robust kernel CO and robust kernel CCO, respectively. The most common example of robust loss functions,on , are Huber’s and Hampel’s loss functions. Unlike the quadratic loss function, the derivatives of these loss functions are bounded (Huber and Ronchetti, 2009, Hampel et al., 1986). Huber’s loss function is defined as
and Hampel’s loss function is defined as
Given the weights of the robust kernel ME of a set of observations, the points are centered and the centered Gram matrix is , where and . Eq. (2) can be written as
As shown in (Kim and Scott, 2012), Eq. (3) does not have a closed form solution, but using the kernel trick the classical re-weighted least squares (IRWLS) can be extended to a RKHS. The solution is then,
Given weight of robust kernel mean element
of a set of observations , the points
are centered. Thus
where and . For a set of test points , we define two matrices of order as and As in Eq. (4), the robust centered Gram matrix of test points, , in terms of the robust Gram matrix is defined as,
The algorithms of estimating robust kernel CC and CCO are discussed in (Alam et al., 2016).
3 Classical kernel CCA and Robust kernel CCA
Classical kernel CCA has been proposed as a nonlinear extension of linear CCA (Akaho, 2001, Lai and Fyfe, 2000). This method along with its variant have been applied for various purposes including genomics, computer graphics and computer-aided drug discovery and computational biology (Alzate and Suykens, 2008, Alam, 2014, Alam and Fukumizu, 2013). Theoretical results on the convergence of kernel CCA have also been obtained (Fukumizu et al., 2007, Hardoon and Shawe-Taylor, 2009).
3.1 Classical kernel CCA
The aim of classical kernel CCA is to seek the sets of functions in the RKHS for which the correlation (Corr) of random variables is maximized. The simplest case, given two sets of random variablesand with two functions in the RKHS, and , the optimization problem of the random variables and is
where the functions and are obtained up to scale.
We can extract the desired functions with a finite sample. Given an i.i.d sample,
from a joint distribution, by taking the inner products with elements or “parameters” in the RKHS, we have features and , where and are the associated kernel functions for and , respectively. The kernel Gram matrices are defined as and . We need the centered kernel Gram matrices and , where with and is the vector with ones. The empirical estimate of Eq. (5) is based on
where is a diagonal matrix with elements , and and are the directions of and , respectively. The regularized coefficient .
3.2 Robust kernel CCA
In this section, we propose a robust kernel CCA method based on the robust kernel CO and the robust kernel CCO. While many robust linear CCA methods have been proposed that fit the bulk of the data well and indicate the points deviating from the original pattern for further investment (Adrover and donato, 2015, Alam et al., 2010), there are no general well-founded robust methods of kernel CCA. The classical kernel CCA considers the same weights for each data point, , to estimate kernel CO and kernel CCO, which is the solution of an empirical risk optimization problem using the quadratic loss function. It is known that the least square loss function is not a robust loss function. Instead, we can solve an empirical risk optimization problem using the robust least square loss function and the weights are determined based on data via KIRWLS. The robust kernel CO and kernel CCO are used in classical kernel CCA, which we call a robust kernel CCA method. Figure 1 presents a detailed algorithm of the proposed method (except for the first two steps, all steps are similar to classical kernel CCA). This method is designed for contaminated data as well, and the principles we describe apply also to the kernel methods, which must deal with the issue of kernel CO and kernel CCO.
Input: in .
Calculate the robust cross-covariance operator, using algorithm as in (Alam et al., 2016).
Calculate the robust covariance operator and using the same weight of the cross-covariance operator (for simplicity).
For , we have
, the largest eigenvalue offor .
The unit eigenfunctions ofcorresponding to the th eigenvalues are and
The jth () kernel canonical variates are given by
Output: the robust kernel CCA
3.3 Test Statistic
Given the data matrix with the gene set , the number of SNP in each gene, with case data and similarly is for control data. Now we apply kernel CCA on the case data and the control data, and the first kernel canonical correlation is noted as and , respectively. We can also use the same procedure for the other data sets, for example DNA methylation and fMRI.
For correlation test statistics, we need to use the Fisher variance stabilizing transformation of the kernel CC, defined as
which are approximately distributed as standard normal. To assess the statistical significance for each pair of genes and , we determine the co-association between case and controls. The nonlinear test statistic is define as
which is asymptotically distributed as a standard normal distribution,
under the null hypothesis,with independent case and control.
As discussed in Section 1, the Bootstrapping methods can be prohibitively expensive for large data or a complicated estimator and also have poor finite sample performance. Fortunately, IFs are directly related to the asymptotic distribution of the estimator, thus using IFs is a convenient way to find the variances and covariances of a variety of estimators.
In this paper, we apply a method based on IF of kernel CCA, proposed in (Alam et al., 2016), to estimate the variance of the test statistics in Eq. (6). To do this, we need to relate the IF of kernel CC to the IF of Fisher variance stabilizing transformation. Fortunately, the IF of Fisher’s transform of the correlation coefficient, , is independent of (Devlin et al., 1975) and the IF of Fisher’s transform, has the distribution of a product of two independent standard normal variables.
Given be a sample from the joint distribution , the empirical IF of first kernel canonical correlation (kernel CC) at is defined as
Letting and the standardized sum of and difference between centered kernel canonical vectors (kernel CV), and , respectively. The EIF with and is then defined as
According to (Hampel et al., 2011, Huber and Ronchetti, 2009, Mark and Katki, 2001), the variance of Fisher’s transform is . As shown in Section 4, the time complexity of this estimator is lower than the resampling based estimators for instance bootstrap method.
We can define a similar test statistic for the robust kernel CCA using the robust kernel CC and the robust kernel CV.
We demonstrate the experiments on synthesized and imaging genetics analysis. For the synthesized experiment, we generate two types of data original data and those with of contamination, which is called ideal data (ID) and contaminated data (CD), respectively. In all experiments, for the bandwidth of Gaussian kernel we use the median of the pairwise distance (Gretton et al., 2008, Sun and Chen, 2007). Since the goal is to seek the outliers observation, the regularized parameter of kernel CCA is set to . The description of the synthetic data sets and the real data sets are in Section 4.1 and Sections 4.2, respectively.
4.1 Synthetic data
We conduct simulation studies to evaluate the performance of the proposed methods with the following synthetic data. Sine and cosine function structural (SCS) data: We used a uniform marginal distribution, and transformed the data with two periodic and functions to make two sets, and , respectively, with additive Gaussian noise: For the CD .
Multivariate Gaussian structural (MGS) data: Given multivariate normal data, () where is the same as in (Alam and Fukumizu, 2015). We divided into two sets of variables (,), and used the first six variables of as and perform transformation of the absolute value of the remaining variables () as . For the CD ().
SNP and fMRI structural (SMS) data: Two data sets of SNP data X with SNPs and fMRI data Y with 1000 voxels were simulated. To correlate the SNPs with the voxels, a latent model is used as in (Parkhomenko et al., 2009). For contamination, we consider the signal level, and noise level, to and , respectively.
In the synthetic experiments, first, we investigate asymptotic relative efficiency (ARE) of bootstrap based variance and influence function (IF) based variance for linear CCA and classical kernel CCA using SCS data with the different sample sizes . We repeat the experiment with 100 samples for each sample size. To illustrate the computational cost, we also mention the CPU time (in seconds) of each estimator. The configuration of the computer is an Intel (R) Core (TM) i7 CPU 920@ 2.67 GHz., 12.00 GB of memory and a 64-bit operating system. Table 1 shows the ARE values and times. This table clearly indicates that the variance based on the IF is highly efficient for sample size of kernel methods and for the linear CCA . The bootstrap based variance estimates have very high time complexity.
We evaluate the performance of the proposed methods, robust kernel CCA, in three different settings. The robust kernel CCA compares with the classical kernel CCA using Gaussian kernel (same bandwidth and regularization). We consider the same EIF as shown in Eq (7) for both methods. To measure the influence, we calculate the ratio between ID and CD of IF of kernel CC and kernel CV. Based on this ratio, we define two measures on kernel CC and kernel CV,
respectively. The method, which does not depend on the contaminated data, the above measures, and , should be approximately zero. In other words, the best methods should give small values. To compare, we consider 3 simulated data sets: MGSD, SCSD, SMSD with sample sizes, . For each sample size, we repeat the experiment for samples. Table 2 presents the results (mean standard deviation) of classical kernel CCA and robust kernel CCA. From this table, we observe that robust kernel CCA outperforms the classical kernel CCA in all cases.
By the simple graphical display, the index plots (the observations on -axis and the influence on -axis) assess the related influence data points in data fusion with respect to EIF based on kernel CCA and robust kernel CCA (Alam and Wang, 2016). To do this, we consider SMS Data. Figure 2 shows the index plots of classical kernel CCA and robust kernel CCA. The st and nd rows, and columns of this figure are for ID and CD, and classical kernel CCA (Classical KCCA) and robust kernel CCA (Robust KCCA), respectively. These plots show that both methods have almost similar results to the ID. However, it is clear that the classical kernel CCA is affected by the CD significantly. We can easily identify influence of observation using this visualization. On the other hand, the robust kernel CCA has almost similar results to both data sets, ID and CD.
4.2 Mind Clinical Imaging Consortium (MCIC) Data analysis
Schizophrenia(SZ) is a complex human disorder that is caused by the interplay of a number of genetic and environmental factors. The Mind Clinical Imaging Consortium (MCIC) has collected two types of data (SNPs and fMRI) from 208 subjects including SZ patients and healthy controls with genes having SNPs. Without missing data the number of subjects is ( SZ patients and healthy controls) (Lin et al., 2014). For pairwise gene-gene interactions we consider top SZ genes, which are listed on the SZGene database (http://www.szgene.org). One gene does not have any SNPs. Finally, we do the experiment on genes using linear CCA, kernel CCA and robust kernel CCA with two loss functions, which are described in Section .
We examine the linear CCA (LCCA), kernel CCA (KCCA) and robust kernel CCA with two functions Hample’s and Huber’s function, noted as RKCCA(Ha) and RKCCA(hu), respectively. In case of liner CCA and kernel CCA, on the one hand the liner CCA extracts significance pairs with isolated genes, on the other hand the kernel CCA extracts significance pairs with isolated genes at level of significance. Table 3 presents the first gene-gene co-associations based on KCCA along with values of test statistics and p-values. The robust methods, RKCCA(Ha) and RKCCA(Hu) extract and significance pairs with and isolated genes, respectively. Table 4 shows pairs of gene-gene co-association based RKCCA(Ha) along with values of test statistics and p-values. Tabletab:sgene lists all significance genes of linear CCA, kernel CCA and robust kernel CCA. To see the integration structure of the selected genes, we use the Venn-diagram of the four methods. Figure 3 presents the Venn-diagram of LCCA, KCCA, RKCCA(Ha) and RKCCA(Hu). By this figure we observe that the disjointly selected genes of LCCA, KCCA, RKCCA (Ha) and RKCCA (Hu) are , , and . The number of common genes only between LCCA and KCCA, and LCCA and RKCCA, KCCA and RKCCA are , and , respectively. All methods select common genes.
Finally, we conduct the gene ontology and the pathway analysis using online software, the Database for Annotation, Visualization and Integrated Discovery (DAVID ) v6.7 (Huang et al., 2009). The goal is to find the genes which are related to SZ disease. To do this, we consider the functional annotation chart of DAVID. Table 6 consists of a part of the results of gene ontology and pathway analysis. This table contains count, percentages, adjusted P-values and Benjamini values of all methods. Note that the p-value is corrected for multiple hypothesis testing using the Benjamini-Hochberg method. In this table, GABDD and KEGG stand for Genetic association BD diseases and Kyoto encyclopedia of genes and genomes, respectively. On one hand, this table indicates that of the selected genes of the method RKCCA(Ha) are directly related to SZ diseases. On the other hand, those genes selected by linear CCA and kernel CCA are only and related, respectively. In case of the term SZ and bipolar disorder RKCCA(Ha) gives better performance over all methods.
5 Concluding remarks and future research
In this paper, we have proposed kernel CCU and its robust variants to detect gene-gene interaction of SZ disease. The variances of kernel CCA, which is used in kernel CCU, is estimated based on IF. In terms of ARE and computational time, it is shown that this estimator not only performs better in ARE but also has a much lower computational time over bootstrap-based methods. The sensitivity analysis shows that the proposed robust kernel CCA is less sensitive to contamination than classical kernel CCA. We demonstrate the proposed methods to MCIC data set. Although the linear CCA and classical kernel CCA select a large set of genes, these genes are less related to SZ disease. On the other hand the robust methods are able to select a small set of genes which are highly related to SZ disease. Based on gene ontology and pathway analysis we can conclude that the selected genes have a significant influence on the manifestation of SZ disease.
Although we illustrated the proposed methods only to detect gene-gene interactions in SNPs data of MCIC, these methods can also be extended to identify gene-gene interactions and ROI-ROI interactions in DNA methylation data and fMRI data respectively. The development of multiple kernel CCA based U statistics for use in more than two clinical trials in the future warrant valid inquiry for additional research.
The authors wish to thank the NIH (R01 GM109068, R01 MH104680) and NSF (1539067) for support.
Adrover and donato 
J.G. Adrover and S. M. donato.
A robust predictive approach for canonical correlation analysis.
Journal of Multivariate Analysis., 133:356–376, 2015.
- Akaho  S. Akaho. A kernel method for canonical correlation analysis. International meeting of psychometric Society., 35:321–377, 2001.
- Alam  M. A. Alam. Kernel Choice for Unsupervised Kernel Methods. PhD. Dissertation, The Graduate University for Advanced Studies, Japan, 2014.
- Alam and Fukumizu  M. A. Alam and K. Fukumizu. Higher-order regularized kernel CCA. 12th International Conference on Machine Learning and Applications, pages 374–377, 2013.
- Alam and Fukumizu  M. A. Alam and K. Fukumizu. Higher-order regularized kernel canonical correlation analysis.
- Alam and Wang  M. A. Alam and Y.-P. Wang. Identifying Outliers using Influence Function of Multiple Kernel Canonical Correlation Analysis. ArXiv e-prints, 2016.
- Alam et al.  M. A. Alam, M. Nasser, and K. Fukumizu. Sensitivity analysis in robust and kernel canonical correlation analysis. 11th International Conference on Computer and Information Technology, Bangladesh., IEEE:399–404, 2008.
- Alam et al.  M. A. Alam, M. Nasser, and K. Fukumizu. A comparative study of kernel and robust canonical correlation analysis. Journal of Multimedia., 5:3–11, 2010.
- Alam et al.  M. A. Alam, K. Fukumizu, and Y.-P. Wang. Robust Kernel (Cross-) Covariance Operators in Reproducing Kernel Hilbert Space toward Kernel Methods. ArXiv e-prints, February 2016.
- Alzate and Suykens  C. Alzate and J. A. K. Suykens. A regularized kernel CCA contrast function for ICA. Neural Networks, 21:170–181, 2008.
- Christmann and Steinwart  A. Christmann and I. Steinwart. On robustness properties of convex risk minimization methods for pattern recognition. Journal of Machine Learning Research, 5:1007–1034, 2004.
- Christmann and Steinwart  A. Christmann and I. Steinwart. Consistency and robustness of kernel-based regression in convex risk minimization. Bernoulli, 13(3):799–819, 2007.
- Debruyne et al.  M. Debruyne, M. Hubert, and J.V. Horebeek. Model selection in kernel based regression using the influence function. Journal of Machine Learning Research, 9:2377–2400, 2008.
Devlin et al. 
S. J. Devlin, R. Gnanadeshikan, and J. R. Kettenring.
Robust estimation and outlier detection.Biometrika, 62(3):531–545, 1975.
- Fukumizu et al.  K. Fukumizu, F. R. Bach, and A. Gretton. Statistical consistency of kernel canonical correlation analysis. Journal of Machine Learning Research, 8:361–383, 2007.
- Ge et al.  T. Ge, T. E. Nichols, D. Ghoshd, E. C. Morminoe, am M. R. Sabuncu J. W.Smoller, and the Alzheimer’s Disease Neuroimaging Initiative. Kernel methods for measuring independence. Journal of Machine Learning Research, 6:2075–2129, 2005.
- Gretton et al.  A. Gretton, K. Fukumizu, C. H. Teo, L. Song, B. Schölkopf, and A. Smola. A kernel statistical test of independence. In Advances in Neural Information Processing Systems, 20:585–592, 2008.
- Hampel et al.  F. R. Hampel, E. M. Ronchetti, and W. A. Stahel. Robust Statistics. John Wiley & Sons, New York, 1986.
- Hampel et al.  F. R. Hampel, P. J. Rousseeuw E. M. Ronchetti, and W. A. Stahel. Robust Statistics: The Approach Based on Influence Functions. John Wiley & Sons, New York, 2011.
- Hardoon and Shawe-Taylor  D. R. Hardoon and J. Shawe-Taylor. Convergence analysis of kernel canonical correlation analysis: theory and practice. Machine Learning, 74:23–38, 2009.
- Hieke et al.  S. Hieke, H. Binder, A. Nieters, and M. Schumacher. Convergence analysis of kernel canonical correlation analysis: theory and practice. Computational Statistics, 29(1-2):51–63, 2014.
- Huang et al.  D.W. Huang, B. R. Sherman, and R. A. Lempicki. Systematic and integrative analysis of large gene lists using david bioinformatics resources. Nature Protocols, 4(1):44–57, 2009.
- Huber and Ronchetti  P. J. Huber and E. M. Ronchetti. Robust Statistics. John Wiley & Sons, England, 2009.
- Kim and Scott  J. Kim and C. D. Scott. Robust kernel density estimation. Journal of Machine Learning Research, 13:2529–2565, 2012.
- Lai and Fyfe  P. Lai and C. Fyfe. Kernel and nonlinear canonical correlation analysis. Computing and Information Systems, 7:43–49, 2000.
-  N. B. Larson, and M. C. Larson G. D. Jenkins, and C. M. Phelan R. A. Vierkant, and T. A. Sellers, J. M. Schildkraut, R. Sutphen, P. PD. Pharoah, S. A. Gayther, N. Wentzensen, Ovarian Cancer Association Consortium, E. L Goode, and B. L. Fridley.
- Li et al.  J. Li, D. Huang, M. Guo, X. Liu, C. Wang, Z. Teng, R. Zhang, Y. Jiang, H. Lv, and L. Wang. A gene-based information gain method for detecting gene gene interactions in case control studies. European Journal of Human Genetics, 23:1566–1572, 2015.
- Li and Cui  S. Li and Y. Cui. Gene-centric gene-gene interaction: a model-based kernel machine method. The Annals of Applied Statistics, 6(3):1134–1161, 2012.
- Lin et al.  D. Lin, V. D. Callhoun, and Y. P. Wang. Correspondence between fmri and snp data by group sparse canonical correlation analysis. Medical Image Analysis, 18:891 – 902, 2014.
- Mark and Katki  S. D. Mark and H. Katki. Influence function based variance estimation and missing data issues in case-cohort studies. Lifetime Data Analysis, 7(4):331–334, 2001.
- Moore and White  J. H. Moore and B. C. White. Tuning relieff for genome-wide genetic analysis. E. Marchiori, J.H. Moore, and J.C. Rajapakse (Eds.): EvoBIO 2007,LNCS 4447, page 166 175, 2007.
- Parkhomenko et al.  E. Parkhomenko, D. Tritchler, and J. Beyene. Sparse canonical correlation analysis with application to genomic data integration. Statistical Applications in Genetics and Molecular Biolog, 8(1):1–34, 2009.
- Peng et al.  Q.n Peng, J. Zhao, and F. Xue. A gene-based method for detecting gene gene co-association in a case control association study. European Journal of Human Genetics, 18:582–587, 2010.
- Purcell et al.  S. Purcell, B. Neale, K. Todd-Brown, L. Thomas, M.A. Ferreira, D. Bender, J. Maller, P. Sklar, P.I. de Bakker, M.J. Daly, and P. C. Sham. Plink: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics, 81:559–575, 2007.
- S. Yu and Moreau  B. D. Moor S. Yu, L-C. Tranchevent and Y. Moreau. Kernel-based Data Fusion for Machine Learning. Springer, Verlag Berlin Heidelberg, 2011.
-  D. F. Schwarz, I. R. König, and A. Ziegler.
Sun and Chen 
T. Sun and S. Chen.
Locality preserving cca with applications to data visualization and pose estimation.Image and Vision Computing, 25 (5):531–543, 2007.
- van der Sluis et al.  S. van der Sluis, and J. Li C. V. Dolan, Y. Song, P. Sham, D. Posthuma1, and M.X. Li. Mgas: a powerful tool for multivariate gene-based genome-wide association analysis. Bioinformatics, 31:1007–1015, 2015.
- Wan et al.  X. Wan, C. Yang, Q. Yang, H. Xue, X. Fan, N. L.S. Tang, and W. Yu. Boost: A fast approach to detecting gene-gene interactions in genome-wide case-control studies. The American Journal of Human Genetics, 87:325–340, 2010.
- Wang et al.  X. Wang, E. P. Xing, and D. J. Schaid. Kernelmethods for large-scale genomic data analysis. Technometrics, 20:397–405, 1978.
- Yuan et al.  Z. Yuan, Q. Gao, Y. He, X. Zhang, F. Li, J. Zhao, and F. Xue. Detection for gene-gene co-association via kernel canonical correlation analysis. BMC Genetic, 13:83, 2012.
- Zhang and Liu  Yu. Zhang and J. S. Liu. Bayesian inference of epistatic interactions in case-control studies. Nature Genetics, 39:1167–1173, 2007.