Genes and their products carry out their functions in concerted effort via intricate networks. Network analysis of genomic data has consequently been an inescapable success by elucidating gene functions and interactions through many different approaches. For that reason, we wished to apply network reconstruction to gene expression data from diffuse large B-cell lymphoma samples from our own lab (Dybkær et al., 2015) as others (Agnelli et al., 2011; Clarke et al., 2013) have successfully used network analysis in other malignancies. However, the methods have one or more of several shortcomings, particularly, as in our case, when the sample size is moderate.
, and Bayesian networks—may be categorized according to their use of marginal, partial, or semi-partial correlations. The underlying assumption of them all is that correlation hints at causation. The different measures of correlation separate the methods by different attempts to remove the influence of remaining other genes(Markowetz and Spang, 2007). As the covariance matrix and its inverse holds this information, its estimation is a central topic to network analysis. However, this fundamental problem of accurately estimating the covariance matrix (or its inverse, the precision matrix) is notoriously difficult. The usual bias-corrected maximum likelihood estimator (MLE), the sample covariance matrix, has long been known to perform poorly in general due to high variability (Dempster, 1972). The sample covariance becomes increasingly ill-conditioned as the number of variables approaches the sample size and singular when exceeds
. Because of its central statistical role, the list of statistical methods and applications utilizing the estimated covariance matrix is exceedingly long. Beside the many standard statistical methods such as principal component analysis (PCA) and discriminant analysis, examples of direct applications include gene and protein network analysis(Butte et al., 2000; Margolin et al., 2006), spectroscopic imaging (Lin et al., 2007), functional magnetic resonance imaging (fMRI), financial forecasting, and many more. Among this expanding list of applications is also an increasing number of high-dimensional applications and datasets publicly available in online repositories.
In genomic datasets, the number of genes often far exceeds the number of samples . Since the number of parameters increases quadratically in and the sample covariance matrix becomes singular when
many shrinkage and regularization estimators have been proposed to combat the accompanying problems by effectively increasing the degrees of freedom. These examples include the graphical lasso and ridge estimation of the precision matrix(Meinshausen and Bühlmann, 2006; Friedman, Hastie and Tibshirani, 2008; van Wieringen and Peeters, 2016). Subsequently, focus on the selection of the regularization parameter has received much attention (Fan and Tang, 2013).
Instead of attempting to derive still more sophisticated estimators, we attempt to alleviate the problem from a different angle by using more available data and thus effectively increasing . We do so by utilizing a large number of genomic DLBCL datasets which, as with all major groups of cancer, are now publicly available online. We use these studies in combination with data from our own laboratory as a network meta-analysis to arrive at a better estimate of the covariance matrix whilst accounting for and assessing inter-study variation.
We note, that while the high-dimensional extension to is important it is out of scope in this paper. We restrict ourselves to the case where the total number of samples exceeds . Hence, if classes or datasets are available with sample sizes , we consider the case where while allowing to exceed for each individual class .
The remainder of this paper is structured as follows. In Section 2, we propose the model for a common covariance matrix across multiple studies, derive estimators thereof, and propose an inter-study homogeneity measure to aid in assessing the variation between studies. Next, we apply the model in Section 3 to DLBCL samples across 11 datasets. We then assess the proposed model estimators via synthetic data in Section 4 before concluding the manuscript in Section 5.
Although this work was motivated by gene-gene interaction networks in DLBCL, where the covariance matrix is assumed to contain all information about the conditional dependencies of the genes, the methods are general and not limited to such genomic data.
2 A hierarchical model for the covariance matrix
The model below was inspired by ordinary meta-analysis. Meta-analysis comes in various flavors corresponding to the assumption on the nature of the inter-study treatment effect. Random-effects models (REM) in meta-analysis model the inter-study effects as random variables(DerSimonian and Laird, 1986; Choi et al., 2003). In a vein similar to the ordinary meta-analysis approach, we think of the different studies as related but perturbed experiments and propose the following simple random covariance model (RCM) of the observations. Let be the number of features and the number of classes. We model an observation from the ’th study as a
-dimensional zero-mean multivariate Gaussian vector with covariance matrix realized from an inverse Wishart distribution, i.e.follows the hierarchical model
where denotes a
-dimensional multivariate Gaussian distribution with meanand positive definite (p.d.) covariance matrix
, and probability density function (pdf) shown in (A.1), and denotes a -dimensional inverse Wishart distribution with degrees of freedom, a p.d. scale matrix , and pdf shown in (A.2). While the inverse Wishart distribution is defined for all
, the first order moment exists only whenand is given by
Hence, in the RCM of (2.1), can be interpreted as a location-like parameter as it is the expected covariance matrix in each study. The parameter inversely controls the inter-class variation and can as such be considered an inter-class homogeneity parameter of the covariance structure. A large corresponds to high study homogeneity and vice versa for small . This can further be seen as concentrates around for which corresponds to an vanishing inter-study variation for increasing
. This fact is seen directly from variance and covariance expressions for the inverse Wishart (see (2.9) and (2.10)) where the 4th order denominator grows much faster than the 1st order nominator as polynomials in and causing the variance to vanish for . Thus, the true underlying covariance matrix and the homogeneity parameter are the effects of interest to be estimated in this paper. These basic properties of the RCM motivates the construction.
2.1 The likelihood function
Suppose are i.i.d. observations from independent studies from the model given in (2.1). Let be the matrix of observations for the ’th study where rows correspond to samples and columns to variables. By the independence assumptions, the log-likelihood for and is given by
Throughout this paper, we use the generic notation and for the conditional and unconditional pdf of random variables, respectively. Since the inverse Wishart distribution is conjugate to the multivariate Gaussian distribution, the integral—of which the integrand forms a Gaussian-inverse-Wishart distribution—can be evaluated. Hence can be marginalized out, cf. (A.4) in Appendix A, and we arrive at the following expression for the log-likelihood function,
up to an additive constant where is the multivariate generalization of the gamma function , see (A.3). As should be expected, the scatter matrix and study sample size are sufficient statistics for each study. Note that is conditionally Wishart distributed, , by construction.
As stated in the following two propositions, the likelihood is not log-concave in general. However, it is log-concave as a function of .
Proposition 1 (Non-concavity in ).
For a fixed , the log-likelihood function (2.3) is not concave in .
All proofs have been deferred to Appendix B.
Proposition 2 (Concavity in ).
For a fixed positive definite , the log-likelihood function (2.3) is concave in .
While the likelihood is not concave in we are able to show the existence and uniqueness of a global maximum in .
Proposition 3 (Existence and uniqueness).
The log-likelihood (2.3) has a unique maximum in for fixed and .
This result is proven in Appendix B and follows from two lemmas stated therein.
In the following section estimators of the parameters are derived using moments and the EM algorithm assuming fixed.
2.2 Moment estimator
The pooled empirical covariance matrix can be viewed as a moment estimator of . By the model assumptions, the first and second moment of the ’th observation in the ’th study, , is given by and
for all and . This suggests the estimators
where the latter is obtained by plugging into (2.2). This is the well-known pooled empirical covariance matrix.
2.3 Maximization using the EM algorithm
Here the updating scheme of the expectation-maximization (EM) algorithm(Dempster, Laird and Rubin, 1977) for fixed is derived. We now compute the expectation step of the EM-algorithm.
From (2.1) we have that,
Let be the precision matrix and let , then we equivalently have that
From the conjugacy of the inverse Wishart and the Wishart distribution, the posterior distribution of the precision matrix is
Hence, by the expectation of the Wishart distribution,
The maximization step, in which the log-likelihood is maximized, yields the estimate which is the mean of the scaled precision matrices . The derivation of this estimate can be seen in Appendix C. Let be the current estimate of . This yields the updating scheme
for . We denote the inverse of the estimate obtained by repeated iteration of (2.6) by .
An approximate maximum likelihood estimator using a first order approximation is also possible. This derivation has been deferred to Appendix D.
2.4 Estimation procedure
We propose a procedure alternating between estimating and while keeping the other fixed. Given parameters and at iteration , we estimate using fixed . Subsequently, we find by a standard one-dimensional numerical optimization procedure using the fixed . This coordinate ascent approach is repeated until convergence as described in Algorithm 1.
The procedure using the EM step utilizes the results about the RCM log-likelihood and thus provides a guarantee of convergence along with the advantage of a very simple implementation. Both the EM step and the update will always yield an increase in the likelihood. The disadvantage is that the identified stationary point might be a local maximum or saddle-point when considering the log-likelihood function jointly in . Intuitively, the latter possibility happens with zero probability, but it cannot be excluded that the maximum found is not global.
Variations on the convergence criterion can also be considered. One alternative is using the difference in successive parameter estimates. More variants can be created by considering relative rather than absolute differences.
2.5 Interpretation and inference
Test for no class heterogeneity
By the RCM construction parameterizes an inter-class variance where the size of corresponds to the homogeneity between the classes. A large yields high study homogeneity while a small yields low homogeneity. Thus, it might be of interest to test if the estimated homogeneity
is extreme under the null-hypothesis of no heterogeneity (i.e. infinite homogeneity). I.e. a test for the hypothesiswhich is equivalent to
The two are equivalent since sampling the covariance matrix from the inverse Wishart distribution becomes deterministic for . Therefore, testing this hypothesis can also be interpreted as testing whether the data is adequately explained when leaving out the hierarchical structure.
The distribution of under the null hypothesis is not tractable. However, in practice under or when is extremely large the estimated will be finite as the intra-study variance dominates the total variance. We note that the null distribution of does not depend on . We propose approximating the distribution of under by resampling. To do this, the model is simply fitted a large number of times on datasets re-sampled under mimicked by permuted class labels to get . As small values of are critical for approximate acceptance regions can be constructed from . Likewise, an approximation of the value testing can be obtained by
where is the indicator function. The addition of one in the nominator and denominator adds a positive bias to the approximate p-value minimally needed according to Phipson and Smyth (2010). This is approximately the fraction of ’s smaller than .
Intra-class correlation coefficient
We now introduce a descriptive statistic analogous to the intra-class correlation coefficient (ICC)(Shrout and Fleiss, 1979) well known from ordinary meta-analysis to better determine what constitute large values of . For the RCM, the ICC is given by
This follows from the definition of the ICC which is the ratio of the between-study variation and the total variation of any single pair of variables. The ICC might in this sense be utilized in quantifying the reproducibility of the covariance across studies. Consider observations from (2.1). We temporarily abuse our notation and let
and consider only a single observation . Furthermore, let , , and . To compute the ICC, we are thus interested in the ratio of the quantities and corresponding to the between-study and total variation of the covariance between variables and , respectively. That is, the ICC is the proportion of the total variance between studies,
where the second equality is obtained by and the law of total variation. This equality agrees with the usual ICC as can be interpreted as the (expected) within-study variation. Using the conditional variance given by the needed quantities can be found. To compute an expression for (2.8) we need to consider the fourth-order moments of the observations. From the model, known results of the inverse Wishart distribution, cf. (Cook and Forzani, 2011; von Rosen, 1988), leads to
Continuing with the expected conditional variance of in the denominator of (2.8),
An expression of in terms of the elements of can then found by substituting (2.9) and (2.10) into (2.11) and by extension an expression for the ICC (2.8) can be obtained. We omit this tedious calculation which can be verified to yield as above. Naturally, the ICC depends only on . A straight-forward plug-in estimator of the ICC of some gene-gene interaction is then .
Though is required for the variances to exist, it is clear that for and for as should be expected.
2.6 Implementation and availability
Algorithm 1 and the different estimators are implemented in the statistical programming language R (R Core Team, 2012) with core functions in C++ using packages Rcpp and RcppArmadillo (Eddelbuettel and François, 2011; François, Eddelbuettel and Bates, 2012). They are incorporated in the open-source R-package correlateR freely available for forking and editing (Bilgrau, 2014a). We refer to the information here for further details and installation instructions. This document was prepared with knitr (Xie, 2013) and LaTeX. To reproduce this document see Acknowledgments or http://github.com/AEBilgrau/RCM.
3 DLBCL meta-analysis
Diffuse large B-cell lymphoma (DLBCL) is an aggressive cancer subtype accounting for of all non-Hodgkin’s lymphomas (NHL) which itself constitutes about of all lymphomas (International Lymphoma Study Group, 1997).
3.1 Data and preprocessing
A large amount of DLBCL gene expression datasets are now available online at the NCBI (National Center for Biotechnology Information) Gene Expression Omnibus (GEO) website. Ten large-scale DLBCL gene expression studies were downloaded and preprocessed using custom brainarray chip definition files (CDF) (Dai et al., 2005) and RMA-normalized using the R-package affy (Gautier et al., 2004). The corresponding GEO-accession numbers are GSE12195, GSE22895, GSE31312, GSE10846, GSE34171, GSE22470, GSE4475, and GSE19246 based on various microarray platforms. The downloaded data together with a dataset from our own laboratory (GSE56315, Dybkær et al. (2015)) yields a total of 2046 samples with study sizes in the range 78-469. The summarization using brainarray CDFs to Ensembl gene identifiers facilitates cross-platform integration.
A coexpression network (or weighted correlation network) analysis integrating all datasets was carried out. For each dataset the scatter matrix of the top 300 most variable genes (as measured by the pooled variance across all studies) was computed as the sufficient statistics along with the number of samples. Hence, we investigate 45,150 pairwise interactions.
The parameters of the RCM were estimated using the EM algorithm and yielded the matrix and . The RCM was fitted using three different initial sets of parameters which all converged to the same parameter estimates. Log-likelihood traces, iterations used, and computation times are seen in Figure 1. From the parameter estimate, the common expected covariance was computed and subsequently scaled to the corresponding correlation matrix .
The estimated yields a surprisingly low estimated ICC of . Hence by the RCM, only of the variability of the gene-gene covariances is between-studies on average. This low ICC might suggest a high inter-study homogeneity of covariances meaning that the covariances reproduce across studies. However, the selection of only the most varying (across studies) genes is an obvious contribution to the low ICC. By selection, we have high inter-study variability which drives the ICC down. In any case, the low ICC might still suggest high reproducibility of the covariances between studies of the selected genes.
Next, we outline one of many possible downstream analyses of the estimated covariance. For simplicity we employed a standard correlation network analysis to the estimated common correlation matrix across all studies. To identify clusters with high internal correlation, we used agglomerative hierarchical clustering with Ward-linkage and distance measure defined as 1 minus the absolute value of the correlation. The tree was arbitrarily pruned at a height which produced 5 modules named by colors. Figure 2 shows these results. A heat-map, the hierarchical tree, and the identified modules are seen at the top left. The top-right shows a graphical representation of the matrix which better illustrates the clusters and their relations. The bottom plot shows the graph radially laid out using hierarchical edge bundling (Holten, 2006) where the edges are guided by the hierarchical tree. Table 1 shows the top genes within each module. As seen e.g. in the olivegreen module, genes from the same gene family cluster together.
To further investigate the low ICC, we refitted the RCM on the subset of the 50 genes in the orchid module. This yielded an
. Subsequently, the model was repeatedly fitted on 50 randomly chosen genes 500 times to gauge the size of the ICC without the selection bias. This resulted in a mean ICC (lower quartile, upper quartile) ofsuggesting that there is a high study homogeneity under randomly selected genes.
The test for the null-hypothesis of no study heterogeneity, , is clearly rejected with a value of 0.002. The mean (sd) of the fitted on 500 permuted datasets was compared to in the observed dataset.
Next, the modules were screened for biological relevance using GO (Gene Ontology) enrichment analysis. The upper right of Figure 2 shows suggested functions of the modules primarily based on the GO analysis. Acknowledgments shows the significant GO-terms at significance level 0.01 for each module in which the most GO-terms appear highly relevant to the pathology of DLBCL.
|n = 159||n = 50||n = 50||n = 31||n = 10|
We checked if the identified modules were prognostic for overall survival (OS) in the CHOP and R-CHOP-treated cohort datasets of GSE10846. To do this, the eigengene (Horvath, 2011) for each module was computed and a multiple Cox proportional hazards model for OS was fitted with the module eigengenes as covariates. The module eigengene is the first principal component of the expression matrix of the module which thus can be represented by a linear combination of the module genes. For the prognostic interesting orchid module, the Kaplan-Meier estimates were computed for groups arising when dichotomizing the values of the corresponding eigengene as above or below the median value. These results are seen in Figure 3.
From the survival analysis, the olivegreen module appeared particularly interesting since it identifies a group of patients with high value of the corresponding eigengene and poor survival, thus making them candidates for experimental therapies. The gene with highest connectivity in this module is S100A8 (MRP8; calgranulin A) and the gene S100A9 (MRP14; calgranulin B) appears as number 16 further down the connectivity list. This is interesting as compelling research has shown that the S100 family of calcium-binding proteins maintain immunosuppressive MDS cells at the tumor site (Fulmer, 2008). Notably, in mice injected with lymphoma cells, knockout of S100A9 resulted in greater tumor infiltration of T-cells and less accumulation of MDS cells than that seen in wild-type mice (Cheng et al., 2008). The knockout mice had higher rates of tumor rejection and lower tumor size than their wild-type littermates. This result indicates that knockdown of these proteins may improve the outcome of immunotherapy strategies in patients with values of the eigengene of the olivegreen module.
The orchid module also appeared interesting as it marks a gene cluster identifying DLBCL patients with significantly improved outcome. Therefore, a manual screening of the 50 genes within the orchid module was performed. The genes CHI3L1, CHIT1, and LYZ are related to chitin degradation and suggests activated immune system response and inflammation. Enzymes related to chitin degradation can possibly originate from macrophages as CHIT1 is expressed by activated macrophages. The inflammation and modulated activity of the immune system are further suggested by the genes ORM1, PLA2G2D, PLA2G7, and IL18. CHIT3L1 (also known as YKL40) has been linked to the AKT anti-apoptotic signaling pathway in glioblastoma (Francescone et al., 2011) and thus high CHIT3L1 is associated with poor outcome. Some of the remaining, MMP9, PTGDS, ADAMDEC1, HSD11B1, APOC1, and CYP27B1 are involved in metalloproteinase degradation and lipid metabolism. MMP9 in particular is known to have a central role in proliferation, migration, differentiation, angiogenesis, apoptosis, and host defenses. Numerous studies have linked altered MMP expression in different human cancers with poor disease prognosis where up-regulation of MMPs are associated with enhanced cancer cell invasion. ADAMDEC1 is thought to have a central role in dendrite cell functions and their interactions with germinal center T-cells.
The manual screening and GO-analysis results further corroborate that the identified modules are biologically meaningful and that the RCM provides a useful estimate of the covariance.
We finally investigated the choice of using the top 300 most variable genes and fitted the RCM with the top 1000 most variable gene. The approximate MLE algorithm followed by the EM algorithm estimated an matrix and in hours. This implies that ; a somewhat smaller value than the above. The common correlation matrix was computed from and and hierarchical clustering was applied as above. This showed that the same genes in the top 300 based analysis also largely cluster together in this clustering. Particularly, all but one gene from the olivegreen module cluster together.
4 Assessment of the estimation procedures
To assess the precision and stability of the estimation procedure we generated data from the hierarchical model (2.1) under two scenarios with studies with an equal number of observations, , and with different choices of the number of genes . In the first scenario, we set with the number of observations in each study varying in the range . In the second scenario, we set with in the range . In both scenarios, we chose the parameters and for all and for all .
We measured the precision of the estimated values against the expected covariance matrix given by (2.2). Let and be the estimates obtained using the moment, EM, or approximate MLE (defined in Appendix D) approaches as described. We benchmark the proposed estimators against the known truth. The benchmarking measure used is the following weighted sum of squared errors,
In both scenarios, the weighted sum of squared errors for each and estimator was computed for datasets. The mean SSEs (and 99% CI) of these values are seen in panels A and B of Figure 4 as function of .
In scenario 1, the EM approach used a total of 47.7 minutes of computation time for fits compared to 9.3 and 4.6 minutes for the pool and approximate MLE approach, respectively. In scenario 2, the EM, pooled, and approximate MLE approach used 939.9, 8.1, and 149.9 minutes respectively. Panel C of Figure 4 shows the computational time in a third scenario for more extreme values of .
From both panels A and B of Figure 4, we see that the EM estimation is superior with less variation to that of the approximate MLE and moment estimators. As seen from the panel C and above computational times, this superiority comes at a computational cost rapidly increasing in .
5 Concluding remarks
This article provides a basic framework for modeling a common covariance structure across multiple classes or datasets applied to identify biologically meaningful gene-gene interaction networks in DLBCL. The straight-forward approaches of using the mean or pooled covariance matrix are seen as moment estimators of the model and the estimate using the EM algorithm is shown to be superior to these simple alternatives. While the improvements are modest, the article demonstrates a potentially advantageous way of modelling the inter-study variability by a hierarchical random covariance model. However, the virtue of such a model is not from improvement in accuracy alone. The model-based approach allows investigating and testing both pairwise and global covariance structures correlations of genes, which we consider for future research. Another feature is the invertible estimate provided by the RCM, which allows users to explore the corresponding Gaussian graphical model. Also desirable is the explicit and interpretable quantification of the inter-study variance. If is estimated to be large, the studies exhibit a largely common covariance structure, and vice-versa when is estimated to be small. We have provided the ICC for the RCM as an attempt to aid in the interpretation of . Also provided is the basic framework for testing if study heterogeneity is present. However, the proposed testing is computationally demanding and only feasible when is sufficiently small. This could e.g. be overcome by an improved and faster fitting procedures or by deriving the distribution of under the null hypothesis. Yet the latter is seemingly intractable as is a very complex function of the data. The fact that the null-hypothesis lies on the edge of parameter space also seems to constrain the feasibility of deriving such a distribution.
One might question whether the added utility of the parameter is an obvious relaxation of covariance homogeneity. For example, it is unclear how large a proportion a single extra parameter can explain of the inter-study variance. Hence, the present work should be considered a first step in the direction of explicitly modeling the inter-study variation of covariances.
As demonstrated, combining multiple studies can yield a sufficiently large total sample size that allows for estimation of large covariance matrices without the use of regularization. However, we believe this work could be further enriched by combining the method with regularized estimation. Such a generalization of the model to is extremely interesting though out of scope for this article.
The recent advances in such regularized techniques which allows for analysis of large covariance matrices has unfortunately diminished the focus on collecting an adequate number of samples. The technically possible estimates for extreme
ratios does not necessarily imply that a good estimate is achieved. For example, while non-zero entries can often be accurately recalled in graphical lasso, actual estimates of the covariances (or precisions) can still be heavily biased. Large sample-sizes are still needed to achieve unbiased estimates of the covariance due to the bias-variance trade-off. An increased focus should therefore also appointed to efficiently aggregating datasets and achieving sufficiently large sample sizes to allow for stable and unbiased estimation of covariance matrices. Likewise, results should be validated in independent experiments.
We thank Martin Raussen, Jon Johnsen, as well as Niels Richard Hansen for their assistance on some of the mathematical proofs. The helpful comments from Steffen Falgreen, Andreas S. Pedersen, and reviewers were also much appreciated. The technical assistance from Alexander Schmitz, Julie S. Bødker, Ann-Maria Jensen, Louise H. Madsen, and Helle Høholt is also greatly appreciated.
Supplement A Documents for reproducibility [url]http://people.math.aau.dk/~abilgrau/RCM/SuppA/ The documents and other needed files to perform the analyses to reproduce this article. See the README file herein.
Supplement B Identified modules and GO analysis [url]http://people.math.aau.dk/~abilgrau/RCM/SuppB/ Tables of gene module memberships, auxiliary information, and the significant GO-terms for each identified module.
- Agnelli et al. (2011) [author] Agnelli, LucaL., Forcato, MattiaM., Ferrari, FrancescoF., Tuana, GiacomoG., Todoerti, KatiaK., Walker, Brian aB. a., Morgan, Gareth JG. J., Lombardi, LuigiaL., Bicciato, SilvioS. Neri, AntoninoA. (2011). The reconstruction of transcriptional networks reveals critical genes with implications for clinical outcome of multiple myeloma. Clinical Cancer Research 17 7402–12.
- Bilgrau (2014a) [author] Bilgrau, Anders EllernA. E. (2014a). correlateR: Fast, efficient, and robust partial correlations R package version 0.1.
- Bilgrau (2014b) [author] Bilgrau, Anders EllernA. E. (2014b). HierarchicalEdgeBundles: Plotting via Hierarchical Edge Bundles R package version 0.1.
- Butte et al. (2000) [author] Butte, A JA. J., Tamayo, PP., Slonim, DD., Golub, T RT. R. Kohane, I SI. S. (2000). Discovering Functional Relationships Between RNA Expression and Chemotherapeutic Susceptibility using Relevance Networks. PNAS 97 12182–6. 10.1073/pnas.220392197
- Cheng et al. (2008) [author] Cheng, PingyanP., Corzo, Cesar AC. A., Luetteke, NoreenN., Yu, BinB., Nagaraj, SrinivasS., Bui, Marylin MM. M., Ortiz, MyrnaM., Nacken, WolfgangW., Sorg, ClemensC., Vogl, ThomasT. et al. (2008). Inhibition of dendritic cell differentiation and accumulation of myeloid-derived suppressor cells in cancer is regulated by S100A9 protein. The Journal of experimental medicine 205 2235–2249.
- Choi et al. (2003) [author] Choi, J. K.J. K., Yu, U.U., Kim, S.S. Yoo, O. J.O. J. (2003). Combining Multiple Microarray Studies and Modeling Interstudy Variation. Bioinformatics 19 i84–i90. 10.1093/bioinformatics/btg1010
- Clarke et al. (2013) [author] Clarke, ColinC., Madden, Stephen FS. F., Doolan, PadraigP., Aherne, Sinead TS. T., Joyce, HelenaH., O’Driscoll, LorraineL., Gallagher, William MW. M., Hennessy, Bryan TB. T., Moriarty, MichaelM., Crown, JohnJ., Kennedy, SusanS. Clynes, MartinM. (2013). Correlating transcriptional networks to breast cancer survival: a large-scale coexpression analysis. Carcinogenesis 34 2300–2308.
- Cook and Forzani (2011) [author] Cook, R. DennisR. D. Forzani, LilianaL. (2011). On the Mean and Variance of the Generalized Inverse of a Singular Wishart Matrix. Electronic Journal of Statistics 5 146–158. 10.1214/11-EJS602
- Dai et al. (2005) [author] Dai, ManhongM., Wang, PinglangP., Boyd, Andrew DA. D., Kostov, GeorgiG., Athey, BrianB., Jones, Edward GE. G., Bunney, William EW. E., Myers, Richard MR. M., Speed, Terry PT. P., Akil, HudaH., Watson, Stanley JS. J. Meng, FanF. (2005). Evolving Gene/Transcript Definitions Significantly Alter the Interpretation of GeneChip Data. Nucleic Acids Research 33 e175. 10.1093/nar/gni179
- Dempster (1972) [author] Dempster, A. P.A. P. (1972). Covariance Selection. Biometrics 28 157–175.
- Dempster, Laird and Rubin (1977) [author] Dempster, A. P.A. P., Laird, N. M.N. M. Rubin, D. B.D. B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B (Statistical Methodology) 39 1–38.
- DerSimonian and Laird (1986) [author] DerSimonian, RR. Laird, NN. (1986). Meta-analysis in Clinical Trials. Controlled Clinical Trials 7 177–88.
- Dybkær et al. (2015) [author] Dybkær, KarenK., Bøgsted, MartinM., Falgreen, SteffenS., Bødker, Julie S. J. S., Kjeldsen, Malene K. M. K., Schmitz, AlexanderA., Bilgrau, Anders E. A. E., Xu-Monette, Zijun Y. Z. Y., Li, LingL., Bergkvist, Kim S. K. S., Laursen, Maria B. M. B., Rodrigo-Domingo, MariaM., Marques, Sara C. S. C., Rasmussen, Sophie B. S. B., Nyegaard, MetteM., Gaihede, MichaelM., Møller, Michael B. M. B., Samworth, Richard J. R. J., Shah, Rajen D. R. D., Johansen, PrebenP., El-Galaly, Tarec C. T. C., Young, Ken H. K. H. Johnsen, Hans E. H. E. (2015). A Diffuse Large B-Cell Lymphoma Classification System That Associates Normal B-cell Subset Phenotypes with Prognosis. Journal Of Clinical Oncology, In press. 10.1200/JCO.2014.57.7080
- Eddelbuettel and François (2011) [author] Eddelbuettel, DirkD. François, RomainR. (2011). Rcpp: Seamless R and C++ integration. Journal of Statistical Software 40.
- Fan and Tang (2013) [author] Fan, YingyingY. Tang, Cheng YongC. Y. (2013). Tuning parameter selection in high dimensional penalized likelihood. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 75 531–552.
- François, Eddelbuettel and Bates (2012) [author] François, RomainR., Eddelbuettel, DirkD. Bates, DougD. (2012). RcppArmadillo: Rcpp Integration for Armadillo Templated Linear Algebra Library R package version 0.3.6.1.
- Francescone et al. (2011) [author] Francescone, Ralph AR. A., Scully, SteveS., Faibish, MichaelM., Taylor, Sherry LS. L., Oh, DennisD., Moral, LuisL., Yan, WeiW., Bentley, BrookeB. Shao, RongR. (2011). Role of YKL-40 in the Angiogenesis, Radioresistance, and Progression of Glioblastoma. Journal of Biological Chemistry 286 15332–15343.
- Friedman, Hastie and Tibshirani (2008) [author] Friedman, JeromeJ., Hastie, TrevorT. Tibshirani, RobertR. (2008). Sparse Inverse Covariance Estimation with the Graphical Lasso. Biostatistics 9 432–41. 10.1093/biostatistics/kxm045
- Fruchterman and Reingold (1991) [author] Fruchterman, Thomas MJT. M. Reingold, Edward ME. M. (1991). Graph Drawing by Force-Directed Placement. Software: Practice and experience 21 1129–1164.
- Fulmer (2008) [author] Fulmer, TimT. (2008). Suppressing the suppressors. 38. 10.1038/scibx.2008.914
- Gautier et al. (2004) [author] Gautier, LaurentL., Cope, LeslieL., Bolstad, Benjamin M.B. M. Irizarry, Rafael A.R. A. (2004). affy—Analysis of Affymetrix GeneChip Data at the Probe Level. Bioinformatics 20 307–315. http://dx.doi.org/10.1093/bioinformatics/btg405
- International Lymphoma Study Group (1997) [author] International Lymphoma Study Group (1997). A Clinical Evaluation of the International Lymphoma Study Group Classification of Non-Hodgkin’s Lymphoma. Blood 89 3909–3918.
[author] Holten, DannyD. (2006). Hierarchical Edge Bundles: Visualization of Adjacency Relations in Hierarchical Data. Visualization and Computer Graphics, IEEE Transactions on 12 741–748.
- Horvath (2011) [author] Horvath, SteveS. (2011). Weighted Network Analysis: Applications in Genomics and Systems Biology. Springer.
- Khalil (2002) [author] Khalil, Hassan K.H. K. (2002). Nonlinear Systems. Prentice Hall.
- Lauritzen (1996) [author] Lauritzen, Steffen LS. L. (1996). Graphical models. Clarendon Press.
- Lin et al. (2007) [author] Lin, Fa-HsuanF.-H., Tsai, Shang-YuehS.-Y., Otazo, RicardoR., Caprihan, ArvindA., Wald, Lawrence LL. L., Belliveau, John WJ. W. Posse, StefanS. (2007). Sensitivity-Encoded (SENSE) Proton Echo-Planar Spectroscopic Imaging (PEPSI) in the Human Brain. Magnetic Resonance in Medicine 57 249–57. 10.1002/mrm.21119
- Margolin et al. (2006) [author] Margolin, Adam aA. a., Nemenman, IlyaI., Basso, KatiaK., Wiggins, ChrisC., Stolovitzky, GustavoG., Dalla Favera, RiccardoR. Califano, AndreaA. (2006). ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 7 Suppl 1 S7.
- Markowetz and Spang (2007) [author] Markowetz, FlorianF. Spang, RainerR. (2007). Inferring Cellular Networks - A Review. BMC Bioinformatics 8 S5. 10.1186/1471-2105-8-S6-S5
- Meinshausen and Bühlmann (2006) [author] Meinshausen, NN. Bühlmann, PP. (2006). High dimensional graphs and variable selection with the lasso. The Annals of Statistics.
- Petersen and Pedersen (2008) [author] Petersen, KBK. Pedersen, MSM. (2008). The Matrix Cookbook Technical University of Denmark, Technical Manual.
- Phipson and Smyth (2010) [author] Phipson, BelindaB. Smyth, Gordon KG. K. (2010). Permutation P-values Should Never be Zero: Calculating Exact P-values when Permutations are Randomly Drawn. Statistical Applications in Genetics and Molecular Biology 9.
- Shrout and Fleiss (1979) [author] Shrout, Patrick EP. E. Fleiss, Joseph LJ. L. (1979). Intraclass Correlations: Uses in Assessing Rater Reliability. Psychological Bulletin 86 420.
- R Core Team (2012) [author] R Core Team (2012). R: A Language and Environment for Statistical Computing R Foundation for Statistical Computing, Vienna, Austria ISBN 3-900051-07-0.
van Wieringen and
[author] van Wieringen, Wessel N.W. N. Peeters, Carel F. W.C. F. W. (2016). Ridge Estimation of Inverse Covariance Matrices from High-Dimensional Data. Computational Statistics and Data Analysis 103 284–303. 10.1016/j.csda.2016.05.012
- von Rosen (1988) [author] von Rosen, DD. (1988). Moments for the Inverted Wishart Distribution. Scandinavian Journal of Statistics 15 97–109.
- Xie (2013) [author] Xie, YihuiY. (2013). Dynamic Documents with R and knitr. CRC Press.
Appendix A Marginalization of the covariance
This section shows the marginalization over in (2.3). Recall the model (2.1) where denotes a -dimensional multivariate Gaussian distribution with mean and positive definite (p.d.) covariance matrix with probability density function (pdf)
and where denotes a -dimensional inverse Wishart distribution with degrees of freedom, a p.d. scale matrix , and pdf
where is p.d. and is the multivariate generalization of the gamma function given by
For ease of notation we drop the subscript on , , , and . By the model assumptions,
The integrand can be recognized as a unnormalized inverse Wishart pdf of the distribution , and so the integral evaluates to the reciprocal value of the normalizing constant in that density. Thus,
Using the matrix determinant lemma and , this can be further simplified to
which can help to speed-up computations.
Appendix B Proofs
b.1 Non-concavity of the log-likelihood
The likelihood function is not log-concave in general. This section analyses the (non)-concavity of the log-likelihood function given in (2.3). More precisely, the following two propositions are proved.
Proof of Proposition 1.
Assume is fixed and consider only the terms involving in (2.3). We reduce to the one-dimensional case where
It is straightforward to show there exists a value for , and for which . Since the second derivative is not always negative the log-likelihood is not log-concave. ∎
Proof of Proposition 2.
Consider the terms involving . Clearly, the mixed terms involving both and are log-linear in and hence log-concave. We thus restrict our attention to the remaining terms not dependent on . The sum of these terms are concave in , since
which can be seen to be concave since for all and is concave for all and . The concavity of is easily seen by the fact that where is the tri-gamma function. The tri-gamma function is a well-known monotonically decreasing function. Hence, the likelihood is log-concave in . ∎
b.2 Existence and uniqueness of likelihood maxima
Before we state the lemmas, the proposition, and their proofs, we see that the reparameterisation of the RCM is irrelevant. Consider the log-likelihood in (2.3) assuming fixed. The log-likelihood obey
Notice, that this equation also holds in the reparameterization. Here we have
Since is only dependent on data (when is fixed) we can set . Without loss of generality we can therefore consider (B.1) in the following.
Proof of Proposition 3.
We first prove existence of the maximum. Note, that we may consider as a function on a vector space by letting where is a symmetric matrix. By Lemma 1 and the continuity of , the set is bounded and closed and thus compact for any . The existence of a maximum follows from the extreme value theorem by the continuity of . A stationary point exists due to Rolle’s theorem and the differentiability of .
Next, we show the uniqueness of the maximum. Let denote the set of stationary points, which is nonempty. By Lemma 1, has a finite upper bound given by the maximum of the log-likelihood in those points. All gradient curves (that is, solution curves to ) must then converge toward exactly one of the stationary points where monotonically increases along each curve. Define for the basin of attraction
The basin of attraction is open if is a maximum (Khalil, 2002, Lemma 4.1). By Lemma 2, is always a maximum and hence all are open sets in the set of all positive definite matrices . This partitions the space into disjoint, non-empty, open sets. Since is connected, this is only possible if and thus there is only a single basin of attraction and maximum of . ∎
If there exists an eigenvalue
If there exists an eigenvalueof such that or , then for fixed and .
Proof of Lemma 1.
Assume the hypothesis of the lemma and consider the expression given in (B.1) up to the addition of a constant. The likelihood obey the following two upper bounds. First,
Secondly, let whereby (B.1) can be expressed as
Since is concave and the above sum is a convex combination, we have
where and . Three cases now exists: 1) If , then
2) If , then
as the matrix in the second term is almost surely positive definite when and the log determinant is some constant. 3) If is bounded and the largest eigenvalue (and hence , then and