1 Introduction
Genes and their products carry out their functions in concerted effort via intricate networks. Network analysis of genomic data has consequently been an inescapable success by elucidating gene functions and interactions through many different approaches. For that reason, we wished to apply network reconstruction to gene expression data from diffuse large Bcell lymphoma samples from our own lab (Dybkær et al., 2015) as others (Agnelli et al., 2011; Clarke et al., 2013) have successfully used network analysis in other malignancies. However, the methods have one or more of several shortcomings, particularly, as in our case, when the sample size is moderate.
Crudely, many network analysis methods—such as relevance or correlation networks (Horvath, 2011), Gaussian graphical models (Lauritzen, 1996)
, and Bayesian networks—may be categorized according to their use of marginal, partial, or semipartial correlations. The underlying assumption of them all is that correlation hints at causation. The different measures of correlation separate the methods by different attempts to remove the influence of remaining other genes
(Markowetz and Spang, 2007). As the covariance matrix and its inverse holds this information, its estimation is a central topic to network analysis. However, this fundamental problem of accurately estimating the covariance matrix (or its inverse, the precision matrix) is notoriously difficult. The usual biascorrected maximum likelihood estimator (MLE), the sample covariance matrix, has long been known to perform poorly in general due to high variability (Dempster, 1972). The sample covariance becomes increasingly illconditioned as the number of variables approaches the sample size and singular when exceeds. Because of its central statistical role, the list of statistical methods and applications utilizing the estimated covariance matrix is exceedingly long. Beside the many standard statistical methods such as principal component analysis (PCA) and discriminant analysis, examples of direct applications include gene and protein network analysis
(Butte et al., 2000; Margolin et al., 2006), spectroscopic imaging (Lin et al., 2007), functional magnetic resonance imaging (fMRI), financial forecasting, and many more. Among this expanding list of applications is also an increasing number of highdimensional applications and datasets publicly available in online repositories.In genomic datasets, the number of genes often far exceeds the number of samples . Since the number of parameters increases quadratically in and the sample covariance matrix becomes singular when
many shrinkage and regularization estimators have been proposed to combat the accompanying problems by effectively increasing the degrees of freedom. These examples include the graphical lasso and ridge estimation of the precision matrix
(Meinshausen and Bühlmann, 2006; Friedman, Hastie and Tibshirani, 2008; van Wieringen and Peeters, 2016). Subsequently, focus on the selection of the regularization parameter has received much attention (Fan and Tang, 2013).Instead of attempting to derive still more sophisticated estimators, we attempt to alleviate the problem from a different angle by using more available data and thus effectively increasing . We do so by utilizing a large number of genomic DLBCL datasets which, as with all major groups of cancer, are now publicly available online. We use these studies in combination with data from our own laboratory as a network metaanalysis to arrive at a better estimate of the covariance matrix whilst accounting for and assessing interstudy variation.
We note, that while the highdimensional extension to is important it is out of scope in this paper. We restrict ourselves to the case where the total number of samples exceeds . Hence, if classes or datasets are available with sample sizes , we consider the case where while allowing to exceed for each individual class .
The remainder of this paper is structured as follows. In Section 2, we propose the model for a common covariance matrix across multiple studies, derive estimators thereof, and propose an interstudy homogeneity measure to aid in assessing the variation between studies. Next, we apply the model in Section 3 to DLBCL samples across 11 datasets. We then assess the proposed model estimators via synthetic data in Section 4 before concluding the manuscript in Section 5.
Although this work was motivated by genegene interaction networks in DLBCL, where the covariance matrix is assumed to contain all information about the conditional dependencies of the genes, the methods are general and not limited to such genomic data.
2 A hierarchical model for the covariance matrix
The model below was inspired by ordinary metaanalysis. Metaanalysis comes in various flavors corresponding to the assumption on the nature of the interstudy treatment effect. Randomeffects models (REM) in metaanalysis model the interstudy effects as random variables
(DerSimonian and Laird, 1986; Choi et al., 2003). In a vein similar to the ordinary metaanalysis approach, we think of the different studies as related but perturbed experiments and propose the following simple random covariance model (RCM) of the observations. Let be the number of features and the number of classes. We model an observation from the ’th study as adimensional zeromean multivariate Gaussian vector with covariance matrix realized from an inverse Wishart distribution, i.e.
follows the hierarchical model(2.1) 
where denotes a
dimensional multivariate Gaussian distribution with mean
and positive definite (p.d.) covariance matrix, and probability density function (pdf) shown in (
A.1), and denotes a dimensional inverse Wishart distribution with degrees of freedom, a p.d. scale matrix , and pdf shown in (A.2). While the inverse Wishart distribution is defined for all, the first order moment exists only when
and is given by(2.2) 
Hence, in the RCM of (2.1), can be interpreted as a locationlike parameter as it is the expected covariance matrix in each study. The parameter inversely controls the interclass variation and can as such be considered an interclass homogeneity parameter of the covariance structure. A large corresponds to high study homogeneity and vice versa for small . This can further be seen as concentrates around for which corresponds to an vanishing interstudy variation for increasing
. This fact is seen directly from variance and covariance expressions for the inverse Wishart (see (
2.9) and (2.10)) where the 4th order denominator grows much faster than the 1st order nominator as polynomials in and causing the variance to vanish for . Thus, the true underlying covariance matrix and the homogeneity parameter are the effects of interest to be estimated in this paper. These basic properties of the RCM motivates the construction.2.1 The likelihood function
Suppose are i.i.d. observations from independent studies from the model given in (2.1). Let be the matrix of observations for the ’th study where rows correspond to samples and columns to variables. By the independence assumptions, the loglikelihood for and is given by
Throughout this paper, we use the generic notation and for the conditional and unconditional pdf of random variables, respectively. Since the inverse Wishart distribution is conjugate to the multivariate Gaussian distribution, the integral—of which the integrand forms a GaussianinverseWishart distribution—can be evaluated. Hence can be marginalized out, cf. (A.4) in Appendix A, and we arrive at the following expression for the loglikelihood function,
(2.3) 
up to an additive constant where is the multivariate generalization of the gamma function , see (A.3). As should be expected, the scatter matrix and study sample size are sufficient statistics for each study. Note that is conditionally Wishart distributed, , by construction.
As stated in the following two propositions, the likelihood is not logconcave in general. However, it is logconcave as a function of .
Proposition 1 (Nonconcavity in ).
For a fixed , the loglikelihood function (2.3) is not concave in .
All proofs have been deferred to Appendix B.
Proposition 2 (Concavity in ).
For a fixed positive definite , the loglikelihood function (2.3) is concave in .
While the likelihood is not concave in we are able to show the existence and uniqueness of a global maximum in .
Proposition 3 (Existence and uniqueness).
The loglikelihood (2.3) has a unique maximum in for fixed and .
This result is proven in Appendix B and follows from two lemmas stated therein.
In the following section estimators of the parameters are derived using moments and the EM algorithm assuming fixed.
2.2 Moment estimator
The pooled empirical covariance matrix can be viewed as a moment estimator of . By the model assumptions, the first and second moment of the ’th observation in the ’th study, , is given by and
for all and . This suggests the estimators
(2.4) 
where the latter is obtained by plugging into (2.2). This is the wellknown pooled empirical covariance matrix.
2.3 Maximization using the EM algorithm
Here the updating scheme of the expectationmaximization (EM) algorithm
(Dempster, Laird and Rubin, 1977) for fixed is derived. We now compute the expectation step of the EMalgorithm.From (2.1) we have that,
Let be the precision matrix and let , then we equivalently have that
(2.5) 
From the conjugacy of the inverse Wishart and the Wishart distribution, the posterior distribution of the precision matrix is
Hence, by the expectation of the Wishart distribution,
The maximization step, in which the loglikelihood is maximized, yields the estimate which is the mean of the scaled precision matrices . The derivation of this estimate can be seen in Appendix C. Let be the current estimate of . This yields the updating scheme
(2.6) 
for . We denote the inverse of the estimate obtained by repeated iteration of (2.6) by .
An approximate maximum likelihood estimator using a first order approximation is also possible. This derivation has been deferred to Appendix D.
2.4 Estimation procedure
We propose a procedure alternating between estimating and while keeping the other fixed. Given parameters and at iteration , we estimate using fixed . Subsequently, we find by a standard onedimensional numerical optimization procedure using the fixed . This coordinate ascent approach is repeated until convergence as described in Algorithm 1.
The update function in the algorithm is defined by the derived estimators. That is, equations (2.4), (2.6), or (D.2) define as the pooled, EM, or approximate MLE estimates, respectively.
The procedure using the EM step utilizes the results about the RCM loglikelihood and thus provides a guarantee of convergence along with the advantage of a very simple implementation. Both the EM step and the update will always yield an increase in the likelihood. The disadvantage is that the identified stationary point might be a local maximum or saddlepoint when considering the loglikelihood function jointly in . Intuitively, the latter possibility happens with zero probability, but it cannot be excluded that the maximum found is not global.
Variations on the convergence criterion can also be considered. One alternative is using the difference in successive parameter estimates. More variants can be created by considering relative rather than absolute differences.
2.5 Interpretation and inference
Test for no class heterogeneity
By the RCM construction parameterizes an interclass variance where the size of corresponds to the homogeneity between the classes. A large yields high study homogeneity while a small yields low homogeneity. Thus, it might be of interest to test if the estimated homogeneity
is extreme under the nullhypothesis of no heterogeneity (i.e. infinite homogeneity). I.e. a test for the hypothesis
which is equivalent toThe two are equivalent since sampling the covariance matrix from the inverse Wishart distribution becomes deterministic for . Therefore, testing this hypothesis can also be interpreted as testing whether the data is adequately explained when leaving out the hierarchical structure.
The distribution of under the null hypothesis is not tractable. However, in practice under or when is extremely large the estimated will be finite as the intrastudy variance dominates the total variance. We note that the null distribution of does not depend on . We propose approximating the distribution of under by resampling. To do this, the model is simply fitted a large number of times on datasets resampled under mimicked by permuted class labels to get . As small values of are critical for approximate acceptance regions can be constructed from . Likewise, an approximation of the value testing can be obtained by
where is the indicator function. The addition of one in the nominator and denominator adds a positive bias to the approximate pvalue minimally needed according to Phipson and Smyth (2010). This is approximately the fraction of ’s smaller than .
Intraclass correlation coefficient
We now introduce a descriptive statistic analogous to the intraclass correlation coefficient (ICC)
(Shrout and Fleiss, 1979) well known from ordinary metaanalysis to better determine what constitute large values of . For the RCM, the ICC is given by(2.7) 
This follows from the definition of the ICC which is the ratio of the betweenstudy variation and the total variation of any single pair of variables. The ICC might in this sense be utilized in quantifying the reproducibility of the covariance across studies. Consider observations from (2.1). We temporarily abuse our notation and let
and consider only a single observation . Furthermore, let , , and . To compute the ICC, we are thus interested in the ratio of the quantities and corresponding to the betweenstudy and total variation of the covariance between variables and , respectively. That is, the ICC is the proportion of the total variance between studies,
(2.8) 
where the second equality is obtained by and the law of total variation. This equality agrees with the usual ICC as can be interpreted as the (expected) withinstudy variation. Using the conditional variance given by the needed quantities can be found. To compute an expression for (2.8) we need to consider the fourthorder moments of the observations. From the model, known results of the inverse Wishart distribution, cf. (Cook and Forzani, 2011; von Rosen, 1988), leads to
(2.9) 
implying that
(2.10) 
Continuing with the expected conditional variance of in the denominator of (2.8),
(2.11) 
An expression of in terms of the elements of can then found by substituting (2.9) and (2.10) into (2.11) and by extension an expression for the ICC (2.8) can be obtained. We omit this tedious calculation which can be verified to yield as above. Naturally, the ICC depends only on . A straightforward plugin estimator of the ICC of some genegene interaction is then .
Though is required for the variances to exist, it is clear that for and for as should be expected.
2.6 Implementation and availability
Algorithm 1 and the different estimators are implemented in the statistical programming language R (R Core Team, 2012) with core functions in C++ using packages Rcpp and RcppArmadillo (Eddelbuettel and François, 2011; François, Eddelbuettel and Bates, 2012). They are incorporated in the opensource Rpackage correlateR freely available for forking and editing (Bilgrau, 2014a). We refer to the information here for further details and installation instructions. This document was prepared with knitr (Xie, 2013) and LaTeX. To reproduce this document see Acknowledgments or http://github.com/AEBilgrau/RCM.
3 DLBCL metaanalysis
Diffuse large Bcell lymphoma (DLBCL) is an aggressive cancer subtype accounting for of all nonHodgkin’s lymphomas (NHL) which itself constitutes about of all lymphomas (International Lymphoma Study Group, 1997).
3.1 Data and preprocessing
A large amount of DLBCL gene expression datasets are now available online at the NCBI (National Center for Biotechnology Information) Gene Expression Omnibus (GEO) website. Ten largescale DLBCL gene expression studies were downloaded and preprocessed using custom brainarray chip definition files (CDF) (Dai et al., 2005) and RMAnormalized using the Rpackage affy (Gautier et al., 2004). The corresponding GEOaccession numbers are GSE12195, GSE22895, GSE31312, GSE10846, GSE34171, GSE22470, GSE4475, and GSE19246 based on various microarray platforms. The downloaded data together with a dataset from our own laboratory (GSE56315, Dybkær et al. (2015)) yields a total of 2046 samples with study sizes in the range 78469. The summarization using brainarray CDFs to Ensembl gene identifiers facilitates crossplatform integration.
After RMA normalization and summarization, the data was brought to a common scale by quantile normalizing all data to the common cumulative distribution function of all arrays. Lastly, the datasets were reduced to 11573 common genes represented in all studies and array platforms.
3.2 Analysis
A coexpression network (or weighted correlation network) analysis integrating all datasets was carried out. For each dataset the scatter matrix of the top 300 most variable genes (as measured by the pooled variance across all studies) was computed as the sufficient statistics along with the number of samples. Hence, we investigate 45,150 pairwise interactions.
The parameters of the RCM were estimated using the EM algorithm and yielded the matrix and . The RCM was fitted using three different initial sets of parameters which all converged to the same parameter estimates. Loglikelihood traces, iterations used, and computation times are seen in Figure 1. From the parameter estimate, the common expected covariance was computed and subsequently scaled to the corresponding correlation matrix .
The estimated yields a surprisingly low estimated ICC of . Hence by the RCM, only of the variability of the genegene covariances is betweenstudies on average. This low ICC might suggest a high interstudy homogeneity of covariances meaning that the covariances reproduce across studies. However, the selection of only the most varying (across studies) genes is an obvious contribution to the low ICC. By selection, we have high interstudy variability which drives the ICC down. In any case, the low ICC might still suggest high reproducibility of the covariances between studies of the selected genes.
Next, we outline one of many possible downstream analyses of the estimated covariance. For simplicity we employed a standard correlation network analysis to the estimated common correlation matrix across all studies. To identify clusters with high internal correlation, we used agglomerative hierarchical clustering with Wardlinkage and distance measure defined as 1 minus the absolute value of the correlation. The tree was arbitrarily pruned at a height which produced 5 modules named by colors. Figure 2 shows these results. A heatmap, the hierarchical tree, and the identified modules are seen at the top left. The topright shows a graphical representation of the matrix which better illustrates the clusters and their relations. The bottom plot shows the graph radially laid out using hierarchical edge bundling (Holten, 2006) where the edges are guided by the hierarchical tree. Table 1 shows the top genes within each module. As seen e.g. in the olivegreen module, genes from the same gene family cluster together.
To further investigate the low ICC, we refitted the RCM on the subset of the 50 genes in the orchid module. This yielded an
. Subsequently, the model was repeatedly fitted on 50 randomly chosen genes 500 times to gauge the size of the ICC without the selection bias. This resulted in a mean ICC (lower quartile, upper quartile) of
suggesting that there is a high study homogeneity under randomly selected genes.The test for the nullhypothesis of no study heterogeneity, , is clearly rejected with a value of 0.002. The mean (sd) of the fitted on 500 permuted datasets was compared to in the observed dataset.
Next, the modules were screened for biological relevance using GO (Gene Ontology) enrichment analysis. The upper right of Figure 2 shows suggested functions of the modules primarily based on the GO analysis. Acknowledgments shows the significant GOterms at significance level 0.01 for each module in which the most GOterms appear highly relevant to the pathology of DLBCL.
Gray  Olivegreen  Orchid  Skyblue  Coral 

n = 159  n = 50  n = 50  n = 31  n = 10 
RGS13  S100A8  CXCL13  COL3A1  KRT6A 
BCL2A1  C1QB  CCL19  COL1A2  KRT14 
MMP1  CCL8  CHI3L1  COL5A2  KRT13 
GMDS  VSIG4  CLU  MXRA5  SPRR3 
FEZ1  CXCL9  CCL21  SULF1  SPRR1B 
HLADQA1  C1QA  PTGDS  POSTN  SPRR1A 
IGHM  CXCL10  CD3D  THBS2  S100A2 
CR2  GBP1  PLAC8  MMP2  KRT5 
CD83  CXCL11  CD2  COL6A3  DSP 
AICDA  GZMB  CXCL14  VCAN  SFN 
HLADQB1  IDO1  TRAT1  LUM  
UGT2B17  MT1G  TRBC2  SPP1  
BIK  GZMA  ADAMDEC1  COL5A1  
MS4A1  GZMK  CSTA  PLOD2  
GRHPR  ALDH1A1  ITK  COL15A1  
CYB5R2  S100A9  IL7R  DCN  
RPS4Y1  FCER1G  CHIT1  CTSK  
ADA  PSTPIP2  GIMAP4  COL11A1  
DMD  LILRB2  ENPP2  COL1A1  
ACTG2  GZMH  LGALS2  FAP 
We checked if the identified modules were prognostic for overall survival (OS) in the CHOP and RCHOPtreated cohort datasets of GSE10846. To do this, the eigengene (Horvath, 2011) for each module was computed and a multiple Cox proportional hazards model for OS was fitted with the module eigengenes as covariates. The module eigengene is the first principal component of the expression matrix of the module which thus can be represented by a linear combination of the module genes. For the prognostic interesting orchid module, the KaplanMeier estimates were computed for groups arising when dichotomizing the values of the corresponding eigengene as above or below the median value. These results are seen in Figure 3.
From the survival analysis, the olivegreen module appeared particularly interesting since it identifies a group of patients with high value of the corresponding eigengene and poor survival, thus making them candidates for experimental therapies. The gene with highest connectivity in this module is S100A8 (MRP8; calgranulin A) and the gene S100A9 (MRP14; calgranulin B) appears as number 16 further down the connectivity list. This is interesting as compelling research has shown that the S100 family of calciumbinding proteins maintain immunosuppressive MDS cells at the tumor site (Fulmer, 2008). Notably, in mice injected with lymphoma cells, knockout of S100A9 resulted in greater tumor infiltration of Tcells and less accumulation of MDS cells than that seen in wildtype mice (Cheng et al., 2008). The knockout mice had higher rates of tumor rejection and lower tumor size than their wildtype littermates. This result indicates that knockdown of these proteins may improve the outcome of immunotherapy strategies in patients with values of the eigengene of the olivegreen module.
The orchid module also appeared interesting as it marks a gene cluster identifying DLBCL patients with significantly improved outcome. Therefore, a manual screening of the 50 genes within the orchid module was performed. The genes CHI3L1, CHIT1, and LYZ are related to chitin degradation and suggests activated immune system response and inflammation. Enzymes related to chitin degradation can possibly originate from macrophages as CHIT1 is expressed by activated macrophages. The inflammation and modulated activity of the immune system are further suggested by the genes ORM1, PLA2G2D, PLA2G7, and IL18. CHIT3L1 (also known as YKL40) has been linked to the AKT antiapoptotic signaling pathway in glioblastoma (Francescone et al., 2011) and thus high CHIT3L1 is associated with poor outcome. Some of the remaining, MMP9, PTGDS, ADAMDEC1, HSD11B1, APOC1, and CYP27B1 are involved in metalloproteinase degradation and lipid metabolism. MMP9 in particular is known to have a central role in proliferation, migration, differentiation, angiogenesis, apoptosis, and host defenses. Numerous studies have linked altered MMP expression in different human cancers with poor disease prognosis where upregulation of MMPs are associated with enhanced cancer cell invasion. ADAMDEC1 is thought to have a central role in dendrite cell functions and their interactions with germinal center Tcells.
The manual screening and GOanalysis results further corroborate that the identified modules are biologically meaningful and that the RCM provides a useful estimate of the covariance.
We finally investigated the choice of using the top 300 most variable genes and fitted the RCM with the top 1000 most variable gene. The approximate MLE algorithm followed by the EM algorithm estimated an matrix and in hours. This implies that ; a somewhat smaller value than the above. The common correlation matrix was computed from and and hierarchical clustering was applied as above. This showed that the same genes in the top 300 based analysis also largely cluster together in this clustering. Particularly, all but one gene from the olivegreen module cluster together.
4 Assessment of the estimation procedures
To assess the precision and stability of the estimation procedure we generated data from the hierarchical model (2.1) under two scenarios with studies with an equal number of observations, , and with different choices of the number of genes . In the first scenario, we set with the number of observations in each study varying in the range . In the second scenario, we set with in the range . In both scenarios, we chose the parameters and for all and for all .
We measured the precision of the estimated values against the expected covariance matrix given by (2.2). Let and be the estimates obtained using the moment, EM, or approximate MLE (defined in Appendix D) approaches as described. We benchmark the proposed estimators against the known truth. The benchmarking measure used is the following weighted sum of squared errors,
In both scenarios, the weighted sum of squared errors for each and estimator was computed for datasets. The mean SSEs (and 99% CI) of these values are seen in panels A and B of Figure 4 as function of .
In scenario 1, the EM approach used a total of 47.7 minutes of computation time for fits compared to 9.3 and 4.6 minutes for the pool and approximate MLE approach, respectively. In scenario 2, the EM, pooled, and approximate MLE approach used 939.9, 8.1, and 149.9 minutes respectively. Panel C of Figure 4 shows the computational time in a third scenario for more extreme values of .
From both panels A and B of Figure 4, we see that the EM estimation is superior with less variation to that of the approximate MLE and moment estimators. As seen from the panel C and above computational times, this superiority comes at a computational cost rapidly increasing in .
5 Concluding remarks
This article provides a basic framework for modeling a common covariance structure across multiple classes or datasets applied to identify biologically meaningful genegene interaction networks in DLBCL. The straightforward approaches of using the mean or pooled covariance matrix are seen as moment estimators of the model and the estimate using the EM algorithm is shown to be superior to these simple alternatives. While the improvements are modest, the article demonstrates a potentially advantageous way of modelling the interstudy variability by a hierarchical random covariance model. However, the virtue of such a model is not from improvement in accuracy alone. The modelbased approach allows investigating and testing both pairwise and global covariance structures correlations of genes, which we consider for future research. Another feature is the invertible estimate provided by the RCM, which allows users to explore the corresponding Gaussian graphical model. Also desirable is the explicit and interpretable quantification of the interstudy variance. If is estimated to be large, the studies exhibit a largely common covariance structure, and viceversa when is estimated to be small. We have provided the ICC for the RCM as an attempt to aid in the interpretation of . Also provided is the basic framework for testing if study heterogeneity is present. However, the proposed testing is computationally demanding and only feasible when is sufficiently small. This could e.g. be overcome by an improved and faster fitting procedures or by deriving the distribution of under the null hypothesis. Yet the latter is seemingly intractable as is a very complex function of the data. The fact that the nullhypothesis lies on the edge of parameter space also seems to constrain the feasibility of deriving such a distribution.
One might question whether the added utility of the parameter is an obvious relaxation of covariance homogeneity. For example, it is unclear how large a proportion a single extra parameter can explain of the interstudy variance. Hence, the present work should be considered a first step in the direction of explicitly modeling the interstudy variation of covariances.
As demonstrated, combining multiple studies can yield a sufficiently large total sample size that allows for estimation of large covariance matrices without the use of regularization. However, we believe this work could be further enriched by combining the method with regularized estimation. Such a generalization of the model to is extremely interesting though out of scope for this article.
The recent advances in such regularized techniques which allows for analysis of large covariance matrices has unfortunately diminished the focus on collecting an adequate number of samples. The technically possible estimates for extreme
ratios does not necessarily imply that a good estimate is achieved. For example, while nonzero entries can often be accurately recalled in graphical lasso, actual estimates of the covariances (or precisions) can still be heavily biased. Large samplesizes are still needed to achieve unbiased estimates of the covariance due to the biasvariance tradeoff. An increased focus should therefore also appointed to efficiently aggregating datasets and achieving sufficiently large sample sizes to allow for stable and unbiased estimation of covariance matrices. Likewise, results should be validated in independent experiments.
Acknowledgments
We thank Martin Raussen, Jon Johnsen, as well as Niels Richard Hansen for their assistance on some of the mathematical proofs. The helpful comments from Steffen Falgreen, Andreas S. Pedersen, and reviewers were also much appreciated. The technical assistance from Alexander Schmitz, Julie S. Bødker, AnnMaria Jensen, Louise H. Madsen, and Helle Høholt is also greatly appreciated.
Supplement A Documents for reproducibility [url]http://people.math.aau.dk/~abilgrau/RCM/SuppA/ The documents and other needed files to perform the analyses to reproduce this article. See the README file herein.
Supplement B Identified modules and GO analysis [url]http://people.math.aau.dk/~abilgrau/RCM/SuppB/ Tables of gene module memberships, auxiliary information, and the significant GOterms for each identified module.
References
 Agnelli et al. (2011) [author] Agnelli, LucaL., Forcato, MattiaM., Ferrari, FrancescoF., Tuana, GiacomoG., Todoerti, KatiaK., Walker, Brian aB. a., Morgan, Gareth JG. J., Lombardi, LuigiaL., Bicciato, SilvioS. Neri, AntoninoA. (2011). The reconstruction of transcriptional networks reveals critical genes with implications for clinical outcome of multiple myeloma. Clinical Cancer Research 17 7402–12.
 Bilgrau (2014a) [author] Bilgrau, Anders EllernA. E. (2014a). correlateR: Fast, efficient, and robust partial correlations R package version 0.1.
 Bilgrau (2014b) [author] Bilgrau, Anders EllernA. E. (2014b). HierarchicalEdgeBundles: Plotting via Hierarchical Edge Bundles R package version 0.1.
 Butte et al. (2000) [author] Butte, A JA. J., Tamayo, PP., Slonim, DD., Golub, T RT. R. Kohane, I SI. S. (2000). Discovering Functional Relationships Between RNA Expression and Chemotherapeutic Susceptibility using Relevance Networks. PNAS 97 12182–6. 10.1073/pnas.220392197
 Cheng et al. (2008) [author] Cheng, PingyanP., Corzo, Cesar AC. A., Luetteke, NoreenN., Yu, BinB., Nagaraj, SrinivasS., Bui, Marylin MM. M., Ortiz, MyrnaM., Nacken, WolfgangW., Sorg, ClemensC., Vogl, ThomasT. et al. (2008). Inhibition of dendritic cell differentiation and accumulation of myeloidderived suppressor cells in cancer is regulated by S100A9 protein. The Journal of experimental medicine 205 2235–2249.
 Choi et al. (2003) [author] Choi, J. K.J. K., Yu, U.U., Kim, S.S. Yoo, O. J.O. J. (2003). Combining Multiple Microarray Studies and Modeling Interstudy Variation. Bioinformatics 19 i84–i90. 10.1093/bioinformatics/btg1010
 Clarke et al. (2013) [author] Clarke, ColinC., Madden, Stephen FS. F., Doolan, PadraigP., Aherne, Sinead TS. T., Joyce, HelenaH., O’Driscoll, LorraineL., Gallagher, William MW. M., Hennessy, Bryan TB. T., Moriarty, MichaelM., Crown, JohnJ., Kennedy, SusanS. Clynes, MartinM. (2013). Correlating transcriptional networks to breast cancer survival: a largescale coexpression analysis. Carcinogenesis 34 2300–2308.
 Cook and Forzani (2011) [author] Cook, R. DennisR. D. Forzani, LilianaL. (2011). On the Mean and Variance of the Generalized Inverse of a Singular Wishart Matrix. Electronic Journal of Statistics 5 146–158. 10.1214/11EJS602
 Dai et al. (2005) [author] Dai, ManhongM., Wang, PinglangP., Boyd, Andrew DA. D., Kostov, GeorgiG., Athey, BrianB., Jones, Edward GE. G., Bunney, William EW. E., Myers, Richard MR. M., Speed, Terry PT. P., Akil, HudaH., Watson, Stanley JS. J. Meng, FanF. (2005). Evolving Gene/Transcript Definitions Significantly Alter the Interpretation of GeneChip Data. Nucleic Acids Research 33 e175. 10.1093/nar/gni179
 Dempster (1972) [author] Dempster, A. P.A. P. (1972). Covariance Selection. Biometrics 28 157–175.
 Dempster, Laird and Rubin (1977) [author] Dempster, A. P.A. P., Laird, N. M.N. M. Rubin, D. B.D. B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B (Statistical Methodology) 39 1–38.
 DerSimonian and Laird (1986) [author] DerSimonian, RR. Laird, NN. (1986). Metaanalysis in Clinical Trials. Controlled Clinical Trials 7 177–88.
 Dybkær et al. (2015) [author] Dybkær, KarenK., Bøgsted, MartinM., Falgreen, SteffenS., Bødker, Julie S. J. S., Kjeldsen, Malene K. M. K., Schmitz, AlexanderA., Bilgrau, Anders E. A. E., XuMonette, Zijun Y. Z. Y., Li, LingL., Bergkvist, Kim S. K. S., Laursen, Maria B. M. B., RodrigoDomingo, MariaM., Marques, Sara C. S. C., Rasmussen, Sophie B. S. B., Nyegaard, MetteM., Gaihede, MichaelM., Møller, Michael B. M. B., Samworth, Richard J. R. J., Shah, Rajen D. R. D., Johansen, PrebenP., ElGalaly, Tarec C. T. C., Young, Ken H. K. H. Johnsen, Hans E. H. E. (2015). A Diffuse Large BCell Lymphoma Classification System That Associates Normal Bcell Subset Phenotypes with Prognosis. Journal Of Clinical Oncology, In press. 10.1200/JCO.2014.57.7080
 Eddelbuettel and François (2011) [author] Eddelbuettel, DirkD. François, RomainR. (2011). Rcpp: Seamless R and C++ integration. Journal of Statistical Software 40.
 Fan and Tang (2013) [author] Fan, YingyingY. Tang, Cheng YongC. Y. (2013). Tuning parameter selection in high dimensional penalized likelihood. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 75 531–552.
 François, Eddelbuettel and Bates (2012) [author] François, RomainR., Eddelbuettel, DirkD. Bates, DougD. (2012). RcppArmadillo: Rcpp Integration for Armadillo Templated Linear Algebra Library R package version 0.3.6.1.
 Francescone et al. (2011) [author] Francescone, Ralph AR. A., Scully, SteveS., Faibish, MichaelM., Taylor, Sherry LS. L., Oh, DennisD., Moral, LuisL., Yan, WeiW., Bentley, BrookeB. Shao, RongR. (2011). Role of YKL40 in the Angiogenesis, Radioresistance, and Progression of Glioblastoma. Journal of Biological Chemistry 286 15332–15343.
 Friedman, Hastie and Tibshirani (2008) [author] Friedman, JeromeJ., Hastie, TrevorT. Tibshirani, RobertR. (2008). Sparse Inverse Covariance Estimation with the Graphical Lasso. Biostatistics 9 432–41. 10.1093/biostatistics/kxm045
 Fruchterman and Reingold (1991) [author] Fruchterman, Thomas MJT. M. Reingold, Edward ME. M. (1991). Graph Drawing by ForceDirected Placement. Software: Practice and experience 21 1129–1164.
 Fulmer (2008) [author] Fulmer, TimT. (2008). Suppressing the suppressors. 38. 10.1038/scibx.2008.914
 Gautier et al. (2004) [author] Gautier, LaurentL., Cope, LeslieL., Bolstad, Benjamin M.B. M. Irizarry, Rafael A.R. A. (2004). affy—Analysis of Affymetrix GeneChip Data at the Probe Level. Bioinformatics 20 307–315. http://dx.doi.org/10.1093/bioinformatics/btg405
 International Lymphoma Study Group (1997) [author] International Lymphoma Study Group (1997). A Clinical Evaluation of the International Lymphoma Study Group Classification of NonHodgkin’s Lymphoma. Blood 89 3909–3918.

Holten (2006)
[author] Holten, DannyD. (2006). Hierarchical Edge Bundles: Visualization of Adjacency Relations in Hierarchical Data. Visualization and Computer Graphics, IEEE Transactions on 12 741–748.
 Horvath (2011) [author] Horvath, SteveS. (2011). Weighted Network Analysis: Applications in Genomics and Systems Biology. Springer.
 Khalil (2002) [author] Khalil, Hassan K.H. K. (2002). Nonlinear Systems. Prentice Hall.
 Lauritzen (1996) [author] Lauritzen, Steffen LS. L. (1996). Graphical models. Clarendon Press.
 Lin et al. (2007) [author] Lin, FaHsuanF.H., Tsai, ShangYuehS.Y., Otazo, RicardoR., Caprihan, ArvindA., Wald, Lawrence LL. L., Belliveau, John WJ. W. Posse, StefanS. (2007). SensitivityEncoded (SENSE) Proton EchoPlanar Spectroscopic Imaging (PEPSI) in the Human Brain. Magnetic Resonance in Medicine 57 249–57. 10.1002/mrm.21119
 Margolin et al. (2006) [author] Margolin, Adam aA. a., Nemenman, IlyaI., Basso, KatiaK., Wiggins, ChrisC., Stolovitzky, GustavoG., Dalla Favera, RiccardoR. Califano, AndreaA. (2006). ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 7 Suppl 1 S7.
 Markowetz and Spang (2007) [author] Markowetz, FlorianF. Spang, RainerR. (2007). Inferring Cellular Networks  A Review. BMC Bioinformatics 8 S5. 10.1186/147121058S6S5
 Meinshausen and Bühlmann (2006) [author] Meinshausen, NN. Bühlmann, PP. (2006). High dimensional graphs and variable selection with the lasso. The Annals of Statistics.
 Petersen and Pedersen (2008) [author] Petersen, KBK. Pedersen, MSM. (2008). The Matrix Cookbook Technical University of Denmark, Technical Manual.
 Phipson and Smyth (2010) [author] Phipson, BelindaB. Smyth, Gordon KG. K. (2010). Permutation Pvalues Should Never be Zero: Calculating Exact Pvalues when Permutations are Randomly Drawn. Statistical Applications in Genetics and Molecular Biology 9.
 Shrout and Fleiss (1979) [author] Shrout, Patrick EP. E. Fleiss, Joseph LJ. L. (1979). Intraclass Correlations: Uses in Assessing Rater Reliability. Psychological Bulletin 86 420.
 R Core Team (2012) [author] R Core Team (2012). R: A Language and Environment for Statistical Computing R Foundation for Statistical Computing, Vienna, Austria ISBN 3900051070.

van Wieringen and
Peeters (2016)
[author] van Wieringen, Wessel N.W. N. Peeters, Carel F. W.C. F. W. (2016). Ridge Estimation of Inverse Covariance Matrices from HighDimensional Data. Computational Statistics and Data Analysis 103 284–303. 10.1016/j.csda.2016.05.012
 von Rosen (1988) [author] von Rosen, DD. (1988). Moments for the Inverted Wishart Distribution. Scandinavian Journal of Statistics 15 97–109.
 Xie (2013) [author] Xie, YihuiY. (2013). Dynamic Documents with R and knitr. CRC Press.
Appendix A Marginalization of the covariance
This section shows the marginalization over in (2.3). Recall the model (2.1) where denotes a dimensional multivariate Gaussian distribution with mean and positive definite (p.d.) covariance matrix with probability density function (pdf)
(A.1) 
and where denotes a dimensional inverse Wishart distribution with degrees of freedom, a p.d. scale matrix , and pdf
(A.2) 
where is p.d. and is the multivariate generalization of the gamma function given by
(A.3) 
For ease of notation we drop the subscript on , , , and . By the model assumptions,
The integrand can be recognized as a unnormalized inverse Wishart pdf of the distribution , and so the integral evaluates to the reciprocal value of the normalizing constant in that density. Thus,
(A.4) 
Using the matrix determinant lemma and , this can be further simplified to
which can help to speedup computations.
Appendix B Proofs
b.1 Nonconcavity of the loglikelihood
The likelihood function is not logconcave in general. This section analyses the (non)concavity of the loglikelihood function given in (2.3). More precisely, the following two propositions are proved.
See 1
See 2
Proof of Proposition 1.
Assume is fixed and consider only the terms involving in (2.3). We reduce to the onedimensional case where
which implies
It is straightforward to show there exists a value for , and for which . Since the second derivative is not always negative the loglikelihood is not logconcave. ∎
Proof of Proposition 2.
Consider the terms involving . Clearly, the mixed terms involving both and are loglinear in and hence logconcave. We thus restrict our attention to the remaining terms not dependent on . The sum of these terms are concave in , since
which can be seen to be concave since for all and is concave for all and . The concavity of is easily seen by the fact that where is the trigamma function. The trigamma function is a wellknown monotonically decreasing function. Hence, the likelihood is logconcave in . ∎
b.2 Existence and uniqueness of likelihood maxima
Before we state the lemmas, the proposition, and their proofs, we see that the reparameterisation of the RCM is irrelevant. Consider the loglikelihood in (2.3) assuming fixed. The loglikelihood obey
(B.1) 
Notice, that this equation also holds in the reparameterization. Here we have
Since is only dependent on data (when is fixed) we can set . Without loss of generality we can therefore consider (B.1) in the following.
See 3
Proof of Proposition 3.
We first prove existence of the maximum. Note, that we may consider as a function on a vector space by letting where is a symmetric matrix. By Lemma 1 and the continuity of , the set is bounded and closed and thus compact for any . The existence of a maximum follows from the extreme value theorem by the continuity of . A stationary point exists due to Rolle’s theorem and the differentiability of .
Next, we show the uniqueness of the maximum. Let denote the set of stationary points, which is nonempty. By Lemma 1, has a finite upper bound given by the maximum of the loglikelihood in those points. All gradient curves (that is, solution curves to ) must then converge toward exactly one of the stationary points where monotonically increases along each curve. Define for the basin of attraction
The basin of attraction is open if is a maximum (Khalil, 2002, Lemma 4.1). By Lemma 2, is always a maximum and hence all are open sets in the set of all positive definite matrices . This partitions the space into disjoint, nonempty, open sets. Since is connected, this is only possible if and thus there is only a single basin of attraction and maximum of . ∎
Lemma 1.
Proof of Lemma 1.
Assume the hypothesis of the lemma and consider the expression given in (B.1) up to the addition of a constant. The likelihood obey the following two upper bounds. First,
Secondly, let whereby (B.1) can be expressed as
Since is concave and the above sum is a convex combination, we have
Hence,
where and . Three cases now exists: 1) If , then
2) If , then
as the matrix in the second term is almost surely positive definite when and the log determinant is some constant. 3) If is bounded and the largest eigenvalue (and hence , then and is bounded away from zero. Therefore,