 # A Stepwise Approach for High-Dimensional Gaussian Graphical Models

We present a stepwise approach to estimate high dimensional Gaussian graphical models. We exploit the relation between the partial correlation coefficients and the distribution of the prediction errors, and parametrize the model in terms of the Pearson correlation coefficients between the prediction errors of the nodes' best linear predictors. We propose a novel stepwise algorithm for detecting pairs of conditionally dependent variables. We show that the proposed algorithm outperforms existing methods such as the graphical lasso and CLIME in simulation studies and real life applications. In our comparison we report different performance measures that look at different desirable features of the recovered graph and consider several model settings.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

High-dimensional Gaussian graphical models (GGM) are widely used in practice to represent the linear dependency between variables. The underlying idea in GGM is to measure linear dependencies by estimating partial correlations to infer whether there is an association between a given pair of variables, conditionally on the remaining ones. Moreover, there is a close relation between the nonzero partial correlation coefficients and the nonzero entries in the inverse of the covariance matrix. Covariance selection procedures take advantage of this fact to estimate the GGM conditional dependence structure given a sample (Dempster, 1972; Lauritzen, 1996; Edwards, 2000).

When the dimension is larger than the number of observations, the sample covariance matrix is not invertible and the maximum likehood estimate (MLE) of does not exist. When , but close to , is invertible but ill-conditioned, increasing the estimation error (Ledoit and Wolf, 2004). To deal with this problem, several covariance selection procedures have been proposed based on the assumption that the inverse of the covariance matrix, , called precision matrix, is sparse.

We present an approach to perform covariance selection in a high dimensional GGM based on a forward-backward algorithm called graphical stepwise (GS). Our procedure takes advantage of the relation between the partial correlation and the Pearson correlation coefficient of the residuals.

Existing methods to estimate the GGM can be classified in three classes: nodewise regression methods, maximum likelihood methods and limited order partial correlations methods. The nodewise regression method was proposed by

Meinshausen and Bühlmann (2006)

. This method estimates a lasso regression for each node in the graph. See for example

Peng et al. (2009), Yuan (2010), Liu and Wang (2012), Zhou et al. (2011) and Ren et al. (2015). Penalized likelihood methods include Yuan and Lin (2007), Banerjee et al. (2008), Friedman et al. (2008), Johnson et al. (2011) and Ravikumar et al. (2011) among others. Cai et al. (2011) propose an estimator called CLIME that estimates precision matrices by solving the dual of an penalized maximum likelihood problem. Limited order partial correlation procedures use lower order partial correlations to test for conditional independence relations. See Spirtes et al. (2000), Kalisch and Bühlmann (2007), Rütimann et al. (2009), Liang et al. (2015) and Huang et al. (2016).

The rest of the article is organized as follows. Section 2 introduces the stepwise approach along with some notation. Section 3 gives simulations results and a real data example. Section 4 presents some concluding remarks. The Appendix shows a detailed description of the crossvalidation procedure used to determine the required parameters in our stepwise algorithm and gives some additional results from our simulation study.

## 2 Stepwise Approach to Covariance Selection

### 2.1 Definitions and Notation

In this section we review some definitions and technical concepts needed later on. Let be a graph where is the set of nodes or vertices and is the set of edges. For simplicity we assume that . We assume that the graph is undirected, that is, if and only if . Two nodes and are called connected, adjacent or neighbors if .

A graphical model (GM) is a graph such that indexes a set of variables and is defined by:

 (i,j)∉E if and only if Xi {\rotatebox[origin={c}]{90.0}{\resizebox{9.675pt}{7.095pt}{⊨}}} Xj∣XV∖{i,j}. (2.1)

Here denotes conditional independence.

Given a node , its neighborhood is defined as

 Ai={l∈V∖{i}:(i,l)∈E}. (2.2)

Notice that gives the nodes directly connected with and therefore a GM can be effectively described by giving the system of neighborhoods .

We further assume that , where is a positive-definite covariance matrix. In this case the graph is called a Gaussian graphical model (GGM). The matrix is called precision matrix.

There exists an extensive literature on GM and GGM. For a detailed treatment of the theory see for instance Lauritzen (1996), Edwards (2000), and Bühlmann and Van De Geer (2011).

### 2.2 Conditional dependence in a GGM

In a GGM the set of edges

represents the conditional dependence structure of the vector

. To represent this dependence structure as a statistical model it is convenient to find a parametrization for .

In this subsection we introduce a convenient parametrization of

using well known results from classical multivariate analysis. For an exhaustive treatment of these results see, for instance,

Anderson (2003), Cramér (1999), Lauritzen (1996) and Eaton (2007).

Given a subset of , denotes the vector of variables with subscripts in in increasing order. For a given pair of nodes , set , and . Note that

has multivariate normal distribution with mean

and covariance matrix

 (Σ11Σ12Σ21Σ22) (2.3)

such that has dimension , has dimension and so on. The matrix in (2.3) is a partition of a permutation of the original covariance matrix , and will be also denoted by , after a small abuse of notation.

Moreover, we set

 Ω=(Σ11Σ12Σ21Σ22)−1=(Ω11Ω12Ω21Ω22).

Then, by (B.2) of Lauritzen (1996), the blocks can be written explicitly in terms of and . In particular

where

 Ω11=(ωiiωilωliωll)

is the submatrix of (with rows and and columns and ). Hence,

 \sc cov(X1|X2) = Σ11−Σ12Σ−122Σ21 = Ω−111 =

and, in consequence, the partial correlation between and can be expressed as

 \sc corr(Xi,Xl|XV∖{i,l})=−ωil√ωiiωll. (2.5)

This gives the standard parametrization of in terms of the support of the precision matrix

 \sc supp(Ω)={(i,l)∈V2:i≠l,ωi,l≠0}. (2.6)

We now introduce another parametrization of , which we need to define and implement our proposed method. We consider the regression error for the regression of on ,

 ε=X1−ˆX1=X1−β⊤X2

and let and denote the entries of (i.e. ). The regression error is independent of and has normal distribution with mean and covariance matrix with elements denoted by

 Ψ11 =(ψiiψilψliψll). (2.7)

A straightforward calculation shows that

 Ψ11 =\sc cov(X1)+\sc cov(ˆX1)−2\sc cov(X1,ˆX1) =Σ11+Σ12Σ−122Σ22Σ−122Σ21−2Σ12Σ−122Σ21 =Σ11−Σ12Σ−122Σ21=Ω−111.

See Cramér (1999, Section 23.4).

Therefore, by this equality, (2.2) and (2.5), the partial correlation coefficient and the conditional correlation are equal

Summarizing, the problem of determining the conditional dependence structure in a GGM (represented by ) is equivalent to finding the pairs of nodes of that belong to the set

 {(i,l)∈V2:i≠l,ψi,l≠0} (2.8)

which is equal to the support of the precision matrix, , defined by (2.6).

###### Remark 1.

As noticed above, under normality, partial and conditional correlation are the same. However, in general they are different concepts (Lawrance, 1976).

###### Remark 2.

Let be the regression coefficient of in the regression of versus and, similarly let be the regression coefficient of in the regression of versus . Then it follows that . This allows for another popular parametrization for . Moreover, let be the error term in the regression of the variable on the remaining ones. Then by Lemma 1 in Peng et al. (2009) we have that and .

### 2.3 The Stepwise Algorithm

Conditionally on its neighbors, is independent of all the other variables. Formally, for all ,

 if l∉Ai and l≠i then Xi {\rotatebox[origin={c}]{90.0}{\resizebox{9.675pt}{7.095pt}{⊨}}} Xl|XAi. (2.9)

Therefore, given a system of neighborhoods and (and so ), the partial correlation between and can be obtained by the following procedure: (i) regress on and compute the regression residual ; regress on and compute the regression residual ; (ii) calculate the Pearson correlation between and

This reasoning motivates the graphical stepwise algorithm (GSA). It begins with the family of empty neighborhoods, for each . There are two basic steps, the forward and the backward steps. In the forward step, the algorithm adds a new edge if the largest absolute empirical partial correlation between the variables is above the given threshold . In the backward step the algorithm deletes an edge if the smallestt absolute empirical partial correlation between the variables is below the given threshold . A step by step description of GSA is as follows:

Graphical Stepwise Algorithm

• Input: the (centered) data and the forward and backward thresholds  and

• Initialization. : set .

• Iteration Step. Given  we compute  as follows.

1. Forward. For each  do the following.

For each  calculate the partial correlations as follows.

1. Regress the variable on the variables with subscript in the set  and compute the regression residuals

2. Regress the variables on the variables with subscript in the set  and compute the regression residuals

3. Obtain the partial correlation by calculating the Pearson correlation between and

If

 maxl∉ˆAkj,j∈V∣∣fkjl∣∣=∣∣fkj0l0∣∣≥αf

set  for

If

 max∣∣fkjl∣∣=∣∣fkj0l0∣∣<αf, % stop.
2. Backward. For each  do the following.

For each  calculate the partial correlation as follows.

1. Regress the variables on the variables with subscript in the set  and compute the regression residuals

2. Regress the variable on the variables with subscript in the set  and compute the regression residuals

3. Compute the partial correlation by calculating the Pearson correlation between and

If

 minl∈ˆAkj,j∈V∣∣bkjl∣∣=∣∣bkj0l0∣∣≤αb

set .

• Output

1. A collection of estimated neighborhoods , .

2. The set of estimated edges .

3. An estimate of , with defined as follow: in the case for where is the vector of the prediction errors in the regression of the variable on In the case  we must distinguish two cases, if then otherwise (see Remark 2).

### 2.4 Thresholds selection by cross-validation

Let be the matrix with rows , , corresponding to observations. We randomly partition the dataset into disjoint subsets of approximately equal sizes, the subset being of size and . For every , let be the validation subset, and its complement , the training subset. For every and for every pair of threshold parameters let be the estimated neighborhoods given by GSA using the training subset. For every let be the estimated coefficient of the regression of the variable on the neighborhood .

Consider now the validation subset. So, for every , using , we obtain the vector of predicted values . If we predict each observation of by the sample mean of the observations in the dataset of this variable.

Then, we define the –fold cross–validation function as

 CV(αf,αb)=1nK∑t=1pj∑j=1∥∥∥X(t)j−ˆX(t)j(αf,αb)∥∥∥2

where the L2-norm or euclidean distance in . Hence the –fold cross–validation forward–backward thresholds , is

 (ˆαf,ˆαb)=:argmin(αf,αb)∈HCV(αf,αb)

where

is a grid of ordered pairs

in over which we perform the search. For a detail description see the Appendix.

### 2.5 Example

To illustrate the algorithm we consider the GGM with 16 edges given in the first panel of Figure 1. We draw independent observations from this model (see the next section for details). The values for the threshold parameters and are determined by -fold cross-validation. The figure also displays the selected pairs of edges at each step in a sequence of successive updates of , for and the final step , showing that the estimated graph is identical to the true graph.

## 3 Numerical results and real data example

We conducted extensive Monte Carlo simulations to investigate the performance of GS. In this section we report some results from this study and a numerical experiment using real data.

### 3.1 Monte Carlo simulation study

Simulated Models

We consider three dimension values and three different models for :

• Model 1.Autoregressive model of orden , denoted . In this case for .

• Model 2. Nearest neighbors model of order 2, denoted . For each node we randomly select two neighbors and choose a pair of symmetric entries of using the NeighborOmega function of the R package Tlasso.

• Model 3. Block diagonal matrix model with blocks of size , denoted BG. For and , we use and blocks, respectively. Each block, of size , has diagonal elements equal to and off-diagonal elements equal to .

For each and each model we generate random samples of size . These graph models are widely used in the genetic literature to model gene expression data. See for example Lee and Liu (2015) and Lee and Ghi (2006). Figure 2 displays graphs from Models 1-3 with nodes.

Methods

We compare the performance of GS with Graphical lasso (Glasso) and Constrained -minimization for inverse matrix estimation (CLIME) proposed by Friedman et al. (2008) and Cai et al. (2011) respectively. Therefore, the methods compared in our simulation study are:

• The proposed method GS with the forward and backward thresholds, , estimated by -fold crossvalidation on a grid of values in , as described in Subsection 2.4. The computing algorithm is available by request.

• The Glasso estimate obtained by solving the penalized-likelihood problem:

 minΩ≻0 (−log{det[Ω]}+tr{ΩX⊤X}+λ∥Ω∥1). (3.1)

In our simulations and examples we use the R-package CVglasso with the tuning parameter selected by fold crossvalidation (the package default).

• The CLIME estimate obtained by symmetrization of the solution of

 min{∥Ω∥1 subject to |SΩ−I|∞≤λ}, (3.2)

where is the sample covariance,

is the identity matrix,

is the elementwise norm, and is a tuning parameter. For computations, we use the R-package clime with the tuning parameter selected by fold crossvalidation (the package default).

To evaluate the ability of the methods for finding the pairs of edges, for each replicate, we compute the Matthews correlation coefficient (Matthews, 1975)

 MCC=TP×TN−FP×FN√(TP+FP)(TP+FN)(TN+FP)(TN+%FN), (3.3)

the and the , where TP, TN, FP and FN are, in this order, the number of true positives, true negatives, false positives and false negatives, regarding the identification of the nonzero off-diagonal elements of . Larger values of MCC, Sensitivity and Specificity indicate a better performance (Fan et al., 2009; Baldi et al., 2000).

For every replicate, the performance of as an estimate for is measured by (where

denotes the Frobenius norm) and by the normalized Kullback-Leibler divergence defined by

where

 DKL=12(tr{ˆΩΩ−1}−log{det[ˆΩΩ−1]}−p)

is the the Kullback-Leibler divergence between and .

Results

Table 1 shows the MCC performance for the three methods under Models 1-3. GS clearly outperforms the other two methods while CLIME just slightly outperforms Glasso. Cai et al. (2011) pointed out that a procedure yielding a more sparse is preferable because this facilitates interpretation of the data. The sensitivity and specificity results, reported in Table 4 in Appendix, show that in general GS is more sparse than the CLIME and Glasso, yielding fewer false positives (more specificity) but a few more false negatives (less sensitivity). Table 2 shows that under models and the three methods achieve fairly similar performances for estimating . However, under model BG, GS clearly outperforms the other two.

Figure 3 display the heat-maps of the number of non-zero links identified in the replications under model . Notice that among the three compared methods, the GS sparsity patterns best match those of the true model. Figures 4 and 5 in the Appendix lead to similar conclusions for models and BG.

### 3.2 Analysis of Breast Cancer Data

In preoperative chemoterapy, the complete eradication of all invasive cancer cells is referred to as pathological complete response, abbreviated as pCR. It is known in medicine that pCR is associated with the long-term cancer-free survival of a patient. Gene expression profiling (GEP) – the measurement of the activity (expression level) of genes in a patient – could in principle be a useful predictor for the patient’s pCR.

Using normalized gene expression data of patients in stages I-III of breast cancer, Hess et al. (2006) aim to identify patients that may achieve pCR under sequential anthracycline paclitaxel preoperative chemotherapy. When a patient does not achieve pCR state, he is classified in the group of residual disease (RD), indicating that cancer still remains. Their data consist of 22283 gene expression levels for 133 patients, with 34 pCR and 99 RD. Following Fan et al. (2009) and Cai et al. (2011) we randomly split the data into a training set and a testing set. The testing set is formed by randomly selecting 5 pCR patients and 16 RD patients (roughly

of the subjects) and the remaining patients form the training set. From the training set, a two sample t-test is performed to select the 50 most significant genes. The data is then standardized using the standard deviation estimated from the training set.

We apply a linear discriminant analysis (LDA) to predict whether a patient may achieve pathological complete response (pCR), based on the estimated inverse covariance matrix of the gene expression levels. We label with the pCR group and the RD group and assume that data are normally distributed, with common covariance matrix and different means . From the training set, we obtain , and for the test data compute the linear discriminant score as follows

 δr(x)=x⊤ˆΩˆμr−12μ⊤rˆΩμr+logˆπr%fori=1,…,n, (3.4)

where is the proportion of group subjects in the training set. The classification rule is

 ˆr(x)=argmaxδr(x)forr=1,2. (3.5)

For every method we use 5-fold cross validation on the training data to select the tuning constants. We repeat this scheme 100 times.

Table 3

displays the means and standard errors (in brackets) of Sensitivity, Specificity, MCC and Number of selected Edges using

over the 100 replications. Considering the MCC, GS is slightly better than CLIME and CLIME than Glasso. While the three methods give similar performance considering the Specificity, GS and CLIME improve over Glasso in terms of Sensitivity.

## 4 Concluding remarks

This paper introduces a stepwise procedure, called GS, to perform covariance selection in high dimensional Gaussian graphical models. Our method uses a different parametrization of the Gaussian graphical model based on Pearson correlations between the best-linear-predictors prediction errors. The GS algorithm begins with a family of empty neighborhoods and using basic steps, forward and backward, adds or delete edges until appropriate thresholds for each step are reached. These thresholds are automatically determined by cross–validation.

GS is compared with Glasso and CLIME under different Gaussian graphical models (, and BG) and using different performance measures regarding network recovery and sparse estimation of the precision matrix . GS is shown to have good support recovery performance and to produce simpler models than the other two methods (i.e. GS is a parsimonious estimation procedure).

We use GS for the analysis of breast cancer data and show that this method may be a useful tool for applications in medicine and other fields.

## Acknowledgements

The authors thanks the generous support of NSERC, Canada, the Institute of Financial Big Data, University Carlos III of Madrid and the CSIC, Spain.

## Appendix A Appendix

### a.1 Selection of the thresholds parameters by cross-validation

Let be the matrix with rows , , corresponding to observations. For each , let denote the jth–column of the matrix .

We randomly partition the dataset into disjoint subsets of approximately equal size, the subset being of size and . For every , let be the validation subset, and its complement , the training subset.

For every and threshold parameters let be the estimated neighborhoods given by GSA using the training subset with . Consider for every node the estimated neighborhood and let be the estimated coefficient of the regression of on , represented in (A.2) (red colour).

Consider the validation subset with , and for every let and define the vector of predicted values

 ˆX(t)j(αf,αb)=XˆA(t)jˆβ(t)A(t)j,

where is the matrix with rows , represented in (A.2) (in blue colour). If the neighborhood we define

 ˆX(t)j(αf,αb)=(¯x(t)j,…,¯x(t)j)⊤

where is the mean of the sample of observations .

We define the –fold cross–validation function as

 CV(αf,αb)=1nK∑t=1p∑j=1∥∥∥X(t)j−ˆX(t)j(αf,αb)∥∥∥2

where the L2-norm or euclidean distance in . Hence the –fold cross–validation forward–backward thresholds , is

 (ˆαf,ˆαb)=:argmin(αf,αb)∈HCV(αf,αb) (A.1)

where is a grid of ordered pairs in over which we perform the search.

 (A.2)
###### Remark 3.

Matrix (A.2) represents, for every node the comparison between estimated and predicted values for cross-validation. is computed using the observations and the matrix with rows , in the training subset (red colour). Based on the validation set is computed using and compared with (in blue color).

## References

• Anderson (2003) Anderson, T. (2003). An Introduction to Multivariate Statistical Analysis. John Wiley.
• Baldi et al. (2000) Baldi, P., S. Brunak, Y. Chauvin, C. Andersen, and H. Nielsen (2000). Assessing the accuracy of prediction algorithms for classification: An overview. Bioinformatics 16(5), 412–424.
• Banerjee et al. (2008) Banerjee, O., L. El Ghaoui, and A. d’Aspremont (2008). Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data.

The Journal of Machine Learning Research

9, 485–516.
• Bühlmann and Van De Geer (2011) Bühlmann, P. and S. Van De Geer (2011).

Statistics for high-dimensional data: methods, theory and applications

.
• Cai et al. (2011) Cai, T., W. Liu, and X. Luo (2011). A constrained minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association 106(494), 594–607.
• Cramér (1999) Cramér, H. (1999). Mathematical Methods of Statistics. Princeton University Press.
• Dempster (1972) Dempster, A. P. (1972). Covariance selection. Biometrics, 157–175.
• Eaton (2007) Eaton, M. L. (2007). Multivariate Statistics : A Vector Space Approach. Institute of Mathematical Statistics.
• Edwards (2000) Edwards, D. (2000). Introduction to Graphical Modelling. Springer Science & Business Media.
• Fan et al. (2009) Fan, J., Y. Feng, and Y. Wu (2009). Network exploration via the adaptive lasso and scad penalties. The Annals of Applied Statistics 3(2), 521–541.
• Friedman et al. (2008) Friedman, J., T. Hastie, and R. Tibshirani (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441.
• Hess et al. (2006) Hess, K. R., K. Anderson, W. F. Symmans, V. Valero, N. Ibrahim, J. A. Mejia, D. Booser, R. L. Theriault, A. U. Buzdar, P. J. Dempsey, et al. (2006). Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer. Journal of Clinical Oncology 24(26), 4236–4244.
• Huang et al. (2016) Huang, S., J. Jin, and Z. Yao (2016). Partial correlation screening for estimating large precision matrices, with applications to classification. The Annals of Statistics 44(5), 2018–2057.
• Johnson et al. (2011) Johnson, C. C., A. Jalali, and P. Ravikumar (2011). High-dimensional sparse inverse covariance estimation using greedy methods. arXiv preprint arXiv:1112.6411.
• Kalisch and Bühlmann (2007) Kalisch, M. and P. Bühlmann (2007). Estimating high-dimensional directed acyclic graphs with the pc-algorithm. The Journal of Machine Learning Research 8, 613–636.
• Lauritzen (1996) Lauritzen, S. L. (1996). Graphical Models. Oxford University Press.
• Lawrance (1976) Lawrance, A. J. (1976). On conditional and partial correlation. The American Statistician 30(3), 146–149.
• Ledoit and Wolf (2004) Ledoit, O. and M. Wolf (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis 88(2), 365–411.
• Lee and Ghi (2006) Lee, H. and J. Ghi (2006). Gradient directed regularization for sparse gaussian concentration graphs, with applications to inference of genetic networks. Biostatistics 7(2), 302–317.
• Lee and Liu (2015) Lee, W. and Y. Liu (2015). Joint estimation of multiple precision matrices with common structures. Journal of Machine Learning Research 16(1), 1035−1062.
• Liang et al. (2015) Liang, F., Q. Song, and P. Qiu (2015). An equivalent measure of partial correlation coefficients for high-dimensional gaussian graphical models. Journal of the American Statistical Association 110(511), 1248–1265.
• Liu and Wang (2012) Liu, H. and L. Wang (2012). Tiger: A tuning-insensitive approach for optimally estimating gaussian graphical models. arXiv preprint arXiv:1209.2437.
• Matthews (1975) Matthews, B. (1975). Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta 405(2), 442−451.
• Meinshausen and Bühlmann (2006) Meinshausen, N. and P. Bühlmann (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics 34(3), 1436–1462.
• Peng et al. (2009) Peng, J., P. Wang, N. Zhou, and J. Zhu (2009). Partial correlation estimation by joint sparse regression models. Journal of the American Statistical Association 104(486), 735–746.
• Ravikumar et al. (2011) Ravikumar, P., M. J. Wainwright, G. Raskutti, B. Yu, et al. (2011). High-dimensional covariance estimation by minimizing -penalized log-determinant divergence. Electronic Journal of Statistics 5, 935–980.
• Ren et al. (2015) Ren, Z., T. Sun, C.-H. Zhang, H. H. Zhou, et al. (2015). Asymptotic normality and optimalities in estimation of large gaussian graphical models. The Annals of Statistics 43(3), 991–1026.
• Rütimann et al. (2009) Rütimann, P., P. Bühlmann, et al. (2009). High dimensional sparse covariance estimation via directed acyclic graphs. Electronic Journal of Statistics 3, 1133–1160.
• Spirtes et al. (2000) Spirtes, P., C. N. Glymour, and R. Scheines (2000). Causation, Prediction, and Search. MIT press.
• Yuan (2010) Yuan, M. (2010).

High dimensional inverse covariance matrix estimation via linear programming.

The Journal of Machine Learning Research 11, 2261–2286.
• Yuan and Lin (2007) Yuan, M. and Y. Lin (2007). Model selection and estimation in the gaussian graphical model. Biometrika 94(1), 19–35.
• Zhou et al. (2011) Zhou, S., P. Rütimann, M. Xu, and P. Bühlmann (2011). High-dimensional covariance estimation based on gaussian graphical models. The Journal of Machine Learning Research 12, 2975–3026.