 # High-dimensional Precision Matrix Estimation with a Known Graphical Structure

A precision matrix is the inverse of a covariance matrix. In this paper, we study the problem of estimating the precision matrix with a known graphical structure under high-dimensional settings. We propose a simple estimator of the precision matrix based on the connection between the known graphical structure and the precision matrix. We obtain the rates of convergence of the proposed estimators and derive the asymptotic normality of the proposed estimator in the high-dimensional setting when the data dimension grows with the sample size. Numerical simulations are conducted to demonstrate the performance of the proposed method. We also show that the proposed method outperforms some existing methods that do not utilize the graphical structure information.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The precision matrix is the inverse of the covariance matrix, widely used in many statistical procedures such as linear discriminant analysis, hypothesis testing for multivariate mean vectors, and confidence regions for mean vectors. In classical asymptotic settings, the number of observations

is assumed to grow to infinity and much bigger than the number of variables , which is assumed to be fixed. The inverse of sample covariance is a biased but consistent estimator of the precision matrix (Bai and Shi, 2011). However, in high-dimensional settings, where the number of observations is often smaller than the number of variables

, the sample covariance matrix is not a consistent estimator of the population covariance (Bai and Silverstein, 2010; Johnstone, 2001), and the precision matrix can not be estimated using the inverse of the sample covariance matrix due to the singularity of the sample covariance matrix. To avoid the singularity issue, the generalized inverse of the sample covariance may be used to estimate the precision matrix. However, its application in high-dimensional settings is very limited because the estimator is not unique, singular, and not sparse. These challenges make the problem of estimation of a precision matrix one of the most fundamental issues in high-dimensional statistics. Various estimators have been proposed in the literature. One often-used technique is based on regularization. The shrinkage method is one type of regularization methods. Ledoit and Wolf (2003; 2004) proposed a shrinkage estimator of covariance matrix via minimizing Frobenius norm of the distance between an underlying population covariance and a weighted average of sample covariance and identity matrix. A direct and optimal shrinkage estimator of the precision matrix is proposed by Bodnar, Gupta, and Parolya (2016). No additional assumption is required on covariance or precision matrix in shrinkage-type estimators. Another popular regularization approach assumes some sparse structure in the precision matrix. Some regularization approaches estimate the entire precision matrix simultaneously by employing the

-penalized maximum likelihood (see Banerjees et al. (2008), Yuan and Lin (2007), Friedman et al. (2008) and Lam and Fan (2009)). Other methods apply the -penalized regularization methods in a column by column fashion, such as CLIME (Constrained L1-minimization for Inverse Matrix Estimation) by Cai et al. (2011), SCIO (Sparse Column-wise Inverse Operator) by Liu and Lou (2015), or TIGER (Tuning-Insensitive Graph Estimation and Regression) by Liu and Wang (2017). However, studying statistical inference problems based on these regularized estimators is challenging due to the non-smoothness of the -penalized regularization methods and the use of tuning parameters.

The methods mentioned above are developed for a general precision matrix and do not take advantage of prior knowledge or information available. For instance, in many genetics and genomics research, some well-known pathway databases, such as KEGG, Reactome, and BioCarta, provide prior knowledge about the connections among genes or SNPs, which forms graphical structures that could be considered as prior information for any type of statistical analysis. Recently, there is a growing interest in using the known graphical structure to improve the accuracy of statistical estimation and inference. For example, Li and Li (2008) proposed a variable selection procedure incorporating the known network structure. Some other methods incorporating prior knowledge of network structure in the statistical procedure can be found in Li and Li (2010), Zhou and Song (2016), or Gao et al. (2019). The results obtained in these papers are encouraging and demonstrate that utilizing prior knowledge about the network structure does help improve the accuracy in estimation and prediction.

To incorporate the prior knowledge about network structure, we develop an estimator of the precision matrix with a known graphical structure. This type of problem did not gain much attention in the past. Under the known network structure assumption, Hastie et al. (2008) (Section 17.3.1) proposed an iterative algorithm to estimate the precision matrix and its inverse by maximizing the Gaussian likelihood function for the precision matrix with constraints. Zhou et al. (2011) also proposed another approach to solve the problem where the precision matrix is estimated by minimizing the likelihood function with a constraint on the zero patterns of the precision matrix so that it is consistent with the known graphical structure. Both approaches mentioned above are free of tuning parameters. However, they are based on the likelihood function of multivariate Gaussian distribution and need some iterative algorithms to obtain the solutions, and an explicit form of the estimator is not available. Due to the nature of these estimators, asymptotic distributions and inference are not easy to establish.

This paper proposes a new and simple method to estimate the precision matrix when its graphical structure is known. Our approach is straightforward, and an explicit form of the estimator is given. The estimator is constructed without using any likelihood function information. The proposed method can be adapted well in a non-parametric framework whenever a connection between the graphical structure and its precision matrix is available. With our simple estimator, we establish the rate of convergence and asymptotic normality of the proposed estimator. Simulation studies demonstrate that the proposed method outperforms some existing methods without utilizing the prior knowledge of network structure.

The paper is organized as follows. In Section 2, we introduce the basic notation and propose the new estimating procedure. Sections 3 studies the rate of convergence and asymptotic normality of the proposed estimator. Simulation studies are investigated in Section 4. Section 5 concludes our work. Complete details of the proofs are in the Appendix Section.

## 2 Estimators of precision matrix with a known structure

We first define some notations. For any vector , we denote the norm of the vector as , where and are positive integers, . For convenience, we denote as . The number of non-zero elements of a -dimensional vector is denoted as where . For any matrix , denotes the transpose of , , , , and where

. For two sequences of random variables

and , we denote if and if . For sequences of numbers , we denote if there exist positive constants such that . For notation simplicity, a generic constant is used, whose value may vary from line to line throughout the paper.

Define for and . Let be observed independent and identically distributed (IID) realizations of

from a multivariate normal distribution with mean vector

and covariance matrix , and is the corresponding precision matrix. The goal is to estimate with a network structure information being given. Here the network information is given by an adjacent matrix which contains zeros and ones. The adjacent matrix is a matrix which describes the connectivities among nodes such that if nodes and are connected; otherwise . For instance, the edge between nodes and of the Gaussian graphical model is absent if and only if the -th position of precision matrix is 0 (Lauritzen, 1996). Thus, the specification of network structure is equivalent to the specification of the structure of a precision matrix. We will make use of the structure of the precision matrix to estimate . Assume that is a sparse precision matrix with nonzero positions specified, and for , the total of non zero elements in the -th column is where is at the same order as , . Here represents the order of the numbers of non-zeros in every column of precision matrix. Let denote the -th column of so that . We will estimate column by column. Because the estimators of , for , are obtained in the same manor, we will study the estimator for the -th column and assume the number of non-zeros is exactly for simplicity.

Let be the vector of non-zero elements of , and then define as a known matrix with either 0’s or 1’s such that . In addition, we let be the identity matrix, is the -th column of . Let be the identity matrix, is the -th column of . Our estimator is defined through the sample covariance matrix. Let be the sample mean and . Let be the sample covariance matrix.

Let denote the -th sub-matrix of corresponding to the non-zero elements of the -th column of such that = . The inverse of is then given by . We will define the corresponding sample analogue of as follows. For , let denote the sub-vector of . Let be IID copies of , and denote them by , respectively. Note that follows a multivariate normal distribution with mean vector and covariance matrix . Let be the sample mean of , and be the sample covariance matrix corresponding the non-zero components of the -th column of .

To obtain an estimator for , we first represent using the population covariance matrix . Using the definition of , , we have, for ,

 Σwi=ΣBiwi1=ei. (1)
###### Lemma 1.

The unique solution of equation (1) is .

Proof: Multiplying on the both sides of the equation (1) yields Because is an invertible matrix,

Based on Lemma 1, a natural plug-in estimator of is , where Accordingly, is estimated by

 ^wi=Bi(BTiSX,nBi)−1BTiei=BiS−1X,nifi.

where and is a vector with one element being 1 and all the other elements being 0, and we use the fact that = . Because the proposed estimator is invariant to , without loss of generality, assume that in the rest of the paper.

Beyond the class of Gaussian random variables, the proposed estimator can also be applied to other classes of distributions whenever a connection between the precision matrix and its graphical structure is available, for instance, the class of nonparanormal variables defined in Liu et al. (2009).

## 3 Main results

In this section, we will study the rates of convergence and the asymptotic distribution of the proposed estimator. To establish consistency of the proposed estimators, we assume the following regularity conditions:

• Let

be eigenvalues of

. There exists some constant such that .

Remark 1. Condition (C1) is a mild condition commonly used in the literature. For example, Zhou et al. (2011) and Liu and Luo (2015).

The following theorem gives us the proposed estimator’s rate of convergence when we estimate nonzero positions of the precision matrix of one column. Theorem 1 shows that the estimator for each column of precision matrix is consistent whence .

###### Theorem 1.

Under condition (C1), , for .

Proof: For each submatrix of , suppose its eigenvalues are . Under condition (C1), we have , because of the eigenvalues interlacing theorem. So the conditions in Lemma 2-4 in the Appendix are satisfied and they can applied for every column estimator . Define . Because and , applying Lemma 4 in the Appendix, we have

 ∥S−1X,nifi−Σ−1ifi∥≤∥fi∥∥S−1X,ni−Σ−1i∥≤∥Σ−1i∥∥Σ−1iWX,ni∥1−∥Σ−1iWX,ni∥=OP(√s0/n),

where we applied Lemma 3 in the Appendix, which implies that , as because ). This implies that the condition in Lemma 4 always holds asymptotically. Theorem 1 is proved

The entire precision matrix estimator of is where can be obtained by for . Theorem 2 provides the rate of convergence of in norm and the maximum of the norm. The proof of Theorem 2 can be found in the Appendix.

###### Theorem 2.

Under condition (C1), we have the uniform rate of convergence of the proposed estimators is and if we have .

The above rate of convergence is the same as that established in Theorem 4.1 of Liu and Wang (2017). It is a minimax optimal rate (Liu and Luo, 2015) over precision matrices satisfying condition (C1). In the following theorem, we establish the rate of convergence of in elementwise sup-norm rate norm, which is also minimax rate optimal (Liu and Luo, 2015).

###### Theorem 3.

Under conditions (C1) and if , then .

We now establishes the asymptotic normality of the linear combinations of the proposed estimators for . The following additional condition is needed.

• There exists some constants such that .

Remark 2. Condition (C2) is commonly used in the literature. For example, conditions (C1) and (C2) are met by the class of covariance matrices as defined in page 6 of Bickle and Levina (2008).

###### Theorem 4.

Let and . Under conditions (C1) and (C2) and assume for , then

 √n√hmT(^wi1−wi1)d→N(0,1),

where for .

Proof: From Lemma 5 in the Appendix, we have

 √n√hmT(^wi1−wi1)=√n√hmT(S−1X,ni−Ωi)fi =−√n√hmTΩiWX,niΩifi−√n√hmTΩiWX,ni(S−1X,ni−Ωi)fi =I1−I2.

Suppose that . We have for . In addition, So for any . Applying Lemma 8 in the Appendix to , we have

 −√n√hmTΩiWX,niΩifi=−(√n√haTSX,nib−√n√haTΣib)d→N(0,1).

The theorem is proved if we can show that is ignorable.

Lemma 6 in the Appendix implies that . Lemma 7 in the Appendix gives us . Combining these two together gives us

 √n√h|mTΩiWX,ni(S−1X,ni−Ωi)fi|=OP(s0∥m∥√n√mTΩim).

Suppose are eigenvalues of , by eigenvalues interlacing theorem, we have for some constant , and . Notice that and So . This gives us In other words, is ignorable. The theorem is proved

## 4 Simulation Studies

In this section, we perform some numerical analysis to illustrate the proposed estimator’s finite sample performance. We conduct two sets of simulation studies. In the first set of simulations, the goal is to demonstrate the proposed estimator’s accuracy. In the second set of simulation studies, we compare the proposed method to another method proposed by Liu and Wang (2017), which does not utilize the network structure. Because the proposed method assumes some prior knowledge of the graphical structure, the method of Liu and Wang (2017) is used as a benchmark to check the performance of the proposed estimator.

We generate independent and identically distributed multivariate normal distributed -dimensional random vectors with mean vector 0 and covariance matrix , where such that for and other wise. Because we estimate the precision matrix in column by column fashion, for simplicity, we only demonstrate the accuracy of our estimator for the first column of the precision matrix. Note that the first column of has non-zero components

### 4.1 Accuracy of the Proposed Estimator

To evaluate the proposed estimator’s accuracy, we report the bias of the proposed estimators for the non-zero components (

) and standard deviations of the bias. Both relative bias and absolute bias are evaluated in the simulation. To summarize the overall performance of the estimation of the entire non-zero vector

, we define the relative bias and the absolute bias, respectively, as follows:

 RelBias=1s0s0∑k=1|^ω11,k−ω11,k||ω11,k|,andAbsBias=1s0s0∑k=1|^ω11,k−ω11,k|.

In addition to the overall performance, we also report the bias and its standard deviation of the component-wise estimators. All the simulation results reported in this section are based on 300 simulation replications. To understand the effect of sample size and data dimension, we consider three different sample sizes and 500. For each sample size , we change the data dimension according to the ratios at five different values and 10.

Table 1 reports the average and standard deviations of the RelBias and AbsBias, and the average and standard deviations of the non-zero components in with sparsity level at . Note that in this simulation, The standard deviations are included in the parentheses. We observe that the data dimension has no significant impact on the proposed estimator. As the sample size increases from 100 to 500, the relative bias and absolute bias of the proposed estimator decrease significantly. This illustrates the consistency of the proposed estimator. Similar phenomena can be observed from the component-wise estimators for , and

To illustrate the impact of the sparsity level on the proposed estimator, we increased the sparsity level from 4 to 6 in the following Table 2. In this scenario, we need to estimate 6 coefficients in the first column of the precision matrix, which is . Table 2 shows that the increasing of sparsity level has some effect on the accuracy of the proposed estimator. As the number of non-zero components increases, the accuracy of the proposed estimator drops. This is understandable due to the increasing in the number of unknown parameters. The other patterns observed in Table 2 are very similar to those in Table 1.

### 4.2 Comparison with Some Existing Estimators

This subsection compares the proposed estimator with the TIGER estimator proposed by Liu and Wang (2017). The proposed estimator assumes that the graphical structure is known and incorporates the information in formulating the estimator, but the TIGER estimator does not use any knowledge of the graphical structure. Our goal is to check the improvement of utilizing the known graphical structure to construct the precision matrix estimator.

We briefly introduce the TIGER approach to estimate the precision matrix. TIGER stands for Tuning-Insensitive Graph Estimation and Regression, proposed by Liu and Wang (2017). This estimating procedure for the precision matrix uses a column by column approach. TIGER uses the similar idea as Meinshausen and Buhlman (2006), which estimates the unknown precision matrix by regressing -th variable () on the remaining variables. However, unlike Meinshausen and Buhlman (2006) approach, each subproblem is solved by using the square root-LASSO (SQRT-LASSO). To be more explicit, let be the normalized data matrix where is the sample size,

is the dimension, and the sample variance of each variable is one. Let

is the -th row of , is the -th column of , corresponds to the -th column of with the -th row removed, is the submatrix of with the -th column removed, and is the submatrix of with the -th row and the -th column removed. Then for , -th column of the precision matrix is estimated by and , where , , and is the diagonal matrix with elements are the main diagonal elements of its sample covariance matrix. As we can see here, when estimating parameter

, TIGER uses SQRT-LASSO instead of LASSO penalty as in Meinshausen and Buhlman (2006). When the data matrix is not normalized, we can refer to Liu and Wang(2017) for more details. The TIGER method’s main advantage compared to the other methods is that it is asymptotic free of tuning and does not depend on any unknown parameters due to the SQRT-LASSO property (Belloni et al., 2012). Nevertheless, in reality, the sample size is always finite. So an optimal estimation for precision matrix for a finite sample is still chosen by comparing loss functions at different tuning values via some selection tuning values approaches, such as K fold cross validation or StARS (Stability Approach to Regularization Selection). In the below simulation, we used the “flare” R package created by the authors to obtain the optimal precision matrix. We applied the K fold cross-validation approach to obtain the optimal precision matrix under the default setting in the package. In this default setting tuning parameters is a sequence of 5 values uniformly distributed in the interval

. The process of choosing the optimal precision matrix is processed as follows. At each tuning value , the precision matrix is estimated from the training sets, the likelihood loss function is calculated based on the leave out data fold. Then the optimal precision matrix is corresponding to the which yields the smallest value for average absolute loss. Now we are ready to compare the performance of our proposed method and the benchmark method TIGER.

Data are generated according to the same setting described in Subsection 4.1. Similar to the previous subsection, we consider two sample sizes and 200, five dimension and sample size ratios: and , and two sparsity levels and . The comparison is made through comparing the relative bias and absolute bias defined in Subsection 4.1.

Table 3 summarizes the performance of the proposed estimator and the TIGER estimator. To make an informative comparison, the relative bias and absolute bias are only compared for the non-zero components in . Based on Table 3, we observe that the relative bias and absolute bias of the TIGER approach are much higher than the proposed method. This observation implies that the proposed estimator, using the graphical structure information, outperforms the TIGER approach. This result is something expected but confirmed with simulation studies. The results in Table 3 confirm that utilizing the graphical structure in estimating the precision matrix improves the estimator’s accuracy.

## 5 Conclusion

We introduce a simple estimator for precision matrix when its graphical structure is known under the high-dimensional framework. The proposed estimator comes in an explicit form and requires no additional iterative algorithm. Also, the estimating procedure is constructed by a simple argument without using any likelihood function, which makes the method adapt to a non-parametric framework. The rates of convergence are obtained under the norm and element-wise maximum norm where both rates are mini-max optimal. We also derive the asymptotic normality for a linear combination of the estimators for precision matrix columns. Asymptotic normality property makes statistical inference available with the proposed estimator. Simulation studies demonstrate that incorporating network structure in estimating precision matrix significantly improve the accuracy of the estimator.

## Acknowledgements

This research was partially supported by the NSF grant DMS-1462156.

## References

•  Adamczak, R., Litvak, A. E., Pajor, A., and Tomczak-Jaegermann, N. (2011). Sharp Bounds on Rate of Convergence of the Empirical Covariance Matrix. ScienceDirect.
•  Bai, Z. and Silverstein, J. W. (2010). Spectral Analysis of Large Dimensional Random Matrices. Springer.
•  Bai, J. and Shi, S. (2011). Estimating High Dimensional Covariance Matrices and its Applications. Annals of Economics and Finance.
•  Banerjee, O., Ghaoui, E. L., and d’Aspremont, A. (2008). Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary data.

The Journal of Machine Learning Research

.
•  Bickel, P. J. and Levina, E. (2008). Regularized Estimation of Large Covariance Matrices. The Annals of Statistics.
•  Cai, T., Liu, W., and Lou, X. (2011). A Constrained Minimization Approach to Sparse Precision Matrix Estimation. Journal of American Statistics Association.
•  Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical LASSO. Biostatistics.
•  Gao, B., Liu, X., Li, H., and Cui, Y. (2019). Integrative Analysis of Genetical Genomics Data Incorporating Network Structures. Biometric Methodology.
•  Golub, G. H. and Van Loan, C. F. (2015). Matrix Computations Fourth Edition. Hindustan Book Agency.
•  Greif, V. (2006). Minimizing the Condition Number for Small Rank Modification. SIAM J. Matrix Analysis Applied.
•  Hastie, T., Tibshirani, R., and Friedman, J. (2008). The elements of Statistical Learning (2nd ed). Springer.
•  Juárez-Ruiz, E., Cortés-Maldonado, R., and Pérez-Rodríguez, F. (2016). Relationship between the Inverses of a Matrix and a Submatrix. Computación y Sistemas.
• 

Jonhstone, I. M. (2001). On The Distribution of the Largest Eigenvalue in Principal Components Analysis.

The Annals of Statistics.
•  Lam, C. and Fan, J. (2009). Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation. The Annals of Statistics.
•  Lauritzen, S. L. (1996). Graphical Model. Oxford University Press.
•  Ledoit, O. and Wolf, M. (2003). Improved Estimation of the Covariance Matrix of Stock Returns with an Application to Portfolio Selection. Journal of Empirical Finance.
•  Ledoit, O. and Wolf, M. (2004). A Well-conditioned Estimator for Large-dimensional Covariance Matrices.

Journal of Multivariate Analysis

.
•  Li, C. and Li, H. (2008). Network-constrained Regularization and Variable Selection for Analysis of Genomic Data. Bioinformatics.
•  Li, X., Zhao, T., Wang, L., Yuan, X., and Liu, H. (2018). Package “flare”. CRAN-Package flare - The R Project for Statistical Computing.
• 

Li, C. and Li, H. (2010). Variable Selection and Regression Analysis for graph structured covariates with an application to genomics.

The Annals of Applied of Statistics.
•  Liu, W. and Luo, X. (2015). Fast and Adaptive Sparse Precision Matrix Estimation in High Dimension. Journal of Multivariate Analysis.
•  Liu, H., Laffery, J., and Wasserman, L. (2009). The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs. Journal of Machine Learning Research.
•  Liu, H. and Wang, L. (2017). TIGER: a Tuning-insensitive Approach for Optimally Estimating Gaussian Graphical Models. Electronic Journal of Statistics.
•  Meinshausen, N. and Bühlmann, P. (2006). High Dimensional Graphs and Variables Selection with the LASSO. The Annals of Statistics.
•  Ullah, A. (2004). Finite Sample Econometrics. Oxford University Press.
• 

Vershynyn, R. (2011). Introduction of Non-asymptotic Analysis of Random Matrices.

Arxiv.
•  Yuan, M. and Lin, Y. (2017). Model Selection and Estimation Gaussian Graphical Model. Biometrika.
•  Zhou, S., Rütimann, P., Xu, M., and Bühlmann, P. (2011). High-dimensional Covariance Estimation Based On Gaussian Graphical Models. Journal of Machine Learning Research.
•  Zhou, Y. and Song, P.X.K. (2016). Regression Analysis of Networked Data. Biometrika.