Precision matrix plays an important role in statistical analysis. Under Gaussian assumptions, the precision matrix is often used to study the conditional independence among the random variables. Contemporary applications usually require fast methods for estimating a very high dimensional precision matrix(Meinshausen and Bühlmann, 2006). Despite recent advances, estimation of the precision matrix remains challenging when the dimension is very large, owing to the fact that the number of parameters to be estimated is of order . For example, in the Prostate dataset we are studying in this paper, 6033 genetic activity measurements are recorded for 102 subjects. The precision matrix to be estimated is of dimension , resulting in more than 18 million parameters.
One of the well-known and popular method in precision matrix estimation is referred to as the graphical lasso (Yuan and Lin, 2007; Banerjee et al., 2008; Friedman et al., 2008). Without loss of generality, assume that
are i.i.d. observations from a p-dimensional Gaussian distribution with meanand covariance matrix . To estimation the precision matrix , the graphical lasso solves the following regularized negative log-likelihood:
where is the set of non-negative definite matrices in , is the sample covariance matrix, denotes the sum of the absolute values of and is the tuning parameter. Although (1) is constructed based on Gaussian likelihood, it is known that glasso also works for non-Gaussian data (Ravikumar et al., 2011). Many algorithms have been developed to solve the graphical lasso. Friedman et al. (2008) proposed a coordinate descent procedure and Boyd et al. (2011) provided an alternating direction method of multipliers (ADMM) algorithm for solving (1). In order to obtain faster convergence for the iterations, second order methods and proximal gradient algorithms on the dual problem are also well developed; see for example Hsieh et al. (2014), Dalal and Rajaratnam (2017), and the references therein. However, since the graphical lasso involves matrix determinant, eigen-decomposition or calculation of the determinant of a matrix is inevitable in these algorithms. Note that the computation complexity of eigen-decomposition or matrix determinant is of order . Thus, the computation time for these algorithms will scale up cubically in .
by some trace based quadratic loss functions. Using Kronecker product and matrix vectorization, our interest is to estimate,
denotes the identity matrix andis Kronecker product. Motivated by (2) and LASSO (Tibshirani, 1996), a natural way to estimate is
In matrix notation and writing , (4) can be written as
Symmetrization can then be applied to obtain a final estimator. The loss function
The loss function
is equivalent to the D-trace loss proposed by Zhang and Zou (2014), owing to the fact that naturally force the solution to be symmetric.
In the original papers by Zhang and Zou (2014) and Liu and Luo (2015), the authors have established consistency results for the estimation (6) and (8) and show that their performance are comparable to graphical lasso. As can be seen in the vectorized formulation (2) and (3), the loss functions and are quadratic in . In this note, we show that under such quadratic loss functions, explicit solutions can be obtained in each step of the ADMM algorithm. In particular, we derived explicit formulations for the inverses of and for any given , from which we are able to solve (6) and (8), or equivalently (4) and (5), with computation complexity of order . Such a rate is in some sense optimal, as the complexity for computing is also of order .
2 Main Results
For any real matrix , we shall use to denote its Frobenius norm, and
to denote its Spectral norm, i.e., the square root of the largest eigenvalue of. For the quadratic loss function or , the augmented Lagrangian is
where is the step size of ADMM. By Boyd et al. (2011), the alternating iterations are
where is an element-wise soft thresholding operator. Clearly the computation complexity will be mainly dominated by the update of , which amount to solving the following problem:
By looking at the vectorized formulations (4) and (5), we immediately have that in order to solve (9) and (10) we need to compute the inverses of and . The following proposition provides explicit expressions for the inverses.
Write the decomposition of as where and . For any , we have
where , and .
The main technique used for deriving the above proposition is the well-known Woodbury matrix identity. The derivation involves lengthy and tedious calculations and is hence omitted. Nevertheless, one can simply verify the proposition by showing for example . By Proposition 1, and the following equations:
where denotes the Hadamard product, we can obtain the following theorem:
For a given ,
the solution to the equation is unique and is given as ;
the solution to the equation is unique and is given as
Note that when is the the sample covariance, and
can be obtained from the singular value decomposition (SVD) ofwhose complexity is of order . On the other hand, the solutions obtained in Theorem 1 involves only elementary matrix operations of and matrices and thus the complexity for solving (9) and (10) can be seen to be of order .
Generally, we can specify different weights for each element and consider the estimation
where . For example,
Setting and where the diagonal elements are left out of penalization;
Using the local linear approximation (Zou and Li, 2008), we can set where is a LASSO solution and is a general penalized function such as SCAD or MCP.
The ADMM algorithm will be the same as the penalized case except the related update is replaced by a element-wise soft thresholding with different thresholding parameters.
Compared with the ADMM algorithm given in Zhang and Zou (2014), our update of only involves matrix operations of some and matrices, while matrix operations on some matrices are required in Zhang and Zou (2014); see for example Theorem 1 in Zhang and Zou (2014). Consequently, we are able to obtain the order in these updates while Zhang and Zou (2014) requires . Our algorithm thus scales much better when .
Similarly, for the graphical lasso, we can also use ADMM (Boyd et al., 2011) to implement the minimization where the loss function is The update for is obtained by solving Denote the eigenvalue decomposition of as where , we can obtain a closed form solution,
Compared with the algorithm based on quadratic loss functions, the computational complexity is dominated by the eigenvalue decomposition of matrices.
In this section, we conduct several simulations to illustrate the efficiency and estimation accuracy of our proposed methods. We consider the following two precision matrices:
where is an asymptotic sparse matrix with many small elements and is a sparse matrix where each column/row only has at most three non-zero elements. For all of our simulations, we set the sample size and generate the data from with or .
For notation convenience, we shall use the term “EQUAL” to denote our proposed Efficient admm algorithm via the QUAdratic Loss (4), and similarly, use “EQUALs” to denote the estimation based on the symmetric quadratic loss (5). An R package “EQUAL” has been developed to implement our methods. For comparison, we consider the following competitors:
CLIME (Cai et al., 2011) which is implemented by the R package “fastclime”;
glasso (Friedman et al., 2008) which is implemented by the R package “glasso”;
BigQuic (Hsieh et al., 2013) which is implemented by the R package “BigQuic”;
glasso-ADMM which solves the glasso by ADMM (Boyd et al., 2011);
SCIO (Liu and Luo, 2015) which is implemented by the R package “scio”;
D-trace (Zhang and Zou, 2014) which is implemented using the ADMM algorithm provided in the paper.
Table 1 summaries the computation time in seconds based on 10 replications where all methods are implemented in R with a PC with 3.3 GHz Intel Core i7 CPU and 16GB memory. For all the methods, we solve a solution path corresponding to 50 lambda values ranging from to . Here is the maximum absolute elements of the sample covariance matrix. Although the stopping criteria is different for each method, we can see from Table 1 the computation advantage of our methods. In particularly, our proposed algorithms are much faster than the original quadratic loss based methods “SCIO” or “D-trace” for large . In addition, we can roughly observe that the required time increases quadratically in in our proposed algorithms.
|Sparse case where|
|Asymptotic sparse case where|
The average computation time (standard deviation) of solving a solution path for the high dimensional precision matrix estimation.
The second simulation is designed to evaluate the performance of estimation accuracy. Given the true precision matrix and an estimator , we report the following four loss functions:
where loss1 is the scaled Frobenius loss, loss2 is the spectral loss, loss3 is the Stein’s loss which is related to the Gaussian likelihood and loss4 is related to the quadratic loss.
Table 2 reports the simulation results where the tuning parameter is chosen by five-fold cross-validations. We can see that the performance of all three estimators are comparable, indicating that the penalized quadratic loss estimator is also reliable for high dimensional precision matrix estimation. We also observe that the estimator based on (5) has slightly smaller estimation error than those based on (4). As shown in Table 1 , the computation for quadratic loss estimator are much faster than glasso. Moreover, to check the singularity of the estimation, we also report the minimum eigenvalue for each estimation in the final column of Table 2. We observe that all estimators are well estimated to be positive definite.
|Sparse case where|
|Asymptotic sparse case where|
Finally, we apply our proposal to the Prostate dataset which is publicly available at https://web.stanford.edu/~hastie/CASI_files/DATA/prostate.html. The data records 6033 genetic activity measurements for the control group (50 subjects) and the prostate cancer group (52 subjects). Here, the data dimension is 6033 and the sample size is 50 or 52. We estimate the precision for each group. Since our EQUAL and EQUALs give similar results, we only report the estimation for EQUALs. It took less than 20 minutes for EQUALs to obtain the solution paths while “glasso” can not produce the solution due to out of memory in R. The sparsity level of the solution paths are plotted in the upper panel of Figure 1. To compare the precision matrices between the two groups, the network graphs of the EQUALs estimators with tuning are provided in the lower panel of Figure 1.
Banerjee et al. 
O. Banerjee, L. E. Ghaoui, and A. d’Aspremont.
Model selection through sparse maximum likelihood estimation for
multivariate gaussian or binary data.
The Journal of Machine Learning Research, 9(Mar):485–516, 2008.
- Boyd et al.  S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning, 3(1):1–122, 2011.
- Cai et al.  T. Cai, W. Liu, and X. Luo. A constrained minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106(494):594–607, 2011.
- Dalal and Rajaratnam  O. Dalal and B. Rajaratnam. Sparse gaussian graphical model estimation via alternating minimization. Biometrika, 104(2):379–395, 2017.
- Friedman et al.  J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432–441, 2008.
- Hsieh et al.  C.-J. Hsieh, M. A. Sustik, I. S. Dhillon, P. K. Ravikumar, and R. Poldrack. BIG & QUIC: Sparse inverse covariance estimation for a million variables. In Advances in neural information processing systems, pages 3165–3173, 2013.
- Hsieh et al.  C.-J. Hsieh, M. A. Sustik, I. S. Dhillon, and P. Ravikumar. QUIC: quadratic approximation for sparse inverse covariance estimation. The Journal of Machine Learning Research, 15(1):2911–2947, 2014.
Liu and Luo 
W. Liu and X. Luo.
Fast and adaptive sparse precision matrix estimation in high
Journal of Multivariate Analysis, 135:153–162, 2015.
- Meinshausen and Bühlmann  N. Meinshausen and P. Bühlmann. High-dimensional graphs and variable selection with the lasso. Annals of Statistics, 34(3):1436–1462, 2006.
- Ravikumar et al.  P. Ravikumar, M. J. Wainwright, G. Raskutti, B. Yu, et al. High-dimensional covariance estimation by minimizing -penalized log-determinant divergence. Electronic Journal of Statistics, 5:935–980, 2011.
- Tibshirani  R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, pages 267–288, 1996.
- Yuan and Lin  M. Yuan and Y. Lin. Model selection and estimation in the Gaussian graphical model. Biometrika, 94(1):19–35, 2007.
- Zhang and Zou  T. Zhang and H. Zou. Sparse precision matrix estimation via lasso penalized D-trace loss. Biometrika, 101(1):103–120, 2014.
- Zou and Li  H. Zou and R. Li. One-step sparse estimates in nonconcave penalized likelihood models. Annals of Statistics, 36(4):1509, 2008.