Precision matrix plays a fundamental role in many statistical inference problems. For example, in discriminant analysis, the precision matrix needs to be estimated to compute the classification rules. In graphical models, the structure exploration of gaussian graphical model is equivalent to recover the support of the precision matrix. Moreover, the precision matrix is useful for a wide range of applications including portfolio optimization, genomics and single processing, among many others. Therefore, it is of great importance to estimate the precision matrix.
Given a data matrix , a sample of realizations from a
-dimensional Gaussian distribution with zero mean and covariance matrix. A natural way to estimate precision matrix is via maximum likelihood approach. Under the Gaussian setting, the negative log-likelihood takes the form
where is the sample covariance matrix and denotes the trace of the matrix. Then minimizing (1.1) with respect to yields the maximum likelihood estimate of the precision matrix. Due to the maximum likelihood estimate performs poorly and beyond compute in high-dimensional setting, the penalized log-likelihood functions and other constrained optimization techniques are used to gain better estimates of the matrix. These alternative methods can be coarsely sorted into two types. One is to seek the sparsity in the precision matrix to explore the structure of the Gaussian graphical model. The graphical lasso [3, 4, 5] is a popular technique among sparse approaches, which penalizes (1.1) with the norm to induce sparsity and minimizes the penalized log-likelihood
over all positive definite matrices . Here denotes the sum of the absolute values of the entries, and is a nonnegative tuning parameter. Other sparse approaches can be seen in Cai et al. , Yuan and Wang  and Liu and Luo , among others. Another approach is to construct shrinkage estimates of the matrix, which might be useful when the true matrix is non-sparse. Many results on this topic were developed to estimate the covariance matrix, which includes Ledoit and Wolf , Warton  and Deng and Tsui . Recently, van Wieringen and Peeters  and Kuismin et al.  proposed a ridge-type estimation for the precision matrix, which penalizes (1.1) with the squared Frobenius norm and minimizes the penalized log-likelihood
over all positive definite matrices . Here is a nonnegative tuning parameter, and denotes the square root of the sum of the squares of the entries. The advantage of (1.3) is that there is an analytical expression for the precision matrix estimation.
All of the aforementioned approaches do not make any distinction between sensitive data and non sensitive data. However in real-world data, a large number of datasets contain personal sensitive information. For example, datasets related to portfolio optimization may contain personal financial information, and datasets related to gene expression may contain personal health information. The availability of large datasets containing sensitive information from individuals has motivated the study of learning algorithms that guarantee the privacy of individuals who contribute to the database. A strict and widely used privacy guarantee is via the concept of differential privacy . It requires that datasets differing in only one entry induce similar distribution on the output of a algorithm, which will provides strong provable protection to individuals and prevents them from being recognized by an arbitrarily powerful adversary. Thus, it is significant to study the estimation of the precision matrix under the differential privacy framework.
Recently, Wang et al. studied the differentially private sparse precision matrix estimation problem from the model perspective. In this paper, we study the differentially private precision matrix estimation problem from the algorithm perspective. Specifically, we design differentially private algorithms for the ridge-type estimation model (1.3) and the graphical lasso problem (1.2). We first develop a differentially private algorithm for the ridge-type estimation of the precision matrix. Note that when the tuning parameter is fixed, the solution of model (1.3. Thus we provide the differential privacy by perturbing . We then develop a differentially private algorithm for the graphical lasso. Inspired by the post-processing nature of the differential privacy, if an algorithm can been decomposed into stable and unstable parts, and only at the stable parts the algorithm access the raw data. Then, satisfying the differential privacy in stable parts will make the whole algorithm satisfy the differential privacy. Note that the ADMM algorithm for the graphical lasso  consists of three steps, but only step 2 accesses the raw dataset. Moreover, this step corresponds to a ridge-type penalty log-likelihood estimation problem similar to model (1.3). Thus, we might perform a differentially private algorithm in this step and then treat other steps as a post-processing step, which will not increase the privacy risk. The theoretical guarantees on the performance of our privacy preserving precision matrix estimation algorithms are also established. Numerical studies show the utility of our privacy preserving algorithms.
The remainder of this paper is organized as follows. We introduce the differential privacy and the ADMM algorithm in Section 2. In Section 3, we develop differentially private algorithms in the context of precision matrix estimation. The theoretical guarantees on the performance of our differentially private algorithms was provided in Section 4. We then test our algorithms under simulation and real data in Section 5, and conclude in Section 6.
In this section, we introduce the background of the differential privacy, some related privacy notions, and the ADMM algorithm for the graphical lasso.
2.1 Differentially privacy
Differential privacy (DP) is a rigorous and common definition of the privacy protection of data analysis algorithm [14, 17]. Let the space of data be and datasets . We use the Hamming distance, denoted by , to measure the distance between datasets and , that is, the number of data points on which they differ. If , we call they are neighboring, meaning that we can get by changing one of the data points in .
[] A randomized algorithm is -differentially private if for all neighboring datasets and and for all measurable set the following holds
If , we say that is -differentially private.
We observe that if algorithm that provides -differential privacy also provides -differential privacy. As a result, the -differential privacy is much stronger. Here are privacy budget parameters, the smaller they are, the stronger the privacy guarantee. From this definition, it is clear that if we arbitrarily change any individual data point in a dataset, the output of the algorithm does not shift too much.
Dwork et al.  also provide a framework for developing privacy preserving algorithms by adding noise with a designed distribution to the output of the algorithm, which is called sensitivity method. The -sensitivity of a function is defined as follows. [] The -sensitivity of a function is
The Gaussian mechanism is a typical sensitivity method that preserves -differential privacy.
[] Given any function , the Gaussian mechanism is defined as
where are random variables drawn from Gaussian distribution , where .
The sensitivity of an algorithm measures the maximum difference in the output of the algorithm under neighboring datasets, which is similar to the stability of the algorithm. An algorithm is stable means that the output of the algorithm will vary little when given two very similar date sets, and it also means that the sensitivity of the algorithm is low. When the privacy parameters are fixed, the low sensitivity makes the differentially private algorithm add small noise at high probability, which means high utility. Thus, a stable non-private algorithm is suitable for development into a differentially private algorithm.
Moreover, differential privacy is closed under post-processing.
[] Let is an -differentially private algorithm. Let is an arbitrary data-independent mapping. Then is -differentially private. That is, a data analyst, without additional knowledge about the private dataset, cannot compute a function of the output of a differentially private algorithm and make it less differentially private.
2.2 Alternating direction method of multipliers algorithm for the graphical lasso
The ADMM is an algorithm that is well suited to distributed convex optimization[18, 16]. The algorithm solves optimization problems by decomposing them into smaller parts, and each part is easier to handle. We might implement differential privacy by handling some of these parts. Now we take the ADMM algorithm to solve the graphical lasso.
The graphical lasso problem (1.2) can be rewritten as
where and are positive definite matrices.
Then we form the augmented Lagrangian function
where is the dual variable or Lagrange multiplier, and is a preselected penalty coefficient. Then the ADMM algorithm alternately updates variables by minimizing the augmented Lagrangian function.
Let be the scaled dual variable. Using the scaled dual variable, the ADMM algorithm for the graphical lasso is
In this section, we first propose a differentially private ridge estimation for the precision matrix. Then notice that the ADMM algorithm for the graphical lasso contains a ridge-type estimate step. A differentially private algorithm for the graphical lasso problem is proposed.
3.1 Differentially private ridge estimation for the precision matrix
where is a diagonal matrix whose diagonal elements are sorted in ascending order beginning from the upper left corner , that is, , . Here ’s are the eigenvalue of the sample covariance matrix . is an orthogonal matrix whose columns correspond to the eigenvectors of , which are ordered in the same order in which the sample covariance matrix eigenvalues appear in the matrix . Similarly, van Wieringen and Peeters  given a different form of analytical solution to model (1.3). Both results showed that the ridge estimation of the precision matrix has high stability, in other words, has low sensitivity. It is clear that the estimator (3.1) is determined by the eigenvalues and eigenvectors of when is fixed. Thus, the estimator will preserve differential privacy as long as the orthogonal eigenvalue decomposition of preserves differential privacy.
Without loss of generality, the columns of the data matrix are assumed to be normalized to have norm at most one, that is, , . We are interested in the sensitivity of function ,
where the function may be viewed as an
-dimensional real vector. We use the Analyze Gauss algorithm to perturb , which provides -differential privacy. Then we perform equation (3.1) on the perturbed , which will yield a ridge estimate of the precision matrix that is -differentially private. See Algorithm 1 for details.
3.2 Differentially private algorithm for the graphical lasso
One algorithm for solving the graphical lasso problem is the ADMM algorithm in Section 2. The ADMM algorithm for the graphical lasso consists of a -minimization step, a -minimization step and a scaled dual variable update. Obviously, the -minimization step accesses the original dataset directly, while the other steps only use the previous update results. Moreover, the -minimization step is a ridge-type penalty log-likelihood estimation similar to model (1.3).
Differential privacy is closed under post-processing, then solving the -minimization step in the -th iteration by Algorithm 1 will make the -th iteration provides -differential privacy. Note that when Algorithm 1 is used here, the analytical solution used in step 4 may be different from (3.1) (see [12, 13, 16]). If the algorithm converges after iterations, then the algorithm provides -differential privacy follows from simple composition.
However, if the same privacy parameter of Algorithm 1 is used in each iteration, it is actually equivalent to perturbing with the same noise level in each iteration. Therefore, we can perturb directly before running the ADMM algorithm, which will avoid the accumulation of privacy loss, that is, reduces privacy cost. We have the following differentially private ADMM algorithm for the graphical lasso.
Proof By the definition of the Gaussian mechanism, we know that step 3 keeps the matrix -differentially private. Thus, due to the post-processing property of differential privacy, Algorithm 1 is -differentially private, and Algorithm 2 is also.
It should be noted that the noise addition mechanism in Algorithm 2 is the same as , but we derived it from the perspective of the algorithm. The differential privacy of Algorithm 2 is achieved by the differential privacy of -minimization step, so the different privacy mechanism of this step will determine the different noise addition mechanism in Algorithm 2. Moreover, this method of developing the differential privacy algorithm can be extended to the development of the differential privacy algorithm for the regularized empirical risk minimization problem.
4 Theoretical analysis
In this section, we provide theoretical guarantees on the performance of privacy-preserving precision matrix estimation algorithms in Section 3.
4.1 Differentially private ridge estimation for the precision matrix
We first provide performance theorem on Algorithm 1. We will show that the differentially private ridge estimator for the precision matrix estimation is consistent under fixed-dimension asymptotic. Let be the output of Algorithm 1. Let be the true precision matrix. Then, if tuning parameter converges almost surely to zero as , with high probability, we have
In this proof, we will need the following lemmas. The first had been proved in .
[] Let , be positive definite matrices. For any , we have
where , . Here and represent the maximum eigenvalue and the minimum eigenvalue, respectively.
We now prove the Theorem 4.1.
Proof By the triangular inequality,
First, we bound the first term on the right-hand side of (4.4). Let and denote the proximity operator of . Then for any ,
Note that the entries in are drawn from . Then, with high probability, we have
Take expectations on both sides of (4.7),
Note that and are always achieved, and thus is finite.
4.2 Differentially private algorithm for the graphical lasso
We now establish performance guarantees for Algorithm 2, by providing a rate of decay of its error in Frobenius norm. Generally, let be the optimal solution of problem (1.2) based on . Let set , with the cardinality , denotes non-zero off-diagonal entries in the precision matrix and be its complement. We provide the result as the following theorem.
Let be the optimal solution of problem (1.2) based on with tuning parameter . Let be the true precision matrix. Then, if the smallest and largest eigenvalues of the covariance matrix satisfy , with high probability, we have
In this proof, we will need a lemma of Bickel and Levina .
[] Let be and . Then if
where , and depend on only.
We now prove the Theorem 4.2.
The estimate minimizes ), or equivalently minimizes .
Consider the set
Note that is a convex function, and . Then, if we can show that
the minimizer must be inside the sphere defined by , and hence
By the Taylor expansion of , we rewrite (4.11) as,
where is the Kronecker product and is vectorized to match the dimensions of the Kronecker product.
Rothman et al.  had show that term with high probability.
To bound term II, we write
Note that the union sum inequality and Lemma 4.2 imply that, with high probability,
and hence term II is bounded by
here are positive constants.
Note that and . Then by triangular inequality, the third part
Therefore, we have
where belongs to . Then
Since that the second term on the right-hand side of (4.22) is always negative and , we then have
Consequently, for sufficiently large. This establishes the theorem.
In this subsection, we conduct a simulation study to evaluate the performance of our proposed differentially private algorithms. We consider four models as follows:
. , where is a matrix with and each is drawn from . This precision matrix is unstructured and non-sparse.
. Full model with if and otherwise. This precision matrix is structured and non-sparse.
. An AR(2) model with and . This precision matrix is structured and sparse.
. The last model comes from Cai et al. . Let , where each off-diagonal entry in is generated independently and equals 0.5 with probability 0.1 or 0 with probability 0.9. is chosen such that the condition number (the ratio of the largest and the smallest eigenvalues of a matrix) of the matrix is equal to . Finally, the matrix is standardized to have unit diagonal. This precision matrix is unstructured and sparse.
Firstly, we measure the difference between and , where denotes the output of our differentially private Algorithm 1 and Algorithm 2, the corresponding denotes the solution of the problem (1.3) solved by equation (3.1) and the solution of the problem (1.2) solved by the ADMM algorithm. For each model, we generate an independent sample of size from a multivariate Gaussian distribution with mean zero and covariance matrix , and preprocess the data matrix by normalizing the data matrix with the maximum norm to enforce the condition . A five-fold cross-validation described by Bien and Tibshirani  is used to choose the parameter such that the optimal value of minimizes the negative log-likelihood (1.1), and we compute the differentially private estimations on the same tuning parameter as the non-private algorithm. We fixed and consider different values of
in Model 1-3. Average (standard error) matrix losses over 50 replications.
and change . For each experiments, we set and replicate 50 times. In addition, the preselected penalty coefficient is set to , We measure the estimation quality by the following matrix norms: the matrix norm , the Frobenius norm , and the spectral norm . Tabel 1 and Tabel 2 report the averages and standard errors of these losses.
As shown in Tabel 1 and Tabel 2, the high sample size and high privacy budget setting will have smaller matrix norm losses. First, we study how the loss of our differentially private algorithms is affected by . As increase, these losses are reduced in both structured or unstructured, sparse or non-sparse models. This is in line with our additive-noise algorithm, where noise level is inversely proportional to privacy parameter, that is, larger means less noise and more utility. Next, we investigate the performance of our differentially private algorithms when the sample size varies. As the sample size increases while being fixed, all three losses show increasing estimation performance.
Secondly, focus on the structural learning of the Gaussian graphical model, we test the ability of our differentially private Algorithm 2 to recover the support of the precision matrix by ROC curves. ROC curves reflect the overall selection performance of each method as the tuning parameter varies, in which the true positive rate (TPR) is plotted against the false positive rate (FPR). For Model 3 and Model 4, we generate a sample of size and preprocess the data matrix. We fixed and let vary in . For each setting, we replicate 50 times. Smoothed average ROC curves are shown in Figure 1. From Figure 1, we see that the model selection ability of our differentially private Algorithm 2 is almost comparable to the non-private algorithm with proper parameters setting.
5.2 Application to real data
5.2.1 Classification of Ionosphere Data
The first example deals with the Ionosphere data from UCI repository . This radar data was collected by a system in Goose Bay, Labrador. This system consists of a phased array of 16 high-frequency antennas. The targets were free electrons in the ionosphere. “Good” radar returns are those showing evidence of some type of structure in the ionosphere. “Bad” returns are those that do not, their signals pass through the ionosphere. The data contains 351 observations with 34 variables. The response is labelled as 1 for “Good” and for “Bad”. As a classification problem, we apply linear discriminant analysis (LDA) with an differentially private precision matrix estimate. The purpose here is to illustrate how the ridge estimation of the precision matrix under different privacy parameters can affect the classification performance of LDA. We randomly divide the data into a training set, a validation set and a test set. The training set is used to estimate the precision matrix. The validation set is used to choose the tuning parameter for the method. The misclassification error is computed based on the test set. The sizes of the training set and the validation set are chosen to be n = 40. We set and repeat this procedure 50 times. The misclassification errors are present Figure 2. Figure 2 shows that there is a large gap between the result of the differentially private algorithm and the result of the non-private algorithm when is small, but the gap becomes smaller as increases.
5.2.2 Cell signalling data
In this subsection, we apply our differentially private Algorithm 2 to the cell signalling data  to evaluate its performance. The dataset containing 7466 cells, with flow cytometry measurements of 11 phosphorylated proteins and phospholipids, i.e., and . They also provided a causal protein-signaling network, shown in Figure 3a. Friedman et al.  used to apply graphical lasso to this data.
We compare the performance of our differentially private Algorithm 2 and non-private algorithm in reconstructing network edges by the ROC curve. To remove the randomness of the additive noise, we replicate 50 times. We fixed preselected penalty coefficient and privacy parameter . The resulting ROC curve is shown in Figure 3b, where the network in Figure 3a was used as a benchmark (here we regard it as an undirected graph). The ROC curve shows that our differential privacy Algorithm 2 has high utility. We randomly show reconstructed networks with different tuning parameter in Figure 4.
In this paper, we have focused on developing differentially private algorithms for solving the ridge estimation of the precision matrix and the graphical lasso problem. We first present a differentially private ridge estimation for the precision matrix. Then based on this algorithm and ADMM algorithm, a differentially private algorithm for the graphical lasso problem has been proposed. We compare our differentially private algorithms with the non-private algorithms through different privacy budget and sample size under simulation and real data, which show that our developed differentially private algorithms provide fine utility.
Martin, N., Maes, H.: Multivariate analysis, Academic press London, 1979
-  Lauritzen, S. L.: Graphical models, Clarendon Press, 1996
Banerjee, O., Ghaoui, L., d’Aaspremont, A.: Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data.
Journal of Machine Learning Research, 9, 485–516 (2008)
-  Yuan, M., Lin, Y.: Model selection and estimation in the gaussian graphical model. Biometrika, 94(1), 19–35 (2007)
-  Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432–441 (2008)
-  Cai, T., Liu, W. D., Luo, X.: A constrained minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106(494), 594–607 (2011)
-  Yuan, T., Wang, J. H.: A coordinate descent algorithm for sparse positive definite matrix estimation. Statistical Analysis and Data Mining, 6(5), 431–442 (2013)
-  Liu, W. D., Luo, X.: Fast and adaptive sparse precision matrix estimation in high dimensions. Journal of Multivariate Analysis, 135, 153–162 (2015)
-  Ledoit, O., Wolf, M.: A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2), 365–411 (2004)
-  Warton, D. I.: Penalized normal likelihood and ridge regularization of correlation and covariance matrices. Journal of the American Statistical Association, 103(481), 340–349 (2008)
-  Deng, X. W., Tsui, K.: Penalized covariance matrix estimation using a matrix-logarithm transformation. Journal of Computational Graphical Statistics, 22(2), 494–512 (2013)
van Wieringen, W. N., Peeters, C. F. W.: Ridge estimation of inverse covariance matrices from high-dimensional data.Computational Statistics Data Analysis, 103, 284–303 (2016)
-  Kuismin, M. O., Kemppainen, J. T., Sillanpää, M. J.: Precision matrix estimation with ROPE. Journal of Computational Graphical Statistics, 26(3), 682-694 (2017)
-  Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. Theory of Cryptography Conference, Springer, Berlin, Heidelberg, 2006
-  Wang, D., Huai, M. D., Xu, J. H.: Differentially private sparse inverse covariance estimation, 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP), IEEE, 2018
-  Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine learning, 3(1), 1–122 (2011)
-  Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4), 211–407 (2014)
-  Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Computers Mathematics with Applications, 2, 17–40 (1976)
Dwork, C., Talwar, K., Thakurta, A., Zhang, L.: Analyze gauss: optimal bounds for privacy-preserving principal component analysis. Proceedings of the forty-sixth annual ACM symposium on Theory of computing, ACM, 2014
-  Rolfs, B., Rajaratnam, B., Guillot, D., Wong, I., Maleki, A.: Iterative thresholding algorithm for sparse inverse covariance estimation, Advances in Neural Information Processing Systems, 2012
-  Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Modeling Simulation, 4(4), 1168–1200 (2005)
-  Bickel, P. J., Levina, E.: Regularized estimation of large covariance matrices. The Annals of Statistics, 36(1), 199–227 (2008)
-  Rothman, A. J., Bickel, P. J., Levina, E., Zhu, J.: Sparse permutation invariant covariance estimation. Electronic Journal of Statistics, 2, 494–515 (2008)
-  Bien, J., Tibshirani, R.: Sparse estimation of a covariance matrix. Biometrika, 98(4), 807–820 (2011)
-  Dua, D., Gra, C.: UCI machine learning repository, 2017
-  Sachs, K., Perez, O., Pe’er, D., Lauenburger, D., and Nolan, G.: Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308(5721), 523–529 (2005)