Machine Learning and Statistical Estimation have made profound impact in recent years to many applied domains such as social sciences, genomics, and medicine. During their applications, a frequently encountered challenge is how to deal with the high dimensionality of the datasets, especially for those in genomics, educational and psychological research. A commonly adopted strategy for dealing with such an issue is to assume that the underlying structures of parameters are sparse.
Another often encountered challenge is how to handle sensitive data, such as those in social science, biomedicine and genomics. A promising approach is to use some differentially private mechanisms for the statistical inference and learning tasks. Differential Privacy (DP) dwork2006calibrating is a widely-accepted criterion that provides provable protection against identification and is resilient to arbitrary auxiliary information that might be available to attackers. Since its introduction over a decade ago, a rich line of works are now available, which have made differential privacy a compelling privacy enhancing technology for many organizations, such as Uber uber , Google google , Apple apple .
Estimating or studying the high dimensional datasets while keeping them (locally) differentially private could be quite challenging for many problems, such as sparse linear regressiondwangppml18 , sparse mean estimation duchi2018right and selection problem ullman2018tight . However, there are also evidences showing that the loss of some problems under the privacy constraints can be quite small compared with their non-private counterparts. Examples of such nature include high dimensional sparse PCA ge2018minimax , sparse inverse covariance estimation dwangglobalsip18 , and high-dimensional distributions estimation kamath2018privately . Thus, it is desirable to determine which high dimensional problem can be learned or estimated efficiently in a private manner.
In this paper, we try to give an answer to this question for a simple but fundamental problem in machine learning and statistics, called estimating the underlying sparse covariance matrix of bounded sub-Gaussian distribution. For this problem, we propose a simple but nontrivial-DP method, DP-Thresholding, and show that the squared -norm error for any is bounded by , where is the sparsity of each row in the underlying covariance matrix. Moreover, our method can be easily extended to the local differentialy privacy model. Experiments on synthetic datasets confirm the theoretical claims. To our best knowledge, this is the first paper studying the problem of estimating high dimensional sparse covariance matrix under (local) differential privacy.
2 Related Work
Recently, there are several papers studying private distribution estimation, such as kamath2018privately ; joseph2018locally ; karwa2017finite ; gaboardi2018locally ; kareemppml18 . For distribution estimation under the central differential privacy model, karwa2017finitekamath2018privately , which studies the problem of privately learning a multivariate Gaussian and product distributions. The following are the main differences with ours. Firstly, our goal is to estimate the covariance of a sub-Gaussian distribution. Even though the class of distributions considered in our paper is larger than the one in kamath2018privately , it has an additional assumption which requires the norm of a sample of the distribution to be bounded by . This means that it does not include the general Gaussian distribution. Secondly, although kamath2018privately also considers the high dimensional case, it does not assume the sparsity of the underlying covariance matrix. Thus, its error bound depends on the dimensionality polynomially, which is large in the high dimensional case (), while the dependence in our paper is only logarithmically (i.e., ). Thirdly, the error in kamath2018privately is measured by the total variation distance, while it is by -norm in our paper. Thus, the two results are not comparable. Fourthly, the methods in kamath2018privately seem difficult to be extended to the local model. kareemppml18
recently also studies the covariance matrix estimation via iterative eigenvector sampling. However, their method is just for the low dimensional case and with Frobenious norm as the error measure.
Distribution estimation under local differential privacy has been studied in gaboardi2018locally ; joseph2018locally . However, both of them study only the 1-dimensional Gaussian distribution. Thus, it is quite different from the class of distributions in our paper.
In this paper, we mainly use Gaussian mechanism to the covariance matrix, which has been studied in dwork2014analyze ; ge2018minimax ; dwangglobalsip18 . However, as it will be shown later, simply outputting the perturbed covariance can cause big error and thus is insufficient for our problem. Compared to these problems, ours is clearly more complicated.
3.1 Differential Privacy
Differential privacy dwork2006calibrating is by now a defacto standard for statistical data privacy which constitutes a strong standard for privacy guarantees for algorithms on aggregate databases. One likely reason that it gains so much popularity is its guarantee of no significant change on the outcome distribution when there is one entry change to the dataset. We say that two datasets are neighbors if they differ by only one entry, denoted as .
Definition 1 (Differentially Privatedwork2006calibrating ).
A randomized algorithm is -differentially private (DP) if for all neighboring datasets and for all events in the output space of , the following holds
When , is -differentially private.
We will use Gaussian Mechanism dwork2006calibrating to guarantee -DP.
Definition 2 (Gaussian Mechanism).
Given any function , the Gaussian Mechanism is defined as:
where Y is drawn from Gaussian Distribution with . Here is the -sensitivity of the function , i.e.
Gaussian Mechanism preservers -differential privacy.
3.2 Private Sparse Covariance Estimation
Let be random samples from a -variate distribution with covariance matrix , where the dimensionality is assumed to be high, i.e., .
We define the parameter space of -sparse covariance matrices as the following:
where means the -th column of with the entry removed. That is, a matrix in has at most non-zero off-diagonal elements in each column.
We assume that each is sampled from a -mean and sub-Gaussian distribution with parameter , that is,
This means that all the one-dimensional marginals of have sub-Gaussian tails. We also assume that with probability 1, . We note that such assumptions are quite common in the differential privacy literature, such as ge2018minimax .
Let denote the set of distributions of satisfying all the above conditions (ı.e., (2) and ) and with the covariance matrix . The goal of private covariance estimation is to obtain an estimator of the underlying covariance matrix based on while keeping it differnetially private. In this paper, we will focus on the -differential privacy. We use the norm to measure the difference between and , i.e., .
Let be random variables sampled from Gaussian distribution . Then
Particularly, if , we have .
4.1 A First Approach
A direct way to obtain a private estimator is to perturb the empirical covariance matrix by symmetric Gaussian matrices, which has been used in previous work on private PCA, such as dwork2014analyze ; ge2018minimax . However, as we can see bellow, this method will introduce big error.
By dwork2014analyze , for any give and , the following perturbing procedure is -differentially private:
where is a symmetric matrix with its upper triangle ( including the diagonal) being i.i.d samples from ; here , and each lower triangle entry is copied from its upper triangle counterpart. By tao2012topics , we know that . We can easily get that
Another issue of the private estimator in (7) is that it is not clear whether it is positive-semidefinite, a property that is normally expected from an estimator.
4.2 Post-processing via Thresholding
We note that one of the reasons that the private estimator in (7) fails is due to the fact that some entries are quite large which make large for some . To see it more precisely, by (4) and (5) we can get the following, with probability at least , for all ,
Thus, to reduce the error, it is natural to think of the following way. For those with larger values, we keep the corresponding in order to make their difference less than some threshold. For those with smaller values compared with (9), since the corresponding may still be large, if we threshold to 0, we can lower the error on .
Following the above thinking and the thresholding methods in cai2012optimal and bickel2008covariance , we propose the following DP-Thresholding method, which post-processes the perturbed covariance matrix in (7) with the threshold
. After thresholding, we further threshold the eigenvalues ofin order to make it positive semi-definite. See Algorithm 1 for detail.
: are privacy parameters and .
For any , Algorithm 1 is -differentially private.
For the matrix in (10) after the first step of thresholding, we have the following key lemma.
For every fixed , there exists a constant such that with probability at least , the following holds:
Proof of Lemma 3.
Let and . Define the event . We have:
By the triangle inequality, it is easy to see that
Depending on the value of , we have the following three cases.
. For this case, we have
This is due to the followings:
For this case, we have
When , we can see from (9) that with probability at least ,
Thus, also holds.
Otherwise when , also holds. Thus, Lemma 3 is true. ∎
By Lemma 3, we have the following upper bound on the -norm error of .
The output of Algorithm 1 satisfies:
where the expectation is taken over the coins of the Algorithm and the randomness of .
Proof of Theorem 2.
We first show that . This is due to the following
where the third inequality is due to the fact that is positive semi-definite.
We define event as
Then, by Lemma 3, we have .
Let , where . Then, we have
We first bound the first term of (24). By the definition of and Lemma 3, we can upper bounded it by
where the second inequality is due to the assumption that at most elements of are non-zero.
For the second term in (24), we have
For the first term in (26), we have
where the first inequality is due to Hölder inequality and the second inequality is due to the fact that . Since is a Gaussian distribution, we have papoulis1965probability . For the first term , since is sampled from a sub-Gaussian distribution (2), by Whittle Inequality (Theorem 2 in whittle1960bounds or cai2012optimal ), the quadratic form satisfies for some positive constant .
For any , the matrix in (10) after the first step of thresholding satisfies
where the -norm of any matrix is defined as . Specifically, for a matrix , is the maximum absolute column sum, and is the maximum absolute row sum.
Comparing the bound in the above corollary with the optimal minimax rate in cai2012optimal for the non-private case, we can see that the impact of the differential privacy is to make the number of efficient sample from