Differentially Private High Dimensional Sparse Covariance Matrix Estimation

01/18/2019 ∙ by Di Wang, et al. ∙ University at Buffalo 0

In this paper, we study the problem of estimating the covariance matrix under differential privacy, where the underlying covariance matrix is assumed to be sparse and of high dimensions. We propose a new method, called DP-Thresholding, to achieve a non-trivial ℓ_2-norm based error bound, which is significantly better than the existing ones from adding noise directly to the empirical covariance matrix. We also extend the ℓ_2-norm based error bound to a general ℓ_w-norm based one for any 1≤ w≤∞, and show that they share the same upper bound asymptotically. Our approach can be easily extended to local differential privacy. Experiments on the synthetic datasets show consistent results with our theoretical claims.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Machine Learning and Statistical Estimation have made profound impact in recent years to many applied domains such as social sciences, genomics, and medicine. During their applications, a frequently encountered challenge is how to deal with the high dimensionality of the datasets, especially for those in genomics, educational and psychological research. A commonly adopted strategy for dealing with such an issue is to assume that the underlying structures of parameters are sparse.

Another often encountered challenge is how to handle sensitive data, such as those in social science, biomedicine and genomics. A promising approach is to use some differentially private mechanisms for the statistical inference and learning tasks. Differential Privacy (DP) dwork2006calibrating is a widely-accepted criterion that provides provable protection against identification and is resilient to arbitrary auxiliary information that might be available to attackers. Since its introduction over a decade ago, a rich line of works are now available, which have made differential privacy a compelling privacy enhancing technology for many organizations, such as Uber uber , Google google , Apple apple .

Estimating or studying the high dimensional datasets while keeping them (locally) differentially private could be quite challenging for many problems, such as sparse linear regression

dwangppml18 , sparse mean estimation duchi2018right and selection problem ullman2018tight . However, there are also evidences showing that the loss of some problems under the privacy constraints can be quite small compared with their non-private counterparts. Examples of such nature include high dimensional sparse PCA ge2018minimax , sparse inverse covariance estimation dwangglobalsip18 , and high-dimensional distributions estimation kamath2018privately . Thus, it is desirable to determine which high dimensional problem can be learned or estimated efficiently in a private manner.

In this paper, we try to give an answer to this question for a simple but fundamental problem in machine learning and statistics, called estimating the underlying sparse covariance matrix of bounded sub-Gaussian distribution. For this problem, we propose a simple but nontrivial

-DP method, DP-Thresholding, and show that the squared -norm error for any is bounded by , where is the sparsity of each row in the underlying covariance matrix. Moreover, our method can be easily extended to the local differentialy privacy model. Experiments on synthetic datasets confirm the theoretical claims. To our best knowledge, this is the first paper studying the problem of estimating high dimensional sparse covariance matrix under (local) differential privacy.

2 Related Work

Recently, there are several papers studying private distribution estimation, such as kamath2018privately ; joseph2018locally ; karwa2017finite ; gaboardi2018locally ; kareemppml18 . For distribution estimation under the central differential privacy model, karwa2017finite

considers the 1-dimensional private mean estimation of a Gaussian distribution with (un)known variance. The work that is probably most related to ours is

kamath2018privately , which studies the problem of privately learning a multivariate Gaussian and product distributions. The following are the main differences with ours. Firstly, our goal is to estimate the covariance of a sub-Gaussian distribution. Even though the class of distributions considered in our paper is larger than the one in kamath2018privately , it has an additional assumption which requires the norm of a sample of the distribution to be bounded by . This means that it does not include the general Gaussian distribution. Secondly, although kamath2018privately also considers the high dimensional case, it does not assume the sparsity of the underlying covariance matrix. Thus, its error bound depends on the dimensionality polynomially, which is large in the high dimensional case (), while the dependence in our paper is only logarithmically (i.e., ). Thirdly, the error in kamath2018privately is measured by the total variation distance, while it is by -norm in our paper. Thus, the two results are not comparable. Fourthly, the methods in kamath2018privately seem difficult to be extended to the local model. kareemppml18

recently also studies the covariance matrix estimation via iterative eigenvector sampling. However, their method is just for the low dimensional case and with Frobenious norm as the error measure.

Distribution estimation under local differential privacy has been studied in gaboardi2018locally ; joseph2018locally . However, both of them study only the 1-dimensional Gaussian distribution. Thus, it is quite different from the class of distributions in our paper.

In this paper, we mainly use Gaussian mechanism to the covariance matrix, which has been studied in dwork2014analyze ; ge2018minimax ; dwangglobalsip18 . However, as it will be shown later, simply outputting the perturbed covariance can cause big error and thus is insufficient for our problem. Compared to these problems, ours is clearly more complicated.

3 Preliminaries

3.1 Differential Privacy

Differential privacy dwork2006calibrating is by now a defacto standard for statistical data privacy which constitutes a strong standard for privacy guarantees for algorithms on aggregate databases. One likely reason that it gains so much popularity is its guarantee of no significant change on the outcome distribution when there is one entry change to the dataset. We say that two datasets are neighbors if they differ by only one entry, denoted as .

Definition 1 (Differentially Privatedwork2006calibrating ).

A randomized algorithm is -differentially private (DP) if for all neighboring datasets and for all events in the output space of , the following holds

When , is -differentially private.

We will use Gaussian Mechanism dwork2006calibrating to guarantee -DP.

Definition 2 (Gaussian Mechanism).

Given any function , the Gaussian Mechanism is defined as:

where Y is drawn from Gaussian Distribution with . Here is the -sensitivity of the function , i.e.

Gaussian Mechanism preservers -differential privacy.

3.2 Private Sparse Covariance Estimation

Let be random samples from a -variate distribution with covariance matrix , where the dimensionality is assumed to be high, i.e., .

We define the parameter space of -sparse covariance matrices as the following:

(1)

where means the -th column of with the entry removed. That is, a matrix in has at most non-zero off-diagonal elements in each column.

We assume that each is sampled from a -mean and sub-Gaussian distribution with parameter , that is,

(2)

This means that all the one-dimensional marginals of have sub-Gaussian tails. We also assume that with probability 1, . We note that such assumptions are quite common in the differential privacy literature, such as ge2018minimax .

Let denote the set of distributions of satisfying all the above conditions (ı.e., (2) and ) and with the covariance matrix . The goal of private covariance estimation is to obtain an estimator of the underlying covariance matrix based on while keeping it differnetially private. In this paper, we will focus on the -differential privacy. We use the norm to measure the difference between and , i.e., .

Lemma 1.

Let be random variables sampled from Gaussian distribution . Then

(3)
(4)

Particularly, if , we have .

Lemma 2 (cai2012optimal ).

If are sampled form a sub-Gaussian distribution in (2) and is the empirical covariance matrix, then there exist constants and such that

(5)

for all , where and are constants and depend only on . Specifically,

(6)

4 Method

4.1 A First Approach

A direct way to obtain a private estimator is to perturb the empirical covariance matrix by symmetric Gaussian matrices, which has been used in previous work on private PCA, such as dwork2014analyze ; ge2018minimax . However, as we can see bellow, this method will introduce big error.

By dwork2014analyze , for any give and , the following perturbing procedure is -differentially private:

(7)

where is a symmetric matrix with its upper triangle ( including the diagonal) being i.i.d samples from ; here , and each lower triangle entry is copied from its upper triangle counterpart. By tao2012topics , we know that . We can easily get that

(8)

where the second inequality is due to tropp2015introduction . However, we can see that the upper bound of the error in (8) is quite large in the high dimensional case.

Another issue of the private estimator in (7) is that it is not clear whether it is positive-semidefinite, a property that is normally expected from an estimator.

4.2 Post-processing via Thresholding

We note that one of the reasons that the private estimator in (7) fails is due to the fact that some entries are quite large which make large for some . To see it more precisely, by (4) and (5) we can get the following, with probability at least , for all ,

(9)

Thus, to reduce the error, it is natural to think of the following way. For those with larger values, we keep the corresponding in order to make their difference less than some threshold. For those with smaller values compared with (9), since the corresponding may still be large, if we threshold to 0, we can lower the error on .

Following the above thinking and the thresholding methods in cai2012optimal and bickel2008covariance , we propose the following DP-Thresholding method, which post-processes the perturbed covariance matrix in (7) with the threshold

. After thresholding, we further threshold the eigenvalues of

in order to make it positive semi-definite. See Algorithm 1 for detail.

: are privacy parameters and .

1:  Compute
where is a symmetric matrix with its upper triangle (including the diagonal) being i.i.d samples from ; here , and each lower triangle entry is copied from its upper triangle counterpart.
2:  Define the thresholding estimator as
(10)
3:  Let the eigen-decomposition of as . Let be the positive part of , then define .
4:  return  .
Algorithm 1 DP-Thresholding
Theorem 1.

For any , Algorithm 1 is -differentially private.

Proof.

By ge2018minimax and dwork2014analyze , we know that Step 1 keeps the matrix -differentially private. Thus, Algorithm 1 is -differentially private due to the post-processing property of differential privacy dwork2006calibrating . ∎

For the matrix in (10) after the first step of thresholding, we have the following key lemma.

Lemma 3.

For every fixed , there exists a constant such that with probability at least , the following holds:

(11)
Proof of Lemma 3.

Let and . Define the event . We have:

(12)

By the triangle inequality, it is easy to see that

and

Depending on the value of , we have the following three cases.

Case 1

. For this case, we have

(13)

This is due to the followings:

(14)
(15)
(16)
(17)
(18)
(19)

where event denotes , and the last inequality is due to (4) and (5).

Thus by (12), with probability at least , we have

which satisfies (11).

Case 2

. For this case, we have

where the proof is the same as (13-17). Thus, with probability at least , we have

(20)

Also, by (9), (11) also holds.

Case 3

Otherwise,

For this case, we have

(21)

When , we can see from (9) that with probability at least ,

Thus, also holds.

Otherwise when , also holds. Thus, Lemma 3 is true. ∎

By Lemma 3, we have the following upper bound on the -norm error of .

Theorem 2.

The output of Algorithm 1 satisfies:

(22)

where the expectation is taken over the coins of the Algorithm and the randomness of .

Proof of Theorem 2.

We first show that . This is due to the following

where the third inequality is due to the fact that is positive semi-definite.

This means that we only need to bound . Since is symmetric, we know that golub2012matrix . Thus, it suffices to prove that the bound in (22) holds for .

We define event as

(23)

Then, by Lemma 3, we have .

Let , where . Then, we have

(24)

We first bound the first term of (24). By the definition of and Lemma 3, we can upper bounded it by

(25)

where the second inequality is due to the assumption that at most elements of are non-zero.

For the second term in (24), we have

(26)

For the first term in (26), we have

(27)

where the first inequality is due to Hölder inequality and the second inequality is due to the fact that . Since is a Gaussian distribution, we have papoulis1965probability . For the first term , since is sampled from a sub-Gaussian distribution (2), by Whittle Inequality (Theorem 2 in whittle1960bounds or cai2012optimal ), the quadratic form satisfies for some positive constant .

For the second term of (26), we have

(28)
(29)

For the second term of (29), by Lemmas 1 and 2 we have

(30)
(31)

For the first term of (29), by Lemma 2 we have

(32)

Thus in total, we have . This means that , which completes the proof. ∎

Corollary 1.

For any , the matrix in (10) after the first step of thresholding satisfies

(33)

where the -norm of any matrix is defined as . Specifically, for a matrix , is the maximum absolute column sum, and is the maximum absolute row sum.

Comparing the bound in the above corollary with the optimal minimax rate