The Cost of Privacy: Optimal Rates of Convergence for Parameter Estimation with Differential Privacy

02/12/2019
by   T. Tony Cai, et al.
University of Pennsylvania
0

Privacy-preserving data analysis is a rising challenge in contemporary statistics, as the privacy guarantees of statistical methods are often achieved at the expense of accuracy. In this paper, we investigate the tradeoff between statistical accuracy and privacy in mean estimation and linear regression, under both the classical low-dimensional and modern high-dimensional settings. A primary focus is to establish minimax optimality for statistical estimation with the (ε,δ)-differential privacy constraint. To this end, we find that classical lower bound arguments fail to yield sharp results, and new technical tools are called for. We first develop a general lower bound argument for estimation problems with differential privacy constraints, and then apply the lower bound argument to mean estimation and linear regression. For these statistical problems, we also design computationally efficient algorithms that match the minimax lower bound up to a logarithmic factor. In particular, for the high-dimensional linear regression, a novel private iterative hard thresholding pursuit algorithm is proposed, based on a privately truncated version of stochastic gradient descent. The numerical performance of these algorithms is demonstrated by simulation studies and applications to real data containing sensitive information, for which privacy-preserving statistical methods are necessary.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

11/08/2020

The Cost of Privacy in Generalized Linear Models: Algorithms and Minimax Lower Bounds

We propose differentially private algorithms for parameter estimation in...
06/04/2020

Median regression with differential privacy

Median regression analysis has robustness properties which make it attra...
01/03/2022

On robustness and local differential privacy

It is of soaring demand to develop statistical analysis tools that are r...
01/24/2020

Distributed Gaussian Mean Estimation under Communication Constraints: Optimal Rates and Communication-Efficient Algorithms

We study distributed estimation of a Gaussian mean under communication c...
03/05/2019

Local differential privacy: Elbow effect in optimal density estimation and adaptation over Besov ellipsoids

We address the problem of non-parametric density estimation under the ad...
03/07/2018

Revisiting differentially private linear regression: optimal and adaptive prediction & estimation in unbounded domain

We revisit the problem of linear regression under a differential privacy...
10/07/2012

Privacy Aware Learning

We study statistical risk minimization problems under a privacy model in...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With the unprecedented availability of datasets containing sensitive personal information, there are increasing concerns that statistical analysis of such datasets may compromise individual privacy. These concerns give rise to statistical methods that provide privacy guarantees at the cost of statistical accuracy, but there has been very limited understanding of the optimal tradeoff between statistical accuracy and privacy cost.

A rigorous definition of privacy is a prerequisite for such an understanding. Differential privacy, introduced in Dwork et al. [15], is arguably the most widely adopted definition of privacy in statistical data analysis. The promise of a differentially private algorithm is protection of any individual’s privacy from an adversary who has access to the algorithm output and even sometimes the rest of the data.

Differential privacy has gained significant attention in the machine learning communities over the past few years

[16, 1, 19, 13] and found its way into real world applications developed by Google [21], Apple [9], Microsoft [10], and the U.S. Census Bureau [2].

A usual approach to developing differentially private algorithms is perturbing the output of non-private algorithms by random noise. When the observations are continuous, differential privacy can be guaranteed by adding Laplace/Gaussian noise to the non-private output [16]. For discrete data, differential privacy can be achieved by adding Gumbel noise to utility score functions (also known as the exponential mechanism). Naturally, the processed output suffers from some loss of accuracy, which has been observed and studied in the literature, see, for example, Wasserman and Zhou [39], Smith [32], Lei [25], Bassily et al. [5], Dwork et al. [17]. However, given a certain privacy constraint, it is still unclear what the best achievable statistical accuracy is, or in other words, what the optimal tradeoff between privacy cost and statistical accuracy is.

The goal of this paper is to provide a quantitative characterization of the tradeoff between privacy cost and statistical accuracy, under the statistical minimax framework. Specifically, we consider this problem for mean estimation and linear regression models in both classical and high-dimensional settings with -differential privacy constraint, which is formally defined as follows.

Definition 1 (Differential Privacy [15]).

A randomized algorithm is -differentially private if and only if for every pair of adjacent datasets and , and for any set ,

where we say two datasets and are adjacent if and only if

According to the definition, the two parameters and control the level of privacy against an adversary who attempts to detect the presence of a certain subject in the sample. Roughly speaking, is an upper bound on the amount of influence an individual’s record has on the information released and

is the probability that this bound fails to hold, so the privacy constraint becomes more stringent as

tend to .

We establish the necessary cost of privacy by first providing minimax lower bounds for the estimation accuracy under this -differential privacy constraint. The results show that the estimators with privacy guarantees generally exhibit very different rates of convergence compared to their non-private counterparts. As a first example, we consider the -dimensional mean estimation under the loss: Theorem 2.2 in Section 2 shows that, when the sample size is , for any -differentially private algorithm, in addition to the standard statistical error, there must be an extra error of at least the order of . This lower bound is established by using a general technique presented in Theorem 2.1, which reduces the establishing minimax risk lower bounds to designing and analyzing a tracing adversary that aims to detect the presence of an individual in a dataset via the output of a differentially private procedure that is applied to the dataset. The design and analysis of tracing adversary makes use of a novel generalization of the fingerprinting lemma, a concept from cryptography [6]. The connections between tracing adversaries, the fingerprinting lemma and differential privacy have been observed in [35], [7] and [18], but their discussions are primarily concerned with discrete distributions. In this paper, we provide a continuous version of the fingerprinting lemma that enables us to establish minimax lower bounds for a greater variety of statistical problems; more discussions are given in Section 2 as well as the Supplementary Material [8].

Further, we argue that these necessary costs of privacy, as shown by lower bounds for the minimax rates, are in fact sharp in both mean estimation and linear regression problems. We construct efficient algorithms and establish matching upper bounds up to logarithmic factors. These algorithms are based on several differentially private subroutines, such as the Gaussian mechanism, reporting noisy top-, and their modifications. In particular, for the high-dimensional linear regression, we propose a novel private iterative hard thresholding pursuit algorithm, based on a privately truncated version of stochastic gradient descent. Such a private truncation step effectively enforces the sparsity of the resulting estimator and leads to optimal control of the privacy cost (see more details in Section 4.2

). To the best of our knowledge, these algorithms are the first results achieving the minimax optimal rates of convergence in high-dimensional statistical estimation problems with the

-differential privacy guarantee. Our Theorems 3.1, 3.2, 4.1, and 4.3 together provide matching upper and lower bounds for both mean estimation and linear regression problems in high-dimensional and classical settings, up to logarithmic factors.

Related literature

There are previous works studying how the privacy constraints compromise estimation accuracy. In theoretical computer science, Smith [32] showed that under strong conditions on privacy parameters, some point estimators attain the statistical convergence rates and hence privacy can be gained for free. [5, 17, 34]

proposed differentially private algorithms for convex empirical risk minimization, principal component analysis, and high-dimensional regression, and investigated the convergence rates of excess risk. In addition,

[7, 36, 3] considered the optimal estimation of sample quantities such as -way marginals and top- selection with privacy constrain. Unlike most prior works that focused on excess risks or the release of sample quantities, our focus is the population parameter estimation. Theoretical properties of excess risks or sample quantities can be very different from those of population parameters; see more discussions in [11].

More recent works aimed to study differential privacy in the context of statistical estimation. [39] observed that, -local differentially private schemes seem to yield slower convergence rates than the optimal minimax rates in general; [12] developed a framework for statistical minimax rates with the -local privacy constraint; in addition, Rohde and Steinberger [31] showed minimax optimal rates of convergence under -local differential privacy and exhibited a mechanism that is minimax optimal for “nearly” linear functionals based on randomized response. However, -local privacy is a much stronger notion of privacy than -differential privacy that is hardly compatible with high-dimensional problems [12]. As we shall see in this paper, the cost of -differential privacy in statistical estimation behaves quite differently compared to that of -local privacy.

Organization of the paper

The rest of the paper is organized as follows. Section 2 introduces a general technical tool for deriving lower bounds of the minimax risk with differential privacy constraint. The new technical tool is then applied in Section 3 to the high-dimensional mean estimation problem. Both minimax lower bound results and algorithms with matching upper bounds are obtained. Section 4 further applies the general lower bound technique to investigate the minimax lower bounds of the linear regression problem with differential privacy constraint, in both low-dimensional and high-dimensional settings. The upper bounds are also obtained by providing novel differentially private algorithms and analyzing their risks. The results together show that our bounds are rate-optimal up to logarithmic factors. Simulation studies are carried out in Section 5 to show the advantages of our proposed algorithms. Section 6 applies our algorithms to real data sets with potentially sensitive information that warrants privacy-preserving methods. Section 7 discusses extensions to other statistical estimation problems with privacy constraints. The proofs are given in Section 8.

Definitions and notation

We conclude this section by introducing notations that will be used in the rest of the paper. For a positive integer , denotes the set

. For a vector

, we use and to denote the usual vector and norm, respectively, where the norm counts the number of nonzero entries in a vector. For any set and , let denote the -dimensional vector consisting of such that . The Frobenius norm of a matrix is denoted by , and the spectral norm of is . In addition, we use

to denote the smallest and the largest eigenvalues of

. The matrix norm is defined similarly as the vector norm, i.e. . In addition, denotes the determinant of . The empirical measure is denoted by for a sample . For a set , we use to denote its complement, and denotes the indicator function on . We use , , and to denote generic constants which may vary line by line.

2 A General Lower Bound for Minimax Risk with Differential Privacy

This section presents a general minimax lower bound technique for statistical estimation problems with differential privacy constraint. As an application, we use this technique to establish a tight lower bound for differentially private mean estimation in this section.

Our lower bound technique is based on a tracing adversary that attempts to detect the presence of an individual data entry in a data set with the knowledge of an estimator computed from the data set. If one can construct a tracing adversary that is effective at this task given an accurate estimator, an argument by contradiction leads to a lower bound of the accuracy of differentially private estimators: suppose a differentially private estimator from a data set is sufficiently accurate, the tracing adversary will be able to determine the presence of an individual data entry in the data set, thus contradicting with the differential privacy guarantee. In other words, the privacy guarantee and the tracing adversary together ensure that a differentially private estimator cannot be “too accurate”.

2.1 Background and problem formulation

Let denote a family of distributions supported on a set , and let denote a population quantity of interest. The statistician has access to a data set of i.i.d. samples, , drawn from a statistical model .

With the data, our goal is to estimate a population parameter by an estimator that belongs to , the collection of all -differentially private procedures. The performance of is measured by its distance to the truth : formally, let be a metric induced by a norm on , namely , and let

be a loss function that is monotonically increasing on

, this paper studies the minimax risk for differentially-private estimation of the population parameter :

In this paper, our setting of the privacy parameters are and . This is essentially the most-permissive setting under which -differential privacy is a nontrivial guarantee: [33] shows that is essentially the weakest privacy guarantee that is still meaningful.

2.2 Lower bound by tracing

Consider a tracing adversary that outputs IN if it determines a certain sample is in the data set after seeing , and outputs OUT otherwise. We define , the index set of samples that are determined as IN by the adversary . A survey of tracing adversaries and their relationship with differential privacy can be found in [19] and the reference therein.

Our general lower bound technique requires some regularity conditions for and : for every , we assume that there exists a such that for every , , and The two statistical problems investigated in this paper, mean estimation and linear regression, satisfy the property.

The following theorem shows that minimax lower bounds for statistical estimation problems with privacy constraint can be constructed if there exist effective tracing adversaries:

Theorem 2.1.

Suppose is an i.i.d. sample from a distribution , and assume that and satisfy the regularity conditions described above. Given a tracing adversary that satisfies the following two properties when ,

  1. completeness:

  2. soundness: , where is an adjacent dataset of with replaced by ,

then if , for some , and , we have

Completeness and soundness roughly correspond to “true positive” and “false positive” in classification: completeness requires the adversary to return some nontrivial result when its input is accurate; soundness guarantees that an individual is unlikely to be identified as IN if the estimator that used is independent of the individual. When a tracing adversary satisfies these properties, Theorem 2.1 conveniently leads to a minimax risk lower bound; that is, Theorem 2.1 is a reduction from constructing minimax risk lower bounds to finding complete and sound tracing adversaries.

In the next section, we illustrate this technique by designing a complete and sound tracing adversary for the classical mean estimation problem.

2.3 A first application: private mean estimation in the classical setting

Consider the

-dimensional sub-Gaussian distribution family

, defined as

where is the mean of , and denotes the th standard basis vector of .

Following the notation introduced in Section 2.1, and . Further we take and , so that our risk function is simply the error. The minimax risk is then denoted by

We propose a tracing adversary:

where is a fresh independent draw from . The adversary is indeed complete and sound, as desired:

Lemma 2.3.1.

If , there is a distribution , such that

  1. , where is an adjacent dataset of with replaced by .

Intuitively, this adversary is constructed as follows. Without privacy constraints, a natural estimator for is the sample mean . On one hand, when does not belong to , is a sum of

independent zero-mean random variables and we have

. On the other hand, when belongs to , we will have , and is more likely to output IN than OUT.

In view of Theorem 2.1, and ; it follows that

Combining with the well-known statistical minimax lower bound, see for example, Lehmann and Casella [24], namely

we arrive at the minimax lower bound result for differentially private mean estimation.

Theorem 2.2.

Let denote the collection of all -differentially private algorithms, and let be an i.i.d. sample drawn from . Suppose that , for some and , then

Remark 1.

In comparison, applying Barber and Duchi [4]’s lower bound argument to our current model yields

Remark 2.

The minimax lower bound characterizes the cost of privacy in the mean estimation problem: the cost of privacy dominates the statistical risk when .

3 Privacy Cost of High-dimensional Mean Estimation

In this section and the subsequent Section 4, we consider the high-dimensional setting where and the population parameters of interest, such as the mean vector or the regression coefficient , are sparse. In each statistical problem investigated, we present a minimax risk lower bound with differential privacy constraint, as well as a procedure with differential privacy guarantee that attains the lower bound up to factor(s) of .

3.1 Private high-dimensional mean estimation

We first consider the problem of estimating the sparse mean vector of a -dimensional sub-Gaussian distribution, where can possibly be much larger than the sample size . We denote the parameter space of interest by

where the sparsity level is controlled by the parameter .

The tracing adversary for this problem is given by

where is an independent draw from , and

Given computed from a data set , the tracing adversary attempts to identify whether an individual belongs to , by calculating the difference of and over those coordinates where has a large value. If belongs to , the former should be correlated with and is likely to be larger than the latter.

Formally, the tracing adversary is complete and sound under appropriate sample size constraint:

Lemma 3.1.1.

If , there is a distribution such that

  1. , where is an adjacent data set of with replaced by .

In conjunction with our general lower bound result Theorem 2.1, we have

Theorem 3.1.

Let denote the collection of all -differentially private algorithms, and let be an i.i.d. sample drawn from . Suppose that , for some , and , then

The first term is the statistical minimax lower bound of sparse mean estimation (see, for example, [23]), and the second term is due to the privacy constraint. Comparing the two terms shows that, in high-dimensional sparse mean estimation, the cost of differential privacy is significant when

In the next section, we present a differentially private procedure that attains this convergence rate up to a logarithmic factor.

3.2 Rate-optimal procedures

The rate-optimal algorithms in this paper utilize some classical subroutines in the differential privacy literature, such as the Laplace and Gaussian mechanisms and reporting the noisy maximum of a vector. Before describing our rate-optimal algorithms in detail, it is helpful to review some relevant results, which will also serve as the building blocks of the differentially private linear regression methods in Section 4.

3.2.1 Basic differentially private procedures

It is frequently the case that differential privacy can be attained by adding properly scaled noises to the output of a non-private algorithm. Among the most prominent examples are the Laplace and Gaussian mechanisms.

The Laplace and Gaussian mechanisms

As the name suggests, the Laplace and Gaussian mechanisms achieve differential privacy by perturbing an algorithm with Laplace and Gaussian noises respectively. The scale of such noises is determined by the sensitivity of the algorithm:

Definition 2.

For any algorithm mapping a dataset to , The -sensitivity of is

For algorithms with finite -sensitivity, the differential privacy guarantee can be attained by adding noises sampled from a Laplace distribution.

Lemma 3.2.1 (The Laplace mechanism [16]).

For any algorithm mapping a dataset to such that , the Laplace mechanism, given by

where is an i.i.d. sample drawn from , achieves -differential privacy.

Similarly, adding Gaussian noises to algorithms with finite -sensitivity guarantees differential privacy.

Lemma 3.2.2 (The Gaussian mechanism [16]).

For any algorithm mapping a dataset to such that , the Gaussian mechanism, given by

where is an i.i.d. sample drawn from , achieves -differential privacy.

An important application of these mechanisms is differentially private selection of the maximum/minimum, which also plays a crucial role in our high-dimensional mean estimation algorithm. Next we review some algorithms for differentially private selection, to provide some concrete examples and prepare us for stating the main algorithms.

Differentially private selection

Selecting the maximum (in absolute value) coordinate of is a straightforward application of the Laplace mechanism, as follows:

Algorithm 1: PrivateMax: 1:Sample . 2:For , compute the noisy version . 3:Return and , where is an independent draw from .

Lemma 3.2.3 ([20]).

If , then PrivateMax is -differentially private.

In applications, we are often interested in finding the top- numbers with . There are two methods for this task: an iterative “Peeling” algorithm that runs the PrivateMax algorithm times, with appropriately chosen privacy parameters in each iteration.

Algorithm 2: Peeling: 1:Set . 2:for to do 3: Run PrivateMax to obtain . 4: Remove from . 5:end for 6:Report the selected pairs.

Lemma 3.2.4 ([20]).

If , then Peeling is -differentially private.

With these differentially private selection subroutine, we are ready to present the high-dimensional mean estimation algorithm in the next section.

3.2.2 Differentially-private mean estimation in high dimensions

Let denote projection onto the ball of radius in , where is a tuning parameter for the truncation level. With suitably chosen , the following algorithm attains the minimax lower bound in Theorem 3.1, up to at most a logarithmic factor in .
Algorithm 3: Private High-dimensional Mean Estimation 1:Compute 2:Find the top components of by running Peeling and set the remaining components to 0. Denote the resulting vector by . 3:Return    
In view of Theorem 3.1, the theorem below shows that the high-dimensional mean estimation algorithm is rate-optimal up to a factor of .

Theorem 3.2.

For with , if and , then Algorithm 3 is -differentially private, and

  1. if there exists a constant such that , when ,

  2. otherwise, with the choice of for a sufficiently large constant ,

Remark 3.

[12] introduced the notion of -local privacy and shows that high-dimensional estimation is effectively impossible with -local privacy constraint. In contrast, Theorem 3.2 shows that sparse mean estimation is still possible with -differential privacy constraint.

Remark 4.

The role of the truncation parameter is to control the sensitivity of the sample mean so that the Laplace/Gaussian mechanisms are applicable. can be replaced by a differentially-private estimator that consistently estimates the sample’s range. Examples of such an estimator can be found in [25]. This remark is applicable to all truncation tuning parameters in algorithms that appear in Sections 3 and 4.

3.2.3 Differentially private algorithms in the classical setting

In the classical setting of , the optimal rate of convergence of the mean estimation problem can be achieved simply by a noisy, truncated sample mean: given an i.i.d. sample , the estimator is defined as

where denotes projection onto the ball of radius in , and is an independent draw from The theoretical guarantees for this estimator are summarized in the theorem below.

Theorem 3.3.

For an i.i.d. sample with satisfying , is an -differentially private procedure, and:

  1. if there exists a constant such that , when ,

  2. otherwise, with the choice of for a sufficiently large constant ,

By comparing with Theorem 2.2, we see that the noisy truncated sample mean achieves the optimal rate of convergence up to a factor of .

4 Privacy Cost of Linear Regression

In this section, we investigate the cost of differential privacy in linear regression problems, with primary focus on the high-dimensional setting where and the regression coefficient is assumed to be sparse; the classical, low-dimensional case () will also be covered. Through the general lower bound technique described in Section 2, we are able to establish minimax lower bounds that match the minimax rate of our differentially private procedures up to factor(s) of .

4.1 Lower bound of high-dimensional linear regression

For high-dimensional sparse linear regression, we consider the following distribution space

where the parameter of interest is is defined such that is the best linear approximation of , and is a generic constant. For brevity, we use to denote .

Let denote an i.i.d. sample drawn from some , we propose the tracing adversary

where , and is a fresh independent sample with covariates .

This adversary satisfies the following properties:

Lemma 4.1.1.

Suppose that , then when , there is a distribution such that

  1. , where is an adjacent dataset of with replaced by .

The proof of this lemma, which appears in the supplementary material, includes a novel generalization of the fingerprinting lemma (see [35], [7], and [18]) to Gaussian random variables, which may be of independent interest.

We note that the extra assumption in Lemma 4.1.1 that can be gained “for free”: when it fails to hold, there would be an automatic lower bound that On the other hand, when , the general lower bound result in Theorem 2.1 is applicable, and we obtain the following lower bound result.

Theorem 4.1.

Let denote the collection of all -differentially private algorithms, and suppose the dataset consists of i.i.d. entries drawn from . Suppose that , for some , and , then

Specifically, the second term in the lower bound is a consequence of Lemma 4.1.1 and Theorem 2.1. The first term is due to the statistical minimax lower bound for high-dimensional linear regression (see, for instance, [30] and [40]).

4.2 Upper bound of high-dimensional linear regression

For high-dimensional sparse linear regression, we propose the following differentially private LASSO algorithm, which splits the sample of size into subsamples of size and iterates through the subsamples by a truncated gradient descent with random perturbation.

Algorithm 4: Differentially Private LASSO 1:Inputs: privacy parameters deign matrix ,response vector , step size , sparsity tuning parameter , truncation tuning parameter and the number of iterations . 2:Randomly split into subsets of size each. 3:Initialize the algorithm with an -sparse vector . 4:for  do 5:     , where denotes projection onto the ball of radius in . 6:      = Peeling 7:end for 8:Output .

Theorem 4.2.

Let be an i.i.d. sample drawn from . If we have that

  • , the covariance matrix of , satisfies for some constant ,

  • for some , and

  • the tuning parameters satisfy for a sufficiently large constant , , and for , it holds that

then is -differentially private, and it holds with high probability that

To the best of our knowledge, this is the first differentially private LASSO algorithm with parameter estimation consistency guarantees. In addition, in view of Theorem 4.1, we see that the proposed algorithm achieves the optimal convergence rate up to a logarithm term .

4.3 Linear regression in the classical setting

In the classical linear regression problem, we have i.i.d. observations drawn from some that belongs to the distribution space

where the parameter of interest is is defined such that is the best linear approximation of , and is a generic constant.

To apply Theorem 2.1 to deriving the lower bound for the linear regression model, we consider the following tracing adversary:

where is a fresh independent draw with the same covariates as .

The next lemma summarizes the soundness and completeness properties of the tracing adversary.

Lemma 4.3.1.

If and , there is a distribution , such that

  1. , where is an adjacent dataset of with replaced by .

As in the high-dimensional setting, the extra assumption in this lemma that can be gained “for free”.

Our minimax lower bound for private linear regression in the classical setting is presented in the theorem below:

Theorem 4.3.

Let denote the collection of all -differentially private algorithms, and suppose that , for some and , then

Similar to the other lower bound results, the two terms in this minimax lower bound correspond to the statistical risk and the risk due to privacy constraint respectively.

4.3.1 Differentially private algorithms in the classical setting

In the classical setting of , the optimal rate of convergence for differentially private linear regression can be directly achieved by perturbing the OLS estimator with suitably chosen noises.

Let denote the OLS estimator, we consider the noisy estimator