With the explosion of machine learning algorithms, and their applications in many areas of science, technology, and governance, data is becoming an extremely valuable asset. However, with the growing power of machine learning algorithms in learning individual behavioral patterns from diverse data sources, privacy is becoming a major concern, calling for strict regulations on data ownership and distribution. On the other hand, many recent examples of de-anonymization attacks on publicly available anonymized data (,) show that regulation on its own will not be sufficient to limit access to private data. An alternative approach, also considered in this paper, is to process the data at the time of release, such that no private information is leaked, called perfect privacy
. Assuming that the joint distribution of the observed data and the latent variables that should be kept private is known, an information-theoretic study is carried out in this paper to characterize the fundamental limits on perfect privacy.
Consider a situation in which Alice wants to release some useful information about herself to Bob, represented by random variable , and she receives some utility from this disclosure of information. This may represent some data measured by a health monitoring system , her smart meter measurements , or the sequence of a portion of her DNA to detect potential illnesses . At the same time, she wishes to conceal from Bob some private information which depends on , represented by . To this end, instead of letting Bob have a direct access to , a privacy-preserving mapping is applied, whereby a distorted version of , denoted by , is revealed to Bob. In this context, privacy and utility are competing goals: The more distorted version of is revealed by the privacy mapping, the less information can Bob infer about , while the less utility can be obtained. This trade-off is the very result of the dependencies between and . An extreme point of this trade-off is the scenario termed as perfect privacy, which refers to the situation where nothing is allowed to be inferred about by Bob through the disclosure of . This condition is modelled by the statistical independence of and .
The concern of privacy and the design of privacy-preserving mappings has been the focus of a broad area of research, e.g., [6, 7, 8, 9], while the information-theoretic view of privacy has gained increasing attention more recently. In , a general statistical inference framework is proposed to capture the loss of privacy in legitimate transactions of data. In , the privacy-utility trade-off under the self-information cost function (log-loss) is considered and called the privacy funnel, which is closely related to the information bottleneck introduced in . In , sharp bounds on the optimal privacy-utility trade-off for the privacy funnel are derived, and an alternative characterization of the perfect privacy condition (see ) is proposed. Measuring both the privacy and the utility in terms of mutual information, perfect privacy is fully characterized in  for the binary case. Furthermore, a new quantity is introduced to capture the amount of private information about the latent variable carried by the observable data .
We study the information theoretic perfect privacy in this paper, and our main contributions can be summarized as follows:
Adopting mutual information as the utility measure, i.e., , we show that the maximum utility under perfect privacy is the solution to a standard linear program (LP). We obtain similar results when other measures of utility, e.g., the minimum mean-square error or the probability of error, are considered.
We show that when is a jointly Gaussian pair with non-zero correlation coefficient, for the privacy mapping , perfect privacy is not feasible. In other words, is independent of if and only if it is also independent of , i.e., maximum privacy is obtained at the expense of zero utility. This, however, is not the case when the mapping is of the form ; that is, when the encoder has access to the private latent variables as well as the data.
Denoting the maximum under perfect privacy by , we characterize the relationship between the non-private information about carried by , as defined in , and .
Considering mutual information as both the privacy and the utility measure, the optimal utility-privacy trade-off curve, characterized by the supremum of over vs. , is not a straightforward problem. Instead, we investigate the slope of this curve when . This linear approximation to the trade-off curve provides the maximum utility rate when a small amount of private data leakage is allowed. We obtain this slope when perfect privacy is not feasible, i.e., , and propose a lower bound on it when perfect privacy is feasible, i.e., .
Random variables are denoted by capital letters, their realizations by lower case letters, and their alphabets by capital letters in calligraphic font. Matrices and vectors are denoted by bold capital and bold lower case letters, respectively. For integers, we have the discrete interval , and the tuple is written in short as . For an integer , denotes an -dimensional all-one column vector. For a random variable , with finite , the probability simplex is the standard -simplex given by
Furthermore, to each probability mass function (pmf) on , denoted by , corresponds a matrix , where is a probability vector in , whose -th element is (). For a pair of random variables with joint pmf , is an matrix with -th entry equal to . Likewise, is an matrix with -th entry equal to .
denotes the cumulative distribution function (CDF) of random variable
, and if it admits a density, its probability density function (pdf) is denoted by. For , denotes the binary entropy function. The unit-step function is denoted by . Throughout the paper, for a random variable with the corresponding probability vector , and are written interchangeably, and so are the quantities and .
Ii System model and preliminaries
Consider a pair of random variables () distributed according to the joint distribution . We assume that and , since otherwise the supports or/and could have been modified accordingly. This equivalently means that the probability vectors and are in the interior of their corresponding probability simplices, i.e. and . Let denote the private/sensitive data that the user wants to conceal and denote the useful data the user wishes to disclose. Assume that the privacy mapping/data release mechanism takes as input and maps it to the released data denoted by . In this scenario,
form a Markov chain, and the privacy mapping is captured by the conditional distribution.
Let be defined111This is done with an abuse of notation, as should be written as . as 
In other words, when mutual information is adopted as a measure of both utility and privacy, (1) gives the best utility that can be obtained by privacy mappings which keep the sensitive data () private within a certain level of .
Proposition 1. It is sufficient to have . Also, we can write
The proof is provided in Appendix A. ∎
Later, we show that it is sufficient to restrict our attention to , when .
Iii Perfect Privacy
Definition. For a pair of random variables , we say that perfect privacy is feasible if there exists a random variable that satisfies the following conditions:
forms a Markov chain,
, i.e., and are independent,
, i.e., and are not independent.
From the above definition, we can say that perfect privacy being feasible is equivalent to having .
Proposition 2. Perfect privacy is feasible if and only if
In [14, Theorem 4], the authors showed that for a given pair of random variables (), there exists a random variable satisfying the conditions of perfect privacy if and only if the columns of are linearly dependent. Equivalently, there must exist a non-zero vector , such that , which is equivalent to (3). ∎
Proposition 3. For the null space of , we have
Therefore, for any , there exists a positive real number , such that .
Theorem 1. For a pair of random variables (), is the solution to a standard linear program (LP) as given in (13).
Let be an matrix with -th entry equal to .
From the singular value decomposition222We assume, without loss of generality, that the singular values are arranged in a descending order. of , we have
where the matrix of right eigenvectors is
(3) is equivalent to having the null space of written as
The random variables and are independent if and only if
Furthermore, if form a Markov chain, (8) is equivalent to
From the column vectors in (6) and the definition of index afterwards, construct the matrix as
From (7), we can write
Therefore, for the triplet , if forms a Markov chain and , we must have , where is a convex polytope defined as
Note that any element of is a probability vector according to Proposition 3.
On the other hand, for any pair , for which , we can simply have and . Therefore, we can write
This leads us to
where in (12), since the minimization is over rather than , a constraint was added to preserve the marginal distribution .
Proposition 4. In minimizing over , it is sufficient to consider only extreme points of .
The proof is provided in Appendix B. ∎
From Proposition 4, the problem in (12) can be divided into two phases: in phase one, the extreme points of set are identified, while in phase two, proper weights over these extreme points are obtained to minimize the objective function.
The procedure of finding the extreme points of is as follows. Pick a set of indices that correspond to linearly independent columns of matrix in (9). Since matrix is full rank (note that its rows are mutually orthonormal and ), there are at most ways of choosing linearly independent columns of . Let be an matrix whose columns are the columns of indexed by the indices in . Also, for any , let , where and are -dimensional and -dimensional vectors whose elements are the elements of indexed by the indices in and , respectively.
For any basic feasible solution , there exists a set of indices that correspond to a set of linearly independent columns of , such that the corresponding vector of , i.e. , satisfies the following
On the other hand, for any set of indices that correspond to a set of linearly independent columns of , if , then is the corresponding vector of a basic feasible solution. Hence, the extreme points of are obtained as mentioned above, and their number is at most .
For the second phase, we proceed as follows. Assume that the extreme points of , found in the previous phase, are denoted by . Then (12) is equivalent to
where is a -dimensional weight vector, and it can be verified that the constraint is satisfied if the constraint in (13) is met. The problem in (13) is a standard linear program (LP), which can be efficiently solved. ∎
The following example clarifies the optimization procedure in the proof of Theorem 1.
Example 1. Consider the pair joint distribution is specified by the following matrix:
This results in
Since , we have ; and therefore, . The singular value decomposition of is
where it is obvious that columns 3 and 4 of the matrix of the right eigenvectors span the null space of . Hence, the matrix in (9) is given by
For the first phase, i.e., finding the extreme points of , it is clear that there are possible ways of choosing 2 linearly independent columns of . Hence, the index set can be or . From , we get
It is obvious that and are not feasible, since they do not satisfy . Therefore, the extreme points of are obtained as
Now, for the second phase, the standard LP in (13) is written as
where the minimum value of the objective function is bits, which is achieved by
Therefore, (note that ), , and . Finally, corresponds to the matrix given as
Remark 1. It can be verified that in the degenerate case of , we have , or equivalently, . In this case, the extreme points of have zero entropy. Therefore, the minimum value of is zero, and with and , which is the extreme point of . As a result, , which is also consistent with the fact that is independent of and maximizes .
Iii-a MMSE under perfect privacy
Assume that instead of , the goal is to minimize under the perfect privacy constraint. This can be formulated as follows:
where the expectation is according to the joint distribution . Obviously, an upperbound for (14) is , as one could choose . In what follows, we show that (14) has a similar solution to that of . The only difference is that the realizations of , i.e., the particular values of the elements in , are irrelevant for the solution of , since only their mass probabilities have a role when evaluating , while the objective function in (14) takes into account both the pmf and the realizations of . We can write
) is a classical result from MMSE estimation; and in (16), we have used . Therefore, from (11) and (16), we have
where the equality holds if and only if .
Proposition 5. is a strictly concave function of .
The proof is provided in Appendix C. ∎
From the concavity of in Proposition 5, we can apply the reasoning in the proof of Proposition 4, and conclude that in (17), it is sufficient to consider only the extreme points of . Hence, the problem has two phases, where in phase one, the extreme points of are found. For the second phase, denoting the extreme points of by , (17) boils down to a standard linear program as follows.
where () denotes under , i.e., when . Finally, once the LP in (18) is solved, the realizations of the random variable are set to equalize the expectations of under the corresponding distributions of those extreme points () of with non-zero mass probability. For example, the problem in (14) for the pair given in Example 1 is
where the extreme points of , i.e., (), are already known from Example 1. The minimum value of the standard LP in (III-B) is , which is achieved by
Therefore, MMSE (), and . By setting , we obtain .
Iii-B Minimum probability of error under perfect privacy
The objective of the optimization can be the error probability as
Obviously, an upper bound for (20) is as one could choose . For an arbitrary joint distribution on , we can write
where (21) holds with equality when . Then,
It can be verified that is convex in . Hence, following the same reasoning in the proof of Proposition 4, it is sufficient to consider only the extreme points of in the optimization in (22). Therefore, the problem has two phases: in phase one, the extreme points of are identified. For the second phase, denoting the extreme points of by , the problem boils down to a standard linear program as follows:
where is the maximum element of the vector Once the LP is solved, the optimal conditionals and the optimal mass probabilities of are obtained. Finally, the realizations of the random variable are set to equalize . For example, the problem in (20) for the pair given in Example 1 is
where the extreme points of , i.e. (), are already known from Example 1. The minimum probability of error is obtained as () achieved by
Hence, with , and
Thus far, we have investigated the constraint of perfect privacy when . The next theorem and its succeeding example consider two cases in which at least one of and is infinite. The following theorem shows that perfect privacy is not feasible for the (correlated) jointly Gaussian pair.
Theorem 2. Let be a pair of jointly Gaussian random variables, where
in which , since otherwise . We have for the above pair.
If there exists a random variable such that form a Markov chain and , we must have , and hence, , since has a density. Equivalently, we must have
Also, to have , there must exists at least , such that
It is known that conditioned on is also Gaussian, given by
Multiplying both sides of (29) by , and taking the integral with respect to , we obtain
By Fubini’s theorem333Note that ., we can write
After some manipulations, we get
) is a Fourier transform. Due to the invertiblity of the Fourier transform, i.e., we must have . Therefore, (27) does not hold and perfect privacy is not feasible for the (correlated) jointly Gaussian pair . ∎
In the following example, we consider and . We observe that we can have bounded and , while an unbounded without even revealing undistorted. This renders the usage of mutual information as a measure of dependence counterintuitive for continuous alphabets. This is related to the fact that differential entropy cannot be interpreted as a measure of the information content in a random variable, as it can take negative values.
Example 2. Let with . Let and . It can be verified that the probability density function (pdf) of is
and the conditional pmf of conditioned on is given by
Since the support of is the interval , the support of must be a subset of . Also, the independence of and in implies
Finally, in order to preserve the pdf of in (31), the conditional CDF must satisfy the following
where is the CDF corresponding to the pdf in (31).
Let be the set of all CDFs defined on and be defined as
We can write