On Perfect Privacy and Maximal Correlation
The problem of private data disclosure is studied from an information theoretic perspective. Considering a pair of correlated random variables (X,Y), where Y denotes the observed data while X denotes the private latent variables, the following problem is addressed: What is the maximum information that can be revealed about Y, while disclosing no information about X? Assuming that a Markov kernel maps Y to the revealed information U, it is shown that the maximum mutual information between Y and U, i.e., I(Y;U), can be obtained as the solution of a standard linear program, when X and U are required to be independent, called perfect privacy. This solution is shown to be greater than or equal to the non-private information about X carried by Y. Maximal information disclosure under perfect privacy is is shown to be the solution of a linear program also when the utility is measured by the reduction in the mean square error, E[(Y-U)^2], or the probability of error, Pr{Y≠ U}. For jointly Gaussian (X,Y), it is shown that perfect privacy is not possible if the kernel is applied to only Y; whereas perfect privacy can be achieved if the mapping is from both X and Y; that is, if the private latent variables can also be observed at the encoder. Next, measuring the utility and privacy by I(Y;U) and I(X;U), respectively, the slope of the optimal utility-privacy trade-off curve is studied when I(X;U)=0. Finally, through a similar but independent analysis, an alternative characterization of the maximal correlation between two random variables is provided.
READ FULL TEXT