Developing Non-Stochastic Privacy-Preserving Policies Using Agglomerative Clustering
We consider a non-stochastic privacy-preserving problem in which an adversary aims to infer sensitive information S from publicly accessible data X without using statistics. We consider the problem of generating and releasing a quantization X̂ of X to minimize the privacy leakage of S to X̂ while maintaining a certain level of utility (or, inversely, the quantization loss). The variables S and S are treated as bounded and non-probabilistic, but are otherwise general. We consider two existing non-stochastic privacy measures, namely the maximum uncertainty reduction L_0(S →X̂) and the refined information I_*(S; X̂) (also called the maximin information) of S. For each privacy measure, we propose a corresponding agglomerative clustering algorithm that converges to a locally optimal quantization solution X̂ by iteratively merging elements in the alphabet of X. To instantiate the solution to this problem, we consider two specific utility measures, the worst-case resolution of X by observing X̂ and the maximal distortion of the released data X̂. We show that the value of the maximin information I_*(S; X̂) can be determined by dividing the confusability graph into connected subgraphs. Hence, I_*(S; X̂) can be reduced by merging nodes connecting subgraphs. The relation to the probabilistic information-theoretic privacy is also studied by noting that the Gács-Körner common information is the stochastic version of I_* and indicates the attainability of statistical indistinguishability.
READ FULL TEXT