Averaging Attacks on Bounded Perturbation Algorithms
We describe and evaluate an attack that reconstructs the histogram of any target attribute of a sensitive dataset which can only be queried through a type of privacy-preserving algorithms which we call bounded perturbation algorithms. A defining property of such an algorithm is that it perturbs answers to the queries by adding noise distributed within a bounded range (possibly undisclosed). We evaluate the attack by querying a synthetic dataset via the bounded perturbation algorithm [15] used in the Australian Bureau of Statistics (ABS) online TableBuilder tool. While the attack is also applicable to the actual Australian census data available through TableBuilder, for ethical considerations we only show the success of the attack on synthetic data. We note, however, that the perturbation method used in the online ABS TableBuilder tool is vulnerable to this attack. Our results show that a small value of the perturbation parameter (desirable from a utility point of view), e.g., perturbing answers by uniformly sampling (integral) noise within range <± 5, can be retrieved with less than 200 queries with a probability of more than 0.95. This probability reaches 1 exponentially with only a linear increase in the number of queries. Furthermore, we show that the true count behind any target attribute value can be retrieved with only 400 queries with a probability of more than 0.96, and the entire column of more than 100 different attribute values can be retrieved with a corresponding linear increase in the number of queries. We argue that the best mitigation strategy is to carefully upscale noise as a function of the number of queries allowed. Our attacks are a practical illustration of the (informal) fundamental law of information recovery which states that "overly accurate estimates of too many statistics completely destroys privacy" [2, 6].
READ FULL TEXT