Random Sampling Plus Fake Data: Multidimensional Frequency Estimates With Local Differential Privacy

09/15/2021
by   Héber H. Arcolezi, et al.
1

With local differential privacy (LDP), users can privatize their data and thus guarantee privacy properties before transmitting it to the server (a.k.a. the aggregator). One primary objective of LDP is frequency (or histogram) estimation, in which the aggregator estimates the number of users for each possible value. In practice, when a study with rich content on a population is desired, the interest is in the multiple attributes of the population, that is to say, in multidimensional data (d ≥ 2). However, contrary to the problem of frequency estimation of a single attribute (the majority of the works), the multidimensional aspect imposes to pay particular attention to the privacy budget. This one can indeed grow extremely quickly due to the composition theorem. To the authors' knowledge, two solutions seem to stand out for this task: 1) splitting the privacy budget for each attribute, i.e., send each value with ϵ/d-LDP (Spl), and 2) random sampling a single attribute and spend all the privacy budget to send it with ϵ-LDP (Smp). Although Smp adds additional sampling error, it has proven to provide higher data utility than the former Spl solution. However, we argue that aggregators (who are also seen as attackers) are aware of the sampled attribute and its LDP value, which is protected by a "less strict" e^ϵ probability bound (rather than e^ϵ/d). This way, we propose a solution named Random Sampling plus Fake Data (RS+FD), which allows creating uncertainty over the sampled attribute by generating fake data for each non-sampled attribute; RS+FD further benefits from amplification by sampling. We theoretically and experimentally validate our proposed solution on both synthetic and real-world datasets to show that RS+FD achieves nearly the same or better utility than the state-of-the-art Smp solution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2021

Improving the Utility of Locally Differentially Private Protocols for Longitudinal and Multidimensional Frequency Estimates

This paper investigates the problem of collecting multidimensional data ...
research
09/04/2022

On the Risks of Collecting Multidimensional Data Under Local Differential Privacy

The private collection of multiple statistics from a population is a fun...
research
05/05/2022

Multi-Freq-LDPy: Multiple Frequency Estimation Under Local Differential Privacy in Python

This paper introduces the Python package for multiple frequency estimat...
research
08/28/2019

Rényi Differential Privacy of the Sampled Gaussian Mechanism

The Sampled Gaussian Mechanism (SGM)---a composition of subsampling and ...
research
04/01/2022

LDP-IDS: Local Differential Privacy for Infinite Data Streams

Streaming data collection is essential to real-time data analytics in va...
research
06/28/2019

Collecting and Analyzing Multidimensional Data with Local Differential Privacy

Local differential privacy (LDP) is a recently proposed privacy standard...
research
07/25/2023

Random (Un)rounding : Vulnerabilities in Discrete Attribute Disclosure in the 2021 Canadian Census

The 2021 Canadian census is notable for using a unique form of privacy, ...

Please sign up or login with your details

Forgot password? Click here to reset