Generating Poisson-Distributed Differentially Private Synthetic Data

06/02/2019
by   Harrison Quick, et al.
0

The dissemination of synthetic data can be an effective means of making information from sensitive data publicly available while reducing the risk of disclosure associated with releasing the sensitive data directly. While mechanisms exist for synthesizing data that satisfy formal privacy guarantees, the utility of the synthetic data is often an afterthought. More recently, the use of methods from the disease mapping literature has been proposed to generate spatially-referenced synthetic data with high utility, albeit without formal privacy guarantees. The objective for this paper is to help bridge the gap between the disease mapping and the formal privacy literatures. In particular, we extend an existing approach for generating formally private synthetic data to the case of Poisson-distributed count data in a way that allows for the infusion of prior information. To evaluate the utility of the synthetic data, we conducted a simulation study inspired by publicly available, county-level heart disease-related death counts. The results of this study demonstrate that the proposed approach for generating differentially private synthetic data outperforms a popular technique when the counts correspond to events arising from subgroups with unequal population sizes or unequal event rates.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/16/2020

Really Useful Synthetic Data – A Framework to Evaluate the Quality of Differentially Private Synthetic Data

Recent advances in generating synthetic data that allow to add principle...
research
01/25/2022

A Latent Class Modeling Approach for Generating Synthetic Data and Making Posterior Inferences from Differentially Private Counts

Several algorithms exist for creating differentially private counts from...
research
02/11/2023

Algorithmically Effective Differentially Private Synthetic Data

We present a highly effective algorithmic approach for generating ε-diff...
research
08/26/2022

Epistemic Parity: Reproducibility as an Evaluation Metric for Differential Privacy

Differential privacy mechanisms are increasingly used to enable public r...
research
09/03/2021

Privacy of synthetic data: a statistical framework

Privacy-preserving data analysis is emerging as a challenging problem wi...

Please sign up or login with your details

Forgot password? Click here to reset