pMSE Mechanism: Differentially Private Synthetic Data with Maximal Distributional Similarity

05/23/2018
by   Joshua Snoke, et al.
0

We propose a method for the release of differentially private synthetic datasets. In many contexts, data contain sensitive values which cannot be released in their original form in order to protect individuals' privacy. Synthetic data is a protection method that releases alternative values in place of the original ones, and differential privacy (DP) is a formal guarantee for quantifying the privacy loss. We propose a method that maximizes the distributional similarity of the synthetic data relative to the original data using a measure known as the pMSE, while guaranteeing epsilon-differential privacy. Additionally, we relax common DP assumptions concerning the distribution and boundedness of the original data. We prove theoretical results for the privacy guarantee and provide simulations for the empirical failure rate of the theoretical results under typical computational limitations. We also give simulations for the accuracy of linear regression coefficients generated from the synthetic data compared with the accuracy of non-differentially private synthetic data and other differentially private methods. Additionally, our theoretical results extend a prior result for the sensitivity of the Gini Index to include continuous predictors.

READ FULL TEXT

page 8

page 9

page 10

page 16

research
01/27/2020

DP-CGAN: Differentially Private Synthetic Data and Label Generation

Generative Adversarial Networks (GANs) are one of the well-known models ...
research
01/01/2021

Disclosure Risk from Homogeneity Attack in Differentially Private Frequency Distribution

Homogeneity attack allows adversaries to obtain the exact values on the ...
research
09/05/2023

Differentially Private Synthetic Heavy-tailed Data

The U.S. Census Longitudinal Business Database (LBD) product contains em...
research
06/03/2020

One Step to Efficient Synthetic Data

We propose a general method of producing synthetic data, which is widely...
research
03/31/2023

On Rényi Differential Privacy in Statistics-Based Synthetic Data Generation

Privacy protection with synthetic data generation often uses differentia...
research
12/30/2020

PrivSyn: Differentially Private Data Synthesis

In differential privacy (DP), a challenging problem is to generate synth...

Please sign up or login with your details

Forgot password? Click here to reset