Private sampling: a noiseless approach for generating differentially private synthetic data

09/30/2021
by   March Boedihardjo, et al.
0

In a world where artificial intelligence and data science become omnipresent, data sharing is increasingly locking horns with data-privacy concerns. Differential privacy has emerged as a rigorous framework for protecting individual privacy in a statistical database, while releasing useful statistical information about the database. The standard way to implement differential privacy is to inject a sufficient amount of noise into the data. However, in addition to other limitations of differential privacy, this process of adding noise will affect data accuracy and utility. Another approach to enable privacy in data sharing is based on the concept of synthetic data. The goal of synthetic data is to create an as-realistic-as-possible dataset, one that not only maintains the nuances of the original data, but does so without risk of exposing sensitive information. The combination of differential privacy with synthetic data has been suggested as a best-of-both-worlds solutions. In this work, we propose the first noisefree method to construct differentially private synthetic data; we do this through a mechanism called "private sampling". Using the Boolean cube as benchmark data model, we derive explicit bounds on accuracy and privacy of the constructed synthetic data. The key mathematical tools are hypercontractivity, duality, and empirical processes. A core ingredient of our private sampling mechanism is a rigorous "marginal correction" method, which has the remarkable property that importance reweighting can be utilized to exactly match the marginals of the sample to the marginals of the population.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/16/2020

Really Useful Synthetic Data – A Framework to Evaluate the Quality of Differentially Private Synthetic Data

Recent advances in generating synthetic data that allow to add principle...
research
03/31/2023

On Rényi Differential Privacy in Statistics-Based Synthetic Data Generation

Privacy protection with synthetic data generation often uses differentia...
research
04/20/2022

Private measures, random walks, and synthetic data

Differential privacy is a mathematical concept that provides an informat...
research
10/30/2019

Chasing Accuracy and Privacy, and Catching Both: A Literature Survey on Differentially Private Histogram Publication

Histograms and synthetic data are of key importance in data analysis. Ho...
research
05/28/2022

MC-GEN:Multi-level Clustering for Private Synthetic Data Generation

Nowadays, machine learning is one of the most common technology to turn ...
research
08/04/2021

Privacy-Preserving Synthetic Location Data in the Real World

Sharing sensitive data is vital in enabling many modern data analysis an...
research
08/26/2017

Plausible Deniability for Privacy-Preserving Data Synthesis

Releasing full data records is one of the most challenging problems in d...

Please sign up or login with your details

Forgot password? Click here to reset