Privacy of synthetic data: a statistical framework

09/03/2021
by   March Boedihardjo, et al.
0

Privacy-preserving data analysis is emerging as a challenging problem with far-reaching impact. In particular, synthetic data are a promising concept toward solving the aporetic conflict between data privacy and data sharing. Yet, it is known that accurately generating private, synthetic data of certain kinds is NP-hard. We develop a statistical framework for differentially private synthetic data, which enables us to circumvent the computational hardness of the problem. We consider the true data as a random sample drawn from a population Omega according to some unknown density. We then replace Omega by a much smaller random subset Omega^*, which we sample according to some known density. We generate synthetic data on the reduced space Omega^* by fitting the specified linear statistics obtained from the true data. To ensure privacy we use the common Laplacian mechanism. Employing the concept of Renyi condition number, which measures how well the sampling distribution is correlated with the population distribution, we derive explicit bounds on the privacy and accuracy provided by the proposed method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/16/2020

Really Useful Synthetic Data – A Framework to Evaluate the Quality of Differentially Private Synthetic Data

Recent advances in generating synthetic data that allow to add principle...
research
12/10/2019

Privacy-preserving data sharing via probabilistic modelling

Differential privacy allows quantifying privacy loss from computations o...
research
07/13/2021

Covariance's Loss is Privacy's Gain: Computationally Efficient, Private and Accurate Synthetic Data

The protection of private information is of vital importance in data-dri...
research
03/03/2023

Differentially Private Neural Tangent Kernels for Privacy-Preserving Data Generation

Maximum mean discrepancy (MMD) is a particularly useful distance metric ...
research
06/02/2019

Generating Poisson-Distributed Differentially Private Synthetic Data

The dissemination of synthetic data can be an effective means of making ...
research
04/22/2023

Differentially Private Synthetic Data Generation via Lipschitz-Regularised Variational Autoencoders

Synthetic data has been hailed as the silver bullet for privacy preservi...
research
10/13/2022

Secure Multiparty Computation for Synthetic Data Generation from Distributed Data

Legal and ethical restrictions on accessing relevant data inhibit data s...

Please sign up or login with your details

Forgot password? Click here to reset