Privacy-preserving data sharing via probabilistic modelling

12/10/2019
by   Joonas Jälkö, et al.
29

Differential privacy allows quantifying privacy loss from computations on sensitive personal data. This loss grows with the number of accesses to the data, making it hard to open the use of such data while respecting privacy. To avoid this limitation, we propose privacy-preserving release of a synthetic version of a data set, which can be used for an unlimited number of analyses with any methods, without affecting the privacy guarantees. The synthetic data generation is based on differentially private learning of a generative probabilistic model which can capture the probability distribution of the original data. We demonstrate empirically that we can reliably reproduce statistical discoveries from the synthetic data. We expect the method to have broad use in sharing anonymized versions of key data sets for research.

READ FULL TEXT

page 2

page 6

research
04/16/2020

Really Useful Synthetic Data – A Framework to Evaluate the Quality of Differentially Private Synthetic Data

Recent advances in generating synthetic data that allow to add principle...
research
08/23/2018

Privacy-Preserving Synthetic Datasets Over Weakly Constrained Domains

Techniques to deliver privacy-preserving synthetic datasets take a sensi...
research
07/04/2023

Approximate, Adapt, Anonymize (3A): a Framework for Privacy Preserving Training Data Release for Machine Learning

The availability of large amounts of informative data is crucial for suc...
research
07/13/2021

Covariance's Loss is Privacy's Gain: Computationally Efficient, Private and Accurate Synthetic Data

The protection of private information is of vital importance in data-dri...
research
02/27/2012

Marginality: a numerical mapping for enhanced treatment of nominal and hierarchical attributes

The purpose of statistical disclosure control (SDC) of microdata, a.k.a....
research
05/28/2022

MC-GEN:Multi-level Clustering for Private Synthetic Data Generation

Nowadays, machine learning is one of the most common technology to turn ...
research
09/03/2021

Privacy of synthetic data: a statistical framework

Privacy-preserving data analysis is emerging as a challenging problem wi...

Please sign up or login with your details

Forgot password? Click here to reset