PrivSyn: Differentially Private Data Synthesis

by   Zhikun Zhang, et al.

In differential privacy (DP), a challenging problem is to generate synthetic datasets that efficiently capture the useful information in the private data. The synthetic dataset enables any task to be done without privacy concern and modification to existing algorithms. In this paper, we present PrivSyn, the first automatic synthetic data generation method that can handle general tabular datasets (with 100 attributes and domain size >2^500). PrivSyn is composed of a new method to automatically and privately identify correlations in the data, and a novel method to generate sample data from a dense graphic model. We extensively evaluate different methods on multiple datasets to demonstrate the performance of our method.



page 22

page 23


DP-CGAN: Differentially Private Synthetic Data and Label Generation

Generative Adversarial Networks (GANs) are one of the well-known models ...

pMSE Mechanism: Differentially Private Synthetic Data with Maximal Distributional Similarity

We propose a method for the release of differentially private synthetic ...

Kamino: Constraint-Aware Differentially Private Data Synthesis

Organizations are increasingly relying on data to support decisions. Whe...

Winning the NIST Contest: A scalable and general approach to differentially private synthetic data

We propose a general approach for differentially private synthetic data ...

Iterative Methods for Private Synthetic Data: Unifying Framework and New Methods

We study private synthetic data generation for query release, where the ...

Privacy of synthetic data: a statistical framework

Privacy-preserving data analysis is emerging as a challenging problem wi...

Differentially Private Synthetic Medical Data Generation using Convolutional GANs

Deep learning models have demonstrated superior performance in several a...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.