Differentially Private Synthetic Data Using KD-Trees

06/19/2023
by   Eleonora Kreačić, et al.
0

Creation of a synthetic dataset that faithfully represents the data distribution and simultaneously preserves privacy is a major research challenge. Many space partitioning based approaches have emerged in recent years for answering statistical queries in a differentially private manner. However, for synthetic data generation problem, recent research has been mainly focused on deep generative models. In contrast, we exploit space partitioning techniques together with noise perturbation and thus achieve intuitive and transparent algorithms. We propose both data independent and data dependent algorithms for ϵ-differentially private synthetic data generation whose kernel density resembles that of the real dataset. Additionally, we provide theoretical results on the utility-privacy trade-offs and show how our data dependent approach overcomes the curse of dimensionality and leads to a scalable algorithm. We show empirical utility improvements over the prior work, and discuss performance of our algorithm on a downstream classification task on a real dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/01/2023

When Synthetic Data Met Regulation

In this paper, we argue that synthetic data produced by Differentially P...
research
02/26/2020

Differentially Private Mean Embeddings with Random Features (DP-MERF) for Simple Practical Synthetic Data Generation

We present a differentially private data generation paradigm using rando...
research
11/28/2019

Comparative Study of Differentially Private Synthetic Data Algorithms and Evaluation Standards

Differentially private synthetic data generation is becoming a popular s...
research
11/07/2022

Private Set Generation with Discriminative Information

Differentially private data generation techniques have become a promisin...
research
10/13/2022

Secure Multiparty Computation for Synthetic Data Generation from Distributed Data

Legal and ethical restrictions on accessing relevant data inhibit data s...
research
06/13/2023

Continual Release of Differentially Private Synthetic Data

Motivated by privacy concerns in long-term longitudinal studies in medic...
research
07/19/2023

DP-TBART: A Transformer-based Autoregressive Model for Differentially Private Tabular Data Generation

The generation of synthetic tabular data that preserves differential pri...

Please sign up or login with your details

Forgot password? Click here to reset