DP-PQD: Privately Detecting Per-Query Gaps In Synthetic Data Generated By Black-Box Mechanisms

09/15/2023
by   Shweta Patwa, et al.
0

Synthetic data generation methods, and in particular, private synthetic data generation methods, are gaining popularity as a means to make copies of sensitive databases that can be shared widely for research and data analysis. Some of the fundamental operations in data analysis include analyzing aggregated statistics, e.g., count, sum, or median, on a subset of data satisfying some conditions. When synthetic data is generated, users may be interested in knowing if their aggregated queries generating such statistics can be reliably answered on the synthetic data, for instance, to decide if the synthetic data is suitable for specific tasks. However, the standard data generation systems do not provide "per-query" quality guarantees on the synthetic data, and the users have no way of knowing how much the aggregated statistics on the synthetic data can be trusted. To address this problem, we present a novel framework named DP-PQD (differentially-private per-query decider) to detect if the query answers on the private and synthetic datasets are within a user-specified threshold of each other while guaranteeing differential privacy. We give a suite of private algorithms for per-query deciders for count, sum, and median queries, analyze their properties, and evaluate them experimentally.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2022

PreFair: Privately Generating Justifiably Fair Synthetic Data

When a database is protected by Differential Privacy (DP), its usability...
research
06/13/2022

Private Synthetic Data with Hierarchical Structure

We study the problem of differentially private synthetic data generation...
research
05/28/2022

Noise-Aware Statistical Inference with Differentially Private Synthetic Data

While generation of synthetic data under differential privacy (DP) has r...
research
12/30/2020

PrivSyn: Differentially Private Data Synthesis

In differential privacy (DP), a challenging problem is to generate synth...
research
09/28/2020

Oblivious Sampling Algorithms for Private Data Analysis

We study secure and privacy-preserving data analysis based on queries ex...
research
06/14/2021

Iterative Methods for Private Synthetic Data: Unifying Framework and New Methods

We study private synthetic data generation for query release, where the ...
research
06/05/2023

Generating Private Synthetic Data with Genetic Algorithms

We study the problem of efficiently generating differentially private sy...

Please sign up or login with your details

Forgot password? Click here to reset