Noise-Aware Statistical Inference with Differentially Private Synthetic Data

05/28/2022
by   Ossi Räisä, et al.
0

While generation of synthetic data under differential privacy (DP) has received a lot of attention in the data privacy community, analysis of synthetic data has received much less. Existing work has shown that simply analysing DP synthetic data as if it were real does not produce valid inferences of population-level quantities. For example, confidence intervals become too narrow, which we demonstrate with a simple experiment. We tackle this problem by combining synthetic data analysis techniques from the field of multiple imputation, and synthetic data generation using noise-aware Bayesian modeling into a pipeline NA+MI that allows computing accurate uncertainty estimates for population-level quantities from DP synthetic data. To implement NA+MI for discrete data generation from marginal queries, we develop a novel noise-aware synthetic data generation algorithm NAPSU-MQ using the principle of maximum entropy. Our experiments demonstrate that the pipeline is able to produce accurate confidence intervals from DP synthetic data. The intervals become wider with tighter privacy to accurately capture the additional uncertainty stemming from DP noise.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2022

PreFair: Privately Generating Justifiably Fair Synthetic Data

When a database is protected by Differential Privacy (DP), its usability...
research
10/12/2022

Differentially Private Bootstrap: New Privacy Analysis and Inference Strategies

Differential private (DP) mechanisms protect individual-level informatio...
research
09/15/2023

DP-PQD: Privately Detecting Per-Query Gaps In Synthetic Data Generated By Black-Box Mechanisms

Synthetic data generation methods, and in particular, private synthetic ...
research
01/21/2023

Statistical Theory of Differentially Private Marginal-based Data Synthesis Algorithms

Marginal-based methods achieve promising performance in the synthetic da...
research
10/28/2021

Privacy Preserving Inference on the Ratio of Two Gaussians Using (Weighted) Sums

The ratio of two Gaussians is useful in many contexts of statistical inf...
research
04/27/2022

Spending Privacy Budget Fairly and Wisely

Differentially private (DP) synthetic data generation is a practical met...

Please sign up or login with your details

Forgot password? Click here to reset