Comparative Study of Differentially Private Synthetic Data Algorithms and Evaluation Standards

11/28/2019
by   Claire McKay Bowen, et al.
0

Differentially private synthetic data generation is becoming a popular solution that releases analytically useful data while preserving the privacy of individuals in the data. In order to utilize these algorithms for public policy decisions, policymakers need an accurate understanding of these algorithms' comparative performance. Correspondingly, data practitioners also require standard metrics for evaluating the analytic qualities of the synthetic data. In this paper, we present an in-depth evaluation of several differentially private synthetic data algorithms using the actual differentially private synthetic data sets created by contestants in the recent National Institute of Standards and Technology's (NIST) "Differentially Private Synthetic Data Challenge." We offer both theoretical and practical analyses of these algorithms. We frame the NIST data challenge methods within the broader differentially private synthetic data literature. In addition, we implement two of our own utility metric algorithms on the differentially private synthetic data and compare these metrics' results to the NIST data challenge outcome. Our comparative assessment of the differentially private data synthesis methods and the quality metrics shows the relative usefulness, general strengths and weaknesses, preferred choices of algorithms and metrics. Finally we give implications of our evaluation for policymakers seeking to implement differentially private synthetic data algorithms on future data products.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/01/2023

When Synthetic Data Met Regulation

In this paper, we argue that synthetic data produced by Differentially P...
research
11/11/2020

Differentially Private Synthetic Data: Applied Evaluations and Enhancements

Machine learning practitioners frequently seek to leverage the most info...
research
06/03/2022

Utility and Disclosure Risk for Differentially Private Synthetic Categorical Data

This paper introduces two methods of creating differentially private (DP...
research
03/03/2023

Differentially Private Neural Tangent Kernels for Privacy-Preserving Data Generation

Maximum mean discrepancy (MMD) is a particularly useful distance metric ...
research
01/25/2022

A Latent Class Modeling Approach for Generating Synthetic Data and Making Posterior Inferences from Differentially Private Counts

Several algorithms exist for creating differentially private counts from...
research
03/24/2023

Differentially Private Synthetic Control

Synthetic control is a causal inference tool used to estimate the treatm...
research
06/19/2023

Differentially Private Synthetic Data Using KD-Trees

Creation of a synthetic dataset that faithfully represents the data dist...

Please sign up or login with your details

Forgot password? Click here to reset