dpart: Differentially Private Autoregressive Tabular, a General Framework for Synthetic Data Generation

07/12/2022
by   Sofiane Mahiou, et al.
0

We propose a general, flexible, and scalable framework dpart, an open source Python library for differentially private synthetic data generation. Central to the approach is autoregressive modelling – breaking the joint data distribution to a sequence of lower-dimensional conditional distributions, captured by various methods such as machine learning models (logistic/linear regression, decision trees, etc.), simple histogram counts, or custom techniques. The library has been created with a view to serve as a quick and accessible baseline as well as to accommodate a wide audience of users, from those making their first steps in synthetic data generation, to more experienced ones with domain expertise who can configure different aspects of the modelling and contribute new methods/mechanisms. Specific instances of dpart include Independent, an optimized version of PrivBayes, and a newly proposed model, dp-synthpop. Code: https://github.com/hazy/dpart

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/11/2021

Winning the NIST Contest: A scalable and general approach to differentially private synthetic data

We propose a general approach for differentially private synthetic data ...
research
12/16/2021

Benchmarking Differentially Private Synthetic Data Generation Algorithms

This work presents a systematic benchmark of differentially private synt...
research
08/24/2021

Bias Mitigated Learning from Differentially Private Synthetic Data: A Cautionary Tale

Increasing interest in privacy-preserving machine learning has led to ne...
research
07/19/2023

DP-TBART: A Transformer-based Autoregressive Model for Differentially Private Tabular Data Generation

The generation of synthetic tabular data that preserves differential pri...
research
03/03/2023

Differentially Private Neural Tangent Kernels for Privacy-Preserving Data Generation

Maximum mean discrepancy (MMD) is a particularly useful distance metric ...
research
01/25/2022

A Latent Class Modeling Approach for Generating Synthetic Data and Making Posterior Inferences from Differentially Private Counts

Several algorithms exist for creating differentially private counts from...
research
05/24/2023

Differentially Private Synthetic Data via Foundation Model APIs 1: Images

Generating differentially private (DP) synthetic data that closely resem...

Please sign up or login with your details

Forgot password? Click here to reset