Differentially Private Mixed-Type Data Generation For Unsupervised Learning

12/06/2019
by   Uthaipon Tantipongpipat, et al.
0

In this work we introduce the DP-auto-GAN framework for synthetic data generation, which combines the low dimensional representation of autoencoders with the flexibility of Generative Adversarial Networks (GANs). This framework can be used to take in raw sensitive data, and privately train a model for generating synthetic data that will satisfy the same statistical properties as the original data. This learned model can be used to generate arbitrary amounts of publicly available synthetic data, which can then be freely shared due to the post-processing guarantees of differential privacy. Our framework is applicable to unlabeled mixed-type data, that may include binary, categorical, and real-valued data. We implement this framework on both unlabeled binary data (MIMIC-III) and unlabeled mixed-type data (ADULT). We also introduce new metrics for evaluating the quality of synthetic mixed-type data, particularly in unsupervised settings.

READ FULL TEXT
research
01/27/2020

DP-CGAN: Differentially Private Synthetic Data and Label Generation

Generative Adversarial Networks (GANs) are one of the well-known models ...
research
12/22/2020

Differentially Private Synthetic Medical Data Generation using Convolutional GANs

Deep learning models have demonstrated superior performance in several a...
research
08/13/2020

Synthesizing Property Casualty Ratemaking Datasets using Generative Adversarial Networks

Due to confidentiality issues, it can be difficult to access or share in...
research
08/28/2023

Generating tabular datasets under differential privacy

Machine Learning (ML) is accelerating progress across fields and industr...
research
04/01/2021

Holdout-Based Fidelity and Privacy Assessment of Mixed-Type Synthetic Data

AI-based data synthesis has seen rapid progress over the last several ye...
research
07/07/2023

Programmable Synthetic Tabular Data Generation

Large amounts of tabular data remain underutilized due to privacy, data ...
research
02/16/2021

A Bayesian Framework for Generation of Fully Synthetic Mixed Datasets

Much of the micro data used for epidemiological studies contain sensitiv...

Please sign up or login with your details

Forgot password? Click here to reset