Improving Correlation Capture in Generating Imbalanced Data using Differentially Private Conditional GANs

06/28/2022
by   Chang Sun, et al.
0

Despite the remarkable success of Generative Adversarial Networks (GANs) on text, images, and videos, generating high-quality tabular data is still under development owing to some unique challenges such as capturing dependencies in imbalanced data, optimizing the quality of synthetic patient data while preserving privacy. In this paper, we propose DP-CGANS, a differentially private conditional GAN framework consisting of data transformation, sampling, conditioning, and networks training to generate realistic and privacy-preserving tabular data. DP-CGANS distinguishes categorical and continuous variables and transforms them to latent space separately. Then, we structure a conditional vector as an additional input to not only presents the minority class in the imbalanced data, but also capture the dependency between variables. We inject statistical noise to the gradients in the networking training process of DP-CGANS to provide a differential privacy guarantee. We extensively evaluate our model with state-of-the-art generative models on three public datasets and two real-world personal health datasets in terms of statistical similarity, machine learning performance, and privacy measurement. We demonstrate that our model outperforms other comparable models, especially in capturing dependency between variables. Finally, we present the balance between data utility and privacy in synthetic data generation considering the different data structure and characteristics of real-world datasets such as imbalance variables, abnormal distributions, and sparsity of data.

READ FULL TEXT
research
01/27/2020

DP-CGAN: Differentially Private Synthetic Data and Label Generation

Generative Adversarial Networks (GANs) are one of the well-known models ...
research
08/13/2020

Synthesizing Property Casualty Ratemaking Datasets using Generative Adversarial Networks

Due to confidentiality issues, it can be difficult to access or share in...
research
10/22/2020

DPD-InfoGAN: Differentially Private Distributed InfoGAN

Generative Adversarial Networks (GANs) are deep learning architectures c...
research
03/02/2023

GlucoSynth: Generating Differentially-Private Synthetic Glucose Traces

In this paper we focus on the problem of generating high-quality, privat...
research
08/20/2020

Conditional Wasserstein GAN-based Oversampling of Tabular Data for Imbalanced Learning

Class imbalance is a common problem in supervised learning and impedes t...
research
08/28/2023

Generating tabular datasets under differential privacy

Machine Learning (ML) is accelerating progress across fields and industr...
research
02/09/2019

Passing Tests without Memorizing: Two Models for Fooling Discriminators

We introduce two mathematical frameworks for foolability in the context ...

Please sign up or login with your details

Forgot password? Click here to reset