CTAB-GAN+: Enhancing Tabular Data Synthesis

04/01/2022
by   Zilong Zhao, et al.
0

While data sharing is crucial for knowledge development, privacy concerns and strict regulation (e.g., European General Data Protection Regulation (GDPR)) limit its full effectiveness. Synthetic tabular data emerges as alternative to enable data sharing while fulfilling regulatory and privacy constraints. State-of-the-art tabular data synthesizers draw methodologies from Generative Adversarial Networks (GAN). As GANs improve the synthesized data increasingly resemble the real data risking to leak privacy. Differential privacy (DP) provides theoretical guarantees on privacy loss but degrades data utility. Striking the best trade-off remains yet a challenging research question. We propose CTAB-GAN+ a novel conditional tabular GAN. CTAB-GAN+ improves upon state-of-the-art by (i) adding downstream losses to conditional GANs for higher utility synthetic data in both classification and regression domains; (ii) using Wasserstein loss with gradient penalty for better training convergence; (iii) introducing novel encoders targeting mixed continuous-categorical variables and variables with unbalanced or skewed data; and (iv) training with DP stochastic gradient descent to impose strict privacy guarantees. We extensively evaluate CTAB-GAN+ on data similarity and analysis utility against state-of-the-art tabular GANs. The results show that CTAB-GAN+ synthesizes privacy-preserving data with at least 48.16 datasets and learning tasks under different privacy budgets.

READ FULL TEXT
research
02/16/2021

CTAB-GAN: Effective Table Data Synthesizing

While data sharing is crucial for knowledge development, privacy concern...
research
07/06/2021

DTGAN: Differential Private Training for Tabular GANs

Tabular generative adversarial networks (TGAN) have recently emerged to ...
research
08/13/2020

Synthesizing Property Casualty Ratemaking Datasets using Generative Adversarial Networks

Due to confidentiality issues, it can be difficult to access or share in...
research
11/26/2021

DP-SGD vs PATE: Which Has Less Disparate Impact on GANs?

Generative Adversarial Networks (GANs) are among the most popular approa...
research
03/02/2020

Generating Higher-Fidelity Synthetic Datasets with Privacy Guarantees

This paper considers the problem of enhancing user privacy in common mac...
research
11/17/2022

Permutation-Invariant Tabular Data Synthesis

Tabular data synthesis is an emerging approach to circumvent strict regu...
research
05/05/2022

Generative Adversarial Network Based Synthetic Learning and a Novel Domain Relevant Loss Term for Spine Radiographs

Problem: There is a lack of big data for the training of deep learning m...

Please sign up or login with your details

Forgot password? Click here to reset