Generating Higher-Fidelity Synthetic Datasets with Privacy Guarantees

03/02/2020
by   Aleksei Triastcyn, et al.
0

This paper considers the problem of enhancing user privacy in common machine learning development tasks, such as data annotation and inspection, by substituting the real data with samples form a generative adversarial network. We propose employing Bayesian differential privacy as the means to achieve a rigorous theoretical guarantee while providing a better privacy-utility trade-off. We demonstrate experimentally that our approach produces higher-fidelity samples, compared to prior work, allowing to (1) detect more subtle data errors and biases, and (2) reduce the need for real data labelling by achieving high accuracy when training directly on artificial samples.

READ FULL TEXT
research
03/08/2018

Generating Differentially Private Datasets Using GANs

In this paper, we present a technique for generating artificial datasets...
research
08/28/2023

Generating tabular datasets under differential privacy

Machine Learning (ML) is accelerating progress across fields and industr...
research
04/01/2022

CTAB-GAN+: Enhancing Tabular Data Synthesis

While data sharing is crucial for knowledge development, privacy concern...
research
09/29/2020

imdpGAN: Generating Private and Specific Data with Generative Adversarial Networks

Generative Adversarial Network (GAN) and its variants have shown promisi...
research
10/11/2019

ABCDP: Approximate Bayesian Computation Meets Differential Privacy

We develop a novel approximate Bayesian computation (ABC) framework, ABC...
research
07/13/2022

Smooth Anonymity for Sparse Binary Matrices

When working with user data providing well-defined privacy guarantees is...
research
10/10/2021

Enhancing Utility in the Watchdog Privacy Mechanism

This paper is concerned with enhancing data utility in the privacy watch...

Please sign up or login with your details

Forgot password? Click here to reset