Differentially Private Synthetic Data Generation via Lipschitz-Regularised Variational Autoencoders

04/22/2023
by   Benedikt Groß, et al.
0

Synthetic data has been hailed as the silver bullet for privacy preserving data analysis. If a record is not real, then how could it violate a person's privacy? In addition, deep-learning based generative models are employed successfully to approximate complex high-dimensional distributions from data and draw realistic samples from this learned distribution. It is often overlooked though that generative models are prone to memorising many details of individual training records and often generate synthetic data that too closely resembles the underlying sensitive training data, hence violating strong privacy regulations as, e.g., encountered in health care. Differential privacy is the well-known state-of-the-art framework for guaranteeing protection of sensitive individuals' data, allowing aggregate statistics and even machine learning models to be released publicly without compromising privacy. The training mechanisms however often add too much noise during the training process, and thus severely compromise the utility of these private models. Even worse, the tight privacy budgets do not allow for many training epochs so that model quality cannot be properly controlled in practice. In this paper we explore an alternative approach for privately generating data that makes direct use of the inherent stochasticity in generative models, e.g., variational autoencoders. The main idea is to appropriately constrain the continuity modulus of the deep models instead of adding another noise mechanism on top. For this approach, we derive mathematically rigorous privacy guarantees and illustrate its effectiveness with practical experiments.

READ FULL TEXT

page 10

page 11

page 12

research
04/16/2020

Really Useful Synthetic Data – A Framework to Evaluate the Quality of Differentially Private Synthetic Data

Recent advances in generating synthetic data that allow to add principle...
research
05/18/2023

Learning Differentially Private Probabilistic Models for Privacy-Preserving Image Generation

A number of deep models trained on high-quality and valuable images have...
research
05/14/2021

Adapting deep generative approaches for getting synthetic data with realistic marginal distributions

Synthetic data generation is of great interest in diverse applications, ...
research
08/26/2017

Plausible Deniability for Privacy-Preserving Data Synthesis

Releasing full data records is one of the most challenging problems in d...
research
12/01/2022

Privacy-Preserving Data Synthetisation for Secure Information Sharing

We can protect user data privacy via many approaches, such as statistica...
research
09/03/2021

Privacy of synthetic data: a statistical framework

Privacy-preserving data analysis is emerging as a challenging problem wi...

Please sign up or login with your details

Forgot password? Click here to reset