Synthetic Data – A Privacy Mirage

11/13/2020
by   Theresa Stadler, et al.
0

Synthetic datasets drawn from generative models have been advertised as a silver-bullet solution to privacy-preserving data publishing. In this work, we show through an extensive privacy evaluation that such claims do not match reality. First, synthetic data does not prevent attribute inference. Any data characteristics preserved by a generative model for the purpose of data analysis, can simultaneously be used by an adversary to reconstruct sensitive information about individuals. Second, synthetic data does not protect against linkage attacks. We demonstrate that high-dimensional synthetic datasets preserve much more information about the raw data than the features in the model's lower-dimensional approximation. This rich information can be exploited by an adversary even when models are trained under differential privacy. Moreover, we observe that some target records receive substantially less protection than others and that the more complex the generative model, the more difficult it is to predict which targets will remain vulnerable to inference attacks. Finally, we show why generative models are unlikely to ever become an appropriate solution to the problem of privacy-preserving data publishing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/24/2023

A Linear Reconstruction Approach for Attribute Inference Attacks against Synthetic Data

Personal data collected at scale from surveys or digital devices offers ...
research
02/11/2022

Privacy-preserving Generative Framework Against Membership Inference Attacks

Artificial intelligence and machine learning have been integrated into a...
research
07/07/2022

Privacy-Preserving Synthetic Educational Data Generation

Institutions collect massive learning traces but they may not disclose i...
research
02/27/2023

Locality-Sensitive Hashing Does Not Guarantee Privacy! Attacks on Google's FLoC and the MinHash Hierarchy System

Recently proposed systems aim at achieving privacy using locality-sensit...
research
11/18/2022

A Unified Framework for Quantifying Privacy Risk in Synthetic Data

Synthetic data is often presented as a method for sharing sensitive info...
research
05/30/2023

How Generative Models Improve LOS Estimation in 6G Non-Terrestrial Networks

With the advent of 5G and the anticipated arrival of 6G, there has been ...
research
02/18/2021

Composable Generative Models

Generative modeling has recently seen many exciting developments with th...

Please sign up or login with your details

Forgot password? Click here to reset