Utility Assessment of Synthetic Data Generation Methods

11/23/2022
by   Md Sakib Nizam Khan, et al.
0

Big data analysis poses the dual problem of privacy preservation and utility, i.e., how accurate data analyses remain after transforming original data in order to protect the privacy of the individuals that the data is about - and whether they are accurate enough to be meaningful. In this paper, we thus investigate across several datasets whether different methods of generating fully synthetic data vary in their utility a priori (when the specific analyses to be performed on the data are not known yet), how closely their results conform to analyses on original data a posteriori, and whether these two effects are correlated. We find some methods (decision-tree based) to perform better than others across the board, sizeable effects of some choices of imputation parameters (notably the number of released datasets), no correlation between broad utility metrics and analysis accuracy, and varying correlations for narrow metrics. We did get promising findings for classification tasks when using synthetic data for training machine learning models, which we consider worth exploring further also in terms of mitigating privacy attacks against ML models such as membership inference and model inversion.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2021

An Analysis of the Deployment of Models Trained on Private Tabular Synthetic Data: Unexpected Surprises

Diferentially private (DP) synthetic datasets are a powerful approach fo...
research
08/04/2021

Privacy-Preserving Synthetic Location Data in the Real World

Sharing sensitive data is vital in enabling many modern data analysis an...
research
11/26/2022

A new PCA-based utility measure for synthetic data evaluation

Data synthesis is a privacy enhancing technology aiming to produce reali...
research
02/11/2022

Privacy-preserving Generative Framework Against Membership Inference Attacks

Artificial intelligence and machine learning have been integrated into a...
research
05/17/2023

Utility Theory of Synthetic Data Generation

Evaluating the utility of synthetic data is critical for measuring the e...
research
10/28/2021

Generating synthetic transactional profiles

Financial institutions use clients' payment transactions in numerous ban...

Please sign up or login with your details

Forgot password? Click here to reset