An Analysis of the Deployment of Models Trained on Private Tabular Synthetic Data: Unexpected Surprises

06/15/2021
by   Mayana Pereira, et al.
6

Diferentially private (DP) synthetic datasets are a powerful approach for training machine learning models while respecting the privacy of individual data providers. The effect of DP on the fairness of the resulting trained models is not yet well understood. In this contribution, we systematically study the effects of differentially private synthetic data generation on classification. We analyze disparities in model utility and bias caused by the synthetic dataset, measured through algorithmic fairness metrics. Our first set of results show that although there seems to be a clear negative correlation between privacy and utility (the more private, the less accurate) across all data synthesizers we evaluated, more privacy does not necessarily imply more bias. Additionally, we assess the effects of utilizing synthetic datasets for model training and model evaluation. We show that results obtained on synthetic data can misestimate the actual model performance when it is deployed on real data. We hence advocate on the need for defining proper testing protocols in scenarios where differentially private synthetic datasets are utilized for model training and evaluation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/09/2022

Evaluating the Fairness Impact of Differentially Private Synthetic Data

Differentially private (DP) synthetic data is a promising approach to ma...
research
04/27/2022

Spending Privacy Budget Fairly and Wisely

Differentially private (DP) synthetic data generation is a practical met...
research
05/10/2021

Transitioning from Real to Synthetic data: Quantifying the bias in model

With the advent of generative modeling techniques, synthetic data and it...
research
11/23/2022

Utility Assessment of Synthetic Data Generation Methods

Big data analysis poses the dual problem of privacy preservation and uti...
research
05/24/2023

Differentially Private Synthetic Data via Foundation Model APIs 1: Images

Generating differentially private (DP) synthetic data that closely resem...
research
05/04/2023

Leveraging gradient-derived metrics for data selection and valuation in differentially private training

Obtaining high-quality data for collaborative training of machine learni...
research
12/20/2022

Privacy-Preserving Domain Adaptation of Semantic Parsers

Task-oriented dialogue systems often assist users with personal or confi...

Please sign up or login with your details

Forgot password? Click here to reset