Bayesian Estimation of Attribute Disclosure Risks in Synthetic Data with the R Package

03/17/2021
by   Ryan Hornby, et al.
0

Synthetic data is a promising approach to privacy protection in many contexts. A Bayesian synthesis model, also known as a synthesizer, simulates synthetic values of sensitive variables from their posterior predictive distributions. The resulting synthetic data can then be released in place of the confidential data. An important evaluation prior to synthetic data release is its level of privacy protection, which is often in the form of disclosure risks evaluation. Attribute disclosure, referring to an intruder correctly inferring the confidential values of synthetic records, is one type of disclosure that is challenging to be computationally evaluated. In this paper, we review and discuss in detail some Bayesian estimation approaches to attribute disclosure risks evaluation, with examples of commonly-used Bayesian synthesizers. We create the R package to facilitate its implementation, and demonstrate its functionality with examples of evaluating attribute disclosure risks in synthetic samples of the Consumer Expenditure Surveys.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/26/2018

Bayesian Data Synthesis and Disclosure Risk Quantification: An Application to the Consumer Expenditure Surveys

The release of synthetic data generated from a model estimated on the da...
research
04/09/2018

Bayesian Estimation of Attribute and Identification Disclosure Risks in Synthetic Data

The synthetic data approach to data confidentiality has been actively re...
research
08/01/2023

Advancing Microdata Privacy Protection: A Review of Synthetic Data

Synthetic data generation is a powerful tool for privacy protection when...
research
09/17/2021

Data Privacy Protection and Utility Preservation through Bayesian Data Synthesis: A Case Study on Airbnb Listings

When releasing record-level data containing sensitive information to the...
research
05/12/2020

Design of a Privacy-Preserving Data Platform for Collaboration Against Human Trafficking

Case records on identified victims of human trafficking are highly sensi...
research
03/15/2018

Strategies to facilitate access to detailed geocoding information using synthetic data

In this paper we investigate if generating synthetic data can be a viabl...
research
04/21/2023

Power to the Data Defenders: Human-Centered Disclosure Risk Calibration of Open Data

The open data ecosystem is susceptible to vulnerabilities due to disclos...

Please sign up or login with your details

Forgot password? Click here to reset