Bayesian Estimation of Attribute Disclosure Risks in Synthetic Data with the R Package

by   Ryan Hornby, et al.

Synthetic data is a promising approach to privacy protection in many contexts. A Bayesian synthesis model, also known as a synthesizer, simulates synthetic values of sensitive variables from their posterior predictive distributions. The resulting synthetic data can then be released in place of the confidential data. An important evaluation prior to synthetic data release is its level of privacy protection, which is often in the form of disclosure risks evaluation. Attribute disclosure, referring to an intruder correctly inferring the confidential values of synthetic records, is one type of disclosure that is challenging to be computationally evaluated. In this paper, we review and discuss in detail some Bayesian estimation approaches to attribute disclosure risks evaluation, with examples of commonly-used Bayesian synthesizers. We create the R package to facilitate its implementation, and demonstrate its functionality with examples of evaluating attribute disclosure risks in synthetic samples of the Consumer Expenditure Surveys.



page 1

page 2

page 3

page 4


Bayesian Data Synthesis and Disclosure Risk Quantification: An Application to the Consumer Expenditure Surveys

The release of synthetic data generated from a model estimated on the da...

Bayesian Estimation of Attribute and Identification Disclosure Risks in Synthetic Data

The synthetic data approach to data confidentiality has been actively re...

Data Privacy Protection and Utility Preservation through Bayesian Data Synthesis: A Case Study on Airbnb Listings

When releasing record-level data containing sensitive information to the...

Design of a Privacy-Preserving Data Platform for Collaboration Against Human Trafficking

Case records on identified victims of human trafficking are highly sensi...

Strategies to facilitate access to detailed geocoding information using synthetic data

In this paper we investigate if generating synthetic data can be a viabl...

Multiple Imputation and Synthetic Data Generation with the R package NPBayesImputeCat

In many contexts, missing data and disclosure control are ubiquitous and...

Guidelines for Producing Useful Synthetic Data

We report on our experiences of helping staff of the Scottish Longitudin...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.