Bayesian Estimation of Attribute Disclosure Risks in Synthetic Data with the R Package

03/17/2021
by   Ryan Hornby, et al.
0

Synthetic data is a promising approach to privacy protection in many contexts. A Bayesian synthesis model, also known as a synthesizer, simulates synthetic values of sensitive variables from their posterior predictive distributions. The resulting synthetic data can then be released in place of the confidential data. An important evaluation prior to synthetic data release is its level of privacy protection, which is often in the form of disclosure risks evaluation. Attribute disclosure, referring to an intruder correctly inferring the confidential values of synthetic records, is one type of disclosure that is challenging to be computationally evaluated. In this paper, we review and discuss in detail some Bayesian estimation approaches to attribute disclosure risks evaluation, with examples of commonly-used Bayesian synthesizers. We create the R package to facilitate its implementation, and demonstrate its functionality with examples of evaluating attribute disclosure risks in synthetic samples of the Consumer Expenditure Surveys.

READ FULL TEXT

Authors

page 1

page 2

page 3

page 4

09/26/2018

Bayesian Data Synthesis and Disclosure Risk Quantification: An Application to the Consumer Expenditure Surveys

The release of synthetic data generated from a model estimated on the da...
04/09/2018

Bayesian Estimation of Attribute and Identification Disclosure Risks in Synthetic Data

The synthetic data approach to data confidentiality has been actively re...
09/17/2021

Data Privacy Protection and Utility Preservation through Bayesian Data Synthesis: A Case Study on Airbnb Listings

When releasing record-level data containing sensitive information to the...
05/12/2020

Design of a Privacy-Preserving Data Platform for Collaboration Against Human Trafficking

Case records on identified victims of human trafficking are highly sensi...
03/15/2018

Strategies to facilitate access to detailed geocoding information using synthetic data

In this paper we investigate if generating synthetic data can be a viabl...
07/12/2020

Multiple Imputation and Synthetic Data Generation with the R package NPBayesImputeCat

In many contexts, missing data and disclosure control are ubiquitous and...
12/12/2017

Guidelines for Producing Useful Synthetic Data

We report on our experiences of helping staff of the Scottish Longitudin...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.