Privacy Protection for Youth Risk Behavior Using Bayesian Data Synthesis: A Case Study to the YRBS

05/22/2022
by   Yixiao Cao, et al.
0

The large number of publicly available survey datasets of wide variety, albeit useful, raise respondent-level privacy concerns. The synthetic data approach to data privacy and confidentiality has been shown useful in terms of privacy protection and utility preservation. This paper aims at illustrating how synthetic data can facilitate the dissemination of highly sensitive information about youth risk behavior by presenting a case study of synthetic data for a sample of the Youth Risk Behavior Survey (YRBS). Given the categorical nature of almost all variables in YRBS, the Dirichlet Process mixture of products of multinomials (DPMPM) synthesizer is adopted to partially synthesize the YRBS sample. Detailed evaluations of utility and disclosure risks demonstrate that the generated synthetic data are able to significantly reduce the disclosure risks compared to the confidential YRSB sample while maintaining a high level of utility.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/17/2021

Data Privacy Protection and Utility Preservation through Bayesian Data Synthesis: A Case Study on Airbnb Listings

When releasing record-level data containing sensitive information to the...
research
07/02/2022

Comparing the Utility and Disclosure Risk of Synthetic Data with Samples of Microdata

Most statistical agencies release randomly selected samples of Census mi...
research
03/28/2020

Privacy for Spatial Point Process Data

In this work we develop methods for privatizing spatial location data, s...
research
03/15/2018

Strategies to facilitate access to detailed geocoding information using synthetic data

In this paper we investigate if generating synthetic data can be a viabl...
research
04/09/2018

Bayesian Estimation of Attribute and Identification Disclosure Risks in Synthetic Data

The synthetic data approach to data confidentiality has been actively re...
research
05/12/2022

On integrating the number of synthetic data sets m into the 'a priori' synthesis approach

Until recently, multiple synthetic data sets were always released to ana...
research
01/19/2019

Bayesian Pseudo Posterior Synthesis for Data Privacy Protection

Statistical agencies utilize models to synthesize respondent-level data ...

Please sign up or login with your details

Forgot password? Click here to reset