Strategies to facilitate access to detailed geocoding information using synthetic data

03/15/2018
by   Joerg Drechsler, et al.
0

In this paper we investigate if generating synthetic data can be a viable strategy for providing access to detailed geocoding information for external researchers without compromising the confidentiality of the units included in the database. This research was motivated by a recent project at the Institute for Employment Research (IAB) in Germany that linked exact geocodes to the Integrated Employment Biographies, a large administrative database containing several million records. Based on these data we evaluate the performance of several synthesizers in terms of addressing the trade-off between preserving analytical validity and limiting the risk of disclosure. We propose strategies for making the synthesizers scalable for such large files, present analytical validity measures for the generated data and provide general recommendations for statistical agencies considering the synthetic data approach for disseminating detailed geographical information.We also illustrate that the commonly used disclosure avoidance strategy of providing geographical information only on an aggregated level will not offer substantial improvements in disclosure protection if coupled with synthesis. As we show in the online supplement accompanying this manuscript that synthesizing additional variables should be preferred if the level of protection from synthesizing only the geocodes is not considered sufficient.

READ FULL TEXT

page 24

page 25

research
11/21/2022

A Framework for Auditable Synthetic Data Generation

Synthetic data has gained significant momentum thanks to sophisticated m...
research
05/22/2022

Privacy Protection for Youth Risk Behavior Using Bayesian Data Synthesis: A Case Study to the YRBS

The large number of publicly available survey datasets of wide variety, ...
research
08/01/2023

Advancing Microdata Privacy Protection: A Review of Synthetic Data

Synthetic data generation is a powerful tool for privacy protection when...
research
03/17/2021

Bayesian Estimation of Attribute Disclosure Risks in Synthetic Data with the R Package

Synthetic data is a promising approach to privacy protection in many con...
research
11/19/2022

An experimental study on Synthetic Tabular Data Evaluation

In this paper, we present the findings of various methodologies for meas...
research
04/04/2023

30 Years of Synthetic Data

The idea to generate synthetic data as a tool for broadening access to s...

Please sign up or login with your details

Forgot password? Click here to reset