Identification Risk Evaluation of Continuous Synthesized Variables

06/01/2020
by   Ryan Hornby, et al.
0

We propose a general approach to evaluating identification risk of continuous synthesized variables in partially synthetic data. We introduce the use of a radius r in the construction of identification risk probability of each target record, and illustrate with working examples for one or more continuous synthesized variables. We demonstrate our methods with applications to a data sample from the Consumer Expenditure Surveys (CE), and discuss the impacts on risk and data utility of 1) the choice of radius r, 2) the choice of synthesized variables, and 3) the choice of number of synthetic datasets. We give recommendations for statistical agencies for synthesizing and evaluating identification risk of continuous variables. An R package is created to perform our proposed methods of identification risk evaluation, and sample R scripts are included.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/20/2019

Risk-Efficient Bayesian Data Synthesis for Privacy Protection

High-utility and low-risks synthetic data facilitates microdata dissemin...
research
06/02/2020

Two-Phase Data Synthesis for Income: An Application to the NHIS

We propose a two-phase synthesis process for synthesizing income, a sens...
research
06/17/2023

Achilles' Heels: Vulnerable Record Identification in Synthetic Data Publishing

Synthetic data is seen as the most promising solution to share individua...
research
04/26/2022

Evaluating the Quality of a Synthesized Motion with the Fréchet Motion Distance

Evaluating the Quality of a Synthesized Motion with the Fréchet Motion D...
research
04/28/2023

A sparse identification approach for automating choice models' specification

The methodology discussed in this paper aims to enhance choice models' c...
research
03/31/2022

Assessing the risk of re-identification arising from an attack on anonymised data

Objective: The use of routinely-acquired medical data for research purpo...
research
09/26/2018

Bayesian Data Synthesis and Disclosure Risk Quantification: An Application to the Consumer Expenditure Surveys

The release of synthetic data generated from a model estimated on the da...

Please sign up or login with your details

Forgot password? Click here to reset