Guidelines for Producing Useful Synthetic Data

12/12/2017
by   Gillian M. Raab, et al.
0

We report on our experiences of helping staff of the Scottish Longitudinal Study to create synthetic extracts that can be released to users. In particular, we focus on how the synthesis process can be tailored to produce synthetic extracts that will provide users with similar results to those that would be obtained from the original data. We make recommendations for synthesis methods and illustrate how the staff creating synthetic extracts can evaluate their utility at the time they are being produced. We discuss measures of utility for synthetic data and show that one tabular utility measure is exactly equivalent to a measure calculated from a propensity score. The methods are illustrated by using the R package synthpop to create synthetic versions of data from the 1901 Census of Scotland.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/26/2021

Assessing, visualizing and improving the utility of synthetic data

The synthpop package for R https://www.synthpop.org.uk provides tools to...
research
07/02/2022

Comparing the Utility and Disclosure Risk of Synthetic Data with Samples of Microdata

Most statistical agencies release randomly selected samples of Census mi...
research
06/03/2022

Utility and Disclosure Risk for Differentially Private Synthetic Categorical Data

This paper introduces two methods of creating differentially private (DP...
research
11/26/2022

A new PCA-based utility measure for synthetic data evaluation

Data synthesis is a privacy enhancing technology aiming to produce reali...
research
02/27/2012

Marginality: a numerical mapping for enhanced treatment of nominal and hierarchical attributes

The purpose of statistical disclosure control (SDC) of microdata, a.k.a....
research
05/17/2023

Utility Theory of Synthetic Data Generation

Evaluating the utility of synthetic data is critical for measuring the e...
research
05/12/2022

On integrating the number of synthetic data sets m into the 'a priori' synthesis approach

Until recently, multiple synthetic data sets were always released to ana...

Please sign up or login with your details

Forgot password? Click here to reset