Imputation under Differential Privacy

06/30/2022
by   Soumojit Das, et al.
0

The literature on differential privacy almost invariably assumes that the data to be analyzed are fully observed. In most practical applications this is an unrealistic assumption. A popular strategy to address this problem is imputation, in which missing values are replaced by estimated values given the observed data. In this paper we evaluate various approaches to answering queries on an imputed dataset in a differentially private manner, as well as discuss trade-offs as to where along the pipeline privacy is considered. We show that if imputation is done without consideration to privacy, the sensitivity of certain queries can increase linearly with the number of incomplete records. On the other hand, for a general class of imputation strategies, these worst case scenarios can be greatly reduced by ensuring privacy already during the imputation stage. We use a simulated dataset to demonstrate these results across a number of imputation schemes (both private and non-private) and examine their impact on the utility of a private query on the data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/16/2022

Assessing Differentially Private Variational Autoencoders under Membership Inference

We present an approach to quantify and compare the privacy-accuracy trad...
research
12/29/2017

Private Exploration Primitives for Data Cleaning

Data cleaning is the process of detecting and repairing inaccurate or co...
research
09/25/2019

Design of Algorithms under Policy-Aware Local Differential Privacy: Utility-Privacy Trade-offs

Local differential privacy (LDP) enables private data sharing and analyt...
research
01/27/2022

Plume: Differential Privacy at Scale

Differential privacy has become the standard for private data analysis, ...
research
11/25/2018

A Fully Private Pipeline for Deep Learning on Electronic Health Records

We introduce an end-to-end private deep learning framework, applied to t...
research
10/25/2021

An Uncertainty Principle is a Price of Privacy-Preserving Microdata

Privacy-protected microdata are often the desired output of a differenti...
research
06/15/2020

Differentially Private Median Forests for Regression and Classification

Random forests are a popular method for classification and regression du...

Please sign up or login with your details

Forgot password? Click here to reset