Distribution-Preserving k-Anonymity

11/05/2017
by   Dennis Wei, et al.
0

Preserving the privacy of individuals by protecting their sensitive attributes is an important consideration during microdata release. However, it is equally important to preserve the quality or utility of the data for at least some targeted workloads. We propose a novel framework for privacy preservation based on the k-anonymity model that is ideally suited for workloads that require preserving the probability distribution of the quasi-identifier variables in the data. Our framework combines the principles of distribution-preserving quantization and k-member clustering, and we specialize it to two variants that respectively use intra-cluster and Gaussian dithering of cluster centers to achieve distribution preservation. We perform theoretical analysis of the proposed schemes in terms of distribution preservation, and describe their utility in workloads such as covariate shift and transfer learning where such a property is necessary. Using extensive experiments on real-world Medical Expenditure Panel Survey data, we demonstrate the merits of our algorithms over standard k-anonymization for a hallmark health care application where an insurance company wishes to understand the risk in entering a new market. Furthermore, by empirically quantifying the reidentification risk, we also show that the proposed approaches indeed maintain k-anonymity.

READ FULL TEXT

page 13

page 14

research
07/04/2020

PPaaS: Privacy Preservation as a Service

Personally identifiable information (PII) can find its way into cyberspa...
research
11/01/2017

Re-DPoctor: Real-time health data releasing with w-day differential privacy

Wearable devices enable users to collect health data and share them with...
research
01/06/2020

Clustering based Privacy Preserving of Big Data using Fuzzification and Anonymization Operation

Big Data is used by data miner for analysis purpose which may contain se...
research
02/27/2019

AutoGAN-based Dimension Reduction for Privacy Preservation

Exploiting data and concurrently protecting sensitive information to who...
research
02/20/2023

Efficient Privacy-Preserved Processing of Multimodal Data for Vehicular Traffic Analysis

We estimate vehicular traffic states from multimodal data collected by s...
research
08/25/2020

Local Generalization and Bucketization Technique for Personalized Privacy Preservation

Anonymization technique has been extensively studied and widely applied ...
research
04/18/2021

Why Should I Trust a Model is Private? Using Shifts in Model Explanation for Evaluating Privacy-Preserving Emotion Recognition Model

Privacy preservation is a crucial component of any real-world applicatio...

Please sign up or login with your details

Forgot password? Click here to reset