Near-Optimal Privacy-Utility Tradeoff in Genomic Studies Using Selective SNP Hiding

06/09/2021
by   Nour Almadhoun Alserr, et al.
0

Motivation: Researchers need a rich trove of genomic datasets that they can leverage to gain a better understanding of the genetic basis of the human genome and identify associations between phenotypes and specific parts of DNA. However, sharing genomic datasets that include sensitive genetic or medical information of individuals can lead to serious privacy-related consequences if data lands in the wrong hands. Restricting access to genomic datasets is one solution, but this greatly reduces their usefulness for research purposes. To allow sharing of genomic datasets while addressing these privacy concerns, several studies propose privacy-preserving mechanisms for data sharing. Differential privacy (DP) is one of such mechanisms that formalize rigorous mathematical foundations to provide privacy guarantees while sharing aggregated statistical information about a dataset. However, it has been shown that the original privacy guarantees of DP-based solutions degrade when there are dependent tuples in the dataset, which is a common scenario for genomic datasets (due to the existence of family members). Results: In this work, we introduce a near-optimal mechanism to mitigate the vulnerabilities of the inference attacks on differentially private query results from genomic datasets including dependent tuples. We propose a utility-maximizing and privacy-preserving approach for sharing statistics by hiding selective SNPs of the family members as they participate in a genomic dataset. By evaluating our mechanism on a real-world genomic dataset, we empirically demonstrate that our proposed mechanism can achieve up to 40 DP-based solutions, while near-optimally minimizing the utility loss.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/30/2021

GenShare: Sharing Accurate Differentially-Private Statistics for Genomic Datasets with Dependent Tuples

Motivation: Cutting the cost of DNA sequencing technology led to a quant...
research
06/06/2023

OptimShare: A Unified Framework for Privacy Preserving Data Sharing – Towards the Practical Utility of Data with Privacy

Tabular data sharing serves as a common method for data exchange. Howeve...
research
02/15/2021

Genomic Data Sharing under Dependent Local Differential Privacy

Privacy-preserving genomic data sharing is prominent to increase the pac...
research
01/31/2019

AnomiGAN: Generative adversarial networks for anonymizing private medical data

Typical personal medical data contains sensitive information about indiv...
research
08/31/2022

Secure and Distributed Assessment of Privacy-Preserving Releases of GWAS

Genome-wide association studies (GWAS) identify correlations between the...
research
10/26/2017

Context-Aware Generative Adversarial Privacy

Preserving the utility of published datasets while simultaneously provid...
research
09/29/2020

DPCrowd: Privacy-preserving and Communication-efficient Decentralized Statistical Estimation for Real-time Crowd-sourced Data

In Internet of Things (IoT) driven smart-world systems, real-time crowd-...

Please sign up or login with your details

Forgot password? Click here to reset