Enabling Trade-offs in Privacy and Utility in Genomic Data Beacons and Summary Statistics

The collection and sharing of genomic data are becoming increasingly commonplace in research, clinical, and direct-to-consumer settings. The computational protocols typically adopted to protect individual privacy include sharing summary statistics, such as allele frequencies, or limiting query responses to the presence/absence of alleles of interest using web-services called Beacons. However, even such limited releases are susceptible to likelihood-ratio-based membership-inference attacks. Several approaches have been proposed to preserve privacy, which either suppress a subset of genomic variants or modify query responses for specific variants (e.g., adding noise, as in differential privacy). However, many of these approaches result in a significant utility loss, either suppressing many variants or adding a substantial amount of noise. In this paper, we introduce optimization-based approaches to explicitly trade off the utility of summary data or Beacon responses and privacy with respect to membership-inference attacks based on likelihood-ratios, combining variant suppression and modification. We consider two attack models. In the first, an attacker applies a likelihood-ratio test to make membership-inference claims. In the second model, an attacker uses a threshold that accounts for the effect of the data release on the separation in scores between individuals in the dataset and those who are not. We further introduce highly scalable approaches for approximately solving the privacy-utility tradeoff problem when information is either in the form of summary statistics or presence/absence queries. Finally, we show that the proposed approaches outperform the state of the art in both utility and privacy through an extensive evaluation with public datasets.

READ FULL TEXT
research
02/04/2022

LTU Attacker for Membership Inference

We address the problem of defending predictive models, such as machine l...
research
12/25/2021

Defending Against Membership Inference Attacks on Beacon Services

Large genomic datasets are now created through numerous activities, incl...
research
06/12/2023

Gaussian Membership Inference Privacy

We propose a new privacy notion called f-Membership Inference Privacy (f...
research
01/24/2020

Genome Reconstruction Attacks Against Genomic Data-Sharing Beacons

Sharing genome data in a privacy-preserving way stands as a major bottle...
research
06/19/2019

Adversarial Task-Specific Privacy Preservation under Attribute Attack

With the prevalence of machine learning services, crowdsourced data cont...
research
03/03/2023

Summary Statistic Privacy in Data Sharing

Data sharing between different parties has become increasingly common ac...
research
05/11/2023

The Privacy-Utility Tradeoff in Rank-Preserving Dataset Obfuscation

Dataset obfuscation refers to techniques in which random noise is added ...

Please sign up or login with your details

Forgot password? Click here to reset