Property Unlearning: A Defense Strategy Against Property Inference Attacks

by   Joshua Stock, et al.

During the training of machine learning models, they may store or "learn" more information about the training data than what is actually needed for the prediction or classification task. This is exploited by property inference attacks which aim at extracting statistical properties from the training data of a given model without having access to the training data itself. These properties may include the quality of pictures to identify the camera model, the age distribution to reveal the target audience of a product, or the included host types to refine a malware attack in computer networks. This attack is especially accurate when the attacker has access to all model parameters, i.e., in a white-box scenario. By defending against such attacks, model owners are able to ensure that their training data, associated properties, and thus their intellectual property stays private, even if they deliberately share their models, e.g., to train collaboratively, or if models are leaked. In this paper, we introduce property unlearning, an effective defense mechanism against white-box property inference attacks, independent of the training data type, model task, or number of properties. Property unlearning mitigates property inference attacks by systematically changing the trained weights and biases of a target model such that an adversary cannot extract chosen properties. We empirically evaluate property unlearning on three different data sets, including tabular and image data, and two types of artificial neural networks. Our results show that property unlearning is both efficient and reliable to protect machine learning models against property inference attacks, with a good privacy-utility trade-off. Furthermore, our approach indicates that this mechanism is also effective to unlearn multiple properties.


Group Property Inference Attacks Against Graph Neural Networks

With the fast adoption of machine learning (ML) techniques, sharing of M...

SNAP: Efficient Extraction of Private Properties with Poisoning

Property inference attacks allow an adversary to extract global properti...

Formalizing Distribution Inference Risks

Property inference attacks reveal statistical properties about a trainin...

PriSampler: Mitigating Property Inference of Diffusion Models

Diffusion models have been remarkably successful in data synthesis. Such...

I Know What You Trained Last Summer: A Survey on Stealing Machine Learning Models and Defences

Machine Learning-as-a-Service (MLaaS) has become a widespread paradigm, ...

Reduced Robust Random Cut Forest for Out-Of-Distribution detection in machine learning models

Most machine learning-based regressors extract information from data col...

Tools for Verifying Neural Models' Training Data

It is important that consumers and regulators can verify the provenance ...

Please sign up or login with your details

Forgot password? Click here to reset