Defending Against Model Stealing Attacks Using Deceptive Perturbations

05/31/2018
by   Taesung Lee, et al.
0

Machine learning models are vulnerable to simple model stealing attacks if the adversary can obtain output labels for chosen inputs. To protect against these attacks, it has been proposed to limit the information provided to the adversary by omitting probability scores, significantly impacting the utility of the provided service. In this work, we illustrate how a service provider can still provide useful, albeit misleading, class probability information, while significantly limiting the success of the attack. Our defense forces the adversary to discard the class probabilities, requiring significantly more queries before they can train a model with comparable performance. We evaluate several attack strategies, model architectures, and hyperparameters under varying adversarial models, and evaluate the efficacy of our defense against the strongest adversary. Finally, we quantify the amount of noise injected into the class probabilities to mesure the loss in utility, e.g., adding 1.74 nats per query on CIFAR-10 and 3.27 on MNIST. Our extensive evaluation shows our defense can degrade the accuracy of the stolen model at least 20 4x more queries while keeping the accuracy of the protected model almost intact.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/23/2022

A Framework for Understanding Model Extraction Attack and Defense

The privacy of machine learning models has become a significant concern ...
research
08/02/2023

Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks

Despite the broad application of Machine Learning models as a Service (M...
research
03/07/2020

Dynamic Backdoor Attacks Against Machine Learning Models

Machine learning (ML) has made tremendous progress during the past decad...
research
03/03/2019

Decision-Focused Learning of Adversary Behavior in Security Games

Stackelberg security games are a critical tool for maximizing the utilit...
research
02/11/2021

Adversarial Poisoning Attacks and Defense for General Multi-Class Models Based On Synthetic Reduced Nearest Neighbors

State-of-the-art machine learning models are vulnerable to data poisonin...
research
12/03/2022

LDL: A Defense for Label-Based Membership Inference Attacks

The data used to train deep neural network (DNN) models in applications ...
research
12/09/2021

Spinning Language Models for Propaganda-As-A-Service

We investigate a new threat to neural sequence-to-sequence (seq2seq) mod...

Please sign up or login with your details

Forgot password? Click here to reset