Defending Against Model Stealing Attacks with Adaptive Misinformation

11/16/2019
by   Sanjay Kariyappa, et al.
0

Deep Neural Networks (DNNs) are susceptible to model stealing attacks, which allows a data-limited adversary with no knowledge of the training dataset to clone the functionality of a target model, just by using black-box query access. Such attacks are typically carried out by querying the target model using inputs that are synthetically generated or sampled from a surrogate dataset to construct a labeled dataset. The adversary can use this labeled dataset to train a clone model, which achieves a classification accuracy comparable to that of the target model. We propose "Adaptive Misinformation" to defend against such model stealing attacks. We identify that all existing model stealing attacks invariably query the target model with Out-Of-Distribution (OOD) inputs. By selectively sending incorrect predictions for OOD queries, our defense substantially degrades the accuracy of the attacker's clone model (by up to 40 Compared to existing defenses, our defense has a significantly better security vs accuracy trade-off and incurs minimal computational overhead.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2020

AdvMind: Inferring Adversary Intent of Black-Box Attacks

Deep neural networks (DNNs) are inherently susceptible to adversarial at...
research
08/02/2023

Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks

Despite the broad application of Machine Learning models as a Service (M...
research
05/06/2020

MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient Estimation

Model Stealing (MS) attacks allow an adversary with black-box access to ...
research
08/23/2023

BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection

We present a novel defense, against backdoor attacks on Deep Neural Netw...
research
11/30/2020

Data-Free Model Extraction

Current model extraction attacks assume that the adversary has access to...
research
08/09/2019

Februus: Input Purification Defence Against Trojan Attacks on Deep Neural Network Systems

We propose Februus; a novel idea to neutralize insidous and highly poten...
research
06/26/2019

Prediction Poisoning: Utility-Constrained Defenses Against Model Stealing Attacks

With the advances of ML models in recent years, we are seeing an increas...

Please sign up or login with your details

Forgot password? Click here to reset