Careful What You Wish For: on the Extraction of Adversarially Trained Models

by   Kacem Khaled, et al.

Recent attacks on Machine Learning (ML) models such as evasion attacks with adversarial examples and models stealing through extraction attacks pose several security and privacy threats. Prior work proposes to use adversarial training to secure models from adversarial examples that can evade the classification of a model and deteriorate its performance. However, this protection technique affects the model's decision boundary and its prediction probabilities, hence it might raise model privacy risks. In fact, a malicious user using only a query access to the prediction output of a model can extract it and obtain a high-accuracy and high-fidelity surrogate model. To have a greater extraction, these attacks leverage the prediction probabilities of the victim model. Indeed, all previous work on extraction attacks do not take into consideration the changes in the training process for security purposes. In this paper, we propose a framework to assess extraction attacks on adversarially trained models with vision datasets. To the best of our knowledge, our work is the first to perform such evaluation. Through an extensive empirical study, we demonstrate that adversarially trained models are more vulnerable to extraction attacks than models obtained under natural training circumstances. They can achieve up to ×1.2 higher accuracy and agreement with a fraction lower than ×0.75 of the queries. We additionally find that the adversarial robustness capability is transferable through extraction attacks, i.e., extracted Deep Neural Networks (DNNs) from robust models show an enhanced accuracy to adversarial examples compared to extracted DNNs from naturally trained (i.e. standard) models.


page 1

page 5


PRADA: Protecting against DNN Model Stealing Attacks

As machine learning (ML) applications become increasingly prevalent, pro...

High-Fidelity Extraction of Neural Network Models

Model extraction allows an adversary to steal a copy of a remotely deplo...

Practical No-box Adversarial Attacks against DNNs

The study of adversarial vulnerabilities of deep neural networks (DNNs) ...

Towards Defending against Adversarial Examples via Attack-Invariant Features

Deep neural networks (DNNs) are vulnerable to adversarial noise. Their a...

EZClone: Improving DNN Model Extraction Attack via Shape Distillation from GPU Execution Profiles

Deep Neural Networks (DNNs) have become ubiquitous due to their performa...

Who's Afraid of Thomas Bayes?

In many cases, neural networks perform well on test data, but tend to ov...

DAWN: Dynamic Adversarial Watermarking of Neural Networks

Training machine learning (ML) models is expensive in terms of computati...

Please sign up or login with your details

Forgot password? Click here to reset