DeepAI AI Chat
Log In Sign Up

On the Intriguing Connections of Regularization, Input Gradients and Transferability of Evasion and Poisoning Attacks

by   Ambra Demontis, et al.

Transferability captures the ability of an attack against a machine-learning model to be effective against a different, potentially unknown, model. Studying transferability of attacks has gained interest in the last years due to the deployment of cyber-attack detection services based on machine learning. For these applications of machine learning, service providers avoid disclosing information about their machine-learning algorithms. As a result, attackers trying to bypass detection are forced to craft their attacks against a surrogate model instead of the actual target model used by the service. While previous work has shown that finding test-time transferable attack samples is possible, it is not well understood how an attacker may construct adversarial examples that are likely to transfer against different models, in particular in the case of training-time poisoning attacks. In this paper, we present the first empirical analysis aimed to investigate the transferability of both test-time evasion and training-time poisoning attacks. We provide a unifying, formal definition of transferability of such attacks and show how it relates to the input gradients of the surrogate and of the target classification models. We assess to which extent some of the most well-known machine-learning systems are vulnerable to transfer attacks, and explain why such attacks succeed (or not) across different models. To this end, we leverage some interesting connections highlighted in this work among the adversarial vulnerability of machine-learning models, their regularization hyperparameters and input gradients.


Attack-Centric Approach for Evaluating Transferability of Adversarial Samples in Machine Learning Models

Transferability of adversarial samples became a serious concern due to t...

When Does Machine Learning FAIL? Generalized Transferability for Evasion and Poisoning Attacks

Attacks against machine learning systems represent a growing threat as h...

Messing Up 3D Virtual Environments: Transferable Adversarial 3D Objects

In the last few years, the scientific community showed a remarkable and ...

Energy-Latency Attacks via Sponge Poisoning

Sponge examples are test-time inputs carefully-optimized to increase ene...

Backdoor Learning Curves: Explaining Backdoor Poisoning Beyond Influence Functions

Backdoor attacks inject poisoning samples during training, with the goal...

NTD: Non-Transferability Enabled Backdoor Detection

A backdoor deep learning (DL) model behaves normally upon clean inputs b...

The Space of Transferable Adversarial Examples

Adversarial examples are maliciously perturbed inputs designed to mislea...