On the Intriguing Connections of Regularization, Input Gradients and Transferability of Evasion and Poisoning Attacks

by   Ambra Demontis, et al.

Transferability captures the ability of an attack against a machine-learning model to be effective against a different, potentially unknown, model. Studying transferability of attacks has gained interest in the last years due to the deployment of cyber-attack detection services based on machine learning. For these applications of machine learning, service providers avoid disclosing information about their machine-learning algorithms. As a result, attackers trying to bypass detection are forced to craft their attacks against a surrogate model instead of the actual target model used by the service. While previous work has shown that finding test-time transferable attack samples is possible, it is not well understood how an attacker may construct adversarial examples that are likely to transfer against different models, in particular in the case of training-time poisoning attacks. In this paper, we present the first empirical analysis aimed to investigate the transferability of both test-time evasion and training-time poisoning attacks. We provide a unifying, formal definition of transferability of such attacks and show how it relates to the input gradients of the surrogate and of the target classification models. We assess to which extent some of the most well-known machine-learning systems are vulnerable to transfer attacks, and explain why such attacks succeed (or not) across different models. To this end, we leverage some interesting connections highlighted in this work among the adversarial vulnerability of machine-learning models, their regularization hyperparameters and input gradients.


Attack-Centric Approach for Evaluating Transferability of Adversarial Samples in Machine Learning Models

Transferability of adversarial samples became a serious concern due to t...

Backdoor Learning Curves: Explaining Backdoor Poisoning Beyond Influence Functions

Backdoor attacks inject poisoning samples during training, with the goal...

Messing Up 3D Virtual Environments: Transferable Adversarial 3D Objects

In the last few years, the scientific community showed a remarkable and ...

Your Attack Is Too DUMB: Formalizing Attacker Scenarios for Adversarial Transferability

Evasion attacks are a threat to machine learning models, where adversari...

When Does Machine Learning FAIL? Generalized Transferability for Evasion and Poisoning Attacks

Attacks against machine learning systems represent a growing threat as h...

NTD: Non-Transferability Enabled Backdoor Detection

A backdoor deep learning (DL) model behaves normally upon clean inputs b...

Energy-Latency Attacks via Sponge Poisoning

Sponge examples are test-time inputs carefully-optimized to increase ene...

Please sign up or login with your details

Forgot password? Click here to reset