Bypassing Backdoor Detection Algorithms in Deep Learning

05/31/2019
by   Te Juin Lester Tan, et al.
0

Deep learning models are known to be vulnerable to various adversarial manipulations of the training data, model parameters, and input data. In particular, an adversary can modify the training data and model parameters to embed backdoors into the model, so the model behaves according to the adversary's objective if the input contains the backdoor features (e.g., a stamp on an image). The poisoned model's behavior on clean data, however, remains unchanged. Many detection algorithms are designed to detect backdoors on input samples or model activation functions, in order to remove the backdoor. These algorithms rely on the statistical difference between the latent representations of backdoor-enabled and clean input data in the poisoned model. In this paper, we design an adversarial backdoor embedding algorithm that can bypass the existing detection algorithms including the state-of-the-art techniques (published in IEEE S&P 2019 and NeurIPS 2018). We design a strategic adversarial training that optimizes the original loss function of the model, and also maximizes the indistinguishability of the hidden representations of poisoned data and clean data. We show the effectiveness of our attack on multiple datasets and model architectures. This work calls for designing adversary-aware defense mechanisms for backdoor detection algorithms.

READ FULL TEXT
research
07/13/2022

Game of Trojans: A Submodular Byzantine Approach

Machine learning models in the wild have been shown to be vulnerable to ...
research
10/15/2021

Textual Backdoor Attacks Can Be More Harmful via Two Simple Tricks

Backdoor attacks are a kind of emergent security threat in deep learning...
research
06/18/2019

On the Robustness of the Backdoor-based Watermarking in Deep Neural Networks

Obtaining the state of the art performance of deep learning models impos...
research
03/18/2021

TOP: Backdoor Detection in Neural Networks via Transferability of Perturbation

Deep neural networks (DNNs) are vulnerable to "backdoor" poisoning attac...
research
02/22/2020

Non-Intrusive Detection of Adversarial Deep Learning Attacks via Observer Networks

Recent studies have shown that deep learning models are vulnerable to sp...
research
04/22/2022

A Tale of Two Models: Constructing Evasive Attacks on Edge Models

Full-precision deep learning models are typically too large or costly to...
research
01/15/2021

Enhancing Security via Deliberate Unpredictability of Solutions in Optimisation

The main aim of decision support systems is to find solutions that satis...

Please sign up or login with your details

Forgot password? Click here to reset