Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification

12/21/2020
by   Siyuan Cheng, et al.
8

Trojan (backdoor) attack is a form of adversarial attack on deep neural networks where the attacker provides victims with a model trained/retrained on malicious data. The backdoor can be activated when a normal input is stamped with a certain pattern called trigger, causing misclassification. Many existing trojan attacks have their triggers being input space patches/objects (e.g., a polygon with solid color) or simple input transformations such as Instagram filters. These simple triggers are susceptible to recent backdoor detection algorithms. We propose a novel deep feature space trojan attack with five characteristics: effectiveness, stealthiness, controllability, robustness and reliance on deep features. We conduct extensive experiments on 9 image classifiers on various datasets including ImageNet to demonstrate these properties and show that our attack can evade state-of-the-art defense.

READ FULL TEXT

page 4

page 7

page 11

page 12

research
04/26/2020

Towards Feature Space Adversarial Attack

We propose a new type of adversarial attack to Deep Neural Networks (DNN...
research
08/05/2021

Poison Ink: Robust and Invisible Backdoor Attack

Recent research shows deep neural networks are vulnerable to different t...
research
07/20/2020

AdvFoolGen: Creating Persistent Troubles for Deep Classifiers

Researches have shown that deep neural networks are vulnerable to malici...
research
10/27/2022

Rethinking the Reverse-engineering of Trojan Triggers

Deep Neural Networks are vulnerable to Trojan (or backdoor) attacks. Rev...
research
12/30/2019

Recognizing Instagram Filtered Images with Feature De-stylization

Deep neural networks have been shown to suffer from poor generalization ...
research
11/18/2019

NeuronInspect: Detecting Backdoors in Neural Networks via Output Explanations

Deep neural networks have achieved state-of-the-art performance on vario...
research
04/07/2021

The art of defense: letting networks fool the attacker

Some deep neural networks are invariant to some input transformations, s...

Please sign up or login with your details

Forgot password? Click here to reset