Invisible Backdoor Attacks Against Deep Neural Networks

09/06/2019

∙

Deep neural networks (DNNs) have been proven vulnerable to backdoor attacks, where hidden features (patterns) trained to a normal model, and only activated by some specific input (called triggers), trick the model into producing unexpected behavior. In this paper, we design an optimization framework to create covert and scattered triggers for backdoor attacks, invisible backdoors, where triggers can amplify the specific neuron activation, while being invisible to both backdoor detection methods and human inspection. We use the Perceptual Adversarial Similarity Score (PASS) rozsa2016adversarial to define invisibility for human users and apply L_2 and L_0 regularization into the optimization process to hide the trigger within the input data. We show that the proposed invisible backdoors can be fairly effective across various DNN models as well as three datasets CIFAR-10, CIFAR-100, and GTSRB, by measuring their attack success rates and invisibility scores.

READ FULL TEXT

Invisible Backdoor Attacks Against Deep Neural Networks

Defending against Backdoor Attack on Deep Neural Networks

Towards Invisible Backdoor Attacks in the Frequency Domain against Deep Neural Networks

BppAttack: Stealthy and Efficient Trojan Attacks against Deep Neural Networks via Image Quantization and Contrastive Adversarial Learning

PiDAn: A Coherence Optimization Approach for Backdoor Attack Detection and Mitigation in Deep Neural Networks

Detecting Backdoor Poisoning Attacks on Deep Neural Networks by Heatmap Clustering

Coloring the Black Box: Visualizing neural network behavior with a self-introspective model

Hidden Backdoors in Human-Centric Language Models

Invisible Backdoor Attacks Against Deep Neural Networks

Related Research

Defending against Backdoor Attack on Deep Neural Networks

Towards Invisible Backdoor Attacks in the Frequency Domain against Deep Neural Networks

BppAttack: Stealthy and Efficient Trojan Attacks against Deep Neural Networks via Image Quantization and Contrastive Adversarial Learning

PiDAn: A Coherence Optimization Approach for Backdoor Attack Detection and Mitigation in Deep Neural Networks

Detecting Backdoor Poisoning Attacks on Deep Neural Networks by Heatmap Clustering

Coloring the Black Box: Visualizing neural network behavior with a self-introspective model

Hidden Backdoors in Human-Centric Language Models