PiDAn: A Coherence Optimization Approach for Backdoor Attack Detection and Mitigation in Deep Neural Networks

03/17/2022
by   Yue Wang, et al.
0

Backdoor attacks impose a new threat in Deep Neural Networks (DNNs), where a backdoor is inserted into the neural network by poisoning the training dataset, misclassifying inputs that contain the adversary trigger. The major challenge for defending against these attacks is that only the attacker knows the secret trigger and the target class. The problem is further exacerbated by the recent introduction of "Hidden Triggers", where the triggers are carefully fused into the input, bypassing detection by human inspection and causing backdoor identification through anomaly detection to fail. To defend against such imperceptible attacks, in this work we systematically analyze how representations, i.e., the set of neuron activations for a given DNN when using the training data as inputs, are affected by backdoor attacks. We propose PiDAn, an algorithm based on coherence optimization purifying the poisoned data. Our analysis shows that representations of poisoned data and authentic data in the target class are still embedded in different linear subspaces, which implies that they show different coherence with some latent spaces. Based on this observation, the proposed PiDAn algorithm learns a sample-wise weight vector to maximize the projected coherence of weighted samples, where we demonstrate that the learned weight vector has a natural "grouping effect" and is distinguishable between authentic data and poisoned data. This enables the systematic detection and mitigation of backdoor attacks. Based on our theoretical analysis and experimental results, we demonstrate the effectiveness of PiDAn in defending against backdoor attacks that use different settings of poisoned samples on GTSRB and ILSVRC2012 datasets. Our PiDAn algorithm can detect more than 90

READ FULL TEXT

page 1

page 10

research
09/06/2019

Invisible Backdoor Attacks Against Deep Neural Networks

Deep neural networks (DNNs) have been proven vulnerable to backdoor atta...
research
08/09/2019

DeepCleanse: Input Sanitization Framework Against Trojan Attacks on Deep Neural Network Systems

Doubts over safety and trustworthiness of deep learning systems have eme...
research
08/08/2023

Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection

Deep neural networks are vulnerable to backdoor attacks (Trojans), where...
research
08/09/2019

Februus: Input Purification Defence Against Trojan Attacks on Deep Neural Network Systems

We propose Februus; a novel idea to neutralize insidous and highly poten...
research
09/07/2021

Trojan Signatures in DNN Weights

Deep neural networks have been shown to be vulnerable to backdoor, or tr...
research
08/18/2023

Backdoor Mitigation by Correcting the Distribution of Neural Activations

Backdoor (Trojan) attacks are an important type of adversarial exploit a...
research
05/31/2022

CASSOCK: Viable Backdoor Attacks against DNN in The Wall of Source-Specific Backdoor Defences

Backdoor attacks have been a critical threat to deep neural network (DNN...

Please sign up or login with your details

Forgot password? Click here to reset