Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection

08/08/2023
by   Hang Wang, et al.
0

Deep neural networks are vulnerable to backdoor attacks (Trojans), where an attacker poisons the training set with backdoor triggers so that the neural network learns to classify test-time triggers to the attacker's designated target class. Recent work shows that backdoor poisoning induces over-fitting (abnormally large activations) in the attacked model, which motivates a general, post-training clipping method for backdoor mitigation, i.e., with bounds on internal-layer activations learned using a small set of clean samples. We devise a new such approach, choosing the activation bounds to explicitly limit classification margins. This method gives superior performance against peer methods for CIFAR-10 image classification. We also show that this method has strong robustness against adaptive attacks, X2X attacks, and on different datasets. Finally, we demonstrate a method extension for test-time detection and correction based on the output differences between the original and activation-bounded networks. The code of our method is online available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/06/2021

Test-Time Detection of Backdoor Triggers for Poisoned Deep Neural Networks

Backdoor (Trojan) attacks are emerging threats against deep neural netwo...
research
08/18/2023

Backdoor Mitigation by Correcting the Distribution of Neural Activations

Backdoor (Trojan) attacks are an important type of adversarial exploit a...
research
08/29/2022

"Prompt-Gamma Neutron Activation Analysis (PGNAA)" Metal Spectral Classification using Deep Learning Method

There is a pressing market demand to minimize the test time of Prompt Ga...
research
03/17/2022

PiDAn: A Coherence Optimization Approach for Backdoor Attack Detection and Mitigation in Deep Neural Networks

Backdoor attacks impose a new threat in Deep Neural Networks (DNNs), whe...
research
01/31/2022

AntidoteRT: Run-time Detection and Correction of Poison Attacks on Neural Networks

We study backdoor poisoning attacks against image classification network...
research
10/08/2019

Detecting AI Trojans Using Meta Neural Analysis

Machine learning models, especially neural networks (NNs), have achieved...
research
11/02/2022

Dormant Neural Trojans

We present a novel methodology for neural network backdoor attacks. Unli...

Please sign up or login with your details

Forgot password? Click here to reset