Adversarial Neuron Pruning Purifies Backdoored Deep Models

10/27/2021
by   Dongxian Wu, et al.
0

As deep neural networks (DNNs) are growing larger, their requirements for computational resources become huge, which makes outsourcing training more popular. Training in a third-party platform, however, may introduce potential risks that a malicious trainer will return backdoored DNNs, which behave normally on clean samples but output targeted misclassifications whenever a trigger appears at the test time. Without any knowledge of the trigger, it is difficult to distinguish or recover benign DNNs from backdoored ones. In this paper, we first identify an unexpected sensitivity of backdoored DNNs, that is, they are much easier to collapse and tend to predict the target label on clean samples when their neurons are adversarially perturbed. Based on these observations, we propose a novel model repairing method, termed Adversarial Neuron Pruning (ANP), which prunes some sensitive neurons to purify the injected backdoor. Experiments show, even with only an extremely small amount of clean data (e.g., 1 causing obvious performance degradation.

READ FULL TEXT
research
05/24/2023

Reconstructive Neuron Pruning for Backdoor Defense

Deep neural networks (DNNs) have been found to be vulnerable to backdoor...
research
09/16/2019

Interpreting and Improving Adversarial Robustness with Neuron Sensitivity

Deep neural networks (DNNs) are vulnerable to adversarial examples where...
research
09/29/2018

To compress or not to compress: Understanding the Interactions between Adversarial Attacks and Neural Network Compression

As deep neural networks (DNNs) become widely used, pruned and quantised ...
research
02/26/2020

Defending against Backdoor Attack on Deep Neural Networks

Although deep neural networks (DNNs) have achieved a great success in va...
research
07/18/2023

Optimistic Estimate Uncovers the Potential of Nonlinear Models

We propose an optimistic estimate to evaluate the best possible fitting ...
research
12/03/2019

Deep Probabilistic Models to Detect Data Poisoning Attacks

Data poisoning attacks compromise the integrity of machine-learning mode...
research
06/13/2023

DHBE: Data-free Holistic Backdoor Erasing in Deep Neural Networks via Restricted Adversarial Distillation

Backdoor attacks have emerged as an urgent threat to Deep Neural Network...

Please sign up or login with your details

Forgot password? Click here to reset