Adversarial Fine-tuning for Backdoor Defense: Connect Adversarial Examples to Triggered Samples

02/13/2022
by   Bingxu Mu, et al.
0

Deep neural networks (DNNs) are known to be vulnerable to backdoor attacks, i.e., a backdoor trigger planted at training time, the infected DNN model would misclassify any testing sample embedded with the trigger as target label. Due to the stealthiness of backdoor attacks, it is hard either to detect or erase the backdoor from infected models. In this paper, we propose a new Adversarial Fine-Tuning (AFT) approach to erase backdoor triggers by leveraging adversarial examples of the infected model. For an infected model, we observe that its adversarial examples have similar behaviors as its triggered samples. Based on such observation, we design the AFT to break the foundation of the backdoor attack (i.e., the strong correlation between a trigger and a target label). We empirically show that, against 5 state-of-the-art backdoor attacks, AFT can effectively erase the backdoor triggers without obvious performance degradation on clean samples, which significantly outperforms existing defense methods.

READ FULL TEXT
research
02/10/2021

Detecting Localized Adversarial Examples: A Generic Approach using Critical Region Analysis

Deep neural networks (DNNs) have been applied in a wide range of applica...
research
06/21/2021

Hardness of Samples Is All You Need: Protecting Deep Learning Models Using Hardness of Samples

Several recent studies have shown that Deep Neural Network (DNN)-based c...
research
03/27/2021

LiBRe: A Practical Bayesian Approach to Adversarial Detection

Despite their appealing flexibility, deep neural networks (DNNs) are vul...
research
10/21/2022

Are You Stealing My Model? Sample Correlation for Fingerprinting Deep Neural Networks

An off-the-shelf model as a commercial service could be stolen by model ...
research
01/24/2022

What You See is Not What the Network Infers: Detecting Adversarial Examples Based on Semantic Contradiction

Adversarial examples (AEs) pose severe threats to the applications of de...
research
08/31/2022

Be Your Own Neighborhood: Detecting Adversarial Example by the Neighborhood Relations Built on Self-Supervised Learning

Deep Neural Networks (DNNs) have achieved excellent performance in vario...
research
10/01/2019

Deep Neural Rejection against Adversarial Examples

Despite the impressive performances reported by deep neural networks in ...

Please sign up or login with your details

Forgot password? Click here to reset