Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks

01/15/2021
by   Yige Li, et al.
17

Deep neural networks (DNNs) are known vulnerable to backdoor attacks, a training time attack that injects a trigger pattern into a small proportion of training data so as to control the model's prediction at the test time. Backdoor attacks are notably dangerous since they do not affect the model's performance on clean examples, yet can fool the model to make incorrect prediction whenever the trigger pattern appears during testing. In this paper, we propose a novel defense framework Neural Attention Distillation (NAD) to erase backdoor triggers from backdoored DNNs. NAD utilizes a teacher network to guide the finetuning of the backdoored student network on a small clean subset of data such that the intermediate-layer attention of the student network aligns with that of the teacher network. The teacher network can be obtained by an independent finetuning process on the same clean subset. We empirically show, against 6 state-of-the-art backdoor attacks, NAD can effectively erase the backdoor triggers using only 5% clean training data without causing obvious performance degradation on clean examples.

READ FULL TEXT

page 7

page 8

page 13

page 17

page 18

page 19

research
04/21/2022

Eliminating Backdoor Triggers for Deep Neural Networks Using Attention Relation Graph Distillation

Due to the prosperity of Artificial Intelligence (AI) techniques, more a...
research
11/22/2022

Backdoor Cleansing with Unlabeled Data

Due to the increasing computational demand of Deep Neural Networks (DNNs...
research
06/13/2023

DHBE: Data-free Holistic Backdoor Erasing in Deep Neural Networks via Restricted Adversarial Distillation

Backdoor attacks have emerged as an urgent threat to Deep Neural Network...
research
09/01/2016

Ternary Neural Networks for Resource-Efficient AI Applications

The computation and storage requirements for Deep Neural Networks (DNNs)...
research
07/05/2020

Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks

Recent studies have shown that DNNs can be compromised by backdoor attac...
research
12/03/2019

Deep Probabilistic Models to Detect Data Poisoning Attacks

Data poisoning attacks compromise the integrity of machine-learning mode...
research
10/12/2022

Trap and Replace: Defending Backdoor Attacks by Trapping Them into an Easy-to-Replace Subnetwork

Deep neural networks (DNNs) are vulnerable to backdoor attacks. Previous...

Please sign up or login with your details

Forgot password? Click here to reset