Defending Against Backdoor Attacks by Layer-wise Feature Analysis

02/24/2023
by   Najeeb Moharram Jebreel, et al.
0

Training deep neural networks (DNNs) usually requires massive training data and computational resources. Users who cannot afford this may prefer to outsource training to a third party or resort to publicly available pre-trained models. Unfortunately, doing so facilitates a new training-time attack (i.e., backdoor attack) against DNNs. This attack aims to induce misclassification of input samples containing adversary-specified trigger patterns. In this paper, we first conduct a layer-wise feature analysis of poisoned and benign samples from the target class. We find out that the feature difference between benign and poisoned samples tends to be maximum at a critical layer, which is not always the one typically used in existing defenses, namely the layer before fully-connected layers. We also demonstrate how to locate this critical layer based on the behaviors of benign samples. We then propose a simple yet effective method to filter poisoned samples by analyzing the feature differences between suspicious and benign samples at the critical layer. We conduct extensive experiments on two benchmark datasets, which confirm the effectiveness of our defense.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2022

Untargeted Backdoor Attack against Object Detection

Recent studies revealed that deep neural networks (DNNs) are exposed to ...
research
11/02/2022

BATT: Backdoor Attack with Transformation-based Triggers

Deep neural networks (DNNs) are vulnerable to backdoor attacks. The back...
research
02/01/2023

BackdoorBox: A Python Toolbox for Backdoor Learning

Third-party resources (e.g., samples, backbones, and pre-trained models)...
research
03/23/2023

Backdoor Defense via Adaptively Splitting Poisoned Dataset

Backdoor defenses have been studied to alleviate the threat of deep neur...
research
11/24/2021

Graph Modularity: Towards Understanding the Cross-Layer Transition of Feature Representations in Deep Neural Networks

There are good arguments to support the claim that feature representatio...
research
01/25/2023

BDMMT: Backdoor Sample Detection for Language Models through Model Mutation Testing

Deep neural networks (DNNs) and natural language processing (NLP) system...
research
03/23/2023

Don't FREAK Out: A Frequency-Inspired Approach to Detecting Backdoor Poisoned Samples in DNNs

In this paper we investigate the frequency sensitivity of Deep Neural Ne...

Please sign up or login with your details

Forgot password? Click here to reset