L-RED: Efficient Post-Training Detection of Imperceptible Backdoor Attacks without Access to the Training Set

10/20/2020
by   Zhen Xiang, et al.
0

Backdoor attacks (BAs) are an emerging form of adversarial attack typically against deep neural network image classifiers. The attacker aims to have the classifier learn to classify to a target class when test images from one or more source classes contain a backdoor pattern, while maintaining high accuracy on all clean test images. Reverse-Engineering-based Defenses (REDs) against BAs do not require access to the training set but only to an independent clean dataset. Unfortunately, most existing REDs rely on an unrealistic assumption that all classes except the target class are source classes of the attack. REDs that do not rely on this assumption often require a large set of clean images and heavy computation. In this paper, we propose a Lagrangian-based RED (L-RED) that does not require knowledge of the number of source classes (or whether an attack is present). Our defense requires very few clean images to effectively detect BAs and is computationally efficient. Notably, we detect 56 out of 60 BAs using only two clean images per class in our experiments on CIFAR-10.

READ FULL TEXT

page 7

page 9

page 10

research
10/15/2020

Reverse Engineering Imperceptible Backdoor Attacks on Deep Neural Networks for Detection and Training Set Cleansing

Backdoor data poisoning is an emerging form of adversarial attack usuall...
research
01/20/2022

Post-Training Detection of Backdoor Attacks for Two-Class and Multi-Attack Scenarios

Backdoor attacks (BAs) are an emerging threat to deep neural network cla...
research
08/27/2019

Revealing Backdoors, Post-Training, in DNN Classifiers via Novel Inference on Optimized Perturbations Inducing Group Misclassification

Recently, a special type of data poisoning (DP) attack targeting Deep Ne...
research
11/19/2019

Poison as a Cure: Detecting Neutralizing Variable-Sized Backdoor Attacks in Deep Neural Networks

Deep learning models have recently shown to be vulnerable to backdoor po...
research
07/16/2020

Odyssey: Creation, Analysis and Detection of Trojan Models

Along with the success of deep neural network (DNN) models in solving va...
research
08/18/2023

Backdoor Mitigation by Correcting the Distribution of Neural Activations

Backdoor (Trojan) attacks are an important type of adversarial exploit a...
research
10/07/2021

Adversarial Unlearning of Backdoors via Implicit Hypergradient

We propose a minimax formulation for removing backdoors from a given poi...

Please sign up or login with your details

Forgot password? Click here to reset