Post-Training Detection of Backdoor Attacks for Two-Class and Multi-Attack Scenarios

01/20/2022
by   Zhen Xiang, et al.
0

Backdoor attacks (BAs) are an emerging threat to deep neural network classifiers. A victim classifier will predict to an attacker-desired target class whenever a test sample is embedded with the same backdoor pattern (BP) that was used to poison the classifier's training set. Detecting whether a classifier is backdoor attacked is not easy in practice, especially when the defender is, e.g., a downstream user without access to the classifier's training set. This challenge is addressed here by a reverse-engineering defense (RED), which has been shown to yield state-of-the-art performance in several domains. However, existing REDs are not applicable when there are only two classes or when multiple attacks are present. These scenarios are first studied in the current paper, under the practical constraints that the defender neither has access to the classifier's training set nor to supervision from clean reference classifiers trained for the same domain. We propose a detection framework based on BP reverse-engineering and a novel expected transferability (ET) statistic. We show that our ET statistic is effective using the same detection threshold, irrespective of the classification domain, the attack configuration, and the BP reverse-engineering algorithm that is used. The excellent performance of our method is demonstrated on six benchmark datasets. Notably, our detection framework is also applicable to multi-class scenarios with multiple attacks.

READ FULL TEXT

page 7

page 21

page 31

research
10/20/2021

Detecting Backdoor Attacks Against Point Cloud Classifiers

Backdoor attacks (BA) are an emerging threat to deep neural network clas...
research
05/13/2022

Universal Post-Training Backdoor Detection

A Backdoor attack (BA) is an important type of adversarial attack agains...
research
07/08/2022

Defense Against Multi-target Trojan Attacks

Adversarial attacks on deep learning-based models pose a significant thr...
research
10/20/2020

L-RED: Efficient Post-Training Detection of Imperceptible Backdoor Attacks without Access to the Training Set

Backdoor attacks (BAs) are an emerging form of adversarial attack typica...
research
11/18/2019

Revealing Perceptible Backdoors, without the Training Set, via the Maximum Achievable Misclassification Fraction Statistic

Recently, a special type of data poisoning (DP) attack, known as a backd...
research
05/29/2023

UMD: Unsupervised Model Detection for X2X Backdoor Attacks

Backdoor (Trojan) attack is a common threat to deep neural networks, whe...
research
10/31/2018

When Not to Classify: Detection of Reverse Engineering Attacks on DNN Image Classifiers

This paper addresses detection of a reverse engineering (RE) attack targ...

Please sign up or login with your details

Forgot password? Click here to reset