Universal Post-Training Backdoor Detection

05/13/2022
by   Hang Wang, et al.
0

A Backdoor attack (BA) is an important type of adversarial attack against deep neural network classifiers, wherein test samples from one or more source classes will be (mis)classified to the attacker's target class when a backdoor pattern (BP) is embedded. In this paper, we focus on the post-training backdoor defense scenario commonly considered in the literature, where the defender aims to detect whether a trained classifier was backdoor attacked, without any access to the training set. To the best of our knowledge, existing post-training backdoor defenses are all designed for BAs with presumed BP types, where each BP type has a specific embedding function. They may fail when the actual BP type used by the attacker (unknown to the defender) is different from the BP type assumed by the defender. In contrast, we propose a universal post-training defense that detects BAs with arbitrary types of BPs, without making any assumptions about the BP type. Our detector leverages the influence of the BA, independently of the BP type, on the landscape of the classifier's outputs prior to the softmax layer. For each class, a maximum margin statistic is estimated using a set of random vectors; detection inference is then performed by applying an unsupervised anomaly detector to these statistics. Thus, our detector is also an advance relative to most existing post-training methods by not needing any legitimate clean samples, and can efficiently detect BAs with arbitrary numbers of source classes. These advantages of our detector over several state-of-the-art methods are demonstrated on four datasets, for three different types of BPs, and for a variety of attack configurations. Finally, we propose a novel, general approach for BA mitigation once a detection is made.

READ FULL TEXT

page 6

page 17

page 18

research
01/20/2022

Post-Training Detection of Backdoor Attacks for Two-Class and Multi-Attack Scenarios

Backdoor attacks (BAs) are an emerging threat to deep neural network cla...
research
10/20/2021

Detecting Backdoor Attacks Against Point Cloud Classifiers

Backdoor attacks (BA) are an emerging threat to deep neural network clas...
research
11/18/2019

Revealing Perceptible Backdoors, without the Training Set, via the Maximum Achievable Misclassification Fraction Statistic

Recently, a special type of data poisoning (DP) attack, known as a backd...
research
05/29/2023

UMD: Unsupervised Model Detection for X2X Backdoor Attacks

Backdoor (Trojan) attack is a common threat to deep neural networks, whe...
research
08/18/2023

Backdoor Mitigation by Correcting the Distribution of Neural Activations

Backdoor (Trojan) attacks are an important type of adversarial exploit a...
research
01/11/2023

Universal Detection of Backdoor Attacks via Density-based Clustering and Centroids Analysis

In this paper, we propose a Universal Defence based on Clustering and Ce...
research
11/04/2020

Detecting Backdoors in Neural Networks Using Novel Feature-Based Anomaly Detection

This paper proposes a new defense against neural network backdooring att...

Please sign up or login with your details

Forgot password? Click here to reset