A Statistical Difference Reduction Method for Escaping Backdoor Detection

by   Pengfei Xia, et al.

Recent studies show that Deep Neural Networks (DNNs) are vulnerable to backdoor attacks. An infected model behaves normally on benign inputs, whereas its prediction will be forced to an attack-specific target on adversarial data. Several detection methods have been developed to distinguish inputs to defend against such attacks. The common hypothesis that these defenses rely on is that there are large statistical differences between the latent representations of clean and adversarial inputs extracted by the infected model. However, although it is important, comprehensive research on whether the hypothesis must be true is lacking. In this paper, we focus on it and study the following relevant questions: 1) What are the properties of the statistical differences? 2) How to effectively reduce them without harming the attack intensity? 3) What impact does this reduction have on difference-based defenses? Our work is carried out on the three questions. First, by introducing the Maximum Mean Discrepancy (MMD) as the metric, we identify that the statistical differences of multi-level representations are all large, not just the highest level. Then, we propose a Statistical Difference Reduction Method (SDRM) by adding a multi-level MMD constraint to the loss function during training a backdoor model to effectively reduce the differences. Last, three typical difference-based detection methods are examined. The F1 scores of these defenses drop from 90 on the models trained with SDRM on all two datasets, four model architectures, and four attack methods. The results indicate that the proposed method can be used to enhance existing attacks to escape backdoor detection algorithms.



There are no comments yet.


page 1

page 3

page 8

page 10

page 16

page 17

page 18

page 19


On Evaluating Neural Network Backdoor Defenses

Deep neural networks (DNNs) demonstrate superior performance in various ...

Attack Agnostic Statistical Method for Adversarial Detection

Deep Learning based AI systems have shown great promise in various domai...

EagleEye: Attack-Agnostic Defense against Adversarial Inputs (Technical Report)

Deep neural networks (DNNs) are inherently vulnerable to adversarial inp...

Constrained Gradient Descent: A Powerful and Principled Evasion Attack Against Neural Networks

Minimal adversarial perturbations added to inputs have been shown to be ...

HASI: Hardware-Accelerated Stochastic Inference, A Defense Against Adversarial Machine Learning Attacks

Deep Neural Networks (DNNs) are employed in an increasing number of appl...

MOOD: Multi-level Out-of-distribution Detection

Out-of-distribution (OOD) detection is essential to prevent anomalous in...

Handcrafted Backdoors in Deep Neural Networks

Deep neural networks (DNNs), while accurate, are expensive to train. Man...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.