DeepAI AI Chat
Log In Sign Up

A Statistical Difference Reduction Method for Escaping Backdoor Detection

by   Pengfei Xia, et al.

Recent studies show that Deep Neural Networks (DNNs) are vulnerable to backdoor attacks. An infected model behaves normally on benign inputs, whereas its prediction will be forced to an attack-specific target on adversarial data. Several detection methods have been developed to distinguish inputs to defend against such attacks. The common hypothesis that these defenses rely on is that there are large statistical differences between the latent representations of clean and adversarial inputs extracted by the infected model. However, although it is important, comprehensive research on whether the hypothesis must be true is lacking. In this paper, we focus on it and study the following relevant questions: 1) What are the properties of the statistical differences? 2) How to effectively reduce them without harming the attack intensity? 3) What impact does this reduction have on difference-based defenses? Our work is carried out on the three questions. First, by introducing the Maximum Mean Discrepancy (MMD) as the metric, we identify that the statistical differences of multi-level representations are all large, not just the highest level. Then, we propose a Statistical Difference Reduction Method (SDRM) by adding a multi-level MMD constraint to the loss function during training a backdoor model to effectively reduce the differences. Last, three typical difference-based detection methods are examined. The F1 scores of these defenses drop from 90 on the models trained with SDRM on all two datasets, four model architectures, and four attack methods. The results indicate that the proposed method can be used to enhance existing attacks to escape backdoor detection algorithms.


page 1

page 3

page 8

page 10

page 16

page 17

page 18

page 19


On Evaluating Neural Network Backdoor Defenses

Deep neural networks (DNNs) demonstrate superior performance in various ...

M-to-N Backdoor Paradigm: A Stealthy and Fuzzy Attack to Deep Learning Models

Recent studies show that deep neural networks (DNNs) are vulnerable to b...

FreeEagle: Detecting Complex Neural Trojans in Data-Free Cases

Trojan attack on deep neural networks, also known as backdoor attack, is...

EagleEye: Attack-Agnostic Defense against Adversarial Inputs (Technical Report)

Deep neural networks (DNNs) are inherently vulnerable to adversarial inp...

Constrained Gradient Descent: A Powerful and Principled Evasion Attack Against Neural Networks

Minimal adversarial perturbations added to inputs have been shown to be ...

FADER: Fast Adversarial Example Rejection

Deep neural networks are vulnerable to adversarial examples, i.e., caref...