A Statistical Difference Reduction Method for Escaping Backdoor Detection

11/09/2021
by   Pengfei Xia, et al.
23

Recent studies show that Deep Neural Networks (DNNs) are vulnerable to backdoor attacks. An infected model behaves normally on benign inputs, whereas its prediction will be forced to an attack-specific target on adversarial data. Several detection methods have been developed to distinguish inputs to defend against such attacks. The common hypothesis that these defenses rely on is that there are large statistical differences between the latent representations of clean and adversarial inputs extracted by the infected model. However, although it is important, comprehensive research on whether the hypothesis must be true is lacking. In this paper, we focus on it and study the following relevant questions: 1) What are the properties of the statistical differences? 2) How to effectively reduce them without harming the attack intensity? 3) What impact does this reduction have on difference-based defenses? Our work is carried out on the three questions. First, by introducing the Maximum Mean Discrepancy (MMD) as the metric, we identify that the statistical differences of multi-level representations are all large, not just the highest level. Then, we propose a Statistical Difference Reduction Method (SDRM) by adding a multi-level MMD constraint to the loss function during training a backdoor model to effectively reduce the differences. Last, three typical difference-based detection methods are examined. The F1 scores of these defenses drop from 90 on the models trained with SDRM on all two datasets, four model architectures, and four attack methods. The results indicate that the proposed method can be used to enhance existing attacks to escape backdoor detection algorithms.

READ FULL TEXT

page 1

page 3

page 8

page 10

page 16

page 17

page 18

page 19

research
10/23/2020

On Evaluating Neural Network Backdoor Defenses

Deep neural networks (DNNs) demonstrate superior performance in various ...
research
11/03/2022

M-to-N Backdoor Paradigm: A Stealthy and Fuzzy Attack to Deep Learning Models

Recent studies show that deep neural networks (DNNs) are vulnerable to b...
research
02/28/2023

FreeEagle: Detecting Complex Neural Trojans in Data-Free Cases

Trojan attack on deep neural networks, also known as backdoor attack, is...
research
08/01/2018

EagleEye: Attack-Agnostic Defense against Adversarial Inputs (Technical Report)

Deep neural networks (DNNs) are inherently vulnerable to adversarial inp...
research
12/28/2021

Constrained Gradient Descent: A Powerful and Principled Evasion Attack Against Neural Networks

Minimal adversarial perturbations added to inputs have been shown to be ...
research
06/09/2021

HASI: Hardware-Accelerated Stochastic Inference, A Defense Against Adversarial Machine Learning Attacks

Deep Neural Networks (DNNs) are employed in an increasing number of appl...
research
10/18/2020

FADER: Fast Adversarial Example Rejection

Deep neural networks are vulnerable to adversarial examples, i.e., caref...

Please sign up or login with your details

Forgot password? Click here to reset