AEVA: Black-box Backdoor Detection Using Adversarial Extreme Value Analysis

10/28/2021
by   Junfeng Guo, et al.
0

Deep neural networks (DNNs) are proved to be vulnerable against backdoor attacks. A backdoor is often embedded in the target DNNs through injecting a backdoor trigger into training examples, which can cause the target DNNs misclassify an input attached with the backdoor trigger. Existing backdoor detection methods often require the access to the original poisoned training data, the parameters of the target DNNs, or the predictive confidence for each given input, which are impractical in many real-world applications, e.g., on-device deployed DNNs. We address the black-box hard-label backdoor detection problem where the DNN is fully black-box and only its final output label is accessible. We approach this problem from the optimization perspective and show that the objective of backdoor detection is bounded by an adversarial objective. Further theoretical and empirical studies reveal that this adversarial objective leads to a solution with highly skewed distribution; a singularity is often observed in the adversarial map of a backdoor-infected example, which we call the adversarial singularity phenomenon. Based on this observation, we propose the adversarial extreme value analysis(AEVA) to detect backdoors in black-box neural networks. AEVA is based on an extreme value analysis of the adversarial map, computed from the monte-carlo gradient estimation. Evidenced by extensive experiments across multiple popular tasks and backdoor attacks, our approach is shown effective in detecting backdoor attacks under the black-box hard-label scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/27/2017

Exploring the Space of Black-box Attacks on Deep Neural Networks

Existing black-box attacks on deep neural networks (DNNs) so far have la...
research
12/11/2022

General Adversarial Defense Against Black-box Attacks via Pixel Level and Feature Level Distribution Alignments

Deep Neural Networks (DNNs) are vulnerable to the black-box adversarial ...
research
10/16/2021

TESDA: Transform Enabled Statistical Detection of Attacks in Deep Neural Networks

Deep neural networks (DNNs) are now the de facto choice for computer vis...
research
08/24/2022

Attacking Neural Binary Function Detection

Binary analyses based on deep neural networks (DNNs), or neural binary a...
research
06/04/2021

DOCTOR: A Simple Method for Detecting Misclassification Errors

Deep neural networks (DNNs) have shown to perform very well on large sca...
research
03/24/2021

Black-box Detection of Backdoor Attacks with Limited Information and Data

Although deep neural networks (DNNs) have made rapid progress in recent ...
research
09/29/2022

Towards Lightweight Black-Box Attacks against Deep Neural Networks

Black-box attacks can generate adversarial examples without accessing th...

Please sign up or login with your details

Forgot password? Click here to reset