Black-box Detection of Backdoor Attacks with Limited Information and Data

03/24/2021
by   Yinpeng Dong, et al.
0

Although deep neural networks (DNNs) have made rapid progress in recent years, they are vulnerable in adversarial environments. A malicious backdoor could be embedded in a model by poisoning the training dataset, whose intention is to make the infected model give wrong predictions during inference when the specific trigger appears. To mitigate the potential threats of backdoor attacks, various backdoor detection and defense methods have been proposed. However, the existing techniques usually require the poisoned training data or access to the white-box model, which is commonly unavailable in practice. In this paper, we propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model. We introduce a gradient-free optimization algorithm to reverse-engineer the potential trigger for each class, which helps to reveal the existence of backdoor attacks. In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models. Extensive experiments on hundreds of DNN models trained on several datasets corroborate the effectiveness of our method under the black-box setting against various backdoor attacks.

READ FULL TEXT

page 1

page 6

page 7

page 12

page 14

research
02/07/2023

SCALE-UP: An Efficient Black-box Input-level Backdoor Detection via Analyzing Scaled Prediction Consistency

Deep neural networks (DNNs) are vulnerable to backdoor attacks, where ad...
research
10/28/2021

AEVA: Black-box Backdoor Detection Using Adversarial Extreme Value Analysis

Deep neural networks (DNNs) are proved to be vulnerable against backdoor...
research
04/08/2022

An Adaptive Black-box Backdoor Detection Method for Deep Neural Networks

With the surge of Machine Learning (ML), An emerging amount of intellige...
research
02/03/2021

TAD: Trigger Approximation based Black-box Trojan Detection for AI

An emerging amount of intelligent applications have been developed with ...
research
07/28/2022

Exploiting and Defending Against the Approximate Linearity of Apple's NeuralHash

Perceptual hashes map images with identical semantic content to the same...
research
02/27/2023

Online Black-Box Confidence Estimation of Deep Neural Networks

Autonomous driving (AD) and advanced driver assistance systems (ADAS) in...
research
02/16/2020

REST: Performance Improvement of a Black Box Model via RL-based Spatial Transformation

In recent years, deep neural networks (DNN) have become a highly active ...

Please sign up or login with your details

Forgot password? Click here to reset