Detecting Backdoor in Deep Neural Networks via Intentional Adversarial Perturbations

05/29/2021
by   Mingfu Xue, et al.
0

Recent researches show that deep learning model is susceptible to backdoor attacks where the backdoor embedded in the model will be triggered when a backdoor instance arrives. In this paper, a novel backdoor detection method based on adversarial examples is proposed. The proposed method leverages intentional adversarial perturbations to detect whether the image contains a trigger, which can be applied in two scenarios (sanitize the training set in training stage and detect the backdoor instances in inference stage). Specifically, given an untrusted image, the adversarial perturbation is added to the input image intentionally, if the prediction of model on the perturbed image is consistent with that on the unperturbed image, the input image will be considered as a backdoor instance. The proposed adversarial perturbation based method requires low computational resources and maintains the visual quality of the images. Experimental results show that, the proposed defense method reduces the backdoor attack success rates from 99.47 0.24 Besides, the proposed method maintains the visual quality of the image as the added perturbation is very small. In addition, for attacks under different settings (trigger transparency, trigger size and trigger pattern), the false acceptance rates of the proposed method are as low as 1.2 Fashion-MNIST, CIFAR-10 and GTSRB datasets, respectively, which demonstrates that the proposed method can achieve high defense performance against backdoor attacks under different attack settings.

READ FULL TEXT

page 1

page 7

research
03/08/2021

Enhancing Transformation-based Defenses against Adversarial Examples with First-Order Perturbations

Studies show that neural networks are susceptible to adversarial attacks...
research
11/30/2018

Adversarial Defense by Stratified Convolutional Sparse Coding

We propose an adversarial defense method that achieves state-of-the-art ...
research
12/16/2022

Adversarial Example Defense via Perturbation Grading Strategy

Deep Neural Networks have been widely used in many fields. However, stud...
research
08/06/2019

BlurNet: Defense by Filtering the Feature Maps

Recently, the field of adversarial machine learning has been garnering a...
research
03/02/2022

Detecting Adversarial Perturbations in Multi-Task Perception

While deep neural networks (DNNs) achieve impressive performance on envi...
research
02/20/2018

On Lyapunov exponents and adversarial perturbation

In this paper, we would like to disseminate a serendipitous discovery in...
research
04/18/2022

UNBUS: Uncertainty-aware Deep Botnet Detection System in Presence of Perturbed Samples

A rising number of botnet families have been successfully detected using...

Please sign up or login with your details

Forgot password? Click here to reset