Revealing Perceptible Backdoors, without the Training Set, via the Maximum Achievable Misclassification Fraction Statistic

11/18/2019
by   Zhen Xiang, et al.
0

Recently, a special type of data poisoning (DP) attack, known as a backdoor, was proposed. These attacks aimto have a classifier learn to classify to a target class whenever the backdoor pattern is present in a test sample. In thispaper, we address post-training detection of perceptible backdoor patterns in DNN image classifiers, wherein thedefender does not have access to the poisoned training set, but only to the trained classifier itself, as well as to clean(unpoisoned) examples from the classification domain. This problem is challenging since a perceptible backdoorpattern could be any seemingly innocuous object in a scene, and, without the poisoned training set, we have nohint about the actual backdoor pattern used during training. We identify two important properties of perceptiblebackdoor patterns, based upon which we propose a novel detector using the maximum achievable misclassificationfraction (MAMF) statistic. We detect whether the trained DNN has been backdoor-attacked and infer the sourceand target classes used for devising the attack. Our detector, with an easily chosen threshold, is evaluated on fivedatasets, five DNN structures and nine backdoor patterns, and shows strong detection capability. Coupled with animperceptible backdoor detector, our approach helps achieve detection for all evasive backdoors of interest.

READ FULL TEXT

page 4

page 15

page 17

research
08/27/2019

Revealing Backdoors, Post-Training, in DNN Classifiers via Novel Inference on Optimized Perturbations Inducing Group Misclassification

Recently, a special type of data poisoning (DP) attack targeting Deep Ne...
research
01/20/2022

Post-Training Detection of Backdoor Attacks for Two-Class and Multi-Attack Scenarios

Backdoor attacks (BAs) are an emerging threat to deep neural network cla...
research
05/13/2022

Universal Post-Training Backdoor Detection

A Backdoor attack (BA) is an important type of adversarial attack agains...
research
07/16/2020

Odyssey: Creation, Analysis and Detection of Trojan Models

Along with the success of deep neural network (DNN) models in solving va...
research
12/18/2017

When Not to Classify: Anomaly Detection of Attacks (ADA) on DNN Classifiers at Test Time

A significant threat to the recent, wide deployment of machine learning-...
research
05/28/2021

A BIC based Mixture Model Defense against Data Poisoning Attacks on Classifiers

Data Poisoning (DP) is an effective attack that causes trained classifie...
research
08/26/2019

A Statistical Defense Approach for Detecting Adversarial Examples

Adversarial examples are maliciously modified inputs created to fool dee...

Please sign up or login with your details

Forgot password? Click here to reset