EX-RAY: Distinguishing Injected Backdoor from Natural Features in Neural Networks by Examining Differential Feature Symmetry

03/16/2021
by   Yingqi Liu, et al.
0

Backdoor attack injects malicious behavior to models such that inputs embedded with triggers are misclassified to a target label desired by the attacker. However, natural features may behave like triggers, causing misclassification once embedded. While they are inevitable, mis-recognizing them as injected triggers causes false warnings in backdoor scanning. A prominent challenge is hence to distinguish natural features and injected backdoors. We develop a novel symmetric feature differencing method that identifies a smallest set of features separating two classes. A backdoor is considered injected if the corresponding trigger consists of features different from the set of features distinguishing the victim and target classes. We evaluate the technique on thousands of models, including both clean and trojaned models, from the TrojAI rounds 2-4 competitions and a number of models on ImageNet. Existing backdoor scanning techniques may produce hundreds of false positives (i.e., clean models recognized as trojaned). Our technique removes 78-100 a small increase of false negatives by 0-30 improvement, and facilitates achieving top performance on the leaderboard. It also boosts performance of other scanners. It outperforms false positive removal methods using L2 distance and attribution techniques. We also demonstrate its potential in detecting a number of semantic backdoor attacks.

READ FULL TEXT

page 1

page 3

page 4

page 7

page 11

page 12

research
06/22/2022

Influence of uncertainty estimation techniques on false-positive reduction in liver lesion detection

Deep learning techniques show success in detecting objects in medical im...
research
07/22/2022

Applying Machine Learning on RSRP-based Features for False Base Station Detection

False base stations – IMSI catchers, Stingrays – are devices that impers...
research
02/09/2021

Backdoor Scanning for Deep Neural Networks through K-Arm Optimization

Back-door attack poses a severe threat to deep learning systems. It inje...
research
02/18/2021

Gifsplanation via Latent Shift: A Simple Autoencoder Approach to Progressive Exaggeration on Chest X-rays

Motivation: Traditional image attribution methods struggle to satisfacto...
research
03/20/2023

A Comparative Analysis of Port Scanning Tool Efficacy

Port scanning refers to the systematic exploration of networked computin...
research
12/03/2020

Detecting Trojaned DNNs Using Counterfactual Attributions

We target the problem of detecting Trojans or backdoors in DNNs. Such mo...

Please sign up or login with your details

Forgot password? Click here to reset